Abstract
This study aims to enhance the performance of the Proximal Policy Optimization (PPO) algorithm in continuous control tasks, particularly in the Pendulum-v1 environment. We introduce a clustering-based approach to simplify the state-action space, improving learning efficiency and maximizing cumulative rewards. The state-action pairs are clustered using k-means, emphasizing regions with higher rewards to guide the agent's policy optimization. Experimental results demonstrate that clustering significantly improves PPO's performance, with the 10 -cluster configuration yielding the best outcomes in terms of reward consistency and efficiency. After tuning, the clustering-enhanced PPO achieves near-optimal performance, outperforming the baseline PPO by a wide margin.
| Original language | English |
|---|---|
| Title of host publication | 2024 1st International Conference on Innovative and Intelligent Information Technologies, IC3IT 2024 |
| Publisher | Institute of Electrical and Electronics Engineers Inc. |
| ISBN (Electronic) | 9798331541064 |
| DOIs | |
| State | Published - 2024 |
| Event | 1st International Conference on Innovative and Intelligent Information Technologies, IC3IT 2024 - Batna, Algeria Duration: 3 Dec 2024 → 5 Dec 2024 |
Publication series
| Name | 2024 1st International Conference on Innovative and Intelligent Information Technologies, IC3IT 2024 |
|---|
Conference
| Conference | 1st International Conference on Innovative and Intelligent Information Technologies, IC3IT 2024 |
|---|---|
| Country/Territory | Algeria |
| City | Batna |
| Period | 3/12/24 → 5/12/24 |
Bibliographical note
Publisher Copyright:© 2024 IEEE.
Keywords
- Continuous control
- Pendulum-v1 environment
- Proximal Policy Optimization (PPO)
- Reinforcement learning (RL)
- State space simplification
ASJC Scopus subject areas
- Artificial Intelligence
- Computer Science Applications
- Computer Vision and Pattern Recognition
- Human-Computer Interaction
- Signal Processing
- Software