Enhancing Proximal Policy Optimization in the Pendulum-v1 Environment Through Clustering-Based State Space Simplification

Mohamed Nadhir Abid, Mounir Beggas, Abdelkader Laouid, Mostefa Kara, Ahcene Bounceur

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

This study aims to enhance the performance of the Proximal Policy Optimization (PPO) algorithm in continuous control tasks, particularly in the Pendulum-v1 environment. We introduce a clustering-based approach to simplify the state-action space, improving learning efficiency and maximizing cumulative rewards. The state-action pairs are clustered using k-means, emphasizing regions with higher rewards to guide the agent's policy optimization. Experimental results demonstrate that clustering significantly improves PPO's performance, with the 10 -cluster configuration yielding the best outcomes in terms of reward consistency and efficiency. After tuning, the clustering-enhanced PPO achieves near-optimal performance, outperforming the baseline PPO by a wide margin.

Original languageEnglish
Title of host publication2024 1st International Conference on Innovative and Intelligent Information Technologies, IC3IT 2024
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9798331541064
DOIs
StatePublished - 2024
Event1st International Conference on Innovative and Intelligent Information Technologies, IC3IT 2024 - Batna, Algeria
Duration: 3 Dec 20245 Dec 2024

Publication series

Name2024 1st International Conference on Innovative and Intelligent Information Technologies, IC3IT 2024

Conference

Conference1st International Conference on Innovative and Intelligent Information Technologies, IC3IT 2024
Country/TerritoryAlgeria
CityBatna
Period3/12/245/12/24

Bibliographical note

Publisher Copyright:
© 2024 IEEE.

Keywords

  • Continuous control
  • Pendulum-v1 environment
  • Proximal Policy Optimization (PPO)
  • Reinforcement learning (RL)
  • State space simplification

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Science Applications
  • Computer Vision and Pattern Recognition
  • Human-Computer Interaction
  • Signal Processing
  • Software

Fingerprint

Dive into the research topics of 'Enhancing Proximal Policy Optimization in the Pendulum-v1 Environment Through Clustering-Based State Space Simplification'. Together they form a unique fingerprint.

Cite this