Integral reinforcement learning solutions for a synchronisation system with constrained policies

Mohammed Abouheaf*, Magdi S. Mahmoud, Wail Gueaieb

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

2 Scopus citations

Abstract

A class of the differential games is considered where the agents employ constrained control strategies, and the mutual interactions between the agents are restricted by an undirected graph topology. The dynamical behaviour of the agents and the applied control policies are evaluated using local non-linear performance indices. The solution of the differential game is obtained via a game-theoretic mathematical framework based on adaptive integral reinforcement learning (IRL) schemes. The constrained optimality conditions for the graphical game are found using Bellman's optimality principles. It is demonstrated that, solving the game's coupled IRL-Bellman optimality equations with constrained control policies yields a Nash equilibrium solution. Online adaptive learning solutions are developed using value iteration processes and means of the adaptive critics. Neural network structures are adopted to approximate the constrained optimal control strategies and the respective optimal value functions for each agent in a distributed fashion. The robustness of the proposed solutions is tested using uncertain dynamical learning environment and graph with large time-varying deviations in the connectivity weights.

Original languageEnglish
Pages (from-to)1599-1611
Number of pages13
JournalIET Control Theory and Applications
Volume14
Issue number12
DOIs
StatePublished - 13 Aug 2020

Bibliographical note

Publisher Copyright:
© The Institution of Engineering and Technology 2020

ASJC Scopus subject areas

  • Control and Systems Engineering
  • Human-Computer Interaction
  • Computer Science Applications
  • Control and Optimization
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Integral reinforcement learning solutions for a synchronisation system with constrained policies'. Together they form a unique fingerprint.

Cite this