Policy iteration solution for differential games with constrained control policies

Mohammed I. Abouheaf, Magdi S. Mahmoud, Frank L. Lewis

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

8 Scopus citations

Abstract

Graphical games are special classes of the standard differential games. The underlying neural network solutions are complicated and do not employ straightforward tuning laws. This issue becomes more challenging if the control strategies of the agents are constrained. An integral adaptive learning approach is developed to find an online solution for the differential graphical games with constrained control strategies. This solution employs a distributed adaptive policy iteration process in real-time. Local performance indices are utilized to assess the coupling between the agents and account for the constrained policies. Means of adaptive critics are used to develop a solution platform for each agent using single layer of neural networks., that are adapted using gradient descent tuning approach. This framework handles the main concerns related to the complexity and scalability of the distributed solution. The convergence of the adaptive learning solution is shown to hold under some graph-based conditions.

Original languageEnglish
Title of host publication2019 American Control Conference, ACC 2019
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages4301-4306
Number of pages6
ISBN (Electronic)9781538679265
DOIs
StatePublished - Jul 2019

Publication series

NameProceedings of the American Control Conference
Volume2019-July
ISSN (Print)0743-1619

Bibliographical note

Publisher Copyright:
© 2019 American Automatic Control Council.

Keywords

  • Adaptive Critics
  • Differential Games
  • Integral Reinforcement Learning
  • Optimal Control
  • Policy Iteration

ASJC Scopus subject areas

  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Policy iteration solution for differential games with constrained control policies'. Together they form a unique fingerprint.

Cite this