Abstract
In this paper, we present a Markov Decision Process (MDP) based formulation for solving the stochastic root-finding problem with predefined stopping time. In the problem that we pose, we need to find only one root of a given finite valued function f (p,u,h). Here, p is a known Markov chain, u is the adjustable variable, and h is the unknown random variable with known distribution. Hence we cannot measure the true value of the function because h is unknown. We assume that we have a way of measuring whether or not f is within some bound ε from zero. We also present a formulation for the problem where f (p,u) is measurable but the adjustment of u is stochastic with known distribution. Another challenge in our problem is the introduction of finite stopping time. This means that the MDP policy has only a predefined finite number of actions available for adjusting u to find the root (or bring | f | < ε). We have included a price control example in the paper to demonstrate the behavior of the resulting MDP policy in response to the available time steps and the variable values of u and p. The results show reasonable trajectories for our simulated environment.
| Original language | English |
|---|---|
| Pages (from-to) | 81-88 |
| Number of pages | 8 |
| Journal | Proceedings of the Pakistan Academy of Sciences: Part A |
| Volume | 57 |
| Issue number | 2 |
| State | Published - 2020 |
| Externally published | Yes |
Bibliographical note
Publisher Copyright:© 2020. Pakistan Academy of Sciences.
Keywords
- Markov decision processes
- Stochastic approximation
- Stochastic root finding
ASJC Scopus subject areas
- General Computer Science
- General Materials Science
- General Physics and Astronomy