Stochastic approximation in Markov decision processes
Miquel Noguer i Alonso, Daniel Bloch and David Pacheco Aznar
Preface
Introduction
Markov decision problems
Learning the optimal policy
Reinforcement learning revisited
Temporal difference learning revisited
Stochastic approximation in Markov decision processes
Large language models: reasoning and reinforcement learning
Deep reinforcement learning
Applications of artificial intelligence in finance
Pricing options with temporal difference backpropagation
Pricing American options
Daily price limits
Portfolio optimisation
Appendix
Let F be a function space equipped with a norm ∥· ∥. For instance, if |X| = N, then F is a vector space in RN. We consider the case when N is large or X is continuous, leading to different types of approximation errors. We need to define an approximation space that can represent functions on X in a compact way, which restricts the set of functions we can learn, introducing an approximation error (or bias). Further, when only a finite number of samples is available, these compact methods have an approximation due to the inexact estimation of the value function. This second source of approximation is referred to as estimation error (or variance).
As discussed in Section 5.4, we can combine reinforcement learning methods with function approximation. Learning value functions with function approximation methods is called value function approximation (VFA). There are several well-known algorithms for learning approximate value functions in reinforcement learning; eg, approximate dynamic programming (ADP) and Bellman residual minimisation (BRM) algorithms. In ADP, we learn from bootstrapped targets (Bertsekas and Tsitsiklis 1996), while in BRM we minimise the Bellman residual directly
Copyright Infopro Digital Limited. All rights reserved.
As outlined in our terms and conditions, https://www.infopro-digital.com/terms-and-conditions/subscriptions/ (point 2.4), printing is limited to a single copy.
If you would like to purchase additional rights please email info@risk.net
Copyright Infopro Digital Limited. All rights reserved.
You may share this content using our article tools. As outlined in our terms and conditions, https://www.infopro-digital.com/terms-and-conditions/subscriptions/ (clause 2.4), an Authorised User may only make one copy of the materials for their own personal use. You must also comply with the restrictions in clause 2.5.
If you would like to purchase additional rights please email info@risk.net