Stochastic approximation in Markov decision processes

Miquel Noguer i Alonso, Daniel Bloch and David Pacheco Aznar

Contents

Preface

Introduction

Markov decision problems

Learning the optimal policy

Reinforcement learning revisited

Temporal difference learning revisited

Stochastic approximation in Markov decision processes

Large language models: reasoning and reinforcement learning

Deep reinforcement learning

Applications of artificial intelligence in finance

10.

Pricing options with temporal difference backpropagation

11.

Pricing American options

12.

Daily price limits

13.

Portfolio optimisation

Appendix

Let F be a function space equipped with a norm ∥· ∥. For instance, if |X| = N, then F is a vector space in R^N. We consider the case when N is large or X is continuous, leading to different types of approximation errors. We need to define an approximation space that can represent functions on X in a compact way, which restricts the set of functions we can learn, introducing an approximation error (or bias). Further, when only a finite number of samples is available, these compact methods have an approximation due to the inexact estimation of the value function. This second source of approximation is referred to as estimation error (or variance).

As discussed in Section 5.4, we can combine reinforcement learning methods with function approximation. Learning value functions with function approximation methods is called value function approximation (VFA). There are several well-known algorithms for learning approximate value functions in reinforcement learning; eg, approximate dynamic programming (ADP) and Bellman residual minimisation (BRM) algorithms. In ADP, we learn from bootstrapped targets (Bertsekas and Tsitsiklis 1996), while in BRM we minimise the Bellman residual directly

As outlined in our terms and conditions, https://www.infopro-digital.com/terms-and-conditions/subscriptions/ (point 2.4), printing is limited to a single copy.

If you would like to purchase additional rights please email info@risk.net

You may share this content using our article tools. As outlined in our terms and conditions, https://www.infopro-digital.com/terms-and-conditions/subscriptions/ (clause 2.4), an Authorised User may only make one copy of the materials for their own personal use. You must also comply with the restrictions in clause 2.5.

If you would like to purchase additional rights please email info@risk.net