Temporal difference learning revisited
Miquel Noguer i Alonso, Daniel Bloch and David Pacheco Aznar
Preface
Introduction
Markov decision problems
Learning the optimal policy
Reinforcement learning revisited
Temporal difference learning revisited
Stochastic approximation in Markov decision processes
Large language models: reasoning and reinforcement learning
Deep reinforcement learning
Applications of artificial intelligence in finance
Pricing options with temporal difference backpropagation
Pricing American options
Daily price limits
Portfolio optimisation
Appendix
We have presented temporal difference (TD) procedures as a way of solving the multi-step prediction problem with a linear function approximation. In Chapter 4, we revisited the TD procedures in light of the Bellman equations and their operators. However, the initial TD procedures introduced by Sutton (1988) were not derived by directly optimising some objective function. The literature on TD methods has mainly ignored the problem of convergence to the true solution, apart from articles by Barnard (1993), who showed that TD(λ) methods were not true gradient descent methods, resulting in narrow convergence and instability, and by Baird (1995), who showed that these methods could not guarantee convergence when they were used with off-policy training. Several non-gradient descent approaches to this problem have been developed, but none have been completely satisfactory. For example, Bradtke and Barto (1996) introduced least squares temporal difference (LSTD) as a second-order method to guarantee stability, but at high computational cost. The theoretical understanding of the optimisation objective in both the linear and non-linear function approximation settings came later. For instance
Copyright Infopro Digital Limited. All rights reserved.
As outlined in our terms and conditions, https://www.infopro-digital.com/terms-and-conditions/subscriptions/ (point 2.4), printing is limited to a single copy.
If you would like to purchase additional rights please email info@risk.net
Copyright Infopro Digital Limited. All rights reserved.
You may share this content using our article tools. As outlined in our terms and conditions, https://www.infopro-digital.com/terms-and-conditions/subscriptions/ (clause 2.4), an Authorised User may only make one copy of the materials for their own personal use. You must also comply with the restrictions in clause 2.5.
If you would like to purchase additional rights please email info@risk.net