Reinforcement learning revisited
Miquel Noguer i Alonso, Daniel Bloch and David Pacheco Aznar
Reinforcement learning revisited
Preface
Introduction
Markov decision problems
Learning the optimal policy
Reinforcement learning revisited
Temporal difference learning revisited
Stochastic approximation in Markov decision processes
Large language models: reasoning and reinforcement learning
Deep reinforcement learning
Applications of artificial intelligence in finance
Pricing options with temporal difference backpropagation
Pricing American options
Daily price limits
Portfolio optimisation
Appendix
For an in-depth look at reinforcement learning we refer the reader to the books by Blackwell (1969) and Sutton and Barto (2018).
4.1 OVERVIEW
While supervised learning is learning from examples provided by a knowledgable external supervisor, it does not allow learning from interaction. Thus, it cannot learn interactive problems. On the other hand, reinforcement learning (RL) is learning how to map situations to actions so as to maximise a numerical reward signal. The learner must discover which actions yield the most reward by trying them. In general, actions may affect not only the immediate reward but also the next situation and, through that, all subsequent rewards. RL explicitly considers the whole problem of a goal-directed agent interacting with an uncertain environment. Agents have explicit goals, can sense aspects of their environments and can choose actions to influence their environments.
4.1.1 The agent and its environment
Consider an agent continually interacting with an environment: the agent selects some actions, and the environment responds to those actions and presents new situations to the agent. The environment also gives rise to rewards, which are special numerical
Copyright Infopro Digital Limited. All rights reserved.
As outlined in our terms and conditions, https://www.infopro-digital.com/terms-and-conditions/subscriptions/ (point 2.4), printing is limited to a single copy.
If you would like to purchase additional rights please email info@risk.net
Copyright Infopro Digital Limited. All rights reserved.
You may share this content using our article tools. As outlined in our terms and conditions, https://www.infopro-digital.com/terms-and-conditions/subscriptions/ (clause 2.4), an Authorised User may only make one copy of the materials for their own personal use. You must also comply with the restrictions in clause 2.5.
If you would like to purchase additional rights please email info@risk.net