メインコンテンツに移動

Stochastic approximation in Markov decision processes

Miquel Noguer i Alonso, Daniel Bloch and David Pacheco Aznar

Let F be a function space equipped with a norm ∥· ∥. For instance, if |X| = N, then F is a vector space in RN. We consider the case when N is large or X is continuous, leading to different types of approximation errors. We need to define an approximation space that can represent functions on X in a compact way, which restricts the set of functions we can learn, introducing an approximation error (or bias). Further, when only a finite number of samples is available, these compact methods have an approximation due to the inexact estimation of the value function. This second source of approximation is referred to as estimation error (or variance).

As discussed in Section 5.4, we can combine reinforcement learning methods with function approximation. Learning value functions with function approximation methods is called value function approximation (VFA). There are several well-known algorithms for learning approximate value functions in reinforcement learning; eg, approximate dynamic programming (ADP) and Bellman residual minimisation (BRM) algorithms. In ADP, we learn from bootstrapped targets (Bertsekas and Tsitsiklis 1996), while in BRM we minimise the Bellman residual directly

Sorry, our subscription options are not loading right now

Please try again later. Get in touch with our customer services team if this issue persists.

New to Risk.net? View our subscription options

無料メンバーシップの内容をお知りになりたいですか?ここをクリック

パスワードを表示
パスワードを非表示にする

You need to sign in to use this feature. If you don’t have a Risk.net account, please register for a trial.

ログイン
You are currently on corporate access.

To use this feature you will need an individual account. If you have one already please sign in.

Sign in.

Alternatively you can request an individual account here