メインコンテンツに移動

Deep reinforcement learning

Miquel Noguer i Alonso, Daniel Bloch and David Pacheco Aznar

This chapter provides a comprehensive introduction to deep reinforcement learning (DRL), which will be explored in depth. Chapter 3 focuses, in particular, on obtaining optimal reinforcement learning (RL) policies. DRL is a framework that merges deep learning techniques with reinforcement learning principles to solve complex decision-making tasks. At its core, the DRL framework consists of an agent that interacts with an environment, aiming to maximise cumulative rewards over time. The environment is modelled as a Markov decision process (MDP), with a state space representing all possible conditions, an action space outlining all available decisions, a transition function defining the probabilities of moving from one state to another and a reward function that provides immediate feedback based on the agent’s actions. A discount factor accounts for the reduced influence of future rewards.

The agent’s behaviour is governed by a policy, which is a probabilistic mapping from states to actions. In DRL, this policy is often represented as a deep neural network with trainable parameters. Through a process of repeated interactions with the environment, the agent learns to refine its policy

Sorry, our subscription options are not loading right now

Please try again later. Get in touch with our customer services team if this issue persists.

New to Risk.net? View our subscription options

無料メンバーシップの内容をお知りになりたいですか?ここをクリック

パスワードを表示
パスワードを非表示にする

You need to sign in to use this feature. If you don’t have a Risk.net account, please register for a trial.

ログイン
You are currently on corporate access.

To use this feature you will need an individual account. If you have one already please sign in.

Sign in.

Alternatively you can request an individual account here