The case for reinforcement learning in quant finance
The technology behind Google’s AlphaGo has been strangely overlooked by quants
When Google’s Deepmind defeated the world’s top Go player in 2016, it was seen as a breakthrough for artificial intelligence. But the technique used to train AlphaGo, known as reinforcement learning (RL), has not gained much traction in finance, despite its ability to handle complex, multi-period decisions.
Igor Halperin, a senior quantitative researcher at Fidelity Investments, thinks it’s time for that to change: “RL is the best and most natural solution to most of the problems we have in quantitative finance,” he says.
He argues that nearly all problems in quantitative finance – including options pricing, dynamic portfolio optimisation and dynamic wealth management – can be solved with RL or inverse RL, or a combination of the two.
RL techniques work sequentially. At each stage, the algorithm observes the reward obtained in previous stages and proceeds accordingly, trying as many combinations of actions as possible to maximise a given reward function.
Halperin and Matthew Dixon, assistant professor at the Illinois Institute of Technology in Chicago, have published a research paper on the application of RL to dynamic wealth management.
They spotlight two techniques, which can be used either individually or in combination. The first is G-learning, a probabilistic extension of the Q-learning approach popularised by Deepmind. The advantage of G-learning – which is relatively new to finance, despite being well established in other fields – is that it can handle noisy environments and high dimensionality, which Q-learning struggles with.
For this reason, a previous effort by Gordon Ritter to apply Q-learning to dynamic portfolio optimisation was limited to a small number of assets.
“[Q-learning] couldn’t manage a portfolio of 500 stocks and it doesn’t cope well with noisy environments such as financial markets,” says Halperin.
RL is better than Black-Scholes and risk-neutral pricing in general, which makes more harm than good
Igor Halperin, Fidelity Investments
G-learning does not suffer from this problem. Given a reward function – in this case, the maximisation of wealth in a given time horizon – it can find the optimal combination of actions to reach a target outcome using the available historical data.
The second technique, which Halperin and Dixon introduce for the first time in their paper, is called generative inverse reinforcement learning, or GIRL. This works the opposite way to G-learning. GIRL takes the outcomes of strategies – the holdings and returns of a portfolio – and works backwards to infer what investment strategy the manager followed.
Halperin says the tools can be combined to create a robo-advisory solution. GIRL can be used to learn existing strategies, which G-learning can then optimally replicate for clients. The adviser can then potentially tailor solutions to clients’ objectives and level of risk aversion.
Other potential applications include minimising market impact in trade execution. The Royal Bank of Canada’s research centre, Borealis AI, has already used RL to develop a new trade execution system for the bank, called Aiden.
Halperin is also convinced RL can be successfully applied to price derivatives. “RL is better than Black-Scholes and risk-neutral pricing in general, which makes more harm than good,” he says. “Option pricing is all about managing risk, but the main assumption of risk-neutral formulation is that there is no risk, which is self-contradictory.”
Halperin and Dixon’s research is still in the experimental phase and has not been tested in practice, but the authors are confident about its effectiveness.
So why is RL missing from most quants’ existing toolkits? Matthew Taylor, associate professor of computer science at the University of Alberta, reckons it might be down to a scarcity of expertise. “In general, RL is not used much in finance, at least publicly,” he says. “There is a barrier to entry for financial institutions and there aren’t enough reinforcement learning professionals, or enough experts for all the potential.”
The work of Halperin, Dixon and others may fuel wider efforts to apply RL in finance.
コンテンツを印刷またはコピーできるのは、有料の購読契約を結んでいるユーザー、または法人購読契約の一員であるユーザーのみです。
これらのオプションやその他の購読特典を利用するには、info@risk.net にお問い合わせいただくか、こちらの購読オプションをご覧ください: http://subscriptions.risk.net/subscribe
現在、このコンテンツを印刷することはできません。詳しくはinfo@risk.netまでお問い合わせください。
現在、このコンテンツをコピーすることはできません。詳しくはinfo@risk.netまでお問い合わせください。
Copyright インフォプロ・デジタル・リミテッド.無断複写・転載を禁じます。
当社の利用規約、https://www.infopro-digital.com/terms-and-conditions/subscriptions/(ポイント2.4)に記載されているように、印刷は1部のみです。
追加の権利を購入したい場合は、info@risk.netまで電子メールでご連絡ください。
Copyright インフォプロ・デジタル・リミテッド.無断複写・転載を禁じます。
このコンテンツは、当社の記事ツールを使用して共有することができます。当社の利用規約、https://www.infopro-digital.com/terms-and-conditions/subscriptions/(第2.4項)に概説されているように、認定ユーザーは、個人的な使用のために資料のコピーを1部のみ作成することができます。また、2.5項の制限にも従わなければなりません。
追加権利の購入をご希望の場合は、info@risk.netまで電子メールでご連絡ください。
詳細はこちら 我々の見解
トランプ流の世界がトレンドにとって良い理由
トランプ氏の政策転換はリターンに打撃を与えました。しかし、彼を大統領の座に押し上げた勢力が、この投資戦略を再び活性化させる可能性があります。
Roll over, SRTs: Regulators fret over capital relief trades
Banks will have to balance the appeal of capital relief against the risk of a market shutdown
オムニバス(法案)の下に投げる:GARはEUの環境規制後退を乗り切れるのか?
停止措置でEU主要銀行の90%が報告を放棄で、グリーンファイナンス指標が宙ぶらりんな状態に
コリンズ修正条項はエンドゲームを迎えたのでしょうか?
スコット・ベッセント氏は、デュアル・キャピタル・スタックを終わらせたいと考えています。それが実際にどのように機能するかは、まだ不明です。
トーキング・ヘッズ2025:トランプ氏の大きな美しい債券を購入するのは誰でしょうか?
国債発行とヘッジファンドのリスクが、マクロ経済の重鎮たちを悩ませています。
AIの説明可能性に関する障壁は低くなってきている
改良され、使いやすいツールは、複雑なモデルを素早く理解するのに役立ちます。
BISの取引高はトレンドを大きく上回っているのか
最新の3年ごとの調査において、外国為替市場の日次平均取引高は9.6兆ドルに急増しましたが、これらの数値は代表的なものと言えるでしょうか。
DFASTのモノカルチャー自身が自分の試練となる
ストレステスト開示の頻度と範囲が減少したため、銀行によるFRBモデルの模倣を監視することが困難となっております。