Podcast: Halperin on reinforcement learning and option pricing

Fidelity quants working on machine learning techniques to optimise investment strategies

Credit: Alex Towle

To most people, Chess and Go are complex strategy games. Mathematically speaking, they are optimisation problems that involve a sequence of multi-period decision making, which are hard to solve because of their non-linearity and high dimensionality. But in recent years, artificial intelligence researchers have shown that machines can be trained to master such games using reinforcement learning (RL), setting the stage for wider applications of the technique to solve complex mathematical problems.

Igor Halperin, senior quant analyst at the AI centre of excellence for asset management at Fidelity Investments, has long been convinced that RL could be applied to portfolio management.

He received Risk.net’s Buy-side Quant of the year award in 2021 for his research with Matthew Dixon, professor at the Illinois Institute of Technology, on optimising retirement plans and target date funds using RL and inverse RL (IRL).

In this edition of Quantcast, Halperin discusses his most recent work with Fidelity colleagues Jiayu Liu and Xiao Zhang on applying a similar approach to the problem of the asset allocation among equity sectors.

“This is something I have been envisioning since 2018 […] and part of a general plan I had,” says Halperin, adding that the progress to date has been encouraging.

Reinforcement learning links decision-making to a reward function, which must be maximised to obtain the optimal outcome. Commonly, the reward function is pre-determined by the user – it may be, for example, a risk-adjusted measure of return that the algorithm aims to exceed by testing all possible sequence combinations.

Inverse reinforcement learning does the opposite, taking the strategies of human experts and working backwards to identify the reward function that explains their decisions. Halperin and his co-authors use IRL to essentially crowdsource a robust reward function from the strategies of multiple portfolio managers. “Once you have a reward function you know what you should do,” he explains. The RL algorithm is then used develop an asset allocation strategy that maximises this general reward function.


According to Halperin, this approach can potentially improve the performance of a homogeneous group of fund managers by giving them investment recommendations that can help remove biases and quirks from their investment process.

In this podcast, Halperin also discusses his long-standing criticism of standard option pricing models, which he maintains are fundamentally flawed. “Should I say they are all wrong, or should I say they’re not even wrong?” he muses, channelling the words of theoretical physicist Wolfgang Pauli.

His point is that standard models based on geometric Brownian motion can capture volatility but fail to account for the existence of a drift in asset prices. In 2021, he proposed an alternative approach that resembles the geometric Brownian motion while adjusting for the drift term. In his setting, the drift is a non-linear function that accounts for market inflows and outflow, as well as frictions.

Discussing the influence of physics on quant finance, Halperin notes the differences between linear models, represented by classic parametric models, and non-linear models, which mostly include neural networks. The former offer clear interpretability of the phenomenon they describe but cannot describe complex systems, while the latter can handle complex systems but cannot be controlled. Halperin sees tensor networks, a functional toolset borrowed from physics, as a good middle ground, “because they do non-linearities but in a controlled way”.

For those curious to learn more, the most recent episode of Quantcast with Vladimir Piterbarg and Alexandre Antonov was entirely dedicated to tensor train approaches.

Halperin is now working on research projects that blend concepts from different branches of finance and statistics. One, for example, deals with multi-agent reinforcement learning, where a reinforcement learning algorithm drives the behaviour of agents in a model, allowing them to adapt based on their interactions with each other.


00:00 RL and IRL in fund management

06:27 Application of RL and IRL to portfolio management

10:27 Previous application of RL to wealth management

13:05 Why RL is not a black box

16:20 Further applications of RL in finance

20:25 Option pricing models – not even wrong?

29:45 Physics and finance

36:30 Future research projects

To hear the full interview, listen in the player above, or download. Future podcasts in our Quantcast series will be uploaded to Risk.net. You can also visit the main page here to access all tracks, or go to the iTunes store, Spotify or Google Podcasts to listen and subscribe.

Only users who have a paid subscription or are part of a corporate subscription are able to print or copy content.

To access these options, along with all other subscription benefits, please contact info@risk.net or view our subscription options here: http://subscriptions.risk.net/subscribe

You are currently unable to copy this content. Please contact info@risk.net to find out more.

You need to sign in to use this feature. If you don’t have a Risk.net account, please register for a trial.

Sign in
You are currently on corporate access.

To use this feature you will need an individual account. If you have one already please sign in.

Sign in.

Alternatively you can request an individual account here