An ‘optimal’ way to calculate future P&L distributions?

Quants use neural networks to upgrade classic options pricing model

Calculating the future profit and loss (P&L) distribution of non-linear portfolios is a tricky problem – one that is typically handled by running a series of Monte Carlo simulations, with all the computational burden that entails.

In Deep learning profit and loss, published in this month, a trio of Italian quants attempt to solve the problem using neural networks for the first time.  

“We wanted to compute the future P&L distribution of a portfolio with correlated assets and provide a semi-automatic system where new assets and complex structures can be added and computed easily,” explains Pietro Rossi, a senior analyst in the data science unit at Italian consultancy firm Prometeia, and an adjunct professor of computational finance at the University of Bologna.

The approach developed by Rossi and his co-authors – Giacomo Bormetti and Flavio Cocco – also professors at the University of Bologna, is a generalisation of the simulation-based Longstaff and Schwartz model introduced in 2011 to price American options. Key to that model is the so-called continuation value that a portfolio would have if options contained within were not exercised. This is calculated using polynomials that describe the payoffs of individual options in the portfolio. 

Rossi and his co-authors eschew the polynomial approach and instead use neural networks to calculate the continuation value. This allows them to handle portfolios of non-linear or path-dependent products without the need to re-run optimisations or devise workarounds for individual options.

“With neural networks, we can easily fit complex products with articulated exercise surfaces without having to look for the clever regressors one would need with the standard Longstaff and Schwartz model,” says Rossi.

Our small technical innovation here is that at every point we generate a bunch of trajectories – 1,024, to be precise
Pietro Rossi, Prometeia

The polynomial approach has the advantage of simplicity. Relatively easily, interpolators can be built and the associated linear equations solved. But this only works with just one or a few vanilla products. It is difficult to apply this approach to a whole portfolio without running into the curse of dimensionality. And determining if and when an option will be exercised is a high-dimensional problem. It is typically solved by propagating backward the continuation value and comparing it to the current price to see whether or not is makes sense to exercise. 

The Longstaff and Schwartz method uses only one trajectory to compute the backward-propagated continuation value. This is fine when dealing with American options, which are exercisable at any time, but it is unsatisfactory when dealing with Bermudan options that are exercisable at pre-defined dates.

“Our small technical innovation here is that at every point we generate a bunch of trajectories – 1,024, to be precise,” says Rossi. This is analogous to running a series of “small Monte Carlo simulations” to compute the future value, he adds.

By back-propagating all the potential future values, the neural network extracts the full P&L distribution, rather than just an expected price.

The authors tested the model with a sample portfolio of three options – an American put, a European call and a Bermudan call – written on the same underlying to show how it deals with portfolios of highly correlated assets. They found it was particularly beneficial in scenarios with a long time horizon. Using polynomials to calculate the P&L distribution of a product with a one-year maturity over different points in its life is “a tough task”, Rossi says. “By simulating the trajectories the way we do, this becomes easier.”

Optimal stopping time

The paper has generated some mild controversy in quant circles. The authors initially set out to explore potential applications of reinforcement learning in finance before ultimately deciding the calculation of the P&L distribution was best approached as an optimal stopping time problem.

Some quants are not convinced. “I would naturally opt to tackle the problem from a reinforcement learning angle,” the chief data scientist at a large global bank says after reading the paper, which he generally praises as an “original work on an important issue that is relevant to all banks”.

In reinforcement learning, an algorithm attempts to learn the sequence of actions an agent can take to maximise a defined function. This is easily analogous to modelling the behaviour of an options trader who buys and sells future payoffs with the aim of earning the highest returns.  

But Rossi is adamant the optimal stopping time approach is the right one. “I disagree with the idea that a reinforcement learning approach would have been preferable,” he says. “In the financial literature, the optimal stopping time is a better understood concept, though reinforcement learning is probably more fashionable.”

The model is still in the early stages of development and has not yet been applied to real portfolios. Meanwhile, the authors are looking to apply their method to solve the more complex problem of calculating the P&L distribution of basket options.

Only users who have a paid subscription or are part of a corporate subscription are able to print or copy content.

To access these options, along with all other subscription benefits, please contact or view our subscription options here:

You are currently unable to copy this content. Please contact to find out more.

You need to sign in to use this feature. If you don’t have a account, please register for a trial.

Sign in
You are currently on corporate access.

To use this feature you will need an individual account. If you have one already please sign in.

Sign in.

Alternatively you can request an individual account here