Digging deeper into deep hedging

Quants have harnessed machine learning to hedge vanilla derivatives. But dynamic techniques and GenAI simulated data can push the limits of deep hedging even further, as derivatives guru John Hull and colleagues explain

Traditionally, derivatives portfolios have been hedged by managing their sensitivity to changes in underlying factors such as volatility or interest rates. These sensitivities are labelled with Greek letters: delta, gamma, vega, etc. The Greeks have the advantage that they are easy to calculate and additive. (For example, if portfolio Z is the sum of portfolios X and Y, the delta of Z is the delta of X plus the delta of Y.) However, they are imperfect tools. They look at the portfolio at one particular point in time and fail to distinguish between similar rises and falls in value, regarding both as equally “bad”.

These flaws are leading dealers to experiment with reinforcement learning (RL), a form of machine learning, for longer term hedging strategies. Dubbed “deep hedging” by some, the technique gives better results than traditional methods and is likely to become the norm in the near future. The hedger chooses a time horizon, chooses an objective function, and makes an assumption about the stochastic process followed by the underlying market variable. RL then derives an optimal hedging strategy. A variety of different objective functions can be used. In particular, losses can be penalised while gains are rewarded. Academics as well as banks including JP Morgan have explored the potential of RL for hedging.

How does reinforcement learning work? It might be described as “sophisticated trial and error”. It starts with a random strategy and uses an algorithm to systematically improve it. RL is designed for decisions that have to be made when key variables are changing in an uncertain way. The method is therefore ideally suited for hedging a portfolio of derivatives dependent on a single underlying asset.

However, the basic RL approach has drawbacks. It assumes that a dealer derives a strategy at one particular time and sticks to it until the horizon date come what may. It also does not allow for new trades during a given time period. At the same time, it is challenging for the dealer to estimate whether and how the market may shift; entering a high volatility regime can render a fixed RL strategy ineffective. 

AI promises a shift towards automation in generating realistic market scenarios in a non-parametric way

A natural development is to use a dynamic version of the RL approach. This involves incrementally improving the RL model each day (or more frequently) with the latest information on the portfolio and the market. The computational facilities available in trading rooms are such that it now typically takes only a few minutes to run the RL approach on the first day once the model has been initially trained. Updating the RL strategy on subsequent days can be much faster because the strategy developed on a particular day can be used as starting point for deriving the next day’s strategy. The dynamic RL approach adds an extra dimension to the way the hedging decisions are made and improves the objective function.

Reinforcement learning copes well with relatively simple models for the behaviour of the underlying asset, such as geometric Brownian motion. But a potential improvement is to use data on the actual observed movements in the underlying market variables. However, the scarcity of relevant historical data hampers the training of RL models, forcing users to turn to market simulators to augment data. Traditionally, model-based stochastic processes are employed to simulate market evolution. For instance, the Heston model is a common way of simulating equity volatilities and prices. These stochastic methods served well in an era where computational limitations and data availability dictated a more formulaic approach to understanding market behaviour. But such simulators require recalibration to stay aligned with real market observations.

The advent of generative AI and machine learning has introduced an alternative solution to market simulators. Unlike traditional approaches, where models are artificially crafted and constantly calibrated, AI promises a shift towards automation in generating realistic market scenarios in a non-parametric way. Despite most media headlines focusing on the generative AI uses in human language and image vision through applications such as ChatGPT and Midjourney, there are ongoing developments in simulated financial data, typically referred to as synthetic data. Recent studies have demonstrated that generative models can be applied to a wide range of tasks, from completing missing information in daily volatility surfaces to producing realistic arbitrage-free volatility surfaces along with asset prices. While mainstream adoption is still in its early stage, these developments are paving the way for a new era of enhanced RL models for hedging derivatives. Hedging strategies are based on paths for the underlying asset that are derived from those observed. But they may involve market movements that are more extreme than any observed. As a result, this new breed of hedging strategies is more robust, adaptable, and effective than those obtained using traditional approaches.

Reinforcement agents

One key advantage of the RL approach is that transaction costs are lower than when the Greek letter approach is used. Tests find that 20–25% fewer transactions are necessary to achieve a certain hedging objective. This is because, by looking several periods ahead, the hedger manages to avoid situations where they place a hedge on one day and subsequently reverse it. This can be particularly important for gamma and vega hedging, which involves trades in options or other derivatives where transaction costs are relatively high.

Another advantage of the RL approach is that the user has more flexibility in the choice of the objective function. Two objective functions used are:

  • VaR95: Minimise value at risk when the confidence level is 95% (ie, minimise the 95 percentile point of the loss distribution)
  • CVaR95: Minimise conditional value at risk (also known as expected shortfall or expected tail loss) when the confidence level is 95%. This is the expected loss conditional on the loss being greater than the VaR95 level

Not surprisingly, RL produces better results for these measures than the Greek letter approach. The Greek letter approach can reduce the standard deviation of outcomes, which has the unwelcome effect of making good outcomes, as well as bad outcomes, less likely. An approach that focuses on a particular objective function that is concerned only with losses (or one that penalises losses while rewarding gains) produces a more satisfactory outcome.

Breaking down barriers

Previous research confirmed that RL can improve results for portfolios of vanilla options. A more recent study tested the performance of the RL approach for barrier options. Barrier options are popular exotic options, often used in structured products such as the common Japanese instrument, Uridashi. Barrier options are less expensive than their vanilla counterparts, as they provide the same payoff, but only in certain circumstances. In the case of an “in” barrier option, the barrier must be hit for the option to become a vanilla option; in the case of an “out” barrier option, crossing the barrier leads to the option disappearing. However, the barriers make it harder for market-makers to hedge their risks when the asset price is close to the barrier. In particular, the option’s delta tends to be discontinuous at the barrier. See figure 1, which shows delta as a function of the asset price for a particular down-and-in-put. As a result, mechanically implementing delta hedging does not usually work well. RL gives better results.

As illustrated in the table below, RL can work well even when the stochastic process assumed for the underlying asset proves wrong. The table shows the results of a 20-day RL strategy that is developed assuming a 30% volatility when the realised volatility proves to be different from 30%. It is assumed that a delta hedger is able to detect the realised volatility whereas the RL hedging strategy continues to assume 30%. In most cases the value of the objective function using RL is less than that using Greek letters indicating that RL hedging avoids big losses more effectively, even though it is not based on the right volatility. Note that these results are not based on dynamic RL. The latter would track volatility and produce better results than the Greek letter approach even when markets experience big changes.

This article was co-written by Jacky Chen, Yu Fu, John Hull, Zissis Poulos, Zeyu Wang and Jun Yuan, from the FinHub Research Centre at the Joseph L. Rotman School of Management, University of Toronto

Only users who have a paid subscription or are part of a corporate subscription are able to print or copy content.

To access these options, along with all other subscription benefits, please contact info@risk.net or view our subscription options here: http://subscriptions.risk.net/subscribe

You are currently unable to copy this content. Please contact info@risk.net to find out more.

Most read articles loading...

You need to sign in to use this feature. If you don’t have a Risk.net account, please register for a trial.

Sign in
You are currently on corporate access.

To use this feature you will need an individual account. If you have one already please sign in.

Sign in.

Alternatively you can request an individual account here