In fake data, quants see a fix for backtesting

Traditionally quants have learnt to pick data apart. Soon they might spend more time making it up

  • Quants are experimenting with so-called synthetic data – fake time series data for markets or investments, generated with the same machine learning tools used in deep fake videos.
  • The fake data could fix a key shortcoming in how firms backtest systematic investing strategies, where they are forced to rely on just one version of events.
  • With the newly fashioned data firms could test strategies against what might have happened as well as what did.
  • Early results show the data to be highly realistic. Work remains, though, to adapt to the financial arena techniques that were first designed for image recognition.

It wasn’t a surprise that videos of Tom Cruise playing golf and doing magic tricks should rack up millions of views on TikTok earlier this year. The real surprise was that the clips didn’t feature Tom Cruise at all. They were fakes.

The technology behind these ultra-realistic ‘deep fake’ videos, now common on social media platforms, has already found a home in finance where quants are using it to create parallel universes of data to test investment strategies.

Experts say this synthetic data could help overcome weaknesses in backtesting, which relies on just a single time series of historical data and says nothing about how strategies might have fared in different conditions. What quants refer to as synthetic, artificial, or simply fake data, offers a chance to invent alternative histories for deeper testing.

“The data is anchored to real life but altered in various ways,” says Joseph Simonian, quant research consultant and co-editor of the Journal of Financial Data Science. It gives a “new flavour” to backtesting. A buy-side quant says fake data makes it possible to explore the market’s unknown unknowns.

Firms are starting to experiment with the idea and are achieving some notable results. The models can be glitchy, still. But the technology to apply them in investing is moving fast.

“We can create synthetic series that are indistinguishable from the original,” says Blanka Horvath, an academic at King’s College London and the Technical University of Munich, and part of a team of researchers and JP Morgan quants working on data generation models. “That’s where the excitement is coming from.”

Firms of all stripes are trying out fake data. Amundi, Europe’s largest asset manager, has begun using synthetic data to test some of its volatility trading and risk parity strategies. ETS Asset Management Factory, a firm that licenses machine learning algorithms to investors, has used artificial data to test its currency trading algorithms and to develop new models. Deutsche Bank is looking at the new simulation techniques.

And the results demand attention.

Horvath with other academics and the team at JP Morgan has built a data generator for use in training and testing models for options hedging. The models use machine learning to hedge complex derivatives books, in a process known as deep hedging. The data generator produced fake market data that mimicked the original to a 99.9% confidence level.

We can create synthetic series that are indistinguishable from the original. That’s where the excitement is coming from
Blanka Horvath, King’s College London

“These generative models have the potential to approximate time series so much more closely than regular stochastic models,” Horvath says. In fact, there is a theoretical proof that with neural networks quants can replicate a real dataset as closely as they wish. “We can do it as precisely as we want,” she says. 

In 2019, a machine learning model built by Magnus Wiese, a quant researcher who splits his time between JP Morgan and the University of Kaiserlautern, together with colleagues bested a Garch model in faking daily S&P 500 prices. The so-called convolutional neural network model uses techniques initially designed for image recognition. Garch models are widely used by quants in simulating market paths.

Bank of America, ETS and another quant firm, Cohen & Steers, separately have trained machine learning models using fake data and tested how successfully the models would invest. With more data to learn from – and no apparent loss of accuracy – those trained on fake data performed better.

BofA’s neural network achieved a higher hit rate compared with training solely on real data. It also scored a better R-squared, which is a measure of how well a model explains the data it applies to. When used to forecast moves in US Treasury bonds over daily to monthly horizons, the model recorded a hit rate as high as 82%.

According to Stefan Jansen, a consultant on the use of machine learning in quant trading and the author of a widely used text on the topic, up to a quarter of the big hedge funds he speaks to already are exploring the use of synthetic data. “It’s too promising to pass up,” he says.

Making it up

As far back as the 1980s trend followers manufactured simple artificial time series such as basic saw-tooth price patterns, to work out how different market conditions would affect the strategies they were developing.

Rudimentary methods of creating artificial data proved especially useful during the last decade in the aftermath of the European sovereign debt crisis, when the European Central Bank set negative rates and quants had to overhaul financial models that were not built for such a possibility.

But traditional backtesting – where investors test the effectiveness of a strategy by charting how it would have performed in real-world conditions – uses only one version of history. “They see only what actually happened,” says Jacques Joubert, founder of Hudson & Thames, a firm that licenses quant algorithms.

This creates an in-built limitation for testing. To draw an analogy with medicine, scientists tracking Covid vaccine efficacy can access millions of separate time series of data, one for each jabbed individual. Backtesting can be compared to gauging vaccine efficacy based on a single patient.

So-called bootstrapping and Monte Carlo simulations try to get around the problem. In bootstrapping, quants glue together bits of the past to fashion new versions of history against which they can test ideas. In Monte Carlo simulations, they create plausible future paths for time series based on models of how markets work, adding an element of randomness.

But even Monte Carlo methods follow a data generation process according to a pre-programmed model. This shapes the distribution of the data that the process creates. The convoluted patterns of real markets are impossible to capture perfectly in a mathematical model. “The real-world data is not so neatly described,” says Anthony Morris, head of quantitative strategies at Nomura.

Bootstrapping runs into the same barrier, he says. “Mixing up historical data in different orders will distort the nature of actual serial and cross-sectional dependencies. These exist but can be quite difficult to specify.”

When markets evolve in ways that make the past a bad proxy for the future, as arguably is the case right now, the problem becomes acute.


Data generators offer something different. They are able to reproduce complex patterns from real markets, says Thierry Roncalli, Amundi’s head of quantitative research: non-linear autocorrelation, fat tails, heteroscedasticity – such as varying cross-sectional correlations – and non-stationarity, the way in which the data’s distribution changes over time.

Using generators, quants say they can add richer data into simulations, like how two indexes move together. They can incorporate data from outside the price time series, like sentiment scores or trading volumes.

Synthesising data, then, can help prepare for unseeable risks – the kind of tail events, like Covid-19, that have no parallel in the historical data.

Amundi is using fake data to test the resilience of its volatility strategies in the face of such events. And in its risk parity strategies, the firm is using the data to calibrate stop-loss and stop-gain mechanisms based on a better grasp of the extremes of the return distribution.

In both cases, testing with synthetic data exposed risks that otherwise would have gone unnoticed, Roncalli says.

“For risk parity, the tail of the probability distribution [of maximum drawdown]… is less fat than that generated by the bootstrap sampling method but has several extreme severe scenarios,” the Amundi team wrote in a paper on their work. 

[Synthetic data] provides a mechanism with which to develop backtesting models that are regime specific
Yigal Jhirad, Cohen & Steers

With fake data, quants might also test strategies against known unknowns – periods of stress for which parallels do exist but where the data is scarce or unreliable. An example is a rise in inflation. Markets have enjoyed low inflation for the past 40 years, but many investors in the US and Europe are concerned this may be about to change.

To test an inflation strategy with a bootstrapping method, investors would cherry-pick relevant periods of data – such as stagflation of the 1970s. But slicing data into regime-specific chunks cuts into an already sparse body of information.

It’s also unclear that past episodes will necessarily be a good source of information. Previous spells of inflation came before markets globalised, for example.

“You’d like to backtest in periods that are representative of the appropriate market conditions or market regime,” said Yigal Jhirad, head of quantitative and derivative strategies at Cohen & Steers, speaking at a recent conference. “[Synthetic data] provides a mechanism with which to develop backtesting models that are regime specific.”

Blanka Horvath
Photo: Geraint Roberts
Blanka Horvath, King’s College London

The new models open up a way to a more sophisticated, forward-looking testing, compared with the traditional method of backtesting strategies, which is by definition backward-looking. Quants might input into a model the level of market volatility or features of recent price moves, and ask the model to plot future paths based on those readings, Horvath explains. “We can ask the model: if we ended up in a year’s time in [a given state of the world], how would the future look after that?”

Elsewhere, Rob Carver, a former portfolio manager at Man AHL and writer on quant investing, says synthetic data could help in stress-testing strategies for which losses can be highly path dependent.

For trend following and risk parity strategies, markets that slowly drift in the wrong direction can be worse than markets that jump quickly, he points out. (Greater volatility would trigger risk management mechanisms that cause the strategies to reduce leverage.) Simulating market paths and generating a theoretical P&L for such strategies can be more revealing than a point in time stress scenario, such as assuming sudden big falls across markets.

And fake data also could help root out strategies that are spurious. Strategies that quants are able to identify in their research in only one synthesised dataset are probably a fluke, say Gautier Marti, a quant researcher at the Abu Dhabi Investment Authority. Marti is widely considered to be one of the architects of using image generation technology to create synthetic financial data.

To compare the robustness of different investing ideas, Simonian suggests using machine learning iteratively to generate fake time series that inch closer and closer to replicating the real data. By testing strategies on the fake data in each step, quants could measure how far the future would need to diverge from the past before a strategy might break down.

Trial and error

To be clear, using machine learning to create synthetic financial data is a nascent area. Generative adversarial networks (Gans), one of the models for making fake data that has attracted most interest, were invented only in 2014.

Gans use so-called generator and discriminator neural networks working in pairs and competing with each other through thousands or millions of iterations. The generator tries to learn to create fake data good enough to fool the discriminator. The discriminator effectively tries to call the generator’s bluff. Their first use to synthesise time series data came as recently as 2017 and not in finance but to train a model to warn of medical emergencies in a hospital intensive care unit.

Generating and using synthetic data in finance will take time to get right.

Firms have to experiment with different data, decide which relationships they want to model, choose whether to use other variables such as sector or macro information in the generation process. They must pick a model network architecture, set training rates and learning horizons, and select from multiple algorithms for the task. This isn’t easy to determine in advance, Jansen says: “You have to try it out until you stumble on something that works.”

Firms may need to work on the problem “in a concentrated fashion” for six to 12 months, he says. Jansen reckons such a project should be carried out as part of an effort to build deep learning capabilities, which he estimates would require a team of four or five specialists and cost maybe $2–3 million a year.

The time required to run the generators holds back the pace at which such research can occur. Running models can take many hours, up to more than a day, practitioners say. And Marti estimates no more than a hundred individuals in Asia and Europe are working in finance who are fluent with the most cutting-edge techniques.

Gans are really difficult to calibrate. When you change the parameters, Gans are more sensitive. But we can’t say one method is better than another because this is a work in progress

Thierry Roncalli, Amundi

Gans in particular can be infuriatingly hard to calibrate. The models are prone to what’s called mode collapse, in which the model picks up too early on limited features of the data and learns to follow only one possible path without exploring further. “The generator creates data that looks real but you miss lots of possible scenarios,” says Marti.

A neural network with too many nodes effectively learns in too much detail, just like any overparameterised model. This means it might pick up on noise in the training data, like homing in on fuzziness in an image, and learn to generate new data in which the fuzziness dominates.

Quants might calibrate the discriminator poorly. “If you use the wrong optimisation objectives you might think your Gan has converged but you produce synthetic data that turns out to be missing the features that really matter,” Horvath says.

In one set of experiments where Jansen scaled up a basic Gan time-series model to synthesise prices and returns for 50 stocks, the fake histories appeared less convincing, losing some of the fat-tailed nature of real market data.

“You may need to overlay these models with some heuristics,” says a buy-side quant. “You might condition the model to make it more prone to come up with appropriate solutions and not to get stuck somewhere in left field. There’s a lot of work you need to do to get these things to output practical results and not just a bunch of gibberish.”

The biggest challenge, though, is simply knowing whether the output data from models can be trusted.

To train a Gan, or to be confident in data from other simpler generative models, quants need a way to determine whether the fake data captures the properties that are relevant from the original. “Honestly, that’s an open problem,” says George Lentzas, chief investment officer at quant firm Springfield Capital Management and an adjunct professor at Columbia Business School.


That said, innovations and advances are coming thick and fast.

Academics in 2020 built Stock-Gan, a neural network that creates synthetic stock market data by using a machine learning module to concoct order-book histories.

Separately, researchers at the Nanyang Technological University and the Chinese University of Hong Kong built a Gan to synthesise data for the S&P 500 and constructed an optimised portfolio using the information. The portfolios proved more resilient in real-world stress periods, they found.

And on the critical question of how to measure the realism of output data, Horvath and the quants at JP Morgan say they have made a breakthrough.

Mathematical signatures – a concept developed by Terry Lyons at Oxford University that originally was applied to read characters in written Chinese – can be used to encode the essence of a sequence of data, Horvath says. That allows models to measure precisely how similar one time series is to another. A forthcoming paper on will detail Horvath’s work with Lyons and the JP Morgan team.

Already, academics at University College London, the University of Kaiserslautern and the University of Edinburgh have developed a Gan that uses signatures in its discriminator. The method consistently beats “state of the art benchmarks” in forging realistic data, the authors state in a working paper released last year. And that includes Gans that learn based on a less sophisticated mechanism.

Several researchers are exploring models, or combinations of models, that reduce the noise in financial markets at the same time as learning to replicate it.

Horvath and her colleagues see promise in variational autoencoders, a type of model that compresses data into core elements, then rebuilds an alternative version following the blueprint the model has created. In its data generator, the team combined such a model with the discriminator element of a Gan.

Jhirad sees a future in allying autoencoders with Gans to help pick out salient features in markets and stop the Gan learning its way down blind alleys.

Amundi has been working with restricted Boltzmann machines, another type of generative model championed by quants at Standard Chartered in a study on synthetic data from 2019.

Boltzmann machines operate in a similar way to autoencoders, Roncalli explains, learning the probability distributions of interlinking elements of the data, from which synthetic data samples can be drawn.

The technicalities of faking data, then, remain to be pinned down. “Gans are really difficult to calibrate,” Roncalli says. “When you change the parameters, Gans are more sensitive. But we can’t say one method is better than another because this is a work in progress.” Quants are optimistic about succeeding, though.

Bigger datasets, a few human-imposed nudges in how the models learn, and neural networks with architecture built for the task at hand mean there’s a good chance of solving any teething issues, Jansen thinks. Gan-generated images of faces have come a long way from the blurry smudges first created in 2014.

Editing by Alex Krohn

Only users who have a paid subscription or are part of a corporate subscription are able to print or copy content.

To access these options, along with all other subscription benefits, please contact or view our subscription options here:

You are currently unable to copy this content. Please contact to find out more.

You need to sign in to use this feature. If you don’t have a account, please register for a trial.

Sign in
You are currently on corporate access.

To use this feature you will need an individual account. If you have one already please sign in.

Sign in.

Alternatively you can request an individual account here