Synthetic data enters its Cubist phase

Quants are using the theory of rough paths to distil the essence of financial datasets

Pablo Picasso was untroubled by the mixed reaction to his portrait of the art patron Gertrude Stein. “Everybody says that she does not look like it,” Picasso said, “but that does not make any difference – she will.”

Years later, his subject agreed: “For me it is I, and it is the only reproduction of me which is always I, for me.” Thoughtful representation, Stein came to realise, is sometimes better than exact replication.  

Quants working on synthetic datasets seem to be having their own Picasso moment. These representations of financial history can be used to overcome the paucity of real-world data needed to train deep-learning algorithms in finance. But so-called generator models – which produce synthetic datasets that closely resemble the statistical properties of real market data – seem to miss something, especially when sampling under certain conditions.  

The theory of rough paths, developed by University of Oxford professor Terry Lyons, may offer a solution to this problem. Rough paths describe the interaction between non-linear systems. A new paper based on the theory proposes using ‘signatures’ – mathematical objects that are able to encode financial data in a parsimonious and efficient way – to create synthetic datasets that capture the essence of financial markets. The paper is the fruit of a collaboration between Lyons, Hans Buehler and Ben Wood of JP Morgan, Blanka Horvath of King’s College London and Imanol Perez, who contributed while at Oxford University.

Signatures capture multiple characteristics of a time-series distribution without losing the implicit narrative thread in the data. In practice, they look like a series of coefficients that enclose information about a stream of data, such as market prices.

Lyons draws an analogy with films to illustrate how this works. Imagine each frame in a film is a sample value. Periodically sampling a single frame every few minutes would make little sense. An approach using signatures would instead sample a few minutes at a time and attempt to summarise what happens within each interval.

A signature is a sort of universal version of what a stream does when it interacts with nonlinear systems
Terry Lyons, University of Oxford

The idea is that describing a stream of data in terms of a succession of effects or trends provides more meaningful information than random sampling at fixed intervals.

Rough paths don’t make stochastic assumptions about the systems they describe. Rather than sampling prices at precise times, whether hourly or daily, they capture the effects of the data on non-linear systems – for instance, the profit or loss that would arise from applying a hedging strategy – over intervals of time. 

“Instead of saying where everything is at a given time, they look at the effects of the stream of data on simple systems,” says Lyons. “A signature is a sort of universal version of what a stream does when it interacts with nonlinear systems. It’s a different way to describe the data.”

The so-called first order of the signature describes the drift up or down in prices from start to finish. The second order measures the volatility of the path over certain time steps. “It’s clear that if prices reach from one point to another with smooth movements, that’s a different story from if you have a volatile journey,” Horvath says. “Higher orders stretch beyond what can be intuitively described.”

The authors show that signature-based models can be trained faster than traditional data generators. They also show that models using signatures retain information more efficiently than data generators that learn directly from raw data and can lose information in the sampling phase.

Crucially, according to Lyons, low order measures in signatures can derive useful information even from small sets of features. This can make all the difference when there is limited data to train a neural network. “You can generally do better with deep learning combined with signatures,” he says. “But if you have small data, then sometimes signatures work really very well, are quick to train and economical.”

The theory of rough paths is being applied in a myriad of fields. “The recognition of hand-written Chinese characters was the first large-scale application of signatures,” says Lyons. Signatures have also been used to recognise actions performed by matchstick people in videos – for example, kicking a ball or swinging a golf club.

In 2019, a team led by James Morrill, a student of Lyons, used signatures-based algorithms to detect early signs of sepsis in medical data.

The underlying theme of actionable pattern recognition also has tremendous uses in finance. Data generated using signatures could be used to train the deep hedging algorithms pioneered by Buehler and Wood at JP Morgan, among others. Other financial applications include simulating market data to price derivatives and test new trading strategies.

Lyons and his co-authors plan to continue their work on perfecting market generators. Future developments in this field may well bear their signatures.

Only users who have a paid subscription or are part of a corporate subscription are able to print or copy content.

To access these options, along with all other subscription benefits, please contact or view our subscription options here:

You are currently unable to copy this content. Please contact to find out more.

You need to sign in to use this feature. If you don’t have a account, please register for a trial.

Sign in
You are currently on corporate access.

To use this feature you will need an individual account. If you have one already please sign in.

Sign in.

Alternatively you can request an individual account here