
Synthetic data enters its Cubist phase
Quants are using the theory of rough paths to distil the essence of financial datasets
Pablo Picasso was untroubled by the mixed reaction to his portrait of the art patron Gertrude Stein. “Everybody says that she does not look like it,” Picasso said, “but that does not make any difference – she will.”
Years later, his subject agreed: “For me it is I, and it is the only reproduction of me which is always I, for me.” Thoughtful representation, Stein came to realise, is sometimes better than exact replication.
Quants working on synthetic datasets seem to be having their own Picasso moment. These representations of financial history can be used to overcome the paucity of real-world data needed to train deep-learning algorithms in finance. But so-called generator models – which produce synthetic datasets that closely resemble the statistical properties of real market data – seem to miss something, especially when sampling under certain conditions.
The theory of rough paths, developed by University of Oxford professor Terry Lyons, may offer a solution to this problem. Rough paths describe the interaction between non-linear systems. A new paper based on the theory proposes using ‘signatures’ – mathematical objects that are able to encode financial data in a parsimonious and efficient way – to create synthetic datasets that capture the essence of financial markets. The paper is the fruit of a collaboration between Lyons, Hans Buehler and Ben Wood of JP Morgan, Blanka Horvath of King’s College London and Imanol Perez, who contributed while at Oxford University.
Signatures capture multiple characteristics of a time-series distribution without losing the implicit narrative thread in the data. In practice, they look like a series of coefficients that enclose information about a stream of data, such as market prices.
Lyons draws an analogy with films to illustrate how this works. Imagine each frame in a film is a sample value. Periodically sampling a single frame every few minutes would make little sense. An approach using signatures would instead sample a few minutes at a time and attempt to summarise what happens within each interval.
A signature is a sort of universal version of what a stream does when it interacts with nonlinear systems
Terry Lyons, University of Oxford
The idea is that describing a stream of data in terms of a succession of effects or trends provides more meaningful information than random sampling at fixed intervals.
Rough paths don’t make stochastic assumptions about the systems they describe. Rather than sampling prices at precise times, whether hourly or daily, they capture the effects of the data on non-linear systems – for instance, the profit or loss that would arise from applying a hedging strategy – over intervals of time.
“Instead of saying where everything is at a given time, they look at the effects of the stream of data on simple systems,” says Lyons. “A signature is a sort of universal version of what a stream does when it interacts with nonlinear systems. It’s a different way to describe the data.”
The so-called first order of the signature describes the drift up or down in prices from start to finish. The second order measures the volatility of the path over certain time steps. “It’s clear that if prices reach from one point to another with smooth movements, that’s a different story from if you have a volatile journey,” Horvath says. “Higher orders stretch beyond what can be intuitively described.”
The authors show that signature-based models can be trained faster than traditional data generators. They also show that models using signatures retain information more efficiently than data generators that learn directly from raw data and can lose information in the sampling phase.
Crucially, according to Lyons, low order measures in signatures can derive useful information even from small sets of features. This can make all the difference when there is limited data to train a neural network. “You can generally do better with deep learning combined with signatures,” he says. “But if you have small data, then sometimes signatures work really very well, are quick to train and economical.”
The theory of rough paths is being applied in a myriad of fields. “The recognition of hand-written Chinese characters was the first large-scale application of signatures,” says Lyons. Signatures have also been used to recognise actions performed by matchstick people in videos – for example, kicking a ball or swinging a golf club.
In 2019, a team led by James Morrill, a student of Lyons, used signatures-based algorithms to detect early signs of sepsis in medical data.
The underlying theme of actionable pattern recognition also has tremendous uses in finance. Data generated using signatures could be used to train the deep hedging algorithms pioneered by Buehler and Wood at JP Morgan, among others. Other financial applications include simulating market data to price derivatives and test new trading strategies.
Lyons and his co-authors plan to continue their work on perfecting market generators. Future developments in this field may well bear their signatures.
Only users who have a paid subscription or are part of a corporate subscription are able to print or copy content.
To access these options, along with all other subscription benefits, please contact info@risk.net or view our subscription options here: http://subscriptions.risk.net/subscribe
You are currently unable to print this content. Please contact info@risk.net to find out more.
You are currently unable to copy this content. Please contact info@risk.net to find out more.
Copyright Infopro Digital Limited. All rights reserved.
You may share this content using our article tools. Printing this content is for the sole use of the Authorised User (named subscriber), as outlined in our terms and conditions - https://www.infopro-insight.com/terms-conditions/insight-subscriptions/
If you would like to purchase additional rights please email info@risk.net
Copyright Infopro Digital Limited. All rights reserved.
You may share this content using our article tools. Copying this content is for the sole use of the Authorised User (named subscriber), as outlined in our terms and conditions - https://www.infopro-insight.com/terms-conditions/insight-subscriptions/
If you would like to purchase additional rights please email info@risk.net
More on Our take
FX-style crypto platforms could bridge gap with TradFi
Emergence of execution-only ECNs, prime brokers and clearing houses brings new confidence in crypto
Skew this: taking the computational burden off basket options
Dan Pirjol presents a snap formula for estimating implied volatility skew in an instant
Shhh, don’t tell: the struggle to keep skew under wraps
Liquidity recycling by clients has made it more difficult for banks to keep skews quiet
How a machine learning model closed a hidden FX arbitrage gap
MUFG Securities quant uses variational inference to control the mid volatility of options
The AOCI elephant in the DFAST room
After March’s banking crisis, Fed stress tests should adopt harsher and wider ranging rate scenarios
China needs an RMB liquidity absorber – HK might be the answer
Increasing HKMA’s CNH debt issuance could help cement renminbi’s role in financial markets
Into the quantiverse: real-world pricing goes arbitrage-free
QRM quants claim to have bridged divide across ‘multiverse’ of fixed-income models
A three-point turn in derivative design
Citibank quant’s triangle method allows information geometry to be applied to hedge structuring