A mix of Gaussian distributions can beat GenAI at its own game
Synthetic data is seen as the preserve of AI models. A new paper shows old methods still have legs
The generation of synthetic market data is widely seen as one of the most promising applications of sophisticated artificial intelligence models, such as generative adversarial networks (GANs) and autoencoders.
A new paper from Jörg Kienitz, director of quantitative methods at consultancy m|rig, suggests these new models still have some way to go to beat the old ways.
In his paper, Kienitz shows Gaussian mixture models (GMMs) – a machine learning technique that has been used to fit complex financial distributions for about half a century – do a better job of generating yield curves and volatility surfaces than the latest AI models.
GMMs can capture nearly any continuous probability distribution using a mixture of Gaussian distributions. The process is gradual. The model starts by identifying the densest part of the probability distribution and assigns a Gaussian to capture that shape as closely as possible. It then successively fills in the tails and other parts of the distribution until the entire distribution is captured to the desired level of accuracy.
A combination of Gaussians has the advantage of using tractable and well-understood objects
Marco Bianchetti, Intesa Sanpaolo
“The underlying model is trained on data stemming from some statistical mechanism – real world tensor time series or model data based on stochastic differential equations – which is then used to produce the output: [in this case] the simulated data, including explainable quantities,” says Kienitz.
The training of the model is based on well-known statistical methods, such as expectation maximisation, and is nearly instantaneous.
“Once the training is done, the simulation is very easy because it’s just using simple distributions – namely the uniform and Gaussian distributions,” says Kienitz.
The early results have been impressive. “I considered overnight rates, like €STR, SOFR and Sonia, and it took a mixture of about seven distributions to capture them properly,” says Kienitz. “For equity volatility surfaces, it took only three to five distributions.”
Kienitz compared the results of his GMM-based method to more complex techniques. GANs struggled to produce satisfactory results with a dataset of four years of daily market prices, which Kienitz says is insufficient to properly train the model. Autoencoders fared better, giving some reasonable results, but GMMs came out on top.
“The Gaussian mixtures were always really fine,” says Kienitz. “They fitted also to multi-dimensional settings and allowed for real conditional distributions, making it possible to account for dependency in modelling and simulation, whereas autoencoders or GANs just add auxiliary variables but do not use true conditional distributions.”
When Kienitz reduced the dataset to only one year of daily data, neither the GANs nor autoencoder models were able to capture the salient features of the distributions, while GMMs continued to do so.
Marco Bianchetti, head of internal models approach methodologies for market and counterparty risk at Intesa Sanpaolo, says Kienitz’s approach has other important advantages over more complex techniques.
“A combination of Gaussians can, like more complex machine learning algorithms, approximate any distribution,” he says. “But differently from those complex algos, it has the advantage of using tractable and well-understood objects, also reducing the number of model parameters and possible overfitting problems, so that you can find statistical quantities in an analytical way.”
This makes the model more explainable than GANs or autoencoders. “Network-based methods are often opaque regarding how they do their job,” says Bianchetti. “[This] method is not. The function is a mixture of Gaussians whose parameters have a clear financial interpretation. That has clear benefits also for model validation.”
Kienitz agrees with that assessment. “The interpretation is akin to that of principal component analysis,” he says. “But in this case, it’s a probabilistic interpretation, as a Gaussian principal component and the weights are equivalent to the eigenvectors accounting for the importance of the components.”
As for use cases, Bianchetti says the method described by Kienitz could be used to rectify incomplete or sparse datasets when calculating risk measures. “In particular, within FRTB [the Fundamental Review of the Trading Book], this can be an important contribution, because it could provide a tool for dealing with illiquid risk factors,” he says.
Further research
Kienitz is also exploring other applications. One idea is to use GMMs to manipulate volatility surfaces to produce some desired features – for instance, ironing them out to make them more stable.
“Known methods are based on optimal transport solutions. Since it is possible to optimally transport one GMM into another very efficiently [without] leaving the class of GMM distributions, we expect some nice results that increase speed and flexibility of the method,” Kienitz says.
Kienitz stresses that GMMs are not always preferable to GANs and autoencoders. While GMMs work well with daily price data, they struggle with larger datasets. For instance, fitting GMMs to tick data, which can be enormous, is unfeasible as too many Gaussians would be required, and tractability may be lost.
For now, Kienitz’s research has shown that traditional modelling techniques still have plenty of legs when it comes to solving problems on the cutting edge of finance.
“Everybody seems so excited about deep learning – that you need to try deep learning instead of exhausting the more traditional algorithms like Gaussian mixtures,” says Miquel Noguer i Alonso, a generative models expert and founder of research company Artificial Intelligence Finance Institute.
Perhaps Kienitz’s paper will inspire more quants to take their old models out for another spin.
Only users who have a paid subscription or are part of a corporate subscription are able to print or copy content.
To access these options, along with all other subscription benefits, please contact info@risk.net or view our subscription options here: http://subscriptions.risk.net/subscribe
You are currently unable to print this content. Please contact info@risk.net to find out more.
You are currently unable to copy this content. Please contact info@risk.net to find out more.
Copyright Infopro Digital Limited. All rights reserved.
As outlined in our terms and conditions, https://www.infopro-digital.com/terms-and-conditions/subscriptions/ (point 2.4), printing is limited to a single copy.
If you would like to purchase additional rights please email info@risk.net
Copyright Infopro Digital Limited. All rights reserved.
You may share this content using our article tools. As outlined in our terms and conditions, https://www.infopro-digital.com/terms-and-conditions/subscriptions/ (clause 2.4), an Authorised User may only make one copy of the materials for their own personal use. You must also comply with the restrictions in clause 2.5.
If you would like to purchase additional rights please email info@risk.net
More on Our take
Roll over, SRTs: Regulators fret over capital relief trades
Banks will have to balance the appeal of capital relief against the risk of a market shutdown
Thrown under the Omnibus: will GAR survive EU’s green rollback?
Green finance metric in limbo after suspension sees 90% of top EU banks forgo reporting
Has the Collins Amendment reached its endgame?
Scott Bessent wants to end the dual capital stack. How that would work in practice remains unclear
Talking Heads 2025: Who will buy Trump’s big, beautiful bonds?
Treasury issuance and hedge fund risks vex macro heavyweights
The AI explainability barrier is lowering
Improved and accessible tools can quickly make sense of complex models
Do BIS volumes soar past the trend?
FX market ADV has surged to $9.6 trillion in the latest triennial survey, but are these figures representative?
DFAST monoculture is its own test
Drop in frequency and scope of stress test disclosures makes it hard to monitor bank mimicry of Fed models
Lightening the RWA load in securitisations
Credit Agricole quants propose new method for achieving capital neutrality