When it comes to correlation, cleaning is a chore that pays

Recent trends in research may help firms obtain reliable correlations from limited data

Cleaning productsCorrelations are some of the most basic pieces of information that help investors understand market moves and build portfolios. However, they lead to some of the most unwieldy estimation problems faced by portfolio managers today. As with many other issues in quantitative finance, it all comes down to a lack of data.

As the number of assets in a portfolio increases, building reliable correlations becomes difficult. For a large portfolio of 500 assets – for example, one that mirrors the S&P 500 index – one would need many years of data to obtain reliable correlations.

"When you try to estimate very large objects, because the object is very large, it would need super-large data sets, going back to the past centuries to be able to pin down the correlation matrix," says Jean-Philippe Bouchaud, chairman and chief scientist at Capital Fund Management in Paris. "[This] obviously makes no sense at all, because over centuries you would expect a lot of things to change."

Modelling using a limited amount of data means one ends up with a huge amount of noise in the estimated correlation matrix, which can lead to massive estimation errors. Portfolio managers have been attempting to tackle the problem for decades, and have tried to solve it through tweaks to the empirical correlation matrix built from observable data. However, that has resulted in both computational challenges and errors in the numbers produced.

Research on 'cleaning' correlation matrices has been picking up in recent times, with many papers published on the topic in the last five years. Most recently, Bouchaud, along with co-authors Joel Bun, a PhD student at Université Paris-Saclay at the Léonard de Vinci Pôle Universitaire, and Marc Potters, co-chief executive and head of research at Capital Fund Management, analysed a number of correlation matrix cleaning techniques and recommended the one they believe performs the best: a technique proposed by quants Olivier Ledoit and Sandrine Péché in 2011.

Its working involves what the authors call a "mathematical miracle", which starts with the empirical correlation matrix of the portfolio and aims to get as close as possible to the true correlation matrix – that is, one without the errors and which truly represents the relationship between assets.

Any correlation matrix can be decomposed into characteristic entities called Eigen values and Eigen vectors. Assuming one wouldn't know the direction of the true Eigen vectors – that is, by keeping them the same as the ones from the empirical correlation matrix – one can still tweak the Eigen values. In their 2011 paper, Ledoit and Péché calculated a formula that gives the overlap between the Eigen vectors of the empirical matrix and the true correlation matrix. This, in turn, gives the Eigen values of the true correlation matrix.

"The miracle comes from the fact that with this explicit formula, in the end, you are able to get a clean formula for the Eigen values which does not require you to know what you try to measure – that is, the true correlation matrix," explains Bouchaud. "As an intermediate step, you think you need this unknown correlation matrix to get the formula, but the miracle is that it cancels out because we are working with very large objects with large matrices. So there is this extension of the law of very large numbers."

Bouchaud and his co-authors tested the performance of their own extension of the Ledoit-Péché method, which accounts for a wider range of Eigen values than the original, against four other commonly used methods. That includes one of the more simple and commonly-used methods – the shrinkage technique proposed by Ledoit and Michael Wolf in 2003, which tries to pull the extreme coefficients in the matrix towards more central values. Even this method was found to be outperformed by the Ledoit-Péché extension, which gave the lowest realised risk among all five methods considered.

Using this method means that, as a rule of thumb, one would require only twice the number of data points as there are assets to get a reasonably good correlation matrix, Bouchaud argues.

The work raises the question of why it has taken so long for the industry to fix something as straightforward and fundamental as building a reliable correlation matrix out of limited asset returns data. Due to the challenges involved, others have settled for filling in missing returns using certain assumptions or data from time points that don't really match for different assets. The resulting correlation matrices are considered invalid because they do not satisfy an important property called positive semi-definiteness.

"The issue is we don't have deep theory behind these things in terms of what we should do when we are violating some of the basic assumptions in building these portfolios," says Eliott Noma, a managing director at Garrett Asset Management in New York.

By analysing existing methods for cleaning correlations and their drawbacks, Bouchaud and his co-authors are helping to bring this fundamental issue back into focus. That could bring lasting benefits for portfolio managers.

Only users who have a paid subscription or are part of a corporate subscription are able to print or copy content.

To access these options, along with all other subscription benefits, please contact info@risk.net or view our subscription options here: http://subscriptions.risk.net/subscribe

You are currently unable to copy this content. Please contact info@risk.net to find out more.

You need to sign in to use this feature. If you don’t have a Risk.net account, please register for a trial.

Sign in
You are currently on corporate access.

To use this feature you will need an individual account. If you have one already please sign in.

Sign in.

Alternatively you can request an individual account here