Quants are still struggling to extract alpha from big and alternative datasets, despite heavy investment in data science.
“How much alpha can we actually make? The answer is, today, not a lot,” said Christine Qi, co-founder and partner at Domeyard, a quant hedge fund specialising in high-frequency strategies.
The problem, according to Michelle McCloskey, president of Man Group, is noisy data. “There’s a tremendous amount of data being utilised in a number of ways and a lot of that data ends up being noise.”
McCloskey and Qi were speaking on a panel at the Cayman Alternative Investment Summit on February 7.
Quants are investing heavily in machine learning to clean and distil these noisy datasets into useful information. “We’re throwing a lot of resource, both from a human perspective and monetary perspective, [into] working around the data and bringing it into a format that’s usable,” McCloskey said.
So far, those efforts have yielded little in the way of returns. Some firms have made “billions of dollars” using alternative data sources, “but it’s really only a handful of lucky firms”, said Qi. “Most firms are really unsuccessful.”
Buy-side spending on alternative data is projected to reach $1.7 billion by 2020 from $656 million last year. The burgeoning market is attracting the interest of big banks, which are putting proprietary datasets up for sale.
Domeyard is approached on a daily basis by vendors peddling everything from social media feeds to credit card data and satellite imagery. While many are drawn to the idea of using parking lot data to forecast retail spending, or measuring the shadows of oil rigs to determine oil supplies, Qi says the reality is more complex.
“There is a lot of noise in the data, a lot of inaccuracy,” she said. “It’s a cool concept but a lot of stuff happens daily in the market that will affect the outcome of what you’re analysing.”
Quant firm PanAgora takes a dim view of alternative datasets sold by third-party vendors. “Think of their business model. They sell to a wide variety of shops, and if 20 people are using exactly the same dataset, the inefficiency will go away in a heartbeat. If too many people chase a certain alpha, the alpha will go away,” said Mike Chen, senior portfolio manager and lead machine learning researcher at PanAgora, who spoke on the same panel as Qi and McCloskey.
As a result, PanAgora collects its own datasets. This is not an easy undertaking, but Chen believes it is preferable to spending vast sums on externally sourced data that has no guarantee of yielding alpha.
Michael Weinberg, chief investment officer at Protégé Partners, a fund of funds, agreed that widely available datasets are of little value to investors. “Credit card data, satellite imagery data – a lot of this data that’s very prevalent and widely shopped is not the valuable data. The alpha is largely gone,” he said. “Where we think there are opportunities is data that’s gotten more creatively. Data that’s scraped, exhaust data, internet of things data, but we see these smaller managers are going out and creatively finding data, and that data in our view is very laden with alpha.”
Weinberg was speaking on another panel at the same conference.