Winton sees problems with big data analytics

Social media and live newsfeeds not so far useful, hedge fund says

david-harding-winton

Social media and news aggregators have so far not proved useful as sources of big data for hedge fund investors, according to David Hand, the chief scientific adviser to quant fund Winton Capital Management.

This is despite a number of hedge funds saying they are increasingly interested in big data analytics, citing social media sources and news aggregators as innovative ways to spot trends and generate alpha.

“There is a lot of big promise around big data but it is not without its risks,” says Hand, stressing that, so far, data quality and selection biases have caused problems for users of big data. “The computer is the intermediary between [the investor] and the data, which means you have to take a lot of it on trust. The data-quality problems are exacerbated by the sheer size of the sorts of data sets we’re talking about.”

He adds that investors are only beginning to handle the problems of high-speed, complex big data, and that in five or 10 years’ time, things may be different, as understanding of big data evolves and more statistics graduates spill into the investment industry.

Winton relies more on its traditional data sets – such as a comprehensive record of prices, dividends and corporate actions for London-listed equities, for example – rather than news aggregators and social media for its investment analysis. But the company keeps an open mind and has a lot of people working on different kinds of data sets, such as social media. So far these have proved less successful, because it is “early days”.

Nonetheless Winton has a long experience in handling big data. Since its inception, the $30 billion hedge fund has built up data sets in daily Chicago futures prices from 1877 and everything from Babylonian wheat prices to historical weather patterns. These are “big enough to be classified as big data”, says founder and chief executive David Harding. “I challenge you to find anyone who’s got a bigger big data!”

Hand says Winton has been dealing with big data in some sense since its outset: “As our capability has grown, so the size of the data set grows with that and the challenges grow as well. If the phrase had existed 10–15 years ago, then we would have been working in that space.”

Winton’s collection of stock market data may be defined as big data as it is quite complex, and Winton’s data sets can be combined. “We have a large data group, whose job it is to look at this data and sort out the quality issues. We stress expertise in coping with these sorts of challenges,” Hand says.

Might new investors stepping into big data be encountering problems that Winton has already overcome? Hand says he is absolutely certain of that.

When asked if the real-time flood of information from social media such as Twitter and Facebook would be useful as a data source for investors, he says: “My personal view is not yet. It is premature. People are looking at this, but when I look at the studies I usually find they don’t hold up.”

There are plenty of news aggregation companies in the market. Some produce indexes of events – such as corporate takeovers, strikes or terrorist attacks – from global newsfeeds. Front-running news aggregators attempt to collect market information before newswires. Do these prove useful to investors? “This is something which is in a very early stage,” says Hand. “So whether it’s useful now or not – I don’t think it is – but I do think it’s something which one would want to keep one’s eye on.”

Hand says often such data is not clean and prey to selection biases. He sees huge problems in establishing causality rather than simply correlation; much analysis based on real-time newsfeeds may be missing a hidden factor.

Winton is not a company that uses front-running news aggregators, Harding confirms. “We don’t pay to get early access to newsfeeds,” he says. “We don’t send our analysts around to meet the management of companies so as to try to get the jump on other people. We are looking at it in a more nuanced, scientific way and less market [oriented] short-term way. We’re quite long term.”

The key thing about big data analytics – including those investors combing through social media and paying news aggregators to find alpha – is that it is “early days”, Hand says. “We are learning more about these things so it’s an exciting space to be in. Gradually understanding emerges.”

Cutting through the hype

David Hand is also senior research investigator and emeritus professor of mathematics at Imperial College, London. He has been studying big data in one form or other – even if it was not termed that – for decades.

“I’ve got examples of data sets from 20 years ago, which were very, very large. It really isn’t a new thing,” he says, “but the public interest in it is new because someone came up with this wonderful buzzphrase ‘big data’.”

Before being termed ‘big data’, the phenomenon of sweeping up colossal mounds of information had been known as ‘business analytics’ or ‘business intelligence’. It is really a rebranding of data mining, Hand thinks.

It was in 2011 when media interest in big data took off. Google Trends shows a pick-up in searches for the phrase after that year. To some extent it is “media hype”, Hand says.

“I jumped on the big data bandwagon in about 2011,” Harding admits, “I have to say everyone else in the world did at the same time… Big data was the word on everyone’s lips [at Davos this year] – central bankers, politicians, Angela Merkel.” The chief executive sees this as ironic, as he has been studying data for the last 30-odd years, first at the commodity trading adviser AHL, now part of Man Group, and then at Winton.

What does Winton make of renewed interest among investors and the media in machine learning? Point72 Asset Management has hired about 30 people to take advantage of public data using “machine-learning” systems, according to reports. Bridgewater was reported to have started an artificial-intelligence unit, headed by former IBM employee David Ferrucci.

“I find words like ‘machine learning’ a bit gimmicky,” says Harding. “The idea of machine learning, I think, is that you feed some information into a computer and it does something rather magical with it, and out of it, it produces some emergent intelligence.

“It’s certainly not what we do, and it’s not what we’ve ever done. The human learning and the machine learning proceed in tandem with each another. We use computers as a tool to do research.”

“From my point of view, machine learning is a branch of statistics,” says Hand. “The two disciplines have such a big intersection that they are really much the same in many ways. We will use those sorts of tools, as well as more classical statistical tools.”

Big problems

Big data can be defined in many ways: in terms of storage capacity – some say ‘big’ means petabytes of data – or the number of variables, high complexity or high frequency.

Gaining insight from such data is problematic because the sheer volume makes it hard to check for data quality and mismatches. Are data collectors measuring the right thing? Measures of a variable may be more precise, but having a great estimate of the wrong thing may not help.

Selection bias bedevils big data, Hand thinks. The data selected – or available to be measured – may not give a good measure of the total pool of data: unselected data also exists. Data collectors do not define a variable and set out to measure that variable; they tend to find data sources already available.

“The numbers don’t speak for themselves. The numbers can often lie,” Hand says. “You can’t just rely on the big data because of all these potential distortions and data-quality issues. You need people who understand them and you need to do research on how to overcome them.”

He gives the example of Google Flu Trends, which attempted to predict flu trends based on Google searches before centres of disease control did and just as accurately. However it emerged that the search engine tended to produce over-inflated predictions of flu outbreaks.

In 2010, Google experimented with launching a price index based on web shopping data, one that would rival the Consumer Price Index. There has been little news since about the status of the price index.

Hand says the problem is selection bias. “With Google it is impossible to tell who is making researches, why they are making researches and so on. If you can find ways to overcome that, you might be able to adjust for that,” he says.

In 10 years’ time, users of big data may be able to find ways to combat these problems, he says. He sees grounds for optimism: “There is a growing awareness of the data-quality issues, especially the selection bias issues.”

For all the hype around big data, he thinks there is still a shortfall of statistics graduates able to understand big data and grapple with these problems. But a recent rise in such graduates may prove fruitful.

Harding is similarly upbeat. “We collect a lot of data – which of it is useful is the question. If you are researching data and you find out you can’t use it to predict markets, that doesn’t mean it can’t be used. That just means that you have failed to find a way of using it. The researcher’s job is never done. We never prove anything definitively at all.”

  • LinkedIn  
  • Save this article
  • Print this page  

Only users who have a paid subscription or are part of a corporate subscription are able to print or copy content.

To access these options, along with all other subscription benefits, please contact [email protected] or view our subscription options here: http://subscriptions.risk.net/subscribe

You are currently unable to copy this content. Please contact [email protected] to find out more.

You need to sign in to use this feature. If you don’t have a Risk.net account, please register for a trial.

Sign in
You are currently on corporate access.

To use this feature you will need an individual account. If you have one already please sign in.

Sign in.

Alternatively you can request an individual account here: