Lessons from the past – Overcoming historical tick-data challenges with the cloud (Part II)

- Refinitiv
- 18 Aug 2020

The panel at a recent Risk.net webinar in association with Refinitiv outlined some of these challenges, including the expense of data cleaning, maintenance and storage; access to relevant data; and dataset integration. They agreed that cloud-based solutions have the potential to ease these pain points and are invaluable in terms of bringing efficiencies to the use of tick-history data.

Historical tick data brings clear benefits to financial institutions (FIs) – informing key business decisions, improving internal risk models and trading strategies, and feeding into regulatory compliance and reporting. Yet there are still some key challenges that FIs need to overcome to truly realise the value of this data source.

Challenges to the effective use of tick-history data

Managing and storing

During the panel discussion, it was widely accepted that cleaning and managing historical datasets can be a costly and time-consuming endeavour. David Ng, chief operating officer for CSOP Asset Management, explained: “We still expend an enormous amount of effort cleaning and sanitising the data. On the existing infrastructure that we use, it is still very much a chore to transform data into valuable insights for our investment strategies. From an asset manager’s point of view, that is non-value added time.”

From a service provider’s perspective, Catalina Vazquez, historical data proposition director at Refinitiv, noted that customers regularly tell her the extract, transform, load (ETL) challenge is “one of the biggest pain points”. She cited research conducted by Refinitiv, which shows that for every $1 spent on financial market data, another $8 is spent on processing, storing and transforming that data, before the analysis can even begin.

Vinay Srinivas, head of Asia-Pacific global markets quant analytics and digital transformation for UBS, supported these concerns and added: “Given the way things are going, with margins collapsing in various markets, it’s going to be very unprofitable for each of the firms to be doing that on their own.”

While a lot of time and effort is currently spent on managing the underlying infrastructure, storage and compute power, the amount of data companies have to deal with is only going to increase, and the scale of the problem will just keep powering up, cautioned Andy Chow, Google Hong Kong’s customer engineering manager.

Accessing relevant data

In a polling question, just over half (51%) of the webinar audience felt that access to relevant data was the biggest challenge for the effective use of historical tick data. Ng explained that firms now have access to a huge amount of data that was not previously available, and in “all the formats they can possibly think of”. APIs are now pretty much the industry standard, allowing much greater data access not just from the cloud, but from other service providers as well. However, he feels that being able to search for relevant datasets in an easy manner is a key challenge – and has not seen many good cases of functionality in the market.

Integrating different datasets

Another challenge stressed by Srinivas was dataset integration. He outlined the important issues around combining referential and proprietary data. Banks need to combine historical tick data with their client information and their positioning information. “How do you combine them if you have tick history sitting in the cloud, but all your proprietary data sitting on-premise or on a different cloud? How do you make sure there is no data or analytics leakage?” he asked.

He also warned that “we need to be careful we don’t end up creating copies of the data” when joining vendor and proprietary databases. Firms need to consider how to integrate with external data sources when validating their models and investment strategies, and described avoiding making copies of the data as “very critical”.

How cloud solutions can help

Cloud-based service providers are increasingly stepping in with solutions to help FIs overcome these challenges.

Vazquez outlined the importance of the work undertaken to produce robust and reliable tick-history data. “There is a significant layer of data aggregation, standardisation and normalisation that needs to happen. Having consistency across multiple venues and asset classes is key,” she said. On top of that, having a consistent data model that provides both real-time and historical data together requires a whole lot of work to succeed, which should not be underestimated, she added.

Vazquez agreed with the concerns raised by Ng and Srinivas around the financial services industry storing multiple copies of the same historical data on-premise or on their own cloud. “Things are changing very rapidly – we now have technology that provides access to a dataset like tick history in a multi-tenanted type of approach. Instead of storing multiple copies of the same dataset, we are moving more towards a facility model for data.”

She also asserted that cloud providers have come a long way in terms of aiding the commingling of datasets at scale, presenting data such as corporate actions and reference data alongside tick history in one place, and reducing the complexity around integration and data cleansing.

Refinitiv’s partnership with Google

Earlier this year Refinitiv launched its Tick History dataset on the Google Cloud Platform (GCP), leveraging the machine learning capabilities of Google Cloud’s BigQuery. The partnership means customers can now benefit from seamless and fast data access and integration, providing an easy and cost-efficient experience given the reduced infrastructure spend and storage required. Customers can access, query and analyse Refinitiv’s extensive archive of pricing and trading data, working across large datasets remotely in a fraction of the time they would typically experience. Analytics can be run on top of the data in situ, without having to download and load the whole six petabytes worth of historical tick data into an analytics engine.

“We saw a great opportunity in Google’s BigQuery, as you remove that ETL challenge by presenting query-ready data to the end-user,” Vazquez said. “In terms of benefits, it is of course reducing timelines around access and reducing costs, but also leveraging some of the great analytic capabilities provided by BigQuery.”

Chow added that one of the key benefits of BigQuery on Google Cloud is the separation of computing and storage, and the ability to scale them separately. “You don’t need to worry about the dependency of compute and storage size,” he said.

“At a petabyte scale, Google helps provide managed solutions, completely serverless,” he added. Customers can leverage Google’s “global scale to be able to parallel process any query ad hoc really fast, like the Google search experience.”

Regulators moving in the right direction

The panel acknowledged that storing data remotely comes with its own set of challenges. Ng noted the regulatory holdback in terms of compliance with regard to cloud usage. “As a regulated FI, we are still required to maintain our data on our premises, despite having an offering in the cloud,” he explained. He asserted that he has only seen regulators coming on board and issuing guidelines on the usage of cloud in the past 12 months, citing his company’s engagement with regulators in Hong Kong and Singapore. Although regulators may still be playing catch-up, and the industry may still be “at the very beginning stages in terms of adoption”, Ng believes things are moving in the right direction. “I want to move my infrastructure into a hybrid model and then eventually onto a multi-tenant full cloud-based server, because it just makes sense” he said.

Srinivas also stated he was keen to use cloud to a greater extent. Direction from local regulatory bodies has been fairly forthcoming lately, he claimed, yet the route still needs to be defined much more tightly when it comes to third-party vendor clouds. There are clear advantages to cloud-based offerings, he said, and regulatory clarity is going to be very helpful.

The benefits of cloud-based solutions and the desire among the panellists to increase their usage was clear. With regulators increasingly coming on board, and with providers like Refinitiv overcoming the challenges of accessing, integrating and storing historical tick data, the sector is destined to see continued growth and much broader industry adoption.

This is the second in a series of articles exploring the themes discussed during the session. The first is available here.

Listen to the Risk.net webinar Lessons from the past – Utilising historical data and technology to assess market volatility

About Refinitiv

Refinitiv is one of the world’s largest providers of financial markets data and infrastructure, serving more than 40,000 institutions in approximately 190 countries. It provides leading data and insights, trading platforms, and open data and technology platforms that connect a thriving global financial markets community – driving performance in trading, investment, wealth management, regulatory compliance, market data management, enterprise risk and fighting financial crime.

As outlined in our terms and conditions, https://www.infopro-digital.com/terms-and-conditions/subscriptions/ (point 2.4), printing is limited to a single copy.

If you would like to purchase additional rights please email info@risk.net

You may share this content using our article tools. As outlined in our terms and conditions, https://www.infopro-digital.com/terms-and-conditions/subscriptions/ (clause 2.4), an Authorised User may only make one copy of the materials for their own personal use. You must also comply with the restrictions in clause 2.5.