Grid computing - the networking of many computers or processors into a single virtual machine for the parallel processing of tasks - is now a well-established technology within banks. But financial institutions are behind the curve in how far they are adopting this new technology. While other industries are using grid for a variety of tasks - from research and development to customer relationship management - the silo nature of banks' operations and the vast quantity of real-time market data with which they deal has hampered its take up.
The benefits of grid are certainly encouraging financial institutions to take a more enterprise-wide view of their computer resources and operations. However, the volume of data requires a new layer of technology that can manage and distribute information in the same way that grid distributes processing. "Financial services is the most siloed industry I have come across," says Songnian Zhou, chairman and chief executive of Ontario-based grid technology specialist company Platform Computing, whose customers include leading companies in electronics, manufacturing, life sciences, oil and gas, and banks.
More than any other industry, financial services firms have tended to create separate organisational and IT infrastructures around individual business lines. Furthermore, because of the competitive advantage that advanced IT can bring, "banks have tended to keep their IT information close to their chests", says Zhou.
The result is that, while electronics companies such as Sunnyvale, California-based Advanced Micro Devices have grids with 30,000 or more processors that are used for a wide range of research and development, analytical and business applications, few banks have reached the 10,000 processor mark with their grids, and their use tends to be focused on a narrow range of applications - predominantly pricing and risk management.
However, all that is beginning to change. Banks are now starting to tread the same path that electronics and other industries have mapped out in their implementation of grid, and a true enterprise approach to grid is on the horizon for a number of major financial institutions in Europe and the US.
"Wall Street is unlike the scientific community in that the data it uses is very dynamic," says Michael Di Stefano, vice-president for architecture at Oregon-based data management specialist Gemstone. "In order to do anything in financial services, whether it is a risk calculation or a portfolio valuation, you have to pull in data from various sources, such as trades, market data and customer information - most of which move very quickly. By contrast, if the scientific community is using grid to solve some protein folding or geographic survey problem, their data is largely static - it doesn't really change." And because this data is static, it can be comparatively easily distributed to the nodes on the grid where it will be processed. Financial institutions don't have this luxury.
As banks began to improve their grids, they found that while they could garner computing power, either through dedicated hardware or by harvesting idle processor time on desktop computers, they could not match this increased capacity in the sphere of data management. This leads to two problems: data starvation, where processors on the grid cannot get data fast enough to keep them busy; and data saturation, where processors cannot get rid of the results of their calculations quickly enough to move on to the next task.
Traditionally, data sits in databases that reside on discs. On the plus side, this means the data is relatively easy to manage since it is all in one place and the discs hold the data permanently, even when the power is cut off. The disadvantage is that it is comparatively slow to read and write from disc-based sources and the data has to be transmitted over network connections - an often extensive and time-consuming process because of the constraints of bandwidth. Such delays can be problematic for banks, particularly when pricing time-sensitive complex trades spanning multiple asset classes in volatile markets.
Conventional high-performance applications overcome these data latency problems by holding the relevant data in cache. In other words, they copy information from the database on to the cache memory - a chip that sits next to the processor chip. This makes the access to data much quicker and data is written back to the discs only periodically to keep the database updated.
However, applying this solution to grid computing presents a number of challenges. For instance, how do you distribute the data to the memory of the grid of processors that will do the work? How do you synchronise data updates, and how do you prevent the network from being clogged with the movement of data? In short, how do you create a data grid to match the computing grid?
"Grid is a difficult beast to feed from a single machine, which is what a database classically is," says Lewis Foti, manager of high-performance computing at Royal Bank of Scotland (RBS) in London. "What we had was a mismatch in terms of the performance of the grid, which is massively parallel with a large amount of computing power, and the network trying to connect it back to the data sources, and the data sources themselves, which had not scaled in the same way as the grid computing."
RBS found that attempting to scale up the database did not work as well as moving the data out of the database altogether and holding it in a data grid, a memory-based distributed data storage system that can be scaled up in a similar way to the computing grid. This represents a dramatic change in the role of the database: no longer is it a live information exchange that is continuously accessed and updated. Instead, the disc-based database functions as a static record that offers a useful back-up in the event of a systems failure.
A data grid instead virtualises the database in the same way as a computing grid virtualises the hardware. So, the database no longer has a permanent physical presence except for the back-up version maintained on the disc. The development of data grids is critical to the success of grid in banks, as it brings data management and storage into line with the dynamic nature of financial services information.
Timely movement of data was a key issue for Bank of America when it was developing applications for its grid. The bank's global grid comprises 8,500 processors, and includes servers, desktop machines and contingency hardware located across a number of business lines across the world. "We have to ensure the data gets to the calculation engines quickly and that the results are able to be written back to the results database without bottlenecks," says Paul Grumann, head of global markets grid development at Bank of America in New York.
Bank of America decided to develop its own data caching tools to solve the problem. However, a number of third-party products have emerged in recent years that perform these tasks. JP Morgan and Bear Stearns are among the firms that use Oregon-based Gemstone's products. Merrill Lynch and financial services firm AIG use data grid technology from New York-based GigaSpaces, while BNP Paribas and Wachovia use similar technology from Massachusetts-based Tangosol.
A concern when moving data from the security of disc-based databases to the ephemeral medium of memory - where the silicon is wiped clean when the power is switched off - is how to ensure resilience and disaster recovery. Gemstone's Di Stefano says the solution is the same as with grid processing: "It's a numbers game."
In other words, if a bank loses a segment of its primary site to a local systems failure, it would still have the rest of the site, as well as its back-up site, holding the data. If the entire primary site was affected by, say, flood or terrorist attack, the back-up site might still be available. If not - for instance, if a New York bank had its back-up in New Jersey and the disaster affected the entire region - then a geographically distributed data and computing grid would save the day.
"It's never going to be completely fault tolerant and nor are we going to replace disc as the long-term system of record. But as you increase the number [of copies of the data held in distributed memory] and start spanning geographies, you get very close to fault tolerance," explains Di Stefano.
Banks are already beginning to view their grids as secure enough to no longer write certain data back to disc. "We are having a rethink on some of our data and when and how we keep it," says RBS' Foti. "With the view that we don't expect the grid to fail - the hardware is resilient and redundant so the risk is very small - maybe we will no longer write back to a database some data that only has a short life in the company."
Banks are also beginning to extend their grid beyond the physical boundaries of their in-house hardware. The monitoring software of Bank of America's in-house global grid confirmed that the bank's peak workload was in the six hours following the end of the US trading day. To avoid buying further hardware to cater for this limited period when demand increased, the bank made arrangements with El Segundo, California-based technology services vendor Computer Sciences Corporation to rent processing power online from one of its grid computing utilities for the peak demand hours. "This avoids the need for the bank to buy hardware to satisfy its peak utilisation and achieves significant cost savings," explains Grumann.
Other technology vendors such as Palo Alto, California-based Hewlett-Packard, Armonk, New York-based IBM and Palo Alto-based Sun Microsystems offer similar utility processing power for rent. For example, BNP Paribas has an arrangement with IBM to supplement the grid for its multi-asset derivatives business, which already has 5,200 processing units, with a further 2,000 on-demand processors.
However, these so-called utility grids do not yet have the flexibility of internal grids. Generally, the processing resources must be specified and booked in advance - although arrangements can be made to use extra power if the vendor has the capacity. "We could ramp up intra-day if we needed to and there is hardware available," says Grumann. "As on-demand becomes more widely used, we expect better hardware availability and pay-for-what-you-use billing."
Another factor driving the use of grid in banks is that many of the major third-party software vendors, and in particular vendors of derivatives trading and risk management systems such as Toronto-based Algorithmics, San Francisco-based Calypso and Paris-based Murex and Sophis, are grid enabling their applications. In other words, they are rewriting them so they can be implemented directly on to grids without requiring in-house customisation or an intermediate layer of interpretive software.
But this is all still relatively new territory and one of the challenges banks face is finding the skills to create enterprise computing and data grids. "It requires a different way of developing software and it is an area where there is a lack of experience and lack of proven practitioners," says RBS' Foti.
Nonetheless, as financial institutions gain experience with grid, they are beginning to expand its use across the business. Bank of America says grid has enabled it to allocate computer power where and when it is needed. "And the resulting cost savings are estimated to be in the millions of dollars on an annual basis," says Grumann.
Lehman Brothers, meanwhile, had one grid for systems development and a separate one for live running of applications, particularly for overnight batch runs. Using its integrated grid layer, it has been able to merge the grids so that overnight production is able to use the resources of the development team, while the developers can access the batch run resources when they are idle during the day, says Thanos Mitsolides, a senior vice-president in the fixed-income group at Lehman Brothers in New York.
Lehman Brothers' grid scheduling software has enabled it to run side-by-side memory-hungry applications that require a gigabyte of memory per processor with those that require, say, only 100 megabytes. "Because the grid scheduling criteria includes memory availability, we are able to merge these applications," says Mitsolides.
Other advantages of the scheduling approach include the sophisticated monitoring and reporting tools that enable the firm to more accurately charge its business units for resource usage, and improved resilience and disaster recovery. Mitsolides says the firm's disaster recovery for its grid resources is now near perfect: "We had a single point of failure; now we don't."
Most importantly, grids give firms more flexibility to respond to market events. "When the credit crunch came at the end of 2005, we were able to move a number of quality assurance and contingency servers into our grid to facilitate additional calculations for the credit business," says Grumann at Bank of America. "That kind of flexibility shows the real business benefit of this kind of technology."
Grid's four phases
Grid computing implementation can be seen as a four-phase process. Phase one is the decoupling of individual applications from specific machines and replacing a single server with a small grid of processors, often called a cluster. This generally proves more cost-effective, since the grid can be built with low-cost processors that can be added to incrementally (as opposed to the big cost and capacity leap of buying another conventional machine) to scale up performance.
Phase two is to start sharing the grid resources with more than one application. This is where grid starts to run into resistance - where users such as traders are hesitant to share their computing resources with others such as risk management. In most cases, pilot projects have been able to demonstrate to user groups that they will not suffer a degradation of service and, in many cases, will see improvements. The availability of resources is guaranteed in service level agreements that grid scheduling software from companies such as Platform and New York-based DataSynapse ensures.
Phase three is to begin sharing grid computing resources across business lines. This is where grid runs into the silo organisational and IT infrastructure barriers that are characteristic of banks. However, the logic and business benefits are compelling: grid is effective at running multiple applications within one business unit, so it makes sense to extend it, argues Songnian Zhou, chairman and chief executive of Ontario-based grid technology specialist Platform Computing. Banks such as JP Morgan and Lehman Brothers, which use Platform's grid technology, and Bank of America and Royal Bank of Scotland, which use DataSynapse's technology, have achieved this phase.
Phase four is where the grid becomes the enterprise computing platform of choice for all but the most specialised applications. So, standard business applications such as customer relationship management and human resources go on to the grid. Many electronics and pharmaceutical companies have reached this stage, although almost no banks are at this level yet, says Zhou.
One application that is likely to stay off the grid for the foreseeable future is algorithmic trading, where minimising latency in data analysis and trade execution is critical to achieve a trading strategy and involves optimising every point in the technological process.