AAD vs GPUs: banks turn to maths trick as chips lose appeal

A maths trick is taking on – and beating – fancy chips as banks try to boost their computing power

risk-0115-lead-illo-app

At a sombre meeting of its senior staff late last year, the foreign exchange business at one of the smaller European dealers was looking into the costs of managing its derivatives counterparty risk. Banks cover this risk by adding what is known as a credit valuation adjustment (CVA) to each trade – a hugely complex exercise, which involves working out the path every trade in a portfolio will take during its life, as well as the chances of each counterparty blowing up and the extent to which those events are inter-related.

At the time, the bank was computing its CVA numbers and their sensitivities on standard processors using traditional, numerical methods, and it was taking days to run each portfolio simulation. Bringing this down to a practical timeframe would mean acquiring 1,000 more central processing units (CPU), the business concluded, with each costing roughly €1,000 plus additional development and maintenance costs.

The budget would not support it, and the team head warned his 14 colleagues the business might have to be wound up. At that point, a senior quant in the room piped up. He suggested using a different method to calculate CVA and its sensitivities, claiming it could be done on existing hardware and would only cost the bank time. "If you can implement it, we'll keep the business," the team head responded.

The quant team threw itself into the task and, a month later, delivered what it had promised. Without needing any new CPUs, it achieved a computational speed-up of a staggering 50,000 times. Today, the desk is still in business.

This might sound like magic, but in fact, it's mathematics. The method the bank used is known as adjoint algorithmic differentiation (AAD), and a growing fanbase is just starting to explore its possibilities.

The real benefit of AAD is not just that I can do things quicker, but the fact that I can start looking at problems I haven't dared look at before

"The real benefit of AAD is not just that I can do things quicker, but the fact that I can start looking at problems I haven't dared look at before," says Jesper Andreasen, head of quantitative research at Danske Bank in Copenhagen. The bank started calculating CVA and other pricing adjustments using AAD last year.

Another new adopter, Italy's Banca IMI, predicts AAD will help solve a range of looming computational roadblocks: "Apart from traders, who always want more speed, I see new needs coming from the changing market and regulatory environment," says Andrea Bugin, the bank's Milan-based head of financial engineering. "For example, the standardised initial margin model for non-cleared derivatives, the ongoing review of trading book capital, the replication of clearing house initial margin for optimisation purposes all require sensitivity calculation on a firm-wide level. I think AAD can help address these challenges. We simply need to put in the AAD turbo and heat the engine."

If so, it could not be better timed. Facing a growing need for data, and with many calculations increasing in complexity, processing capacity is becoming a limiting factor for many derivatives businesses and a barrier to best-practice risk management.

At the same time, the oft-touted solution – investing in specialist chips known as graphics processing units (GPUs) – is falling out of favour, and is out of the reach of smaller banks. Originally used in high-end computer gaming, GPUs have around 1,000-3,000 cores apiece that can process multiple instructions in parallel, generating 10-50 times the speed-up of traditional CPUs. But critics claim they are proving to be more expensive than users expected.

"Banks have a business to run, and GPUs have unexpectedly high operating costs due to the code maintenance costs, power consumption, space and cooling requirements," says Oskar Mencer, chief executive of Maxeler Technologies. His company specialises in rival hardware – field-programmable gate arrays (FPGAs) – but the claim is corroborated by banks and other chip users.

Years after going into implementation, some of the early GPU adopters – which include BNP Paribas, JP Morgan and Societe Generale – are now regretting it. Three sources point to BNP Paribas as an example, claiming the bank's equity derivatives business made a big investment in GPUs, went into production, but recently shut the project down after two years of unsustainable maintenance costs. The bank declines to comment.

Societe Generale, meanwhile, even has GPU sceptics within the bank: "GPUs are an expensive and onerous solution for speeding up numerical code," says Lorenzo Bergomi, head of quantitative research at the bank.

The chips are, though, already in widespread use. Many major banks and insurance companies today use GPUs in some form to speed up calculation of unwieldy numbers such as CVA, to run Monte Carlo simulations for the pricing of exotic derivatives, or for Solvency II-related computations, which otherwise takes days to run. Software firm Murex moved its entire analytics library to GPUs some years ago, and Pierre Spatz, head of quantitative research for the firm in Paris, argues chips and AAD cannot be seen as rivals: "A GPU is a piece of hardware, and AAD is a mathematical method, so this is not a straightforward comparison," he says.

Others see it as a straight choice and claim AAD is not only far cheaper than GPUs, it's also more powerful.

XVA express

At Danske Bank, Andreasen and his colleague, Antoine Savine, senior quantitative adviser, recently set up a CVA engine based on AAD. They can now calculate not just the counterparty risk charges but other pricing adjustments – collectively known as the XVAs – as well as their 2,500 risk sensitivities in about an hour, using a computer grid of 200 cores. Previously, it took a fortnight, using the same number of cores.

"If a big bank ran the same exercise on a GPU base, it will probably take weeks. It is sort of unthinkable," says Andreasen. On average, they find AAD is 50 times faster than pure GPU implementations, not just for CVA calculations, but also vanilla interest rate yield curve construction, complex risk calculations, exotic options and capital calculations (see box, CPUs vs GPUs).

Andreasen is already thinking about what more the institution can do with real-time risk calculations – it could, for example, give the bank's traders a competitive edge, allowing them to quickly test multiple versions of a given trade and its net effect on a client's portfolio. Tweaking different specifications of the product, such as term, notional, strike and collateralisation, a trader could see what would bring down the XVAs and immediately offer a better price.

AAD was popularised in the industry by Credit Suisse's Luca Capriotti and the University of Oxford's Mike Giles four years ago in a co-authored Risk article (Risk April 2010). The claimed benefits are eye-catching, but the method itself is not intuitive, making it a harder sell.

Commonly, sensitivities are calculated by bumping the values of the inputs of a derivative's price and calculating the output value each time. This is repetitive and time-consuming, especially for the more complex products, because the valuation process typically has multiple steps, each waiting for values from the previous step so it can proceed, causing a drag.

In contrast, AAD works backwards from the output of a single calculation. By applying the chain rule of differentiation – which links the derivatives of parts of a function to the derivative of the whole – it allows the simultaneous, rather than sequential, calculation of a host of sensitivities.

risk-0115-gpu-usage

At the risk of making quants cringe, one very rough analogy is the building of a prototype car. First time round, it has to be done in sequence, with each part being made separately and fitted together. After it has been shown to work, new cars can be built and assembled far more rapidly by constructing the various parts at the same time. The car parts, in this analogy, are adjoints – the intermediate steps required in a pricing calculation or other simulation.

What AAD does, in slightly more technical terms, is propagate backwards the sensitivities of the output with respect to the variables in the intermediate steps, until you get the sensitivities with respect to the inputs. The intermediate sensitivities form the adjoints.

To get hold of the adjoints, though, a bank will have to add extra lines of code to its models, enabling them to churn out these values. Depending on how the code is structured, this can be relatively easy or hugely time-consuming.

One of the first to apply the technique, Credit Suisse has been using AAD for CVA risk management since 2007. The bank's other AAD-enabled applications range from its Monte Carlo engine for fixed income and equities to credit product risk management and interest rate curve construction, says Capriotti. And they can achieve speed-ups of 10-1,000 times depending on the product and the task at hand.

Other banks have joined the club. Nomura has been one of the frontrunners in adopting AAD and uses it widely across the bank's fixed-income portfolios and for risk management using CPUs. Barclays and Paris-based Natixis are also using it to boost their computational speeds and Banca IMI went live with its AAD tool for interest rates and credit sensitivities for CVA and debit valuation adjustment in November last year.

Another large European bank is also trying to implement it, according to its head of front-office quants, who adds that the bank has ruled out the option of GPUs due to their high development costs.

"With AAD, the computational saving is so great, you don't have to resort to sophisticated technologies," says Martin Baxter, global head of quantitative research at Nomura.

Getting there and staying there

But AAD is still only starting to make the sort of waves GPUs did at one point, and many say that is because it is intimidating – throwing more processing power at a problem may seem simpler and more straightforward. Even for quants, it takes some effort to understand AAD, and those who do, then have to be prepared to get their hands dirty by drilling down into the risk management library and extending the code.

Credit Suisse's Capriotti says this is not as hard as it might sound. "Old libraries can be easily retrofitted to support the adjoint calculation of sensitivities. This being mechanical in nature, it does not require the same theoretical and tuning work required in the original implementation of a model. And although it involves thinking about the chain rule of calculus in an unusual way, most of the quants on the street can understand AAD," he says (see box, AAD implementation tips).

It still takes time, of course. Both Natixis and Banca IMI took about two years to go into production, while Danske Bank spent a year on it. "As a rule of thumb, the development time is about double the time needed to write the present value-only code. But because it has immense production benefits, I think it's a price well worth paying," says Nomura's Baxter.

Price is the reason some banks are now going cold on GPUs – these wonder chips don't come cheap. A single, high-spec GPU can cost up to £8,000, and large banks will deploy anything up to 1,000 at a time, so the investment in hardware alone could be in the multiple millions.

risk-0115-processing-approaches

Even in these straitened times, many banks would be willing to spend that, but critics say ongoing maintenance adds to the bill. The chips are extremely power hungry, and take up a lot of room because they require a considerable amount of empty space around them to allow for system cooling. In addition, they also have to be coded in a language that is just a step away from machine language, whereas quants are typically used to languages like Java or C++. So a team of specialised developers is needed to set up, maintain and update the system.

"For GPUs, you need a programmer per model per product and every time you change something, you have to recode everything.

They are very labour-intensive manual processors and create a lot of job security for those who know how to work with them," says Danske Bank's Andreasen. At one of the biggest dealers, it is said to have taken three developers to code a single pricing model onto a GPU. Scaling this up to entire asset classes and portfolios poses an obvious challenge.

Those with expertise in developing GPUs, however, argue it is not as hard as all that. "Updating them is not more difficult than CPUs. What you need to think about when you have a new algorithm is whether it can be parallel or not. Once you know that, it is easy to code," says Murex's Spatz.

And many banks have opted not to spend on third-party training courses for their in-house programmers, according to John Ashley, software developer and relations manager at chip-maker Nvidia. "They use a GPU project as a reward for their smart programmers and these guys pick it up very quickly," he says.

Conversion tools such as Xcelerit, meanwhile, aim to cut out the tedious middle process of writing everything in GPU-specific languages like CUDA or OpenCL by automatically converting Xcelerit-enabled sequential code into parallel code, with a claimed performance drag of less than 5% when compared with a customised code developed within a bank.

"We have seen many banks that have chosen to go down the path of coding the GPUs in-house struggling to handle their development costs," says Hicham Lahlou, chief executive and co-founder of Xcelerit. "Traditional banks have teams of quants and they also have a separate layer of software developers taking their models and recoding them for effficiency. We cut that layer so quants can do the complete job of writing and optimising the programme – this is more efficient."

Lahlou says a majority of the world's top 20 investment banks use the conversion tool to bring down their in-house development costs, but declined to name clients.

Best of both worlds

The number of risk factors that go into the calculations contributes a lot to the speed gain achieved by the competing approaches. The speed-up of AAD is independent of the number of risk factors, unlike that of GPUs. So when you go past a certain number of risks, GPUs begin to slow down; but for lower numbers of risk factors, GPUs do better.

Typically, GPUs are 50 times faster than a single CPU, while AAD calculates all risk factors at an average cost – or extra computation time – of six, compared to the runtime of a single valuation. Crucially, though, the six-times multiplier is constant, regardless of the number of risk factors. With the traditional bump-and-recompute approach, if a bank wants to calculate 500 risk factors, it has to value the derivative 500 times, whereas for AAD, all 500 can be calculated at that average cost of six, compared to a single valuation.

Exotic products and interest rate derivatives usually have anywhere from a few hundred to thousands of risk factors – CVA alone has around 2,000 risk factors, and the figure could be even higher for big banks. "The more sensitivities you have to compute, the more efficient AAD becomes," says Marc Henrard, head of quantitative research at OpenGamma.

This is one reason some recommend a hybrid approach. "We should not go against GPUs. They should cover a path of the code that is simple to write and maintain. For example, plain real-time swap calculations for a clearing house which needs to compute stress-test scenarios on two counterparties in less than two seconds. But for exotic derivatives, what you gain in terms of speed is lost in the difficulty to write and maintain the system," says Adil Reghai, head of quantitative research at Natixis.

"For GPUs and FPGAs, the increase in speed depends on how much the application at hand exploits the parallelism of this type of hardware. I have heard reports of Monte Carlo applications sped up by two orders of magnitude and partial differential equation applications sped up by one order of magnitude," says Capriotti at Credit Suisse.

Barclays already seeks the best of both worlds by running Monte Carlo simulations on GPUs and risk management on CPUs using AAD.

The constantly changing landscape of technology also means this has diversification benefits. "What is risky in this business is to go for a solution where you spend a lot of time on one platform, but you have to redo it for another one every time the hardware changes, or there is a new CPU or GPU. This is the danger of a hardware-specific approach. We need to be able to work with today's solution and also the one that comes out tomorrow. It is not one size fits all," says Xcelerit's Lahlou.

Truce, after all?

A natural question, then, is whether it makes sense to implement AAD on GPUs. Some of the proponents of AAD dismiss the idea as impractical in the near future, citing the dangers of placing an optimised mathematical algorithm on chips that have less-sophisticated cores and are difficult to programme. Some banks, though, are said to be exploring the idea. Computational solutions company Numerical Algorithms Group (NAG) claims to be supporting a few large institutions that are setting up AAD on GPUs, but declines to name names.

"Many people have been looking into GPUs, and a growing subset is looking into adjoint methods. The holy grail is to run AAD on GPUs," says Uwe Naumann, professor at RWTH Aachen University and a technical consultant at NAG. "Some of our clients have GPU infrastructure. They will not throw away their GPUs. So you might want to work with both," he adds. The combined speed-up for such a combination could be in the thousands.

NAG's solution is such that banks can code in C or C++ and it then automatically generates most of the code for AAD. The AAD-on-GPU piece is still hand-coded and does not entirely eliminate the need for specialised GPU developers and AAD developers.

"The technical challenges of doing this is that only big players can look into it- you need personnel in quality and numbers, and that costs," says Naumann. At the end of the day, it's a question of matching available resources to requirements. Large banks may be able to stomach these costs. Many mid-sized ones are unlikely to see the benefit.

CPUs versus GPUs

It might seem surprising that central processing units (CPUs) with just 4-12 cores can outperform graphics processing units (GPUs). The reason is that a GPU's simplified architecture allows it to have a large number of cores, but each individual core falls behind on certain features such as clock speed, memory handling and compiler optimisation. So in reality, a handful of cores on a GPU are required to match the performance of a single CPU core.

A CPU's clock speed, or speed of executing instructions, is 3.8 GHz – or 3.8 billion cycles per second – whereas a GPU's is about 1 GHz. CPU performance can be further enhanced through techniques such as multi-threading and vectorisation, allowing processes to run in parallel on a single core.

So when a bank uses a well-optimised CPU grid, the benefits of massive parallel architectures that pack in thousands of cores become subjective, and debatable. "When we implemented adjoint algorithmic differentiation, multi-threaded over modern CPUs, we found the speed-up to be significant enough that we dropped GPU development, which would have us rewrite our libraries down to a low level," says Antoine Savine, senior quantitative adviser at Danske Bank.

"If you have well-designed models and numerical algorithms, and are harnessing all the multi-threading and vectorisation capabilities of current multi-core CPUs, then it is not at all clear to me that GPUs are a cost-effective alternative," says Lorenzo Bergomi, head of quantitative research at Societe Generale.

 

AAD implementation tips

Since adjoint algorithmic differentiation (AAD) is a mathematical technique rather than a brute-force method of running simultaneous calculations, it is language- and platform-independent, meaning it can, in theory, be coded in any language and on any piece of hardware. So although the initial development phase is time-consuming, users say it is easy to maintain once up and running.

"Around 90% of the time is spent on the model and original code. The development time for AAD is just the remaining 10%," says Marc Henrard, head of quantitative research at OpenGamma.

Natixis got over the initial development hurdle by applying the lessons from an AAD prototype for Monte Carlo simulations the bank developed from scratch, and then adapting that to the larger system. "A clean library is hard to find, so it helps to make a prototype and then learn from it," says Adil Reghai, head of quantitative research at Natixis. This also means the bank has to keep the original code so there is a benchmark to check AAD-based values against. "AAD is an unnatural way of calculating things, so the only way to check if you did it right is to keep a benchmark," says Reghai.

One of the things that might scare banks away from using AAD is the accompanying changes it requires in behaviour. The programme has to be functionalised, says Jesper Andreasen, head of quantitative research at Danske Bank in Copenhagen, meaning ordering it into distinct functional blocks that can be manipulated individually when needed.

"Quants tend to ignore that, because to optimise speed, they get sloppy with managing memory, so they cache a lot of stuff. The structure has to be of a good quality," says Andreasen.

Only users who have a paid subscription or are part of a corporate subscription are able to print or copy content.

To access these options, along with all other subscription benefits, please contact info@risk.net or view our subscription options here: http://subscriptions.risk.net/subscribe

You are currently unable to copy this content. Please contact info@risk.net to find out more.

You need to sign in to use this feature. If you don’t have a Risk.net account, please register for a trial.

Sign in
You are currently on corporate access.

To use this feature you will need an individual account. If you have one already please sign in.

Sign in.

Alternatively you can request an individual account here