Marcelo Cruz, Editor-in-Chief, The Journal of Operational Risk
Gaurav Kapoor, Chief Operating Officer, MetricStream
Theresa O’Rourke, Managing Director, Operational Risk Management, Citi
What makes a good key risk indicator (KRI)?
Marcelo Cruz, The Journal of Operational Risk: According to Basel, there are four mandatory inputs for operational risk measurement: internal loss data; external loss data; scenario analysis; and business environment and internal control factors (BEICFs). KRIs fall into this fourth category. A lot has been done in terms of including internal and external data and scenario analysis management in the measurement framework, but not much has been done around the KRIs. I believe this is a big gap in operational risk on both the management and risk management sides.
KRIs are metrics that measures how good your control environment is and how stressed it can be. For example, if you work in a heavy processing control environment, the volume of trades or the volume of credit card processing each day should be an important indicator of the quality of your operation, or how many fails you have in processing trades, how many people work in a certain department or how many amendments operation officers need to make in trades to make them OK to settle. These indicators – whether you call them KRIs, key control indicators or key performance indicators – assess how your control environment is at a certain point in time, and how this is linked to your losses and your operational risk overall.
So, good KRIs are those that reflect how your control environment is at a certain moment, whether it is stressed or not.
Theresa O’Rourke, Citi: I find the term KRI too constraining, so I tend not to use the term anymore. In terms of indicators, we need to be looking at risk indicators, control indicators, performance indicators, business environment indicators – the whole gamut. My philosophy is that you need the right tools for the right purpose. Marcelo gave the example of indicators that reflect the control environment, and those are very important indicators for that purpose. But you also want to be looking at other indicators to answer different questions. If I’m a line supervisor, I want very detailed, specific metrics about what is happening in my processing shop. Whereas, if I’m at the board level, I may just want a handful of very high-level indicators.
Gaurav Kapoor, MetricStream: Companies that are doing a good job of managing KRIs have some themes in common. One is that KRIs need to be predictive, not reactive; in addition to telling a story about the past, they should also be able to tell a story about the future of the business. Second, they need to be quantifiable. They need to reflect what is happening in the organisation beyond just numbers. And they also need to be very easy to understand because they may have implications at the board level or senior management level, as well as at the operating levels.
If you have identified employee turnover as an important KRI, you have to make sure it talks about the future as well – not just what the turnover was, but what the impact of the turnover could be on future business, revenue growth, competitive matters, and so on. It needs to be quantifiable, therefore you have to consider completeness and accuracy – not just including full-time employees, but also temporary employees, contractors and sub-contractors to give you the whole picture. And then the representation of this data needs to be easy to understand so it can be easily absorbed by different parts of the organisation.
What should the process be for picking KRIs?
O’Rourke: The person managing the business is going to look at a different set of indicators from what an independent risk function might. From an independent risk function, we do not look at very specific, detailed metrics – we couldn’t, given the vast size of some of the businesses. We look at indicators that trigger us to look deeper and see if something is going wrong. Think of when you go to a doctor – they don’t order the whole round of tests straight away, they look at your blood pressure, your temperature, which are a couple of indicators that signal whether a deeper investigation needs to be carried out.
If you are picking indicators to measure somebody’s performance and decide their compensation, I would suspect the business head might look at revenue generation, but the risk and control function would want an indicator of whether they are taking on too much risk compared to that revenue. So again, depending on what you are using it for, you will have different audiences selecting the indicators and using them for different purposes.
From a risk management perspective, I tend not to use the word ‘predictive’, but instead talk about correlations.
Cruz: Collecting those indicators in a systematic way takes considerable work and involves a lot of investigation. Systems across any firm are not going to be perfect, so collecting this data and investigating it requires interaction between risk teams and the team in the business unit. The teams need to create an automated feed between the local systems and the risk central system for the amount of data analysis and data mining that needs to be done, and you need some resources in your budget to enable you to run this process adequately. So, it’s not the number of indicators themselves but, when you put those numbers into the analysis, they need to start meaning something and tell you a story, and that’s the best value you will get from a good KRI.
In the US, we have the Comprehensive Capital Analysis Review (CCAR), which requires banks to try and establish some correlation between some macroeconomic factors – for example, inflation or the Dow Jones index – and operational risk losses. It would be very hard to find the Dow Jones correlated directly to losses, but there are much better chances for an analyst to find a correlation between the Dow Jones and your trading because, when the Dow Jones gets more volatile, the volume usually picks up, and a higher volume of trading in your firm involves higher operational risk.
It must be clear that we are not trying to predict loss events. I don’t have a crystal ball and managers should not have a crystal ball. We are trying to find if these factors (internal and external) are correlated to operational losses – for example, banks track the number of trades that have been cancelled or amended and a spike in the number of cancelled trades might eventually result in a larger number of losses. So, in many cases, if those cancel/corrects go above a certain threshold, we know we are going to have larger than average losses in the next few days because we may have historical cases where 10% of our transactions were cancelled due to some error in programme trading, and we knew our risk of losing money would be a lot higher in the following few weeks. The point here is: if you have the historic database of losses and KRIs and are able to articulate a story of cause and consequence of movements of these, the participation of operational risk into management decisions can become much more active.
O’Rourke: And that’s one of the reasons why, in operational risk, we like control indicators because they often have many underlying root causes – so when they hit a threshold it’s a trigger to undertake a more detailed assessment. Again, we don’t only look at risk indicators, control indicators are ripe with a lot of interesting information that helps us as risk managers.
Cruz: We had some exception rules hard-coded into the system that say: if one day any of these indicators are, for example, 20% higher than the previous day, or 100% higher, or whatever the threshold is, it calls for immediate attention. It sends an email to all of us indicating the need to investigate. Banking trading processing platforms are like a factory banks have structured their factories to process a certain average number of transactions per day. But, unlike other industries, we might have four or five times the average volume from one day to another. And we have to process these transactions, we cannot afford to just say we are not going to process transactions today. These stresses on the system can cause some errors and some of these are pretty big. Highlighting this allows much more dynamic and active operational risk management. We know exactly where all the stresses are across the firm if we have this data coming to us automatically each day. We will know where the problems are, the potential problems and the risks.
Now, just because you have 10% of transactions cancelled or amended one day, it does not necessarily mean you are going to have a huge loss the next day, but the risks increase tremendously. And, if you have history to back it up, and we do, you can say the risk has increased by a certain percentage. That kind of monitoring is changing the game in operational risk because it allows operational risk managers to be proactive and anticipate situations.
Your list of KRIs and the ones you regard as top level cannot be static, there needs to be some process of reappraisal. How should that process work?
O’Rourke: It depends for what purpose we are using the indicators. If we are giving materials to the board, we are not going to change those indicators very frequently. You will want a lot of consistency in the story you give to senior management. But, if you do a lot of risk monitoring, there is constant change as you dig into these correlations, identify new ones or shut down others. At an operations level, there is some consistency as you will want to look at indicators such as cancel/corrects, fails and reconciliations.
Kapoor: What is relevant today may not be relevant even a week later and, therefore, indicators need to be changed as you go along by stakeholders across the organisation.
Cruz: We need to highlight the main indicators that report what is going on in a particular moment. This story can change from one week to the next from an initial public offering, commercial banking or the network not working well.
Making sure the data is good is important as well. In practice, the systems in the firm can be totally fragmented. If you want to collect the volume of transactions in equities, in any big firm like Goldman Sachs, Morgan Stanley or Citi, you probably need to go to 10 or 15 systems to collect it.
O’Rourke: The KRIs are data elements in the greater framework of operational risk management within a firm. You need to use management judgement, you need to take other information and assessment and add it all together to come up with a story, especially because we face challenges of data quality.
Cruz: The owners of the data – the business – can come back and say that, for example, the data on failed transactions is wrong because it has a system issue, so all your analysis is invalid. Sometimes they have a point, sometimes they don’t, but people can try to defend their actions in a way by discrediting the data you collected. Making sure the data quality is good can become a job in itself.
Kapoor: In a lot of cases, people view the same data but interpret it differently. There are political issues, turf issues, compensation issues and incentive issues that can hamper the proper definition of a risk indicator and of risks, and then later hamper the impact of those risks.
We are working with companies in which the audit group, for example, will view the information differently from the operational risk group, the business group and the legal group. Different organisations or groups each use the data for their own objectives.
What kind of process do you go through in terms of capping the number of KRIs you use to avoid being drowned in the data?
Cruz: We try to understand the process. For example, to understand the process of settling an equity transaction, we break it down into trade capture, trade processing, trade settlement and securities delivery. We have indicators that represent the quality of every piece of this process, so it is like a risk control self-assessment that is done objectively and on a daily basis. And, of course, this is not cheap or easy. If you start drilling down, it becomes really cumbersome, so you need to justify how important this is going to be for you.
The process of capturing the KRIs can be a quite long one. We select the bits that could be important, extract this data in Excel so we can do some initial analysis, and see if there is any story to be told, any correlation. Only after that will we go into making a more serious pledge to getting the data into our system.
It’s like when you are piloting an aeroplane – you have more than 2,000 gauges in front of you, but you only really look at three or four. We can see the same thing from regression models as well – even if we have 300 factors, there are usually going to be three or four that explain 80% of the variance of your losses. You don’t need 3,000 KRIs, you don’t need 1,000 KRIs – you probably need 20 or 30 and those are the most important, those are the ones you are going to monitor.
O’Rourke: Yes, and I’d also say you have to make it a win-win with the business. We talked earlier about the struggle of trying to make sure we get the right data from a risk perspective and ensuring we are not encouraging people to game the system, so we’re linking all of the different elements of the operational risk framework together.
We use indicators as a key part of risk control self-assessment. We’ve also integrated it as part of our capital process – as indicators are going off the charts, it triggers a formal evaluation by the risk office. And, if we feel that more work should have been done before we stepped in, we actually allocate additional capital to the business, and that incentivises each group to improve their metrics and their story. Rather than us building that story from scratch, we are more of an advisory presence that is asking: ‘how can we help you reduce your losses over time by improving these kinds of metrics?’.
Kapoor: There is no defined number of KRIs you need to track for a business unit to function. It could be one or it could be 100. Of course, a lot of it depends on how you define these KRIs. You will find companies have defined them very well upfront and keep changing them as time goes on. Most have also done a good job of figuring out the right correlations so, even if they are tracking 100 indicators, they can still give you a historic, as well as a predictive, story at the end of the day. And then there are companies that are tracking five indicators and cannot make sense of the data.
I’m working with one bank that pulls in KRIs at seven levels of risk, and then there’s another bank of a similar size that is doing it at only one level. The first bank has a more systematic approach so, even with several streams of data, it will do a good job. The second bank is making some subjective evaluations on data because it doesn’t have all the data coming in. But they are both doing a decent job of managing their risk. A lot of it depends on how the company is structured and how it is thinking about capturing and correlating, and then providing the right story to stakeholders at the end of the day.
We started this discussion by asking about the properties of a good KRI. Turning that around, what is the wrong way to do it? How do you misuse a KRI? What would be an unhelpful or misleading KRI?
O’Rourke: Not using KRIs or indicators is probably the first and biggest hurdle. Back to the adage ‘you can’t manage what you can’t measure’, we still find lots of people who only do it based on experience levels and not the quantitative data. So the first hurdle is ‘are you using anything?’ and the second hurdle is ‘are you using only what you can collect or are you using the right ones?’. A lot of people merely look at what they can collect, which is a very different story.
Another big issue that we find is lack of accountability. I have been in lots of organisations throughout my career and seen a lot of good indicators that nobody looks at. No one feels accountable from a first line of defence or a second line of defence. You can have great indicators but, if you are not actually looking at them and feeling accountable for them, it’s also a failure.
And another point is the unintended consequences. For example, if we have trade booking errors, we might track to see if losses increase as volumes increase. But, if you just measure entry mistakes, then your traders will wait until they have time to enter the trades and put them in late. So, if you’re not also measuring late trade entry, you have another issue.
Cruz: If you are going to automate the collection, you need to understand that automation by itself doesn’t guarantee quality. You need to have some degree of checking in the system. And sometimes indicators correlate well to some losses but then lose this correlation. That can happen anywhere, in the macroeconomic world as well – sometimes interest rates explain inflation and sometimes they don’t.
There are no wrong KRIs, but you need to understand that the importance or the weight of these indicators will change over time. Maybe volume is a very important indicator today because every time volume spikes you have big losses but, if I upgrade the system or hire more people in operations or trading, I might resolve the issue and the correlation goes away.
You also need to understand the differences between products and work out the useful thresholds. For some products like structured notes, 30% of transactions not settling or not having confirmation doesn’t mean that something is wrong because they are very complex. But, for others, having 30% unconfirmed transactions is pretty bad because it is a very straightforward product – it’s traded on an exchange, for example. And, just because you have a 30% fail rate doesn’t mean your organisation is going to have losses. Understanding your products well is very important and most operational risk managers have trouble with this at first.
Kapoor: Often a company that is still in the early stages will treat KRIs as stand-alone, but you can’t treat them in isolation. Just calculating the operational loss balance doesn’t mean anything unless it is tied to the right performance indicators on one side and the right controls on the other side.
Also, we often find that incentivisation and accountability related to these risk indicators is either not strong enough or is misaligned. We are working with a very large asset management company that has done a good job in changing its incentive structure. The earlier incentive scheme was set up so that whichever group had the smaller number of risks was seen as the better-run business unit or functional group, so the incentive was to not report risks in your unit. It later changed the incentive structure to ensure more risks were highlighted in the organisation, and there was an incentive for that – and that actually increased the visibility of risks in the organisation. The last thing is it cannot just be reactive, it needs to be correlated or predictive – data analysis of risk indicators has to enable businesses to understand the past impact and also predict future impact of these risks on the business.
The opinions expressed by Citi in this article are those of the speaker and do not necessarily reflect the policies or practices of the organisation.
View the published webinar proccedings in PDF format