Do banks still need to validate GenAI models?
Regulators carved out GenAI models from new risk guidance. Banks shouldn’t see this as a reason to stop validating them.
Banks got Microsoft Copilot into the hands of employees at record speed.
Too fast, if you ask some.
Long-standing US model risk guidance, known as SR 11-7, required banks to validate quantitative models used in decision making before deployment.
As Risk.net reported in April, most banks skipped this step with Copilot and other generative AI applications by classifying them as productivity tools rather than models. This allowed them to quickly deploy the latest AI tools without having to confront thorny questions about how they actually work.
Most in the industry are convinced this was the right thing to do. GenAI is a transformative technology, they argue, and falling behind the adoption curve is simply not an option. Model validation was seen as a roadblock to progress.
That argument is not without merit. SR 11-7 required banks to validate models by evaluating their conceptual soundness and running backtests to see how they perform in different scenarios. That’s a reasonable ask for the traditional quantitative models banks use to calculate everything from value-at-risk to default probabilities. But it’s a tall order for GenAI models, which never seem to give the same answer twice.
An AI head at a large global bank scoffs at the idea that banks would even attempt to validate Copilot, which is powered by ChatGPT, OpenAI’s large language model.
“I don’t think Copilot should be validated,” this person says. “You cannot validate a foundation model. If other banks want to try to do that, they can. We don’t.”
US banking regulators have given banks a pass on GenAI validation – for now at least. On April 17, they replaced SR 11-7 with new model risk guidance, known as SR 26-2, and excluded GenAI models from its scope.
But not everyone in the industry agrees with the proposition that GenAI models are beyond validation. As Risk.net reported in April, at least two large US banks – Bank of America and Goldman Sachs – were subjecting GenAI models, including AI assistants such as Copilot, to validation by default before SR 11-7 was withdrawn by regulators.
Others feel the same way. In a paper published in the Journal of Operational Risk on March 27, Krishan Kumar Sharma, a model risk leader and senior vice-president at Citi, proposed a six-pillar model risk governance framework for GenAI, which he believes could serve as a foundation for new supervisory guidance tailored specifically for generative models.
One of Sharma’s recommendations is that “the GenAI system itself and the specific applications built upon it should be formally classified as a model within the bank’s model risk management framework” and undergo “formal model validation process before deployment”.
This recommendation “is based on real-time testing”, Sharma writes in the paper, though the results of the testing were not published “for reasons of confidentiality”.
Banks may want to think twice about using SR 26-2 as a reason to abandon model risk management for GenAI
Validating GenAI models is not easy, Sharma concedes, but he argues this step is necessary to ensure the accuracy and robustness of their outputs.
“I fully agree that foundation models themselves are difficult to validate in the traditional sense – that is not in dispute,” Sharma tells Risk.net. “But the goal is to ensure that the risks specific to GenAI, particularly hallucination and opacity, are surfaced and managed through the model risk management lifecycle rather than left outside of it.”
Sharma does not claim validation is a magic bullet for GenAI risks. To the contrary, he cautions that validation of GenAI applications cannot be as exhaustive as for deterministic models and needs to be reinforced with other controls. This includes a human-in-the-loop mandate – which Sharma calls “the single most important control for mitigating risks such as hallucination” – and adherence to a library of approved and validated prompts.
The benefit of validation, he argues, is that it supports other parts of the model risk management framework by clearly identifying the risks and weaknesses of GenAI systems.
Banks, then, may want to think twice about using SR 26-2 as a reason to abandon model risk management for GenAI. The expectation among seasoned model risk managers – including Sharma – is that regulators will in due course publish separate guidance specifically for GenAI and agentic AI use cases. This seems like the most likely outcome. Regulators often allow banks to develop their own frameworks for emerging risks and then use them as a starting point for formalising supervisory requirements. This was the case with SR 11-7 – supervisors let industry practice mature, then codified it.
With several large banks opting to voluntarily subject GenAI models to validation, this could easily become a supervisory expectation in the near future. Banks that jettison validation to get ahead in the AI race could find themselves falling behind supervisory expectations.
Editing by Alex Krohn
Only users who have a paid subscription or are part of a corporate subscription are able to print or copy content.
To access these options, along with all other subscription benefits, please contact info@risk.net or view our subscription options here: http://subscriptions.risk.net/subscribe
You are currently unable to print this content. Please contact info@risk.net to find out more.
You are currently unable to copy this content. Please contact info@risk.net to find out more.
Copyright Infopro Digital Limited. All rights reserved.
As outlined in our terms and conditions, https://www.infopro-digital.com/terms-and-conditions/subscriptions/ (point 2.4), printing is limited to a single copy.
If you would like to purchase additional rights please email info@risk.net
Copyright Infopro Digital Limited. All rights reserved.
You may share this content using our article tools. As outlined in our terms and conditions, https://www.infopro-digital.com/terms-and-conditions/subscriptions/ (clause 2.4), an Authorised User may only make one copy of the materials for their own personal use. You must also comply with the restrictions in clause 2.5.
If you would like to purchase additional rights please email info@risk.net
More on Our take
Iran confusion makes the case for causal modelling
A new test model built using Claude suggests oil prices may surge back above $100
Credit market maths seems not to add up
Today’s investors would appear to be better off buying ‘riskier’ debt
Has the Iran conflict made FX untradable?
FX options volumes jump despite high costs and short-lived opportunities
Can AI be the great equaliser in e-FX?
FX market-makers see real benefits for agentic AI in code generation and data analysis
The loneliness of the model risk manager
Boards may see them as a drag on innovation; risk functions need to show they embrace efficiency
A smooth fit for complex volatility surfaces
Quant shows a new way to capture implied vol with optimisers
The ‘addictive’ way of working behind Marex’s rapid growth
Staff are encouraged to run lots of little experiments to figure out what works – and what doesn’t
Why Trump’s latest Truth should make TradFi twitchy
Wall Street is becoming the villain in US president’s crypto movie