Skip to main content

Do banks still need to validate GenAI models?

Regulators carved out GenAI models from new risk guidance. Banks shouldn’t see this as a reason to stop validating them.

Banks got Microsoft Copilot into the hands of employees at record speed. 

Too fast, if you ask some. 

Long-standing US model risk guidance, known as SR 11-7, required banks to validate quantitative models used in decision making before deployment. 

As Risk.net reported in April, most banks skipped this step with Copilot and other generative AI applications by classifying them as productivity tools rather than models. This allowed them to quickly deploy the latest AI tools without having to confront thorny questions about how they actually work. 

Most in the industry are convinced this was the right thing to do. GenAI is a transformative technology, they argue, and falling behind the adoption curve is simply not an option. Model validation was seen as a roadblock to progress. 

That argument is not without merit. SR 11-7 required banks to validate models by evaluating their conceptual soundness and running backtests to see how they perform in different scenarios. That’s a reasonable ask for the traditional quantitative models banks use to calculate everything from value-at-risk to default probabilities. But it’s a tall order for GenAI models, which never seem to give the same answer twice. 

An AI head at a large global bank scoffs at the idea that banks would even attempt to validate Copilot, which is powered by ChatGPT, OpenAI’s large language model.

“I don’t think Copilot should be validated,” this person says. “You cannot validate a foundation model. If other banks want to try to do that, they can. We don’t.”

US banking regulators have given banks a pass on GenAI validation – for now at least. On April 17, they replaced SR 11-7 with new model risk guidance, known as SR 26-2, and excluded GenAI models from its scope. 

But not everyone in the industry agrees with the proposition that GenAI models are beyond validation. As Risk.net reported in April, at least two large US banks – Bank of America and Goldman Sachs – were subjecting GenAI models, including AI assistants such as Copilot, to validation by default before SR 11-7 was withdrawn by regulators. 

Others feel the same way. In a paper published in the Journal of Operational Risk on March 27, Krishan Kumar Sharma, a model risk leader and senior vice-president at Citi, proposed a six-pillar model risk governance framework for GenAI, which he believes could serve as a foundation for new supervisory guidance tailored specifically for generative models. 

One of Sharma’s recommendations is that “the GenAI system itself and the specific applications built upon it should be formally classified as a model within the bank’s model risk management framework” and undergo “formal model validation process before deployment”. 

This recommendation “is based on real-time testing”, Sharma writes in the paper, though the results of the testing were not published “for reasons of confidentiality”.   

Banks may want to think twice about using SR 26-2 as a reason to abandon model risk management for GenAI

Validating GenAI models is not easy, Sharma concedes, but he argues this step is necessary to ensure the accuracy and robustness of their outputs. 

“I fully agree that foundation models themselves are difficult to validate in the traditional sense – that is not in dispute,” Sharma tells Risk.net. “But the goal is to ensure that the risks specific to GenAI, particularly hallucination and opacity, are surfaced and managed through the model risk management lifecycle rather than left outside of it.”

Sharma does not claim validation is a magic bullet for GenAI risks. To the contrary, he cautions that validation of GenAI applications cannot be as exhaustive as for deterministic models and needs to be reinforced with other controls. This includes a human-in-the-loop mandate – which Sharma calls “the single most important control for mitigating risks such as hallucination” – and adherence to a library of approved and validated prompts.

The benefit of validation, he argues, is that it supports other parts of the model risk management framework by clearly identifying the risks and weaknesses of GenAI systems.

Banks, then, may want to think twice about using SR 26-2 as a reason to abandon model risk management for GenAI. The expectation among seasoned model risk managers – including Sharma – is that regulators will in due course publish separate guidance specifically for GenAI and agentic AI use cases. This seems like the most likely outcome. Regulators often allow banks to develop their own frameworks for emerging risks and then use them as a starting point for formalising supervisory requirements. This was the case with SR 11-7 – supervisors let industry practice mature, then codified it. 

With several large banks opting to voluntarily subject GenAI models to validation, this could easily become a supervisory expectation in the near future. Banks that jettison validation to get ahead in the AI race could find themselves falling behind supervisory expectations. 

Editing by Alex Krohn

Only users who have a paid subscription or are part of a corporate subscription are able to print or copy content.

To access these options, along with all other subscription benefits, please contact info@risk.net or view our subscription options here: http://subscriptions.risk.net/subscribe

You are currently unable to copy this content. Please contact info@risk.net to find out more.

Most read articles loading...

You need to sign in to use this feature. If you don’t have a Risk.net account, please register for a trial.

Sign in
You are currently on corporate access.

To use this feature you will need an individual account. If you have one already please sign in.

Sign in.

Alternatively you can request an individual account here