Do banks still need to validate GenAI models?
Regulators carved out GenAI models from new risk guidance. Banks shouldn’t see this as a reason to stop validating them.
Banks got Microsoft Copilot into the hands of employees at record speed.
Too fast, if you ask some.
Long-standing US model risk guidance, known as SR 11-7, required banks to validate quantitative models used in decision making before deployment.
As Risk.net reported in April, most banks skipped this step with Copilot and other generative AI applications by classifying them as productivity tools rather than models. This allowed them to quickly deploy the latest AI tools without having to confront thorny questions about how they actually work.
Most in the industry are convinced this was the right thing to do. GenAI is a transformative technology, they argue, and falling behind the adoption curve is simply not an option. Model validation was seen as a roadblock to progress.
That argument is not without merit. SR 11-7 required banks to validate models by evaluating their conceptual soundness and running backtests to see how they perform in different scenarios. That’s a reasonable ask for the traditional quantitative models banks use to calculate everything from value-at-risk to default probabilities. But it’s a tall order for GenAI models, which never seem to give the same answer twice.
An AI head at a large global bank scoffs at the idea that banks would even attempt to validate Copilot, which is powered by ChatGPT, OpenAI’s large language model.
“I don’t think Copilot should be validated,” this person says. “You cannot validate a foundation model. If other banks want to try to do that, they can. We don’t.”
US banking regulators have given banks a pass on GenAI validation – for now at least. On April 17, they replaced SR 11-7 with new model risk guidance, known as SR 26-2, and excluded GenAI models from its scope.
But not everyone in the industry agrees with the proposition that GenAI models are beyond validation. As Risk.net reported in April, at least two large US banks – Bank of America and Goldman Sachs – were subjecting GenAI models, including AI assistants such as Copilot, to validation by default before SR 11-7 was withdrawn by regulators.
Others feel the same way. In a paper published in the Journal of Operational Risk on March 27, Krishan Kumar Sharma, a model risk leader and senior vice-president at Citi, proposed a six-pillar model risk governance framework for GenAI, which he believes could serve as a foundation for new supervisory guidance tailored specifically for generative models.
One of Sharma’s recommendations is that “the GenAI system itself and the specific applications built upon it should be formally classified as a model within the bank’s model risk management framework” and undergo “formal model validation process before deployment”.
This recommendation “is based on real-time testing”, Sharma writes in the paper, though the results of the testing were not published “for reasons of confidentiality”.
Banks may want to think twice about using SR 26-2 as a reason to abandon model risk management for GenAI
Validating GenAI models is not easy, Sharma concedes, but he argues this step is necessary to ensure the accuracy and robustness of their outputs.
“I fully agree that foundation models themselves are difficult to validate in the traditional sense – that is not in dispute,” Sharma tells Risk.net. “But the goal is to ensure that the risks specific to GenAI, particularly hallucination and opacity, are surfaced and managed through the model risk management lifecycle rather than left outside of it.”
Sharma does not claim validation is a magic bullet for GenAI risks. To the contrary, he cautions that validation of GenAI applications cannot be as exhaustive as for deterministic models and needs to be reinforced with other controls. This includes a human-in-the-loop mandate – which Sharma calls “the single most important control for mitigating risks such as hallucination” – and adherence to a library of approved and validated prompts.
The benefit of validation, he argues, is that it supports other parts of the model risk management framework by clearly identifying the risks and weaknesses of GenAI systems.
Banks, then, may want to think twice about using SR 26-2 as a reason to abandon model risk management for GenAI. The expectation among seasoned model risk managers – including Sharma – is that regulators will in due course publish separate guidance specifically for GenAI and agentic AI use cases. This seems like the most likely outcome. Regulators often allow banks to develop their own frameworks for emerging risks and then use them as a starting point for formalising supervisory requirements. This was the case with SR 11-7 – supervisors let industry practice mature, then codified it.
With several large banks opting to voluntarily subject GenAI models to validation, this could easily become a supervisory expectation in the near future. Banks that jettison validation to get ahead in the AI race could find themselves falling behind supervisory expectations.
Editing by Alex Krohn
コンテンツを印刷またはコピーできるのは、有料の購読契約を結んでいるユーザー、または法人購読契約の一員であるユーザーのみです。
これらのオプションやその他の購読特典を利用するには、info@risk.net にお問い合わせいただくか、こちらの購読オプションをご覧ください: http://subscriptions.risk.net/subscribe
現在、このコンテンツを印刷することはできません。詳しくはinfo@risk.netまでお問い合わせください。
現在、このコンテンツをコピーすることはできません。詳しくはinfo@risk.netまでお問い合わせください。
Copyright インフォプロ・デジタル・リミテッド.無断複写・転載を禁じます。
当社の利用規約、https://www.infopro-digital.com/terms-and-conditions/subscriptions/(ポイント2.4)に記載されているように、印刷は1部のみです。
追加の権利を購入したい場合は、info@risk.netまで電子メールでご連絡ください。
Copyright インフォプロ・デジタル・リミテッド.無断複写・転載を禁じます。
このコンテンツは、当社の記事ツールを使用して共有することができます。当社の利用規約、https://www.infopro-digital.com/terms-and-conditions/subscriptions/(第2.4項)に概説されているように、認定ユーザーは、個人的な使用のために資料のコピーを1部のみ作成することができます。また、2.5項の制限にも従わなければなりません。
追加権利の購入をご希望の場合は、info@risk.netまで電子メールでご連絡ください。
詳細はこちら 我々の見解
「SaaSpocalypse」は、プライベート市場にリスクモデルが必要であることを示している
投資家たちは、プライベート・クレジットにおける損失がどれほど深刻なものになるか、ほとんど見当がつかない
プライベート・クレジットの開示は、答えよりも疑問を残す結果となっている
指標の不統一や手当たり次第の報告が、米国の金融機関間の比較を妨げています
「ライトタッチ・ブリゲード」への追加料金
米国のG-Sibサーチャージの改革は、単なる見直しをはるかに超えるものです
イランをめぐる混乱は、因果モデル化の必要性を裏付けている
Claudeを用いて構築された新しい予測モデルによると、原油価格は再び100ドルを上回る可能性があると示唆されています
クレジット市場の計算が合わない様子である
今日の投資家にとっては、「リスクの高い」債券を購入するほうが得策であるように思われます
イラン情勢により、外国為替取引は不可能になってしまったのだろうか
コストの高さや機会の短さにもかかわらず、FXオプションの取引高が急増しています
Can AI be the great equaliser in e-FX?
FX market-makers see real benefits for agentic AI in code generation and data analysis
モデル・リスク・マネージャーの孤独
取締役会は、それらをイノベーションの足かせと見なすかもしれません。リスク管理部門は、効率性を重視していることを示す必要があります