Journal of Risk Model Validation
ISSN:
1753-9587 (online)
Editor-in-chief: Steve Satchell
Volume 19, Number 4 (December 2025)
Editor's Letter
Steve Satchell
Trinity College, University of Cambridge
This issue of The Journal of Risk Model Validation looks at various novel approaches to risk model validation.
The issue’s first paper, “An aggregated metrics framework for multicriteria model validation using rolling origin evaluation” by Stanisław M. S. Halkiewicz and Mateusz Stachowicz, extends the rolling origin evaluation framework to model validation in multi-criteria settings, where performance must be assessed across several scenarios or forecast targets. The authors propose three complementary metrics: the weighted sum of errors; the weighted aggregate performance metric; and the combined error and standard deviation metric. These allow users to balance expected accuracy, fairness across scenarios and stability over repeated data splits. The practical value of these metrics is illustrated by a stress testing case study in which the same gross domestic product (GDP) growth series is forecast under baseline, adverse and prosperity scenarios, with supervisory-style weights reflecting regulatory priorities. The results show how each metric encodes a distinct evaluation philosophy, and thus may recommend a different model depending on whether accuracy, balance or robustness is most required. In addition, Halkiewicz and Stachowicz introduce correlation-adjusted variants that penalize systemic errors across scenarios, ensuring that models vulnerable to structural shifts are not inadvertently selected. This contribution provides a structured, quantitative framework for risk-aware model selection, supporting applications in finance, economics and other domains in which scenario-based evaluation is desirable. These three metrics could well be analyzed in terms of utility functions, and further work could address the decomposition of such utilities by weighting the various stakeholders in the forecasting and model selection exercises.
Our second paper, “Probabilistic classification with discriminative and generative models: credit-scoring application” by Taha Buğra Çelik, investigates the potential of such a classification to enhance credit scoring accuracy, with a focus on model validation through reliability thresholds. By quantifying prediction confidence as a risk validation metric, the proposed framework provides a robust tool for assessing model performance under various reliability criteria, addressing key challenges in credit risk model validation. A comparison of discriminative models (random forest and logistic regression) and generative models (probabilistic neural networks and naive Bayes) is conducted to determine if leveraging the reliability of classifier predictions can improve their overall performance. The class probability values generated by these models are analyzed as a measure of prediction confidence (reliability level). The authors hypothesize that increasing the reliability threshold (ie, requiring higher class probability values for predictions to be considered) can reduce the number of predictions while improving their accuracy. Their findings support this hypothesis for most models: both discriminative and generative approaches demonstrate increased accuracy with higher reliability thresholds. While inconsistent results are observed for the naive Bayes model, the other models exhibit comparable performance, suggesting that these models’ base performance is more influential than their discriminative or generative nature. These findings highlight the potential to incorporate prediction reliability thresholds as a practical approach to risk model validation in credit-scoring contexts. In very simple terms, the above framework could be interpreted as a volatility-of-volatility approach.
The third and final paper in the issue is “Green risk identification and risk measurement in fintech: a particle swarm optimization fuzzy analytic hierarchy process and sparrow search algorithm quantile regression neural network approach” by Li Zeng and Wee-Yeap Lau. Financial technology (fintech) refers to the use of technology to improve financial activities. Its development has given rise to a series of potential risk issues, particularly in the domains of risk identification and risk measurement. Zeng and Lau’s study employs the particle swarm optimization fuzzy analytic hierarchy process (PSO-FAHP) along with the sparrow search algorithm quantile regression neural network (SSA-QRNN) to identify and measure financial risks in the fintech sector. These models allow researchers to examine problems involving multiple criteria. Despite its formidable acronym, the SSA-QRNN model demonstrates stability and precision in assessing fintech risks as well as adaptability. Suggestions are presented from diverse perspectives, demonstrating the benefits of the authors’ approach. In addition, the study considers not only traditional risk factors but elements of “green risk”, thereby establishing a more comprehensive model for assessing environmental risks in the fintech industry.
Taken together, these three papers provide interesting extensions to the idea of what constitutes risk model validation.
Papers in this issue
An aggregated metrics framework for multicriteria model validation using rolling origin evaluation
The authors apply the rolling origin evaluation framework to model validation in multicriteria settings, where performance must be assessed through various scenarios or forecast targets.
Probabilistic classification with discriminative and generative models: credit-scoring application
The author investigates how probabilistic classification can be used to enhance credit-scoring accuracy, offering a robust means for assessing model performance under various reliability criteria
Green risk identification and risk measurement in fintech: a particle swarm optimization fuzzy analytic hierarchy process and sparrow search algorithm quantile regression neural network approach
The authors apply PSO-FAHP and SSA-QRNN models to identify and measure financial risks in the fintech sector.