Journal of Risk Model Validation

Steve Satchell
Trinity College, University of Cambridge

To begin, we would like to offer further apologies to authors and others for any delay in our replies as we struggle through these difficult times. We are now making good progress through our backlog, and we hope to be back to normal – rather than the “new normal” – soon. In this issue of The Journal of Risk Model Validation, we once again have four papers that all make useful contributions to the field of risk model validation.

Our first paper, “A hybrid model for credit risk assessment: empirical validation by real-world credit data” by Guotai Chi, Mohammad Shamsu Uddin, Tabassum Habib, Ying Zhou, Md Rashidul Islam and Md Asad Iqbal Chowdhury, examines which hybridization strategy is most suited to credit risk assessment. The authors use extensive new data sets and develop different hybrid models by combining traditional statistical and modern artificial intelligence methods based on classification and clustering feature selection approaches. They find that a multilayer perceptron combined with discriminant analysis or logistic regression can significantly improve classification accuracy compared with other single and hybrid classifiers. To check the efficiency and viability of their proposed model, the authors utilize three imbalanced credit data sets: Chinese farmer data, Chinese small and medium-sized enterprise data, and German data. They also utilize Australian credit data. The first two data sets are private and high dimensional, whereas the second two are widely used, publicly available and low dimensional. The breadth of the authors’ empirical applications justifies their claims about the robustness of their results.

“How accurate is the accuracy ratio in credit risk model validation?”, written by Marco van der Burgt, is the second paper in this issue. Here, the author investigates those stalwarts of risk model validation: the receiver operation curve and the cumulative accuracy profile. They are essentially rearrangements of the joint probabilities of two binary variables – actual default and predicted default – as some underlying threshold is moved. These tools visualize the ability of a credit scoring model to distinguish defaulting from nondefaulting counterparties. The curves lead to performance metrics such as the accuracy ratio (AR) and the area under the curve (AUC). Since these performance metrics are sample statistics, one cannot draw firm conclusions on model performance without knowing the sample distributions. Van der Burgt presents four methods to estimate the sample variance of the AR and the AUC. The first method is based on numerical integration, the second and third methods assume specific score distributions, and the fourth method uses a correlation structure, leading to a distribution-independent equation for the sample variance. The author demonstrates by simulation that the first method gives the best estimation of the sample variance. The distribution-independent equation gives reasonable estimations of the sample variance but ignores higher-order effects that are distribution dependent.

Our third paper is “Determination of weights for an optimal credit rating model based on default and nondefault distance maximization”. In it, Guotai Chi, Kunpeng Yuan, Ying Zhou and Lingling Gong argue that the reasonableness of indicator weights is a key determinant of the reliability of a credit rating system. This is certainly important if the risk manager has to justify their model to a wider nontechnical audience. Several combinations of weights are possible in such systems; thus, it may be straightforward to determine a reasonable weight for a single indicator but difficult to determine them for a group of indicators, and especially difficult to determine the optimal weights for such a group in a credit rating system. This study proposes a credit rating model that accurately identifies default companies and nondefault companies by maximizing intergroup credit score deviations and minimizing intragroup deviations. Empirically, the authors show that nonfinancial indicators have a greater effect on the credit status of Chinese SMEs than do financial indicators. This point is of broader interest as it indicates the presence of a number of sources of risk not fully captured by financial markets. This reflects, to some extent, the mixed-economy nature of China.

“Statistical properties of the population stability index” by Bilal Yurdakul and Joshua Naranjo is the final paper in the issue and the second that investigates the statistical properties of standard measures used in risk model validation. The population stability index (PSI) is a widely used statistic that measures how much a variable has shifted over time. A high PSI may alert a business to some change in the characteristics of a population. This shift may require investigation and, possibly, a model update. PSI is commonly used among banks to measure shifts between model development data and current data. Banks may face additional risks if models are used without proper validation. The incorrect use of PSI may bring unexpected risks for these institutions. However, there are not many studies about the statistical properties of PSI. In practice, the authors claim, the following rule-of-thumb is being used: PSI < 0:10 means a “little change”, 0:10 ≤ PSI < 0:25 means a “moderate change” and 0:25 ≤ PSI means a “significant change, action required”. However, these benchmarks are used without reference to statistical type I or type II error rates. This paper aims to fill the gap by providing statistical properties of the PSI and some recommendations on its use.

You need to sign in to use this feature. If you don’t have a Risk.net account, please register for a trial.

Sign in
You are currently on corporate access.

To use this feature you will need an individual account. If you have one already please sign in.

Sign in.

Alternatively you can request an individual account here: