Journal of Risk Model Validation
ISSN:
17539579 (print)
17539587 (online)
Editorinchief: Steve Satchell
Smoothing algorithms by constrained maximum likelihood: methodologies and implementations for Comprehensive Capital Analysis and Review stress testing and International Financial Reporting Standard 9 expected credit loss estimation
Need to know
Smoothing algorithms for monotonic rating level PD and rating migration probability are proposed. The approaches can be characterized as follows:
 These approaches are based on constrained maximum likelihood, with a fair risk scale for the estimates determined fully by constrained maximum likelihood, leading to a fair and more robust credit loss estimation.
 Default correlation is considered by using the asset correlation and the Merton model.
 Quality of smoothed estimates is assessed by the likelihood ratio test, and the impacted credit loss due to the change of risk scale for the estimates.
 These approaches generally outperform the interpolation method and regression models, and are easy to implement by using, for example, SAS PROC NLMIXED.
Abstract
In the process of loan pricing, stress testing, capital allocation, modeling of probability of default (PD) term structure and International Financial Reporting Standard 9 expected credit loss estimation, it is widely expected that higher risk grades carry higher default risks, and that an entity is more likely to migrate to a closer nondefault rating than a more distant nondefault rating. In practice, sample estimates for the ratinglevel default rate or rating migration probability do not always respect this monotonicity rule, and hence the need for smoothing approaches arises. Regression and interpolation techniques are widely used for this purpose. A common issue with these, however, is that the risk scale for the estimates is not fully justified, leading to a possible bias in credit loss estimates. In this paper, we propose smoothing algorithms for ratinglevel PD and rating migration probability. The smoothed estimates obtained by these approaches are optimal in the sense of constrained maximum likelihood, with a fair risk scale determined by constrained maximum likelihood, leading to more robust credit loss estimation. The proposed algorithms can be easily implemented by a modeler using, for example, the SAS procedure PROC NLMIXED. The approaches proposed in this paper will provide an effective and useful smoothing tool for practitioners in the field of risk modeling.
Introduction
1 Introduction
Given a riskrated portfolio with $k$ ratings $\{{R}_{i}\mid 1\le i\le k\}$, we assume that rating ${R}_{1}$ is the best quality rating and ${R}_{k}$ is the worst, ie, the default rating. It is widely expected that higher risk ratings carry higher default risk, and that an entity is more likely to be downgraded or upgraded to a closer nondefault rating than a more distant nondefault rating. The following constraints are therefore required:
$$0\le {p}_{1}\le {p}_{2}\le \mathrm{\cdots}\le {p}_{k1}\le 1,$$  (1.1)  
$${p}_{ii+1}\ge {p}_{ii+2}\ge \mathrm{\cdots}\ge {p}_{ik1},$$  (1.2)  
$${p}_{i1}\le {p}_{i2}\le \mathrm{\cdots}\le {p}_{ii1},$$  (1.3) 
where ${p}_{i}$, $1\le i\le k1$, denotes the probability of default (PD) for rating ${R}_{i}$, and ${p}_{ij}$, $1\le i,j\le k1$, is the migration probability from a nondefault initial rating ${R}_{i}$ to a nondefault rating ${R}_{j}$.
Estimates that satisfy the above monotonicity constraints are called smoothed estimates. Smoothed estimates are widely expected for ratinglevel PD and rating migration probability in the process of loan pricing, capital allocation, Comprehensive Capital Analysis and Review (CCAR) stress testing (Board of Governors of the Federal Reserve System 2016), modeling of PD term structure and International Financial Reporting Standard 9 expected credit loss (ECL) estimation (Ankarath et al 2010).
In practice, sample estimates for ratinglevel PD and rating migration probability do not always respect these monotonicity rules. This calls for smoothing approaches. Regression and interpolation methods have been widely used for this purpose. A common issue with these approaches is that the risk scale for the estimates is not fully justified, leading to a possible bias estimate for the credit loss.
In this paper, we propose smoothing algorithms based on constrained maximum likelihood (CML). These CMLsmoothed estimates are optimal in the sense of constrained maximum likelihood, with a fair risk scale determined by constrained maximum likelihood, leading to a fair and more justified loss estimation. As shown by the empirical examples for ratinglevel PD in Section 2.3, the CML approach is more robust than the logistic and loglinear models, with quality being measured based on the resulting likelihood ratio, the predicted portfolio level PD and the impacted ECL.
This paper is organized as follows. In Section 2, we propose smoothing algorithms for smoothed ratinglevel PD, for the cases with and without default correlation. A smoothing algorithm for multinomial probability is proposed in Section 3. Empirical examples are given accordingly in Sections 2 and 3, and in Section 2 we benchmark the CML approach for ratinglevel PD with a logistic model proposed by Tasche (2013) and a loglinear model proposed by van der Burgt (2008). Section 4 concludes.
2 Smoothing ratinglevel probability of default
2.1 The proposed smoothing algorithm for ratinglevel PD assuming no default correlation
Crosssection or withinsection default correlation may arise due to some commonly shared risk factors. In which case, we assume that the sample is at a point in time, given the commonly shared risk factors, and that defaults occur independently given the commonly shared risk factors.
Let ${d}_{i}$ and $({n}_{i}{d}_{i})$ be the observed default and nondefault frequencies, respectively, for a nondefault risk rating ${R}_{i}$. Let ${p}_{i}$ denote the PD for an entity with a nondefault initial rating ${R}_{i}$. With no default correlation, we can assume that the default frequency follows a binomial distribution. Then the sample loglikelihood is given by
$$\mathrm{LL}=\sum _{i=1}^{k1}[({n}_{i}{d}_{i})\mathrm{log}(1{p}_{i})+{d}_{i}\mathrm{log}({p}_{i})]$$  (2.1) 
up to a summand given by the logarithms of the related binomial coefficients, which are independent of $\{{p}_{i}\}$. By taking the derivative of (2.1) with respect to ${p}_{i}$ and setting it to zero, we have
$$\frac{({n}_{i}{d}_{i})}{1{p}_{i}}+\frac{{d}_{i}}{{p}_{i}}=0,$$  
$${d}_{i}(1{p}_{i})=({n}_{i}{d}_{i}){p}_{i}\Rightarrow {p}_{i}=\frac{{d}_{i}}{{n}_{i}}.$$ 
Therefore, the unconstrained maximum likelihood estimate for ${p}_{i}$ is just the sample default rate ${d}_{i}/{n}_{i}$.
We propose the following smoothing algorithm for the case when no default correlation is assumed.
Algorithm 2.1 (Smoothing ratinglevel PD assuming no default correlation).
 (a)
Parameterize the PD for a nondefault rating ${R}_{i}$ by
$${p}_{i}=\mathrm{exp}({b}_{1}+{b}_{2}+\mathrm{\cdots}+{b}_{ki}),$$ (2.2) where
$${b}_{k1}\le {\epsilon}_{1},{b}_{k2}\le {\epsilon}_{2},\mathrm{\dots},{b}_{2}\le {\epsilon}_{k2},{b}_{1}\le 0$$ (2.3) for given constants ${\epsilon}_{i}\ge 0$, $1\le i\le k2$.
 (b)
$${p}_{k1}=\mathrm{exp}({b}_{1})\le \mathrm{exp}(0)=1,$$  
$$\frac{{p}_{i}}{{p}_{i1}}=\mathrm{exp}({b}_{ki+1})\ge \mathrm{exp}({\epsilon}_{i1})\ge 1\Rightarrow 0\le {p}_{1}\le {p}_{2}\le \mathrm{\cdots}\le {p}_{k1}\le 1.$$ 
Thus, monotonicity (1.1) is satisfied. When ${\epsilon}_{1}={\epsilon}_{2}=\mathrm{\cdots}={\epsilon}_{k2}=\epsilon \ge 0$, let $\rho =\mathrm{exp}(\epsilon )$. Then $\rho $ is the maximum lower bound for all the ratios $\{{p}_{i}/{p}_{i1}\}$ of the smoothed estimates $\{{p}_{i}\}$.
2.2 The proposed smoothing algorithms for ratinglevel PD assuming default correlation
Default correlation can be modeled by the asymptotic single risk factor (ASRF) model using asset correlation. Under the ASRF model framework, the risk for an entity is governed by a latent random variable $z$, called the firm’s normalized asset value, which splits into the following two parts (Miu and Ozdemir 2009):
$$  (2.4) 
where $s$ denotes the common systematic risk and $\epsilon $ is the idiosyncratic risk independent of $s$. The quantity $\rho $ is called the asset correlation. It is assumed that there exist threshold values (ie, the default points) $\{{b}_{i}\}$ such that an entity with an initial risk rating ${R}_{i}$ will default when $z$ falls below the threshold value ${b}_{i}$. The longrun PD for rating ${R}_{i}$ is then given by ${p}_{i}=\mathrm{\Phi}({b}_{i})$, where $\mathrm{\Phi}$ denotes the standard normal cumulative distribution function (CDF).
Let ${p}_{i}(s)$ denote the PD for an entity with an initial risk rating ${R}_{i}$ given the systematic risk $s$. It is shown in Yang (2017) that
$${p}_{i}(s)=\mathrm{\Phi}({b}_{i}\sqrt{1+{r}^{2}}rs),$$  (2.5) 
where
$$r=\frac{\sqrt{\rho}}{\sqrt{1\rho}}.$$ 
Let ${n}_{i}(t)$ and ${d}_{i}(t)$ denote, respectively, the number of entities and the number of defaults at time $t$ for $t={t}_{1},{t}_{2},\mathrm{\dots},{t}_{q}$. Given the latent factor $s$, we propose the following smoothing algorithm for ratinglevelcorrelated longrun PDs by using (2.5).
Algorithm 2.2 (Smoothing ratinglevelcorrelated longrun PDs given the latent systematic risk factor).
 (a)
Parameterize ${p}_{i}(s)$ for a nondefault rating ${R}_{i}$ by (2.5) with
$${b}_{i}=({c}_{1}+{c}_{2}+\mathrm{\cdots}+{c}_{ki}),$$ (2.6) where, for a given constant $\epsilon \ge 0$, the following constraints are satisfied:
$${c}_{k1}\le \epsilon ,{c}_{k2}\le \epsilon ,\mathrm{\dots},{c}_{2}\le \epsilon ,{c}_{1}\le 0.$$ (2.7)  (b)
Estimate parameters $\{{c}_{1},{c}_{2},\mathrm{\dots},{c}_{k1}\}$ by maximizing, under constraint (2.7), the following loglikelihood:
$$\mathrm{LL}=\sum _{h=1}^{q}\sum _{i=1}^{k1}[({n}_{i}({t}_{h}){d}_{i}({t}_{h}))\mathrm{log}(1{p}_{i}(s)+{d}_{i}({t}_{h}))\mathrm{log}({p}_{i}(s))].$$ (2.8) Set ${p}_{i}=\mathrm{\Phi}({b}_{i})$. Then monotonicity (1.1) for $\{{p}_{i}\}$, ie, the ratinglevel longrun PDs, follows from constraints (2.6) and (2.7).
Optimization with a random effect can be implemented by using, for example, SAS PROC NLMIXED (SAS Institute 2009).
When some key risk factors $x=({x}_{1},{x}_{2},\mathrm{\dots},{x}_{m})$, common to all ratings, are observed, we assume the following decomposition for the systematic risk factor $s$:
$$ 
where the common index $\mathrm{ci}(x)=[{a}_{1}{x}_{1}+{a}_{2}{x}_{2}+\mathrm{\cdots}+{a}_{m}{x}_{m}u]/v$ is a linear combination of variables ${x}_{1},{x}_{2},\mathrm{\dots},{x}_{m}$, with $u$ and $v$ being the mean and standard deviation of ${a}_{1}{x}_{1}+{a}_{2}{x}_{2}+\mathrm{\cdots}+{a}_{m}{x}_{m}$.
Let ${p}_{i}(x)$ denote the PD given a scenario $x$. Assume that $\mathrm{ci}(x)$ is standard normal independent of $e$. Then we have (Yang 2017, Theorem 2.2)
$${p}_{i}(x)=\mathrm{\Phi}[{b}_{i}\sqrt{1+{\stackrel{~}{r}}^{2}}+\stackrel{~}{r}\mathrm{ci}(x)]$$  (2.9) 
for some $\stackrel{~}{r}$.
Let $\mathrm{ci}(x(t))$ denote the value of $\mathrm{ci}(x)$ at time $t$ for $t={t}_{1},{t}_{2},\mathrm{\dots},{t}_{q}$. Given $\mathrm{ci}(x)$, we propose the following smoothing algorithm for ratinglevelcorrelated longrun PDs and ratinglevel pointintime PDs by using (2.9).
Algorithm 2.3 (Smoothing ratinglevelcorrelated PDs given the common index $\mathrm{ci}\mathbf{}\mathrm{(}x\mathrm{)}$).
 (a)
Parameterize ${p}_{i}(x(t))$ for a nondefault rating ${R}_{i}$ by (2.6) with
$${b}_{i}=({c}_{1}+{c}_{2}+\mathrm{\cdots}+{c}_{ki}),$$ (2.10) where, for a given constant $\epsilon \ge 0$, the following constraints are satisfied:
$${c}_{k1}\le \epsilon ,{c}_{k2}\le \epsilon ,\mathrm{\dots},{c}_{2}\le \epsilon ,{c}_{1}\le 0.$$ (2.11)  (b)
Estimate parameters $\{{c}_{1},{c}_{2},\mathrm{\dots},{c}_{k1}\}$ by maximizing, under constraint (2.11), the loglikelihood, as follows
$$\mathrm{LL}=\sum _{h=1}^{q}\sum _{i=1}^{k1}[({n}_{i}({t}_{h}){d}_{i}({t}_{h}))\mathrm{log}(1{p}_{i}(x({t}_{h}))+{d}_{i}({t}_{h})\mathrm{log}({p}_{i}(x({t}_{h}))))].$$ (2.12) Set ${p}_{i}=\mathrm{\Phi}({b}_{i})$. Then monotonicity (1.1) for $\{{p}_{i}\}$, ie, the ratinglevel longrun PDs, and for $\{{p}_{i}(x({t}_{h}))\}$ at time $t={t}_{h}$, follows from constraints (2.10) and (2.11).
2.3 Empirical examples: smoothing of ratinglevel PDs
Example 1: smoothing ratinglevel longrun PDs assuming no default correlation
Table 1 shows the record count and default rate (DF rate) for a sample created synthetically with six nondefault risk ratings.
Algorithm 2.1 will be benchmarked by the following methods.
Risk rating  
Portfolio  
1  2  3  4  5  6  level  
DF  1  11  22  124  62  170  391  
Count  5 529  11 566  29 765  52 875  4 846  4 318  108 899  
DF rate (%)  0.0173  0.0993  0.0739  0.2352  1.2833  3.9442  0.3594 
 LGL1:

with this approach, the PD for rating ${R}_{i}$ is estimated by ${p}_{i}=\mathrm{exp}(a+bx)$, where $x$ denotes the index for rating ${R}_{i}$, ie, $x=i$ for rating ${R}_{i}$. Parameters $a$ and $b$ are estimated by a linear regression of the form below, using the logarithm of the sample default rate for a rating:
$$\mathrm{log}({r}_{i})=a+bx+e,e\sim N(0,{\sigma}^{2}).$$ A common issue with this approach is the unjustified uniform risk scale $b$ (in the log space) for all ratings. In addition, this approach generally causes the portfolio level PD to be underestimated, due to the convexity of the exponential function (the second derivative of the function $\mathrm{exp}(\cdot )$ is positive):
$$E(y\mid x)=E(\mathrm{exp}(a+bx+e)\mid x)=\mathrm{exp}(a+bx+\frac{1}{2}{\sigma}^{2})>\mathrm{exp}(a+bx).$$  LGL2:

like method LGL1, ratinglevel PD is estimated by ${p}_{i}=\mathrm{exp}(a+bx)$. However, parameters $a$ and $b$ are estimated by maximizing the loglikelihood given in (2.1). With this approach, the bias for portfolio PD can generally be avoided, though the issue with the unjustified uniform risk scale remains.
 EXPCDF:

this method was proposed by van der Burgt (2008). With this approach, the ratinglevel PD is estimated by ${p}_{i}=\mathrm{exp}(a+bx)$, where $x$ denotes, for rating ${R}_{i}$, the adjusted sample cumulative distribution,
$$x(i)=\frac{({n}_{1}+{n}_{2}+\mathrm{\cdots}+{n}_{i1}+\frac{1}{2}{n}_{i})}{({n}_{1}+{n}_{2}+\mathrm{\cdots}+{n}_{k1})}.$$ (2.13) Instead of estimating parameters via a cap ratio (van der Burgt 2008), we estimate parameters by maximizing the loglikelihood given in (2.1).
 LGSTINVCDF:
Estimation quality is measured by the following.
 $p$value:

this is the $p$value calculated from the likelihood ratio chisquared test with degrees of freedom equal to the number of restrictions. A higher $p$value indicates a better model.
 ECL ratio:

this is the ratio of expected credit loss based on the smoothed ratinglevel PDs to that based on the realized ratinglevel PDs, given the exposure at default and loss given default parameters for each rating. A significantly lower ECL ratio value indicates a possible underestimation of the credit loss.
 PD ratio:

the ratio of the portfolio level PD aggregated from the smoothed ratinglevel PDs is relative to the portfolio level PD aggregated from the realized ratinglevel PDs. A value significantly lower than 100% for the PD ratio indicates a possible underestimation for the PD at portfolio level.
Table 2 shows the results for Algorithm 2.1 (labeled “CML”) when ${\epsilon}_{1}={\epsilon}_{2}=\mathrm{\cdots}={\epsilon}_{k2}=0$ along with the benchmarks, where the smoothed ratinglevel PDs are listed in columns P1–P6.
Portfolio level  

ECL  PD  
Method  P1  P2  P3  P4  P5  P6  $?$value  ratio  ratio 
CML  0.0173  0.0810  0.0810  0.2352  1.2833  3.9442  95.92  99.91  100.00 
LGL1  0.0165  0.0416  0.1053  0.2663  0.6732  1.7022  0.00  46.09  72.57 
LGL2  0.0032  0.1468  0.2901  0.4333  0.5763  0.7191  0.00  27.58  100.07 
EXPCDF  0.0061  0.0086  0.0294  0.3431  1.9081  2.5057  0.00  72.92  100.21 
LGSTINVCDF  0.0104  0.0188  0.0585  0.2795  1.5457  3.4388  0.00  90.46  100.00 
Portfolio level  

ECL  PD  
$?$  P1  P2  P3  P4  P5  P6  $?$value  ratio  ratio 
0.0  0.0173  0.0810  0.0810  0.2352  1.2833  3.9442  95.92  99.91  100.00 
0.1  0.0173  0.0753  0.0832  0.2352  1.2833  3.9442  89.06  99.88  100.00 
0.5  0.0173  0.0552  0.0910  0.2352  1.2833  3.9442  36.63  99.79  100.00 
1.0  0.0120  0.0327  0.0890  0.2419  1.2833  3.9442  2.54  99.63  100.00 
These results show that Algorithm 2.1 outperforms the other benchmarks significantly by $p$value, impacted ECL and aggregated portfoliolevel PD. The first loglinear model (LGL1) underestimates the portfolio level PD significantly. All loglinear models (LGL1, LGL2 and EXPCDF) underestimate the ECL significantly.
Table 3 illustrates the strictly monotonic smoothed ratinglevel PDs by Algorithm 2.1 when ${\epsilon}_{1}={\epsilon}_{2}=\mathrm{\cdots}={\epsilon}_{k2}=\epsilon >0$. However, while the $p$value deteriorates quickly as $\epsilon $ increases from 0 to 1, the impacted ECL does not change that much.
Example 2: smoothing ratinglevel longrun PDs in the presence of default correlation
Risk rating  
Portfolio  
1  2  3  4  5  6  level  
Longrun AVG PD  0.0215  0.1027  0.0764  0.2731  1.1986  3.8563  0.3818 
Overall distribution  5.07  10.61  27.47  48.32  4.52  4.01  100.00 
The sample created synthetically contains the quarterly default count by rating for a portfolio with six nondefault ratings between 2005 Q1 and 2014 Q4. The (ratinglevel or portfoliolevel) pointintime default rate is calculated for each quarter and then averaged over the sample window by dividing by the number of quarters (fortyfour) to obtain the estimate for the longrun average realized PD (labeled “AVG PD”). Sample distribution (labeled “overall distribution”) by rating is calculated by combining all fortyfour quarters. Table 4 displays sample statistics (with a heavy size concentration at rating ${R}_{4}$).
Portfolio  
longrun PD  
AVG  PD  
$?$  P1  P2  P3  P4  P5  P6  AIC  PD  ratio 
0.0 (no correl)  0.0179  0.0836  0.0836  0.2371  1.3076  4.0372  694.02  0.3710  97.17 
0.0 ($w$ correl)  0.0183  0.0828  0.0828  0.2545  1.1951  3.9340  594.62  0.3843  100.66 
0.1 ($w$ correl)  0.0183  0.0483  0.0966  0.2541  1.1942  3.9318  600.79  0.3842  100.64 
0.2 ($w$ correl)  0.0035  0.0176  0.0754  0.2775  1.1859  3.9237  617.96  0.3842  100.64 
0.3 ($w$ correl)  0.0010  0.0086  0.0560  0.2905  1.1961  3.9342  637.25  0.3845  100.71 
Table 5 shows the smoothed correlated ratinglevel longrun PD for all six nondefault ratings obtained by using Algorithm 2.2.
Estimation quality is measured by the following.
 AIC:

the Akaike information criterion. A lower AIC indicates a better model.
 PD ratio:

the ratio of the longrun average predicted portfoliolevel PD (labeled “AVG PD”) to the longrun average realized portfolio level PD. A value significantly less than 100% for this ratio indicates a possible underestimation for the PD at portfolio level.
The first row in Table 5 shows results for the case when no default correlation is assumed (labeled “no correl”) and $\epsilon $ is chosen to be 0, while the second row shows those for the case when default correlation is assumed (labeled “$w$ correl”) and $\epsilon =0$.
The results in the first row show that the estimated longrun portfolio level PD for the case assuming no default correlation is lower than that for the case when default correlation is assumed (second row), which suggests we may have underestimated the longrun ratinglevel PD when assuming no default correlation. The high AIC value in the first row implies that the assumption of no default correlation may not be appropriate.
Note that, when applying Algorithm 2.2 to the sample used in example 1, assuming no default correlation, we got exactly the same estimates as in example 1.
3 Smoothing algorithms for multinomial probability
3.1 Unconstrained maximum likelihood estimates for multinomial probability
For $n$ independent trials, where each trial results in exactly one of $h$ fixed outcomes, the probability of observing frequencies $\{{n}_{i}\}$, with frequency ${n}_{i}$ for the $i$th ordinal outcome, is
$$\frac{n!}{{n}_{1}!{n}_{2}!\mathrm{\cdots}{n}_{h}!}{x}_{1}^{{n}_{1}}{x}_{2}^{{n}_{2}}\mathrm{\cdots}{x}_{h}^{{n}_{h}},$$  (3.1) 
where ${x}_{i}>0$ is the probability of observing the $i$th ordinal outcome in a single trial, and
$$n={n}_{1}+{n}_{2}+\mathrm{\cdots}+{n}_{h},{x}_{1}+{x}_{2}+\mathrm{\cdots}+{x}_{h}=1.$$ 
The loglikelihood is
$$\mathrm{LL}={n}_{1}\mathrm{log}{x}_{1}+{n}_{2}\mathrm{log}{x}_{2}+\mathrm{\cdots}+{n}_{h}\mathrm{log}{x}_{h}$$  (3.2) 
up to a constant given by the logarithm of some multinomial coefficient independent of parameters $\{{x}_{1},{x}_{2},\mathrm{\dots},{x}_{h}\}$. By using the relation ${x}_{h}=1{x}_{1}{x}_{2}\mathrm{\cdots}{x}_{h1}$ and setting to zero the derivative of (3.2) with respect to ${x}_{i}$, $1\le i\le h1$, we have
$$\frac{{n}_{i}}{{x}_{i}}\frac{{n}_{h}}{(1{x}_{1}{x}_{2}\mathrm{\cdots}{x}_{h1})}=0\Rightarrow \frac{{n}_{i}}{{x}_{i}}=\frac{{n}_{h}}{{x}_{h}}.$$ 
Since this holds for each $i$ and for the fixed $h$, we conclude that the vector $({x}_{1},{x}_{2},\mathrm{\dots},{x}_{h})$ is in proportion with $({n}_{1},{n}_{2},\mathrm{\dots},{n}_{h})$. Thus, the maximum likelihood estimate for ${x}_{i}$ is the sample estimate
$${x}_{i}=\frac{{n}_{i}}{({n}_{1}+{n}_{2}+\mathrm{\cdots}+{n}_{h})}=\frac{{n}_{i}}{n}.$$  (3.3) 
3.2 The proposed smoothing algorithm for multinomial probability
We next propose a smoothing algorithm for multinomial probability under the following constraint:
$$0\le {x}_{1}\le {x}_{2}\le \mathrm{\cdots}\le {x}_{h}\le 1.$$  (3.4) 
Algorithm 3.1 (Smoothing multinomial probability).
 (a)
Parameterize the multinomial probability by
$${x}_{i}=\frac{\mathrm{exp}({b}_{1}+{b}_{2}+\mathrm{\cdots}+{b}_{h+1i})}{\mathrm{exp}({b}_{1})+\mathrm{exp}({b}_{1}+{b}_{2})+\mathrm{\cdots}+\mathrm{exp}({b}_{1}+{b}_{2}+\mathrm{\cdots}+{b}_{h})}.$$ (3.5)  (b)
Maximize (3.2), with ${x}_{i}$ given by (3.5), for parameters ${b}_{1},{b}_{2},\mathrm{\dots},{b}_{h}$ subject to
$${b}_{h}\le {\epsilon}_{1},{b}_{h1}\le {\epsilon}_{2},\mathrm{\dots},{b}_{2}\le {\epsilon}_{h1},{b}_{1}\le 0$$ (3.6) for ${\epsilon}_{i}\ge 0$, $1\le i\le h1$. Derive the CMLsmoothed estimates by using (3.5). Then the monotonicity (3.4) for the estimates follows from (3.5) and (3.6).
In the case when ${\epsilon}_{1}={\epsilon}_{2}=\mathrm{\cdots}={\epsilon}_{h1}=\epsilon \ge 0$, let $\rho =\mathrm{exp}(\epsilon )$. Then $\rho $ is the maximum lower bound for all the ratios $\{{x}_{i}/{x}_{i1}\}$.
3.3 An empirical example: smoothing transition probability matrix
(a) Transition probability before smoothing  
p1  p2  p3  p4  p5  p6  p7 
0.97162  0.01835  0.00312  0.00554  0.00104  0.00017  0.00017 
0.00621  0.94528  0.03071  0.01284  0.00215  0.00257  0.00025 
0.00071  0.01028  0.93803  0.04089  0.00659  0.00277  0.00074 
0.00024  0.00069  0.01260  0.96726  0.01261  0.00543  0.00118 
0.00039  0.00118  0.00790  0.07996  0.82725  0.07048  0.01283 
0.00022  0.00133  0.00266  0.04498  0.01197  0.89940  0.03944 
(b) Transition probability after smoothing  
p1  p2  p3  p4  p5  p6  p7 
0.97162  0.01835  0.00433  0.00433  0.00104  0.00017  0.00017 
0.00621  0.94528  0.03071  0.01284  0.00236  0.00236  0.00025 
0.00071  0.01028  0.93803  0.04089  0.00659  0.00277  0.00074 
0.00024  0.00069  0.01260  0.96726  0.01261  0.00543  0.00118 
0.00039  0.00118  0.00790  0.07996  0.82725  0.07048  0.01283 
0.00022  0.00133  0.00266  0.02847  0.02847  0.89940  0.03944 
Rating migration matrix models (Miu and Ozdemir 2009; Yang and Du 2016) are widely used for International Financial Reporting Standard 9 ECL estimation and CCAR stress testing. Given a nondefault risk rating ${R}_{i}$, let ${n}_{ij}$ be the observed longrun transition frequency from ${R}_{i}$ to ${R}_{j}$ at the end of the horizon, and let ${n}_{i}={n}_{i1}+{n}_{i2}+\mathrm{\cdots}+{n}_{ik}$. Let ${p}_{ij}$ be the longrun transition probability from ${R}_{i}$ to ${R}_{j}$. By (3.3), the maximum likelihood estimate for ${p}_{ij}$ observing the longrun transition frequencies $\{{n}_{ij}\}$ for a fixed $i$ is
$${p}_{ij}=\frac{{n}_{ij}}{{n}_{i}}.$$  (3.7) 
It is widely expected that higher risk grades carry greater default risk, and that an entity is more likely to be downgraded or upgraded to a closer nondefault rating than a more distant nondefault rating. The following constraints are thus required:
$${p}_{ii+1}\ge {p}_{ii+2}\ge \mathrm{\cdots}\ge {p}_{ik1},$$  (3.8)  
$${p}_{i1}\le {p}_{i2}\le \mathrm{\cdots}\le {p}_{ii1},$$  (3.9)  
$${p}_{1k}\le {p}_{2k}\le \mathrm{\cdots}\le {p}_{k1k}.$$  (3.10) 
The constraint (3.10) is for ratinglevel PD, which was discussed in Section 2.
Smoothing the longrun migration matrix involves the following steps.
 (a)
Rescale migration probabilities $\{{p}_{i1},{p}_{i2},\mathrm{\dots},{p}_{ii1}\}$ in (3.9) to make them sum to 1. Then find the CMLsmoothed estimates by using Algorithm 3.1, and rescale these CML estimates in return to obtain the same summed value for $\{{p}_{i1},{p}_{i2},\mathrm{\dots},{p}_{ii1}\}$ as that before smoothing. Do the same for (3.8).
 (b)
Find the CMLsmoothed estimates by using Algorithm 2.1 for the ratinglevel default rate. Keep these CML default rate estimates unchanged and rescale, for each nondefault rating ${R}_{i}$, the nondefault migration probabilities $\{{p}_{i1},{p}_{i2},\mathrm{\dots},{p}_{ik1}\}$ so that the entire row $\{{p}_{i1},{p}_{i2},\mathrm{\dots},{p}_{ik}\}$ sums to 1.
Table 6 shows empirical results using Algorithms 2.1 and 3.1 for smoothing the longrun migration matrix, where for Algorithm 3.1 all ${\epsilon}_{i}$ are set to zero.
The sample used here is created synthetically. It consists of the historical quarterly rating transition frequency for a commercial portfolio from 2005 Q1 to 2015 Q4. There are seven risk ratings, with ${R}_{1}$ being the best quality rating and ${R}_{7}$ being the default rating.
Part (a) shows sample estimates for longrun transition probabilities before smoothing, while part (b) shows CMLsmoothed estimates. There are three rows, as highlighted in bold in part (a), where sample estimates violate (3.8) or (3.9) (but (3.10) is satisfied). Ratinglevel sample default rates (the column labeled “p7”) do not require smoothing.
As shown in the table, the CMLsmoothed estimates are the simple average of the relevant nonmonotonic sample estimates. (For the structure of CMLsmoothed estimates for multinomial probabilities, we show theoretically in a separate paper that the CMLsmoothed estimate for an ordinal class is either the sample estimate or the simple average of the sample estimates for some consecutive ordinal classes including the named class.)
4 Conclusions
Regression and interpolation approaches are widely used for smoothing rating transition probability and ratinglevel probability of default. A common issue with these methods is that the risk scale for the estimates does not have a strong mathematical basis, leading to possible bias in credit loss estimation. In this paper, we propose smoothing algorithms that are based on constrained maximum likelihood for ratinglevel PD and for rating migration probability. These smoothed estimates are optimal in the sense of constrained maximum likelihood, with a fair risk scale determined by constrained maximum likelihood, leading to a fair and more justified credit loss estimation. These algorithms can be implemented by a modeler using, for example, the SAS PROC NLMIXED package.
Declaration of interest
The author reports no conflicts of interest. The author alone is responsible for the content and writing of the paper. The views expressed in this paper are not necessarily those of the Royal Bank of Canada or any of its affiliates.
Acknowledgements
The author thanks both referees for suggesting extended discussion to cover both the case when default correlation is assumed and the likelihood ratio test for the constrained maximum likelihood estimates. Special thanks to Carlos Lopez for his consistent input, insights and support for this research. Thanks also go to Clovis Sukam and Biao Wu for their critical reading of this manuscript, and Zunwei Du, Wallace Law, Glenn Fei, Kaijie Cui, Jacky Bai and Guangzhi Zhao for many valuable conversations.
References
 Ankarath, N., Ghost, T. P., Mehta, K. J., and Alkafaji, Y. A. (2010). Understanding IFRS Fundamentals. Wiley.
 Board of Governors of the Federal Reserve System (2016). Comprehensive Capital Analysis and Review 2016: summary instructions. Report, January, Federal Reserve Bank.
 Miu, P., and Ozdemir, B. (2009). Stress testing probability of default and rating migration rate with respect to Basel II requirements. The Journal of Risk Model Validation 3(4), 3–38 (https://doi.org/10.21314/JRMV.2009.048).
 SAS Institute (2009). SAS 9.2 user’s guide: the NLMIXED procedure. SAS Institute Inc., Cary, NC.
 Tasche, D. (2013). The art of probabilityofdefault curve calibration. The Journal of Credit Risk 9(4), 63–103 (https://doi.org/10.21314/JCR.2013.169).
 van der Burgt, M. J. (2008), Calibrating lowdefault portfolios, using the cumulative accuracy profile. The Journal of Risk Model Validation 1(4), 17–33 (https://doi.org/10.21314/JRMV.2008.016).
 Yang, B. H. (2017). Pointintime probability of default term structure models for multiperiod scenario loss projection. The Journal of Risk Model Validation 11(1), 73–94 (https://doi.org/10.21314/JRMV.2017.164).
 Yang, B. H., and Du, Z. (2016). Ratingtransitionprobability models and Comprehensive Capital Analysis and Review stress testing. The Journal of Risk Model Validation 10(3), 1–19 (https://doi.org/10.21314/JRMV.2016.155).
Only users who have a paid subscription or are part of a corporate subscription are able to print or copy content.
To access these options, along with all other subscription benefits, please contact info@risk.net or view our subscription options here: http://subscriptions.risk.net/subscribe
You are currently unable to print this content. Please contact info@risk.net to find out more.
You are currently unable to copy this content. Please contact info@risk.net to find out more.
Copyright Infopro Digital Limited. All rights reserved.
You may share this content using our article tools. Printing this content is for the sole use of the Authorised User (named subscriber), as outlined in our terms and conditions  https://www.infoproinsight.com/termsconditions/insightsubscriptions/
If you would like to purchase additional rights please email info@risk.net
Copyright Infopro Digital Limited. All rights reserved.
You may share this content using our article tools. Copying this content is for the sole use of the Authorised User (named subscriber), as outlined in our terms and conditions  https://www.infoproinsight.com/termsconditions/insightsubscriptions/
If you would like to purchase additional rights please email info@risk.net