Journal of Operational Risk

Risk.net

Measuring expected shortfall under semi-parametric expected shortfall approaches: a case study of selected Southern European/Mediterranean countries

Nikola Radivojević, Borislav Bojić and Marija Lakićević

  • The HS models, which are based on certain transformed historical data, can reliably be used for the estimation of a market risk in terms of the Basel III standards.
  • The incorporation of the volatility models co-opting the leverage effect contributes to the improvement of the applicability of these models.
  • The first step in testing the validity of risk models, in the context of Basel III rules, implies VaR backtesting.

We investigate the applicability of semi-parametric approaches for estimating expected shortfall. More precisely, we examine the applicability of several models based on the historical simulation (HS) approach: one based on untransformed historical data, and others based on transformed historical data. Our research shows that the HS models based on certain transformed historical data can reliably be used for the estimation of market risk in terms of the Basel III standards. This investigation was conducted on the capital markets of selected Southern European/Mediterranean countries and those of Serbia and Ireland. Our backtesting results were verified using Monte Carlo testing and the bootstrap method.

1 Introduction

Any attempt to summarize an entire distribution by a single number implies a loss of potentially important information. Hence, the use of value-at-risk (VaR) as a risk measure has been sharply criticized, particularly since this measure does not control for scenarios exceeding the VaR. For instance, the largest loss exceeding the VaR can be significantly increased but the VaR risk measure will remain unchanged. VaR represents the maximum loss of a portfolio that may arise during the holding period for a precisely defined confidence level. This means that it indicates the size and likelihood of this loss, as well as the likelihood of exceeding it, or the likelihood of the extreme loss that a bank may suffer above a defined confidence level. However, it does not say anything about the magnitude of this overrun. In the eyes of investors and regulators, these extreme losses are precisely what a risk measure should flag. VaR is, however, inherently incapable of distinguishing between the situations where losses in the tail are only slightly worse than the threshold and those where they are overwhelming. It only provides a lower bound for losses in the tail, thus having a bias toward optimism instead of conservatism, which is generally thought to be beneficial in risk management (Žiković and Filer 2013). The consequence of all this is that VaR estimates may misinterpret the risk of a portfolio. For example, two portfolios with the same VaR are not necessarily exposed to the same market risk.

Artzner et al (1999) studied the characteristics of risk measures in the context of capital requirements for risk coverage. They found that any risk measure that could be used to determine capital adequacy should have a coherence characteristic. A coherent risk measure is one that is able to determine the minimum amount of capital that should be invested to ensure a future value of the portfolio is acceptable. In other words, a coherent risk measure determines the minimum amount of capital to invest in risk-free assets in order to preserve the position. According to Artzner, such a risk measure is a measure of economic capital.

The most important condition for coherence is subadditivity. This refers to the requirement that portfolio risk should be less than or, in the worst case, equal to the risk of individual positions in its composition. First, this condition guarantees a conservative risk estimation; when positions in a portfolio are added, the upper limit of risk cannot be greater than the sum of the risks of individual positions. Second, the effect of diversification as a means of reducing risk is respected (Voit 2007). Unfortunately, VaR estimates do not universally exhibit subadditivity. They do satisfy this condition if the assumption that returns are elliptically distributed holds.

Since VaR does not satisfy all the characteristics of coherent risk measures, the Basel Committee on Banking Supervision recently proposed fundamental changes to the regulatory treatment of financial institutions’ trading book positions (Kellner and Rösch 2016). Among other things, the replacement of 99% VaR with a 97.5% expected shortfall (ES) or conditional VaR (CVaR) for the quantification of market risk is recommended. The ES estimations are also calculated for a one-day-ahead horizon with a confidence level of 97.5%. According to Basel Committee on Banking Supervision (2014, 2017), this confidence level provides a broadly similar level of risk capture as the existing 99% VaR threshold while providing a number of benefits, including a generally more stable model output and often a reduced sensitivity to extreme outlier observations.

No matter how conservative a measure is, some residual risk will always remain. Except for in certain abnormal cases, the ES is always less than the maximum loss amount; hence, even when the ES is used as a measure of economic capital, we still need to keep in mind that catastrophic events might occur, for which the capital buffer might be insufficient. According to many studies, purely nonparametric ES estimation approaches, such as calculating the ES from an untransformed historical data set of the tail losses, are sure to be unreactive to sudden shifts in market regimes as well as the occurrence of extreme events. This is exactly the same criticism as when using them for VaR calculation. The logic shows that the weak points of risk measurement models cannot be ignored, and that they can return to haunt us even when we switch from one risk measure to another. The problems remain the same regardless of whether the estimation of VaR or the ES (Žiković 2008) is made. Since the semi-parametric approaches based on the bootstrap method have proven to be very reliable estimation procedures for VaR on emerging markets, as shown by Žiković, it seems natural to extend their application to the calculation of ES. Therefore, this paper aims to examine the applicability of the popular semi-parametric historical simulation approaches based on the bootstrap method, such as bootstrap historical simulation (HS), filtered historical simulation (FHS) and bootstrap hybrid historical simulation (HHS), to the markets of selected Southern European/Mediterranean countries plus Serbia and Ireland. The performance of HHS based on the extreme value theory (EVT) is also examined. An empirical investigation has been conducted on the capital markets of Portugal, Greece, Spain, Turkey, Croatia, Slovenia, Bosnia and Herzegovina, Malta, Serbia and Ireland. We shall explain later, in Section 4, why the latter two countries are included in this study even though they do not belong to the Southern European countries, and why the Mediterranean countries of Italy, France, Montenegro, Albania, Gibraltar and Monaco were not included in our study.

2 Literature review

In the literature, a large number of papers are devoted to ES (see, for example, Acerbi and Tasche 2002; Acerbi 2007; Dhaene et al 2008; Acerbi and Székely 2014; Emmer et al 2013; Tasche 2013; Bellini et al 2014; Martin 2014).

However, most of these are dedicated to the study of the properties of ES as a spectral and coherent risk measure. There are also a significant number of papers dedicated to testing the applicability of different ES approaches, but these are mostly implemented on developed markets. For example, a significant piece of research devoted to the applicability of ES approaches was carried out by Nieto and Ruiz (2008). They tested about thirty parametric models, ranging from conditional autoregressive VaR asymmetric to asymmetric power autoregressive conditional heteroscedasticity (APARCH)-EVT-Hill. They found that these models could reliably be used for ES estimation. They also concluded that the HS approach could not be used for this purpose. However, the study was only conducted on an example using the Standard & Poor’s 500 (S&P 500) index. A similar study was carried out by Harmantzis et al (2006). They tested several ES models on the S&P 500, DAX, CAC, Nikkei, TSE and FTSE indexes, as well as on several currencies, and found that the HS model and the EVT-based peaks-over-threshold method gave more correct estimations. They stated that Gaussian models underestimated ES, whereas models based on a stable Pareto distribution overestimated ES. Angelidis and Degiannakis (2007) examined the impact of different volatility forecasting models on the ES estimation within a strictly parametric framework by using the S&P 500 index, gold bullion US dollar per troy ounce and USD/GBP exchange rates. They showed that different volatility models were “optimal” for different assets. Chinhamu et al (2015) studied the applicability of the EVT-ES models on the gold market. Their results indicated that the generalized Pareto distribution (GPD) was superior to the traditional Gaussian and Student t models for ES estimations.

A relatively small number of papers are devoted to testing the applicability of nonparametric ES models, especially on the emerging markets. The first empirical study related to the applicability of a nonparametric approach to the estimation of ES was conducted by Giannopoulos and Tunaru (2005). They tested the applicability of the FHS model proposed by Barone-Adesi et al (1998) and found that the FHS generated ES estimates that satisfied the requirements of a coherent risk measure. However, they conducted their research using the S&P 500 index.

The first investigations related to the applicability of different ES models on emerging markets were conducted by Žiković (2008) and Žiković and Filer (2013). In the former, Žiković analyzed the possibility of applying parametric and nonparametric ES models to the capital markets of the former Yugoslavian states. He analyzed a simple moving average volatility model with the Fréchet distribution and the Gumbel distribution, the bootstrapped HS, the generalized autoregressive conditional heteroscedasticity (GARCH) volatility model with the Fréchet distribution and the Gumbel distribution, as well as the bootstrapped HHS CVaR model. Error statistics show that ES models are quite successful in capturing the extreme losses that occurred on these markets, especially the models based on the generalized extreme value distribution and the HHS CVaR model. To compare the models, Žiković used the four symmetrical error statistics. Žiković and Filer (2013) compared the performance of the ES models by using daily returns data for sixteen stock market indexes (eight from developed markets and eight from emerging markets) prior to and during the 2008 financial crisis. They showed that in the ES estimation the HHS model yielded the smallest error statistics, surpassing even the EVT models, especially for the developed markets. However, no backtesting procedures or procedures for verifying the backtesting results of the ES estimates were used in the two studies.

3 Expected shortfall

VaR estimates do not always fulfill all the characteristics of coherent risk measures or universally exhibit subadditivity. The risk of a portfolio can be greater than the sum of the stand-alone risks of its components. Hence, managing risk by VaR may fail to stimulate diversification. Moreover, VaR does not take into account the severity of an incurred damage event. For a subadditive measure, portfolio diversification always leads to risk reduction, while for measures violating this axiom, diversification may increase their value even when partial risks are triggered by mutually exclusive events (Acerbi and Tasche 2002). Further, Acerbi et al (2008) showed that, in the case of complex portfolios exposed to many risk variables, as in financial institutions, the computation of VaR can often be a formidable task. This is because the computation cannot be split into separate subcomputations due to the twofold nonadditivity of VaR.

  1. (1)

    Nonadditivity by a position: for a portfolio made up of two subportfolios, the total VaR is not obtained by summing the two partial VaRs, with the consequence that adding a new instrument to a portfolio often makes it necessary to recompute the VaR for the whole portfolio.

  2. (2)

    Nonadditivity by a risk variable: for a portfolio depending on multiple risk variables, the VaR is not the sum of partial VaRs.

In both cases, for the normally distributed returns of the portfolio, it is possible to show that nonadditivity is actually subadditivity. As a response to these limitations of VaR, many authors have suggested that alternative approaches to the management and assessment of market risks in banks and other financial institutions should be applied. Thus, research groups led by Artzner, Albrecht, Acerbi, Embrechts, Rockafellar and Uryasev have advocated the use of a coherent measure of risk that includes losses in excess of the VaR. More precisely, they describe and provide mathematical definitions for four measures of risk that include losses in excess of the VaR:

  • the tail conditional expectation (Artzner et al 1999);

  • the worst conditional expectation (Artzner et al 1999);

  • CVaR (Rockafellar and Uryasev 2002);

  • ES (Acerbi and Tasche 2002).

The risk measure proposed by Artzner et al (1999), known as tail conditional expectation in the literature, measures the expected loss in the 100p% worst cases and is expressed as follows:

  ESα=-E[rrVaRα].   (3.1)

This is a coherent measure of risk. However, Voit (2007) points out that, in the event of noncontinuous distributions, there are certain mathematical refinements that may reduce the subadditivity requirement, which is a necessary condition for the coherence characteristic. In those cases, the fulfillment of the coherence condition must be taken into account. For distributions with possible discontinuities, there is a more subtle definition, which may differ depending on whether a loss is strictly greater than VaR (CVaR+) or greater than or equal to VaR (CVaR-) (Žiković 2008). Rockafellar and Uryasev (2002) propose that the ES can be obtained as the weighted average of CVaR+ and VaR, and it is a coherent measure in the sense of Artzner et al (1999), namely

  ESα=Fx(VaRα(X))-α1-αVaRα(X)+1-FxVaRα(X)1-αCVaRα+(X),   (3.2)

where F(x) is the cumulative distribution function.

The difference between (3.1) and (3.2) embodies an adjustment needed to ensure the coherence of the statistic under noncontinuous distributions (for continuous distributions, the weights are obviously equal to 0 and 1 for VaR and CVaR, respectively).

Unlike CVaR, ES is expressed through the following equation:

  ESα=-CVaRα+(λ-1)(CVaRα-VaRα),   (3.3)

where λPt-1[XtVaRα,t]/α1 is always a coherent measure. In a downside-risk analysis, the most common choice is that stipulating λ=0.01 (see Rubia and Sanchis-Marco 2017).

Although Acerbi and Tasche (2002) provided a more rigorous coherent measure, their paper focused on the ES proposed by Artzner et al (1999). The reason for this is that the two measures are the same over the class of continuous distributions, and continuous probability distributions are used when dealing with market risk in practice. Most importantly, Inui and Kijima (2005) proved that ES provides the minimum value of a class of plausible coherent risk measures. Moreover, they showed that any coherent risk measure is given by a convex combination of ESs. Acerbi (2004) remarks that ES is less sensitive to the choice of confidence interval than VaR. This is obvious since the former measure of risk takes into consideration the whole tail. As such, this robustness property is an extra argument in favor of using ES.

Despite the theoretical superiority of CVaR over VaR, it has its own problems. Yamai and Yoshiba (2002a,b) compared the two risk measures in terms of estimation errors, decomposition into risk factors and optimization. They also investigated their validity during periods of market turmoil. CVaR can easily be decomposed and optimized, whereas VaR cannot. CVaR requires a larger sample size than VaR at the same level of accuracy. Both measures seem to underestimate the risk of securities with fat-tailed properties and a high potential for large losses. However, this problem is less acute for ES. Yamai and Yoshiba found that, for a certain number of observations and at a certain confidence level, the accuracy of VaR and ES is about the same when the loss is normally distributed, but that VaR estimates are more accurate than ES estimates when such losses have fat tails. This means that the capital calculated from ES may be less stable than that calculated from VaR. Kondor and Varga-Haszonits (2008) find that whenever there is an asset in a portfolio that, with respect to risk and reward, dominates other risks in a given sample, such a portfolio’s return cannot be maximized under any coherent measure on that sample, CVaR included. In periods of high volatility and/or extreme price spikes, classical, widely used VaR models prove to be overly liberal and optimistic, which is definitely a problem in risk management (Žiković 2008).

4 Estimating the expected shortfall with semi-parametric expected shortfall approaches

The data we used to estimate the VaR and ES was made up of the daily logarithmic returns of the stock indexes from the capital markets of Spain, Portugal, Greece, Turkey, Slovenia, Croatia, Malta and Bosnia and Herzegovina, as well as Serbia and Ireland. There are six Southern European/Mediterranean countries that were not included in our research:

  • France and Italy, since these belong in the highly developed countries category, as their capital markets are significantly more developed, with significantly different characteristics than the countries that are the subject of this research;

  • Albania, since it has no quoted securities on the stock exchange, and thus there is no trading on it;

  • Monaco, since it has no stock exchange;

  • Gibraltar, since there is no defined stock exchange index that could serve as a proxy; and

  • Montenegro, since the data about the market index for the subperiod over which this research was being conducted is missing.

Serbia is included in the research since it belonged to the Mediterranean countries through different state communities until 2006 and has similar characteristics to the remaining ex-Yugoslavian countries belonging to the Mediterranean countries category. Ireland is included because it used to have a development path and market characteristics similar to the leading Mediterranean countries included in this study (Spain, Portugal and Greece).

The stock indexes tested were IBEX35 (Spain), PSI20 (Portugal), ATHEX20 (Greece), CROBEX (Croatia), SAXS (Bosnia and Herzegovina), XU100 (Turkey), SBITOP (Slovenia), MSE (Malta), BELEXline (Serbia) and ISEQ (Ireland). The returns were collected for the period from January 1, 2015 to January 1, 2018. The calculated VaR and ES figures are for the one-day-ahead horizon for the period from January 1, 2017 to January 1, 2018, according to the Basel III standard. The VaR and ES estimates were calculated for 99% and 97.5% confidence levels, respectively. The remaining observations were used as the resample observations needed for the VaR and ES starting values.

As an example of the bootstrap HS approach, the bootstrap HS500 models were used. The bootstrap HS estimated shortfall is obtained as follows:

  ESE(rr>VaR)=1n-[ncl][i=[ncl]nr^n(i)],   (4.1)

where r^n(1)r^n(2)r^n(n) are the order statistics from the bootstrapping series r^.

The FHS estimated shortfall is obtained as follows:

  ESE(rr>VaR)=1n-[ncl][i=[ncl]nr^n(i)*],   (4.2)

where r^n(1)r^n(2)r^n(n) are the order statistics from the volatility-adjusted bootstrapped series r^*. The volatility-adjusted bootstrapped series r^* was obtained according to the procedure proposed by Barone-Adesi et al (2002).

The HHS estimated shortfall is obtained as follows:

  ESE(rr>VaR)=1n-[ncl][i=[ncl]nZ^n(i)],   (4.3)

where Zt are the standardized tail losses and Z^n(1)Z^n(2)Z^n(n) are the order statistics from the volatility-scaled bootstrapped series Z^.

The HHS based on the EVT estimated shortfall is obtained as

  ES-EVT=VaRcl1-ξ+σ-ξu1-ξ,   (4.4)

noting that a VaRcl estimate can be calculated as

  VaRcl=xn-k(nk(1-cl))1/α^H,   (4.5)

where k is the number of values exceeding the threshold u, σ is the scale parameter and n is the number of observations. Because the approach is based on the assumption that extreme returns over a defined threshold u follow the GPD with the tail index ξ>0 (Radivojević et al 2016), the Hill estimator was used to estimate the ξ value of the extreme value distribution from the empirical data, as follows:

  α^H=1ki=1kln(xn-i+1)-ln(xn-k).   (4.6)

As the application of the bootstrap method requires that historical yields should be independent and identically distributed (iid), ie, autocorrelation and heteroscedasticity should be removed from them, the historical yields used in the models described above were standardized by applying the volatility estimations obtained from the appropriate GARCH model, as recommended by the creators of these models. However, as the Southern European/Mediterranean markets are known for the presence of the leverage effect, and as the ARCH models that can co-opt this characteristic are more desirable, the volatility estimates obtained by applying the exponential GARCH (EGARCH), Glosten–Jagannathan–Runkle GARCH (GJR–GARCH) and APARCH models are used for the transformation of the historical yields. As Radivojević et al (2016) and Rossignolo et al (2013, 2012) highlight that the assumption of the innovations distribution is more important to these countries than the volatility model specification, we calculated the volatility estimations under the assumption that innovations follow the normal, Student t and generalized error distributions. The tables included in the paper and the online appendix show the parameters of the volatility models with an appropriate distribution assumption that are the best fit to the loglikelihood information criterion. The volatility models were selected for the VaR and ES estimations according to the same criterion. For example, risk estimation with the BELEXline was done by applying the EGARCH model with a normal distribution, whereas for the PSI index the APARCH model with a Student t distribution was used.11 1 The parameters for all models as well as the estimations of all possible combinations of models tested in this paper are available from the authors upon request.

At the beginning of our analysis, the characteristics of the selected markets for the entire observation period were analyzed. Table 1 summarizes the descriptive statistics and the normality test. The indexes show a relatively large difference between the minimum and maximum values of the returns. The standard deviations are also relatively high. Our analysis shows that the indexes have significantly fatter distribution tails than are assumed under normality. The excess kurtosis is significant at a 5% level for all the indexes considered. The coefficients of the excess kurtosis range from 2.4518 (BELEXline) to 13.2578 (IBEX35). This indicates that the indexes have a significant leptokurtosis. Also, the skewness is significant at a 5% level for all the indexes considered. The skewness of all the indexes is significantly different from zero, which indicates that the indexes have asymmetric returns. A negative skewness was recorded for all the indexes except ISEQ. In order to formally examine whether the returns follow the normal distribution, the Jarque–Bera test was used. The Jarque–Bera value indicates that the null hypothesis – normality providing that the return series are not normally distributed – should be rejected. The Engel test, which is based on the Lagrange multiplier for the ARCH(1) model, was used to analyze the presence of ARCH effects. The results of the Engel test are surprising: the ARCH effect was not present for the capital markets of Serbia, Bosnia and Herzegovina, Turkey, Malta, Slovenia and Spain, although the first two markets are less developed than the others in this study. The presence of a first-order ARCH effect was recorded on all the other markets.

Table 1: The descriptive statistics of selected indexes. [The p-values are reported in parentheses. Source: authors’ calculations.]
  BELEXline CROBEX SAXS10 PSI20 IBEX35 ISEQ ATHEX XU100 SBITOP MSE
Mean 0.0003 0.0001 -0.0003 -0.0002 00.0000 0-0.0004 00.0000 0.0000 0.0000 0.0002
Standard deviation 0.0057 0.0058 0.0078 0.0120 00.0133 00.0115 00.0224 0.0124 0.0067 0.0041
Ex. kurtosis 2.4518 5.0784 6.4783 3.6404 13.2578 13.1606 11.3323 2.5340 4.1536 1.1561
Skewness -0.0481 -0.6602 -0.6221 -0.6072 0-1.2384 01.5816 0-0.7229 -0.3105 -0.5182 -0.2599
Range 0.0569 0.0540 0.0821 0.1185 00.1808 00.1486 00.2894 0.1248 0.0733 0.0307
Min. values -0.0317 -0.0311 -0.0463 -0.0725 0-0.1319 0-0.0445 0-0.1771 -0.0708 -0.0473 -0.0164
Max. values 0.0252 0.0229 0.0358 0.0460 00.0489 00.1042 00.1123 0.0540 0.0260 0.0143
No. of observations 757 749 753 768 768 761 712 757 749 621
Jarque–Bera test 186.19 845.84 1344.53 463.79 5740.71 5729.64 3812.72 214.69 571.943 41.581
  (0.000) (0.000) (0.000) (0.000) (0.000) (0.000) (0.000) (0.000) (0.000) (0.000)
ARCH effect 0.966 88.710 0.820 35.571 2.010 137.81 6.759 0.2484 0.5484 2.7941
  (0.325) (0.000) (0.365) (0.000) (0.156) (0.000) (0.009) (0.6182) (0.4589) (0.0946)

The maximum likelihood of the estimated parameters of the GARCH(p,q), the appropriate EGARCH(p,q), GJR(p,q)–GARCH(p,q) and APARCH(p,q) models, autoregressive moving average–generalized autoregressive conditional heteroscedasticity (ARMA(p,q)–GARCH(p,q)) models, ARMA(p,q)–EGARCH(p,q)/GJR(p,q)–GARCH(p,q)/APARCH(p,q) models and the GPD (based on the transformation of the yields by applying the volatility models co-opting the leverage effect) are shown in Table 2, parts (a)–(c) of Table 3 and Tables 46, respectively. All the estimated parameters are statistically significant.

Table 2: Estimates of the parameters of the GARCH(p,q) model. [The p-values are given in parentheses. The boldface entries here and throughout indicate that the parameter is not significant. Source: authors’ calculations.]
Parameter BELEXline CROBEX SAXS10 PSI20 IBEX35 ISEQ ATHEX XU100 SBITOP MSE
α 0.075 0.077 0.055 0.197 0.142 0.180 0.145 0.0205 0.0385 0.0905
  (0.007) (0.000) (0.063) (0.007) (0.024) (0.001) (0.000) (0.000) (0.399) (0.032)
β 0.842 0.825 0.897 0.770 0.817 0.742 0.849 0.9731 0.7057 0.738
  (0.000) (0.000) (0.000) (0.000) (0.000) (0.000) (0.000) (0.000) (0.079) (0.000)
ω 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.0008 0.000 0.000
  (0.085) (0.000) (0.000) (0.000) (0.063) (0.048) (0.056) (0.008) (0.484) (0.092)
Loglikelihood 2856.83 2856.48 2603.62 2392.43 2290.41 2424.11 1859.06 2259.05 N/A 2536.46

The results show that the majority of the markets are characterized by the presence of asymmetry. Bearing in mind the results presented in Table 3(a), in most of the countries the negative innovation (bad news) has a greater influence on the destabilization of the market than the positive innovation. Bearing in mind this observation, the yields transformed by applying the ARMA–EGARCH/GJR–GARCH/APARCH models were used to assess gross domestic product in the HHS–EVT models.

The threshold value for each index is determined by applying the rule of thumb for determining the threshold proposed by Christoffersen (2011). This instruction is applied in the paper. The value of the thresholds and the maximum likelihood estimates of the tail index and sigma for each stock index are presented in Table 6.

5 Backtesting results and validation

ES backtesting is significantly more complex than VaR backtesting. This is why the Basel III standard is not the prescribed way to backtest the validity of ES assessments. Numerous authors, such as Emmer et al (2013), have recommended different methods for ES backtesting, and most agree that the first step in testing the validity of risk models implies VaR backtesting. Namely, ES may be widely accepted as a measure of risk due to its coherence, but it is obvious from the foregoing formulas that, practically speaking, a bad estimation of VaR implies a misestimation of ES (Giannopoulos and Tunaru 2005). For this reason, Kupiec’s unconditional coverage test (LRuc test) and Christofferson’s conditional coverage test (LRcc test) were used in this paper. Both tests were used at a 5% significance level.

Table 3: Estimates of the parameters of the appropriate leverage effects obtained by applying the different volatility models. [The p-values are given in parentheses. For part (a), if g=0, then the EGARCH(p,q) model is symmetric. When g<0, positive shocks (good news) generate less volatility than negative shocks (bad news). When g>0, this implies that positive innovations are more destabilizing than negative innovations. N/A denotes that the value was impossible to calculate, or was not being calculated as the model was not statistically significant. Source: authors’ calculations.]
(a) EGARCH(p,q)
Parameter BELEXline CROBEX SAXS10 PSI20 IBEX35 ISEQ ATHEX XU100 SBITOP MSE
α 0.165 0.205 0.160 0.167 0.069 0.221 0.216 0.0194 0.007 0.047
  (0.000) (0.000) (0.000) (0.000) (0.069) (0.000) (0.000) (0.590) (0.830) (0.015)
β 0.927 0.925 0.937 0.973 0.974 0.957 0.973 0.979 0.974 0.988
  (0.000) (0.000) (0.000) (0.000) (0.000) (0.000) (0.000) (0.000) (0.000) (0.000)
ω -0.878 -0.938 -0.713 -0.373 -0.285 -0.564 -0.379 -0.205 -0.263 -0.167
  (0.004) (0.005) (0.000) (0.007) (0.007) (0.006) (0.000) (0.085) (0.330) (0.098)
γ 0.053 -0.049 -0.042 -0.117 -0.109 -0.094 -0.063 0.073 -0.043 -0.040
  (0.033) (0.098) (0.017) (0.000) (0.000) (0.022) (0.042) (0.000) (0.042) (0.016)
η 4.926 6.73 5.402 8.01 4.912 6.333 6.127
  (0.000) (0.000) (0.000) (0.000) (0.000) (0.000) (0.000)
Loglikelihood 2862.53 2891.55 2596.66 2419.33 2347.27 2441.2 1903.15 N/A N/A 2538.73
(b) GRJ GARCH(p,q)
Parameter BELEXline CROBEX SAXS10 PSI20 IBEX35 ISEQ ATHEX XU100 SBITOP MSE
α 0.064 0.059 0.058 0.108 0.078 0.112 0.147 0.010 0.010 0.094
  (0.003) (0.003) (0.000) (0.021) (0.037) (0.002) (0.000) (0.048) (0.362) (0.148)
β 0.866 0.821 0.895 0.844 0.851 0.810 0.845 0.961 0.929 0.731
  (0.000) (0.000) (0.000) (0.000) (0.000) (0.000) (0.000) (0.000) (0.000) (0.000)
ω 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
  (0.022) (0.000) (0.000) (0.070) (0.032) (0.039) (0.000) (0.269) (0.271) (0.202)
γ -0.257 0.365 -0.099 0.436 0.556 0.274 0.098 1.026 0.974 0.169
  (0.073) (0.092) (0.245) (0.008) (0.041) (0.073) (0.077) (0.000) (0.283) (0.446)
η 6.393 5.099 7.864 1.338 6.117 1.285
  (0.000) (0.000) (0.000) (0.000) (0.000) (0.000)
Loglikelihood 2857.34 2857.15 N/A 2415.91 2342.92 2437.6 1859.10 N/A N/A N/A
(c) APARCH(p,q)
Parameter BELEXline CROBEX SAXS10 PSI20 IBEX35 ISEQ ATHEX XU100 SBITOP MSE
α 0.081 0.102 0.000 0.091 0.078 0.116 0.108 N/A 0.007 0.029
  (0.000) (0.002) (0.927) (0.001) (0.000) (0.000) (0.000) (0.678) (0.269)
β 0.869 0.821 0.929 0.908 0.921 0.838 0.898 N/A 0.918 0.935
  (0.000) (0.000) (0.000) (0.000) (0.000) (0.000) (0.000) (0.000) (0.000)
ω 0.000 0.000 0.000 0.000 0.000 0.000 0.000 N/A 0.000 0.000
  (0.019) (0.002) (0.363) (0.042) (0.004) (0.077) (0.001) (0.023) (0.371)
γ -0.329 0.389 -0.293 0.726 0.680 0.312 0.529 N/A 0.946 0.381
  (0.046) (0.093) (0.000) (0.007) (0.003) (0.099) (0.001) (0.465) (0.451)
δ 0.680 1.085 13.102 0.994 0.784 1.590 0.482 N/A 2.333 2.063
  (0.012) (0.061) (0.154) (0.001) (0.000) (0.005) (0.015) (0.080) (0.074)
η 1.255 6.728 7.758 N/A 6.221 5.860
  (0.000) (0.000) (0.000) (0.000) (0.000)
Loglikelihood 2862.31 2887.85 N/A 2419.65 2299.07 2437.81 1871.91 N/A N/A N/A
Table 4: Estimates of the parameters of the ARMA(p,q)–GARCH(p,q) models. [The p-values are given in parentheses. ARMA, autoregressive moving average. Source: authors’ calculations.]
Parameter BELEXline CROBEX SAXS10 PSI20 IBEX35 ISEQ ATHEX XU100 SBITOP MSE
AR(p) 0.111 0.107 0.850 -0.538 -0.476 -0.349 -0.797
  (0.001) (0.003) (0.000) (0.000) (0.084) (0.023) (0.000)
MA(q) -0.917 0.666 0.519 0.472 0.067 0.772 -0.121
  (0.000) (0.000) (0.052) (0.000) (0.086) (0.000) (0.003)
α 0.105 0.103 0.058 0.176 0.146 0.180 0.141 0.0205 0.033 0.084
  (0.000) (0.000) (0.044) (0.046) (0.023) (0.001) (0.000) (0.000) (0.430) (0.036)
β 0.832 0.806 0.892 0.793 0.813 0.751 0.853 0.9731 0.765 0.742
  (0.000) (0.000) (0.000) (0.000) (0.000) (0.000) (0.000) (0.000) (0.053) (0.000)
ω 0.000 0.000 0.000 0.000 0.000 0.0008 0.000 0.000
  (0.083) (0.025) (0.063) (0.003) (0.062) (0.008) (0.576) (0.099)
Loglikelihood 2879.28 2898.68 2608.95 2395.41 2289.86 2423.24 1859.87 2259.05 N/A 2540.04
Table 5: Estimates of the parameters of the ARMA(p,q)–EGARCH(p,q)/GJR(p,q)–GARCH(p,q)/APARCH(p,q) models. [The p-values are reported in parentheses. Source: authors’ calculations]
Parameter BELEXline CROBEX SAXS10 PSI20 IBEX35 ISEQ ATHEX XU100 SBITOP MSE
AR(p) 0.111 0.107 0.850 -0.538 -0.476 -0.349 -0.797
  (0.001) (0.003) (0.000) (0.000) (0.084) (0.023) (0.000)
MA(q) -0.917 0.666 0.519 0.472 0.067 0.772 -0.121
  (0.000) (0.000) (0.052) (0.000) (0.086) (0.000) (0.003)
α 0.143 0.140 0.167 0.088 0.062 0.118 0.235 0.0194 N/A 0.047
  (0.000) (0.000) (0.000) (0.000) (0.096) (0.000) (0.000) (0.590) (0.012)
β 0.935 0.919 0.942 0.912 0.974 0.791 0.983 0.979 N/A 0.986
  (0.000) (0.000) (0.000) (0.000) (0.000) (0.000) (0.000) (0.000) (0.000)
ω -0.789 -0.954 -0.673 0.000 -0.279 0.000 -0.311 -0.205 N/A -0.186
  (0.009) (0.000) (0.000) (0.014) (0.000) (0.010) (0.000) (0.085) (0.200)
γ 0.0499 0.094 -0.012 0.690 -0.113 0.392 -0.052 0.073 N/A -0.031
  (0.044) (0.000) (0.000) (0.594) (0.000) (0.002) (0.032) (0.000) (0.062)
δ 0.872 2.021 N/A
  (0.002) (0.002)
η 6.333 N/A
  (0.000)
Loglikelihood 2863.31 2860.5 N/A 2407.91 2347.79 2427.79 1866.6 N/A N/A N/A
Table 6: Estimates of the threshold u, tail index ξ and scale parameter σ. [Source: authors’ calculations.]
Parameter BELEXline CROBEX SAXS10 PSI20 IBEX35 ISEQ ATHEX XU100 SBITOP MSE
u -1.7023 -1.721 -1.705 -1.693 -1.8516 -1.720 -1.686 -1.560 -1.877 -3.114
ξ 0.1724 0.2891 0.341 0.269 0.2656 0.260 0.220 0.369 0.233 0.286
σ 0.794 1.123 1.092 0.990 1.330 0.652 1.533 0.769 0.399 1.077
Table 7: VaR backtesting. [If there is no cluster of VaR breaks, an alternative formula is used to calculate the first-order Markov likelihood (see Brandolini and Colucci 2013). The difference in the values of tests with the same number of breaks is a consequence of the difference in the number of backtesting days. The samples in which no test can be computed are omitted due to the lack of VaR breaks. “CV” denotes critical value, and the subscript “assy” signifies the models are based on volatility models that co-opt the leverage effect. *No volatility model co-opting the leverage effect was used for this market; the different values of the tests related to the market are the result of the different simulations, although the VaR and ES estimations are the same. This is the case with Slovenia’s capital market. Source: authors’ calculations.]
(a) The LRuc test for 99%VaR
    BootstrapHS ??????????????? FHS500 ??????????
           
Stock Backtesting No. of CV of   No. of CV of   No. of CV of   No. of CV of  
index days breaks ???? ?-value breaks ???? ?-value breaks ???? ?-value breaks ???? ?-value
BELEXline 252 2 0.129 0.719 0 3 0.075 0.782 3 00.087 0.768
CROBEX 247 2 0.097 0.756 2 0.097 0.756 2 0.104 0.746 5 02.018 0.155
SAXS 250 2 0.112 0.737 1 1.176 0.278 1 1.188 0.275 1 01.176 0.278
PSI20 262 1 1.386 0.239 1 1.324 0.250 4 1.634 0.201 2 00.161 0.688
IBEX35 265 0 1 1.361 0.243 2 0.185 0.666 1 01.250 0.264
ISEQ 261 3 0.056 0.812 3 0.056 0.813 0 1 01.311 0.252
ATHEX 211 0 1 0.733 0.392 0 1 00.733 0.392
XU100 257 0 0 0 0
SBITOP* 257 0 0 1 1.262 0.261 1 01.262 0.261
MSE 240 0 0 2 0.078 0.780 1 01.057 0.304
    HHS500 ?????????? HHS–EVT HHS–EVT????
           
Stock Backtesting No. of CV of   No. of CV of   No. of CV of   No. of CV of  
index days breaks ???? ?-value breaks ???? ?-value breaks ???? ?-value breaks ???? ?-value
BELEXline 252 2 0.129 0.719 2 0.129 0.719 0 5 01.917 0.166
CROBEX 247 3 0.099 0.752 7 5.608 0.018 0 11 16.102 0.000
SAXS 250 2 0.112 0.737 2 0.112 0.737 0 2 00.108 0.742
PSI20 262 0 1 1.324 0.250 0 2 00.161 0.688
IBEX35 265 0 2 0.176 0.675 0 4 00.601 0.438
ISEQ 261 3 0.056 0.812 4 0.643 0.423 0 2 00.157 0.692
ATHEX 211 0 0 0 1 00.733 0.392
XU100 257 0 0 8 7.425 0.006 8 07.425 0.006
SBITOP* 257 0 0 0 0
MSE 240 0 0 0 5 02.168 0.141
(b) The LRcc test for 99%VaR
    BootstrapHS ??????????????? FHS500 ??????????
           
Stock Backtesting No. of CV of   No. of CV of   No. of CV of   No. of CV of  
index days breaks ???? ?-value breaks ???? ?-value breaks ???? ?-value breaks ???? ?-value
BELEXline 252 2 0.129 0.937 0 3 0.075 0.962 3 0.087 0.957
CROBEX 247 2 5.524 0.063 2 7.574 0.023 2 0.104 0.949 5 2.018 0.364
SAXS 250 2 0.112 0.945 1 1.178 0.555 1 1.190 0.551 1 1.178 0.555
PSI20 262 1 1.388 0.500 1 1.325 0.516 4 4.919 0.085 2 0.161 0.922
IBEX35 265 0 1 1.363 0.506 2 0.186 0.911 1 1.251 0.535
ISEQ 261 3 0.056 0.972 3 0.056 0.972 0 1 1.313 0.519
ATHEX 211 0 1 0.734 0.693 0 1 0.734 0.693
XU100 257 0 0 0
SBITOP* 257 0 1 1.263 0.532 1 1.263 0.532
MSE 240 0 0 2 0.078 0.962 1 1.059 0.589
    HHS500 ?????????? HHS–EVT HHS–EVT????
           
Stock Backtesting No. of CV of   No. of CV of   No. of CV of   No. of CV of  
index days breaks ???? ?-value breaks ???? ?-value breaks ???? ?-value breaks ???? ?-value
BELEXline 252 2 0.129 0.937 2 0.129 0.937 0 5 1.917 0.384
CROBEX 247 3 5.524 0.063 7 5.608 0.061 0 11 16.102 0.000
SAXS 250 2 0.112 0.945 2 0.112 0.945 0 2 0.108 0.947
PSI20 262 0 1 1.325 0.516 0 2 0.161 0.922
IBEX35 265 0 2 0.176 0.916 0 4 0.601 0.741
ISEQ 261 3 0.056 0.972 4 0.643 0.725 0 2 0.157 0.925
ATHEX 211 0 0 0 1 0.734 0.693
XU100 257 0 0 8 8.857 0.012 8 8.857 0.012
SBITOP* 257 0 0 0 0
MSE 240 0 0 0 5 2.168 0.338

As can be seen from Table 7 and Table A1 in the online appendix, all the models except for HS500 satisfied both the validity criteria of the VaR model. Note that the models generated risk estimations that were too conservative, which is not very desirable from the point of view of the bank capital charge. In this case, additional capital needs to be singled out, which will negatively impact profitability. This can be ascribed to the deficiency suffered by the HS approach, ie, the absence of assumptions about the evolution of risk factors. This implies that VaR will be underestimated if too peaceful a volatility period is co-opted, or overestimated if an exceptional volatility period is co-opted. It is obvious that too volatile a period was co-opted for the countries tested, so that the extreme loss recorded throughout the period influences the overly conservative estimations of VaR. However, the questionable statistical power of both tests when applied to finite samples is known to be their main drawback (see, for example, Christoffersen and Pelletier 2004; Berkowitz et al 2008; Wied et al 2016). Both tests were developed by using asymptotic arguments, which may create difficulties when applied to finite samples. The first test is asymptotically distributed as χ2 with one degree of freedom under the null hypothesis that the tail probability p is the true probability. The second test is asymptotically distributed as χ2 with two degrees of freedom under the null hypothesis that the hit sequence is iid Bernoulli with the mean equal to the confidence level for which VaR is performed. Asymptotically, ie, as the number of observations T goes to , the LRuc test will be distributed as χ2 with one degree of freedom. It is the same with the LRcc test. In sufficiently large samples, the LRcc test will be distributed as χ2 with two degrees of freedom. Radivojević et al (2016) showed that when the number of VaR breaks is small there are substantial differences between the asymptotic probability distributions of the tests considered and their finite sample analogs. For this reason, the backtesting results obtained by these tests must be verified. To do this, Monte Carlo testing is used, where the p-value is given by

  p-value=110 000{1+i=19999(ILR~uc(i)>LRuc)}.   (5.1)

Here, the sample size equals that of the actual sample, LRuc and LR~uc are the actual and the simulated values of the unconditional test, respectively, and I() assumes the value 1 if the argument is true and 0 otherwise. The p-value for the conditional coverage test is calculated in the same way. The results of these simulations are shown in Table 8. The average feasible rate of the LRuc test is 0.733, whereas for the LRcc test it is 0.719.

The simulated p-values confirm the previous backtesting results. All the tested models except for HS500 for the CROBEX and MSE indexes, the HHS based on the EVT for the PSI20 index and HHSassy based on the EVT for the XU100 index met the validity criteria of the VaR model.

Bearing in mind the issue raised by Gneiting (2012), Berkowitz’s ES backtest, based on the Levy–Rosenblatt transformation, was used in this paper. Berkowitz (2001) proposed the following likelihood ratio test:

  LRBT=2[lnL(μ=μ^ML,σ2=σ^ML)-lnL(μ=0,σ2=1)].   (5.2)

The LRBT test is asymptotically distributed as χ2 with two degrees of freedom. Using the Berkowitz test, we compared the shape of the forecasted tail of the density function with that of the observed tail. Any observations that did not fall within the tail were truncated, noting that the threshold was defined as follows:

  THi,t=max{ES1,ES2,,ESt}.   (5.3)

The results of the Berkowitz test are presented in Table 9.

The bootstrap simulation was used to verify the validity of the ES model, where F is the unknown cumulative distribution function of the estimator θ^. In fact, the F value of our ES estimates was estimated by repeating the simulations using the appropriate models several times. The number of bootstrap repetitions was determined by the Andrews and Buchinsky (1997) procedure. The determination of this number is particularly important in this case because the sample size of the breaks used in obtaining a single ES estimate is a small fraction of the number of draws. The procedure for calculating the p-value is then continued by analogy, as previously described. The results of the bootstrap Berkowitz test are presented in Table 10 and in Table A1 in the online appendix.

Table 8: The backtesting results based on the simulated p-values. [The significance level is 5%. Source: authors’ calculations.]
(a) The LRuc test for 99%VaR
  ?-value
Stock  
index Bootstrap HS ??????????????? FHS500 ?????????? HHS500 ?????????? HHS–EVT HHS–EVT????
BELEXline 0.094 0.096 0.117 0.207 0.223 0.234 0.169 0.732
CROBEX 0.308 0.435 0.2 0.113 0.103 0.348 0.068 0.043
SAXS 0.157 0.287 0.197 0.642 0.217 0.119 0.289 0.623
PSI20 0.691 0.498 0.068 0.238 0.266 0.432 0.091 0.423
IBEX35 0.306 0.232 0.138 0.171 0.089 0.237 0.118 0.441
ISEQ 0.492 0.371 0.301 0.467 0.331 0.399 0.264 0.338
ATHEX 0.199 0.208 0.294 0.171 0.146 0.208 0.173 0.271
XU 0.205 0.369 0.305 0.458 0.352 0.699 0.038 0.037
SBITOP 0.333 0.278 0.382 0.507 0.199 0.434 0.434 0.893
MSE 0.381 0.890 0.108 0.412 0.253 0.223 0.551 0.701
(b) The LRcc test for 99% VaR
  ?-value
Stock  
index BootstrapHS ??????????????? FHS500 ?????????? HHS500 ?????????? HHS–EVT HHS–EVT????
BELEXline 0.301 0.402 0.249 0.665 0.358 0.556 0.241 0.671
CROBEX 0.077 0.258 0.101 0.401 0.212 0.237 0.331 0.547
SAXS 0.196 0.114 0.067 0.359 0.397 0.656 0.214 0.256
PSI20 0.352 0.602 0.088 0.229 0.059 0.471 0.017 0.647
IBEX35 0.088 0.701 0.135 0.803 0.186 0.257 0.224 0.236
ISEQ 0.073 0.345 0.273 0.492 0.292 0.397 0.199 0.347
ATHEX 0.19 0.191 0.311 0.319 0.333 0.417 0.366 0.168
XU100 0.531 0.608 0.357 0.226 0.344 0.422 0.046 0.045
SBITOP 0.5 0.27 0.653 0.396 0.567 0.333 0.389 0.499
MSE 0.444 0.834 0.703 0.656 0.308 0.618 0.738 0.711
Table 9: ES backtesting. [N/A means that the test was not calculated because the model did not pass the validity criterion of the VaR model, or that no testing was possible since there were no ES exceedances. The significance level was 5%. Source: authors’ calculations.]
    BootstrapHS ??????????????? FHS500 ??????????
Stock Backtesting        
index days CV of ???? ?-value CV of ???? ?-value CV of ???? ?-value CV of ???? ?-value
BELEXline 252 07.251 0.023 23.208 0.000 09.018 0.011 06.132 0.047
CROBEX 247 N/A N/A 16.620 0.000 13.339 0.001 14.538 0.001
SAXS 250 03.881 0.143 04.302 0.116 05.400 0.067 03.876 0.144
PSI20 262 14.508 0.000 00.369 0.832 00.460 0.749 00.117 0.943
IBEX35 265 06.474 0.039 03.904 0.142 02.278 0.320 03.078 0.215
ISEQ 261 08.263 0.016 07.865 0.931 01.502 0.471 05.200 0.074
ATHEX 211 01.332 0.513 05.377 0.068 03.265 0.195 03.894 0.143
XU100 257 15.520 0.000 15.520 0.000 11.141 0.004 N/A N/A
SBITOP 257 00.276 0.871 N/A N/A 00.012 0.994 N/A N/A
MSE 240 N/A N/A 17.475 0.000 N/A N/A N/A N/A
    HHS500 ?????????? HHS–EVT HHS–EVT????
Stock Backtesting        
index days CV of ???? ?-value CV of ???? ?-value CV of ???? ?-value CV of ???? ?-value
BELEXline 252 01.390 0.499 N/A N/A 00.291 0.896 11.786 0.003
CROBEX 247 17.668 0.000 24.246 0.000 00.222 0.894 17.246 0.000
SAXS 250 03.435 0.179 05.628 0.060 00.166 0.920 05.683 0.058
PSI20 262 00.202 0.903 00.353 0.838 N/A N/A 00.112 0.945
IBEX35 265 05.631 0.059 11.392 0.003 00.346 0.840 06.521 0.038
ISEQ 261 07.287 0.026 01.860 0.394 00.267 0.874 11.141 0.981
ATHEX 211 00.744 0.689 01.328 0.515 00.091 0.995 02.163 0.339
XU100 257 21.385 0.000 21.385 0.000 12.652 0.002 N/A N/A
SBITOP 257 00.819 0.664 00.819 0.664 01.631 0.442 01.631 0.442
MSE 240 N/A N/A 63.267 0.000 N/A N/A N/A N/A
Table 10: The results of the bootstrap Berkowitz test. [The average feasible LRBT is about 0.878. Source: authors’ calculations.]
  ?-value
Stock  
index BootstrapHS ??????????????? FHS500 ?????????? HHS500 ?????????? HHS–EVT HHS–EVT????
BELEXline 0.009 0.029 0.370 0.051 0.567 0.877 0.281 0.696
CROBEX 0.308 0.044 0.031 0.007 0.016 0.002 0.030 0.001
SAXS 0.294 0.237 0.01 0.678 0.375 0.241 0.551 0.335
PSI20 0.008 0.099 0.094 0.514 0.573 0.068 N/A 0.149
IBEX35 0.391 0.419 0.055 0.671 0.099 0.347 0.077 0.166
ISEQ 0.022 0.371 0.233 0.367 0.040 0.449 0.313 0.647
ATHEX 0.109 0.273 0.467 0.543 0.489 0.575 0.438 0.882
XU100 0.038 0.049 0.036 0.384 0.099 0.033 0.036 0.007
SBITOP 0.367 0.306 0.559 0.601 0.335 0.351 0.178 0.161
MSE N/A 0.190 N/A N/A N/A 0.106 N/A N/A

The results of the Berkowitz test show that the worst performer is the HS500 model, which did not satisfy the test on almost all the markets; this is followed by the BootstrapHS model, which did, however, satisfy the test on more than half the tested markets. This result confirms Žiković’s claims that purely semi-parametric ES estimation approaches, such as calculating the ES from the untransformed historical data set of tail losses, are certain to be unreactive to sudden shifts in market regimes and the occurrence of extreme events. The best performers are the FHSassy, FHS500, HHS500, HHSassy, HHS–EVTassy and HHS–EVT models, which satisfied the test on most of the markets. However, the observation that models based on volatility models capable of co-opting the leverage effect generally satisfied this test on the majority of the markets is interesting. Bearing these results in mind, it is possible to conclude that semi-parametric approaches to ES estimation that primarily use transformed historical data can be used reliably on the selected markets in the context of the Basel III standards, and that the use of volatility models capable of co-opting the leverage effect can improve their applicability to the selected markets. These findings are in line with the results presented by Žiković (2008) and Giannopoulos and Tunaru (2005).

As can be seen from Table 10, it was impossible to conduct the Berkowitz test in a large number of cases, primarily due to an insufficient number of exceedances and the fact that no test was carried out for models that did not satisfy the VaR models’ validity criteria. The Berkowitz test has two drawbacks: it requires parametric assumptions and it needs large samples. The need for parametric assumptions is not an issue if VaR is calculated using a parametric distribution. However, it is important to be able to distinguish a bad model from a bad parametric assumption. The major drawback of the test is the need for large samples, an unrealistic assumption in the backtesting of ES since there are always just a few losses at hand.

Bearing the above in mind, we also employed Acerbi and Szekely’s first method in order to draw a reliable conclusion about the applicability of the tested ES models to the selected markets (Acerbi and Székely 2014). The method is nonparametric, but similar to the Righi and Ceretta (2013) method in the sense that it defines a test statistic and aims for significance using simulations, thus solving the problem of small samples. The advantage the Acerbi–Szekely test offers is that no parametric assumption is needed. This method was selected bearing in mind the research results of Wimmerstedt (2015), which show that, unlike other tests that solve the problem of small samples (such as Wong’s saddle-point technique (2010), Righi and Ceretta’s truncated distribution (2013) and Emmer et al’s quantile approximation (2013)), Acerbi and Szekely’s first method has an optimal trade-off between its capability to accept valid ES models and the power to reject invalid ES models. Namely, it shows a higher rate of acceptance as the number of exceedances increases. The Acerbi–Szekely method also has stable rejection rates, as can be seen from the fact that they are not dependent or show very little dependence on the number of exceedances.

Acerbi and Szekely defined the null hypothesis

  Pt[α]=Ft[α]for all t   (H0)

against the alternatives

  ES^α,t(R) ESα,t(R)   for all t and > for some t,   (H1)
     
  VaR^α,t(R) VaRα,t(R)   for all t,  

where Ft is the realized distribution of returns, Pt[α] is the conditional distribution tail of the distribution of Pt below the quantile α, and ES^α,t(R) and VaR^α,t(R) are the sample ES and VaR from the realized returns. Under the null hypothesis, the realized tail is assumed to be the same as the predicted tail of the return distribution. The alternative hypothesis rejects the ES without rejecting the VaR. In order to test the null hypothesis, Acerbi and Székely (2014) defined the following test statistics:

  Z1(R)=t=1T(RtIt/ESα,t)Nt+1,   (5.4)

where R denotes the vector of realized returns (R1,R2,,Rt), It=?(Rt<VaRα(R)) is the indicator function that indicates the backtesting exceedance of VaR for the realized return Rt in the period t, and Nt=t=1TIt is the number of exceedances.

Table 11: The results of Acerbi and Szekely’s first method. [The p-values were obtained by applying 10 000 simulations. The average feasible test is about 0.836. Source: authors’ calculations.]
  ?-value
Stock  
index BootstrapHS ??????????????? FHS500 ?????????? HHS500 ?????????? HHS–EVT HHS–EVT????
BELEXline 0.011 0.056 0.324 0.114 0.13 0.078 0.178 0.353
CROBEX 0.257 0.245 0.009 0.070 0.257 0.331 0.000 0.001
SAXS 0.367 0.038 0.113 0.382 0.745 0.767 0.678 0.117
PSI20 0.019 0.455 0.364 0.149 0.288 0.068 0.221 0.786
IBEX35 0.478 0.317 0.397 0.218 0.772 0.138 0.034 0.227
ISEQ 0.033 0.273 0.781 0.381 0.117 0.327 0.714 0.473
ATHEX 0.647 0.669 0.116 0.223 0.0281 0.223 0.293 0.502
XU100 0.040 0.008 0.107 0.441 0.019 0.013 0.157 0.009
SBITOP 0.679 0.505 0.326 0.209 0.457 0.616 0.029 0.239
MSE 0.055 0.009 0.001 0.013 0.077 0.220 0.202 0.130

To test for significance with the above method, the simulations from the distribution under H0 were used. More precisely, we followed the steps below:

  1. (1)

    simulate Rti from Pt for all t and i=1,2,,M;

  2. (2)

    for every i, compute Zi=Z(Ri), ie, compute the value of Z1 using the simulations from the first step;

  3. (3)

    estimate the p-value as

      p=i=1MZi<Z(r)M,  

    where Z(x) is the observed value on Z1.

The results of Acerbi and Szekely’s first method shown in Table 11 and Table A1 in the online appendix confirm in principle the results of the Berkowitz test. According to this test, the HS500, based on untransformed yields, showed the weakest performance, followed by BootstrapHS and BootstrapHSassy. The best performance was achieved by FHS500assy, which only failed to satisfy the ES model validity criteria on Malta’s capital market, and HHSassy, which did not pass the test on Turkey’s market. It is interesting that the models did not pass the test on the markets where no presence of the ARCH effect was recorded. The rest of the models did not pass the test on at least two markets. A justification for the weaker performance of the HHS and HHS–EVT models in comparison with the FHS model may be found in the relatively small amount of data in the distribution tail, which might have resulted in less precise assessments of the distribution parameters of extreme yields. Interestingly, more models passed the Acerbi–Szekely test on Croatia’s capital market than passed the Berkowitz test, for which the historical simulation recorded the largest number of exceedances. This clearly indicates that the Acerbi–Szekely test depends only slightly on the number of VaR exceedances and that its acceptance rate increases as the number of exceedances increases, which is not the case with the Berkowitz test.

As most models satisfied the ES estimation validity criteria on a larger number of markets when the volatility models that co-opt the leverage effect were used than when the GARCH model was used, we can conclude that incorporating the volatility models that co-opt the leverage effect can improve their applicability.

6 Conclusion

The applicability of semi-parametric approaches to the estimation of ES was investigated. More precisely, the applicability of five models based on the HS approach was examined – one based on untransformed historical data, and the others based on transformed historical data – noting that, apart from the GARCH model, those volatility models capable of co-opting the leverage effect were also used for data transformation. Bearing in mind previous studies stating that the distributional assumption about innovations is more important to these markets than the volatility model specification, we used volatility models with three kinds of distributional assumption for yield transformation and risk estimation. Due to restrictions on paper length, only the assessments of the parameters and the risk estimations of those models selected according to the loglikelihood information criteria are shown.

We made the following findings. First, our study confirms Žiković’s claims that purely nonparametric ES estimation approaches, such as the common HS model, are certain to be unreactive to sudden shifts in market regimes and the occurrence of extreme events, and thus they can reliably be used to estimate the ES (Žiković 2008).

Second, generally speaking, the HS approaches based on transformed historical data can plausibly be used on the selected markets. We also found that the FHSassy model proposed by Barone-Adesi et al (1998, 1999) was the best performer, noting that this model is based on the volatility model co-opting the leverage effect rather than the original one based on the GARCH model. We also confirmed that incorporating the volatility models co-opting the leverage effect improves the applicability of these models.

The best estimations of ES were obtained by applying the HHS500 model proposed by Žiković (2008). The HHS–EVT model was the worst performer, as it generated overly conservative estimations of market risk. In order to backtest the VaR estimations, we used Kupiec’s and Christofferson’s tests; to test the ES estimations, we used the Berkowitz test. To validate the VaR backtesting results, we conducted Monte Carlo testing. The Berkowitz test results were verified by applying the bootstrap method. Since the sample of breaks used in obtaining a single ES estimate is a small fraction of the number of draws, it was important to choose the right number of bootstrap repetitions. To do this, we conducted the Andrews–Buchinsky procedure. Bearing in mind the deficiencies of the Berkowitz test, the ES estimations were also tested by applying Acerbi and Szekely’s first method. The results of this test in principle confirm our conclusions, but also lead us to question the prevailing attitude that if a model does not satisfy the VaR estimation validity criteria, it should not be considered for estimating ES.

Our findings indicate the need for significant discussion, primarily in the context of the prescription of the validity conditions for ES models by the Basel Committee.

Declaration of interest

The authors report no conflicts of interest. The authors alone are responsible for the content and writing of the paper.

References

  • Acerbi, C. (2004). Coherent representations of subjective risk aversion. In Risk Measures for the 21st Century, Szegö, G. (ed), pp. 147–207. Wiley.
  • Acerbi, C. (2007). Coherent measures of risk in everyday market practice. Quantitative Finance 7(4), 359–364 (https://doi.org/10.1080/14697680701461590).
  • Acerbi, C., and Székely, B. (2014). Backtesting expected shortfall. Research Paper, December, MSCI. URL: https://bit.ly/2nfWc3k.
  • Acerbi, C., and Tasche, D. (2002). On the coherence of expected shortfall. Journal of Banking and Finance 26, 1487–1503 (https://doi.org/10.1016/S0378-4266(02)00283-2).
  • Acerbi, C., Nordio, C., and Sirtori, C. (2008). Expected shortfall as a tool for financial risk management. Preprint (arXiv:mat/0102304v1).
  • Andrews, D., and Buchinsky, M. (1997). On the number of bootstrap repetitions for bootstrap standard errors, confidence intervals and tests. Paper CFDP 1141R, Cowles Foundation for Research in Economics, Yale University, New Haven, CT.
  • Angelidis, T., and Degiannakis, S. (2007). Backtesting VaR models: an expected shortfall approach. Working Paper 0701, Department of Economics, University of Crete (https://doi.org/10.2139/ssrn.898473).
  • Artzner, P., Delbaen, F., Elber, J., and Heath, D. (1999). Coherent measures of risk. Mathematical Finance 9, 203–228 (https://doi.org/10.1111/1467-9965.00068).
  • Barone-Adesi, G., Bourgoin, F., and Giannopoulos, K. (1998). Don’t look back. Risk 11(8), 100–104.
  • Barone-Adesi, G., Giannopoulos, K., and Vosper, L. (1999). VaR without correlations for portfolios of derivative securities. Journal of Futures Markets 19, 583–602 (https://doi.org/10.1002/(SICI)1096-9934(199908)19:5%3C583::AID-FUT5%3E3.0.CO;2-S).
  • Barone-Adesi, G., Giannopoulos, K. and Vosper, L. (2002). Backtesting derivative portfolios with FHS. European Financial Management 8, 31–58 (https://doi.org/10.1111/1468-036X.00175).
  • Basel Committee on Banking Supervision (2014). Fundamental review of the trading book: outstanding issues. Technical Report, Bank for International Settlements.
  • Basel Committee on Banking Supervision (2017). High-level summary of Basel III reforms. Report, December, Bank for International Settlements. URL: http://www.bis.org/bcbs/publ/d424_hlsummary.pdf.
  • Bellini, F., Klar, B., Müller, A., and Gianin, E. R. (2014). Generalized quantiles as risk measures. Insurance: Mathematics and Economics 54, 41–48 (https://doi.org/10.1016/j.insmatheco.2013.10.015).
  • Berkowitz, J. (2001). Testing density forecasts, with applications to risk management. Journal of Business and Economic Statistics 19(4), 465–474 (https://doi.org/10.1198/07350010152596718).
  • Berkowitz, J., Christoffersen, P. F., and Pelletier, D. (2008). Evaluating value-at-risk models with desk-level data. Management Science 57, 2213–2227 (https://doi.org/10.1287/mnsc.1080.0964).
  • Brandolini, D., and Colucci, S. (2013). Backtesting value-at-risk: a comparison between filtered bootstrap and historical simulation. The Journal of Risk Model Validation 6(4), 3–16 (https://doi.org/10.21314/JRMV.2012.094).
  • Chinhamu, K., Huang, C.-K., Huang, C.-S., and Hammujuddy, J. (2015). Empirical analyses of extreme value models for the South African mining index. South African Journal of Economics 83(1), 41–55 (https://doi.org/10.1111/saje.12051).
  • Christoffersen, P. F. (2011). Elements of Financial Risk Management. Academic Press, San Diego, CA (https://doi.org/10.1016/B978-0-12-374448-7.00011-7).
  • Christoffersen, P. F., and Pelletier, D. (2004). Backtesting value-at-risk: a duration-based approach. Journal of Financial Econometrics 2(1), 84–108 (https://doi.org/10.1093/jjfinec/nbh004).
  • Dhaene, J., Laeven, R. J. A., Vanduffel, S., Darkiewicz, G., and Gooaverts, M. J. (2008). Can a coherent risk measure be too subadditive? Journal of Risk and Insurance 75(2), 365–386 (https://doi.org/10.1111/j.1539-6975.2008.00264.x).
  • Emmer, S., Kratz, M., and Tasche, D. (2013). What is the best risk measure in practice? A comparison of standard measures. Preprint (arXiv:1312.1645).
  • Giannopoulos, K., and Tunaru, K. R. (2005). Coherent risk measures under filtered historical simulation. Journal of Banking and Finance 29, 979–996 (https://doi.org/10.1016/j.jbankfin.2004.08.009).
  • Gneiting, T. (2012). Making and evaluating point forecasts. Journal of the American Statistical Association 106(494), 746–762 (https://doi.org/10.1198/jasa.2011.r10138).
  • Harmantzis, F. C., Miao, L., and Chien, Y. (2006). Empirical study of value-at-risk and expected shortfall model with heavy tails. Journal of Risk Finance 7(2), 117–135 (https://doi.org/10.1108/15265940610648571).
  • Inui, K., and Kijima, M. (2005). On the significance of expected shortfall as a coherent risk measure. Journal of Banking and Finance 29(4), 853–864 (https://doi.org/10.1016/j.jbankfin.2004.08.005).
  • Kellner, R., and Rösch, D. (2016). Quantifying market risk with value-at-risk or expected shortfall? Consequences for capital requirements and model risk. Journal of Economic Dynamics and Control 68, 45–63 (https://doi.org/10.1108/15265940610648571).
  • Kondor, I., and Varga-Haszonits, I. (2008). Feasibility of portfolio optimization under coherent risk measures. Preprint (arXiv:0803.2283).
  • Martin, R. (2014). Expectiles behave as expected. Risk 27(6), 79–83.
  • Nieto, M. R., and Ruiz, E. (2008). Measuring financial risk: comparison of alternative procedures to estimate VaR and ES. Working Paper 08-73, Statistics and Econometrics Series 26, Universidad Carlos III de Madrid, Spain.
  • Radivojević, N., Cvijetković, M., and Stepanov, S. (2016). The new hybrid VaR approach based on EVT. Estudios de Economia 43(1), 29–52 (https://doi.org/10.4067/S0718-52862016000100002).
  • Righi, M. B., and Ceretta, P. S. (2013). Individual and exible expected shortfall backtesting. The Journal of Risk Model Validation 7(3), 3–20 (https://doi.org/10.21314/JRMV.2013.108).
  • Rockafellar, R. T., and Uryasev, S. (2002). Conditional value-at-risk for general loss distributions. Journal of Banking and Finance 26, 1443–1472 (https://doi.org/10.1016/S0378-4266(02)00271-6).
  • Rossignolo, F. A., Fethib, M. D., and Shaban, M. (2012). Value-at-risk models and Basel capital charges: evidence from emerging and frontier stock markets. Journal of Financial Stability 8, 303–319 (https://doi.org/10.1016/j.jfs.2011.11.003).
  • Rossignolo, F. A., Fethib, M. D., and Shaban, M. (2013). Market crises and Basel capital requirements: could Basel III have been different? Evidence from Portugal, Ireland, Greece and Spain (PIGS). Journal of Banking and Finance 37, 1323–1339 (https://doi.org/10.1016/j.jbankfin.2012.08.021).
  • Rubia, A., and Sanchis-Marco, L. (2017). Measuring tail-risk cross-country exposures in the banking industry. Revista de Economía Aplicada 25(74), 27–74.
  • Tasche, D. (2013). Expected shortfall is not elicitable. So what? Presentation, Imperial College, London.
  • Voit, J. (2007). The Statistical Mechanics of Financial Markets, 3rd edn. Springer.
  • Wied, D., Weiß, G., and Ziggel, D. (2016). Evaluating value-at-risk forecasts: a new set of multivariate backtests. Journal of Banking and Finance 72, 121–132 (https://doi.org/10.1016/j.jbankfin.2016.07.014).
  • Wimmerstedt, L. (2015). Backtesting expected shortfall: the design and implementation of different backtests. Working Paper, KTH.
  • Yamai, Y., and Yoshiba, T. (2002a). Comparative analyses of expected shortfall and value-at-risk: their estimation error, decomposition, and optimization. Monetary and Economic Studies 20, 87–122.
  • Yamai, Y., and Yoshiba, T. (2002b). Comparative analyses of expected shortfall and value-at-risk: their validity under market stress. Monetary and Economic Studies 20, 181–237.
  • Žiković, S. (2008). Quantifying extreme risks in stock markets: a case of former Yugoslavian states. Zbornik radova Ekonomskog fakulteta u Rijeci 26(1), 41–68.
  • Žiković, S., and Filer, R. K. (2013). Ranking of VaR and ES models: performance in developed and emerging markets. Czech Journal of Economics and Finance 63(3), 327–359.

Only users who have a paid subscription or are part of a corporate subscription are able to print or copy content.

To access these options, along with all other subscription benefits, please contact info@risk.net or view our subscription options here: http://subscriptions.risk.net/subscribe

You are currently unable to copy this content. Please contact info@risk.net to find out more.

You need to sign in to use this feature. If you don’t have a Risk.net account, please register for a trial.

Sign in
You are currently on corporate access.

To use this feature you will need an individual account. If you have one already please sign in.

Sign in.

Alternatively you can request an individual account here