Journal of Operational Risk
ISSN:
17446740 (print)
17552710 (online)
Editorinchief: Marcelo Cruz
Various approximations of the total aggregate loss quantile function with application to operational risk
Need to know
 We explain the mechanics of the empirical aggregate loss bootstrap distribution and develop an analytic approximation to its quantile function.
 We present various other approximations of the empirical bootstrap quantile function based on techniques in the literature.
 We study the impact of empirical moments and large losses in the context of the empirical bootstrap measure of operational risk capital.
Abstract
A compound Poisson distribution is the sum of independent and identically distributed random variables over a count variable that follows a Poisson distribution. Generally, this distribution is not tractable. However, it has many practical applications that require the estimation of the quantile function at a high percentile, eg, the 99.9th percentile. Without loss of generality, this paper focuses on the application to operational risk. We assume that the support of random variables is nonnegative, discrete and finite. We investigate the mechanics of the empirical aggregate loss bootstrap distribution and suggest different approximations of its quantile function. Furthermore, we study the impact of empirical moments and large losses on the empirical bootstrap capital at the 99.9% confidence level.
Introduction
1 Introduction
A compound Poisson distribution is the sum of independent and identically distributed random variables over a count variable that follows a Poisson distribution. Generally, its distribution is not tractable. However, it has many practical applications that require the estimation of the quantile function at a high percentile, eg, the 99.9th percentile. Without loss of generality, this paper focuses on its application to operational risk.
Operational risk is defined by the Basel Committee on Banking Supervision as follows: “The risk of loss resulting from inadequate and failed internal processes, people or systems or from external events. This definition includes legal risk, but excludes strategic and reputational risk.” The Basel II framework introduced the advanced measurement approach (AMA) to calculate regulatory capital for operational risk. AMA is a risksensitive methodology based on sophisticated mathematical tools. It offers a flexible framework to measure risk exposure and, therefore, to implement risk measurement and reporting tools.
AMAregulated financial institutions are expected to collect and analyze loss data at a certain level of granularity to capture the idiosyncratic risks reflecting the complexity of their business activities. The most popular modeling concept by far is the loss distribution approach (LDA). This is mainly based on two parametric distributions: a single loss (or severity) distribution, and an annual frequency distribution of loss events, from which an annual aggregate loss distribution is then inferred. As operational loss data usually contains outliers, heavytailed distributions are often chosen to model loss severity. Calibrating these distributions to estimate the percentiles sought for regulatory capital thus becomes a challenge (Cope et al 2009), necessitating the use of benchmarks.
The empirical bootstrap distribution is widely used to benchmark the continuous compound distribution. It approximates the single loss distribution by the empirical distribution function arising from a sample. We investigate the mechanics of the empirical aggregate loss bootstrap distribution and suggest different approximations of its quantile function. Furthermore, we study the impact of empirical moments and large losses in the context of the empirical bootstrap measure of operational risk capital. Historical data is plagued with large outliers, which dominate the upper percentiles of the aggregate loss percentiles used to benchmark capital results. This paper offers a better understanding of the relationship between outlier losses and the empirical bootstrap distribution, particularly at high percentiles.
The paper is organized as follows. Section 2 defines the problem that we tackle, and introduces some notation. Section 3 presents a simple and intuitive analytical approximation of the quantile function of the empirical aggregate loss bootstrap distribution. Section 4 develops the multinomial representation of total loss random variable. This representation is the key to computing various approximations (Poisson approximation, Cornish–Fisher expansion) that are developed in Section 5. We investigate their numerical efficiency in Section 6. Section 7 is devoted to the binomial approximation. Section 8 extends the analytic approximation developed in Section 3 to continuous distributions. Section 9 concludes.
2 Definition of the problem
Consider an ordered sample of single losses,
$$L=\{{x}_{1},{x}_{2},\mathrm{\dots},{x}_{n}=M\}\mathit{\hspace{1em}}\text{such that}{x}_{i}\le {x}_{j}\text{for all}i\le j.$$ 
The aggregate loss is defined as
$T={\displaystyle \sum _{i=1}^{f(t)}}{s}_{i},$  (2.1) 
where ${s}_{i}$ is randomly sampled with replacement from $L$. Each individual loss ${x}_{i}$ has a probability weight of being drawn equal to $1/n$. $f(t)$ is the frequency of loss over $t$ years. Throughout this paper, it is assumed to follow a Poisson distribution with intensity $\lambda $.^{1}^{1}The time horizon, $t$, is assumed to be constant. Hereafter, for clarity, we omit it. Without loss of generality, we normalize the loss data sample $L$ by its largest loss $M$:
$\stackrel{~}{L}=\{{\stackrel{~}{x}}_{1},{\stackrel{~}{x}}_{2},\mathrm{\dots},{\stackrel{~}{x}}_{n}=1\},{\stackrel{~}{x}}_{i}={\displaystyle \frac{{x}_{i}}{M}}.$ 
Define
${\stackrel{~}{\mu}}_{k}={\displaystyle \frac{1}{k}}{\displaystyle \sum _{i=1}^{k}}{\stackrel{~}{x}}_{i}.$ 
When $k=n$, ${\stackrel{~}{\mu}}_{n}$ is the sample mean of $\stackrel{~}{L}$. The normalized aggregate loss is
$\stackrel{~}{T}={\displaystyle \frac{T}{M}}={\displaystyle \sum _{i=1}^{f}}{\stackrel{~}{s}}_{i},$ 
where ${\stackrel{~}{s}}_{i}$ is randomly sampled with replacement from $\stackrel{~}{L}$.
Let us denote the cumulative density function (CDF) of $\stackrel{~}{T}$ by $F$. We are interested in the approximation of the quantile function ${\stackrel{~}{T}}_{\alpha}$ of $\stackrel{~}{T}$, with a focus on samples with very large severity loss:
${\stackrel{~}{T}}_{\alpha}=\mathrm{min}\left\{x={\displaystyle \sum _{i=1}^{f}}{\stackrel{~}{s}}_{i}:F(x)\ge \alpha \right\}.$ 
Approximating the quantile function ${\stackrel{~}{T}}_{\alpha}$ is challenging for a high percentile, $\alpha $. In fact, $\stackrel{~}{T}$ is a discrete distribution that exhibits clustering around multiples of large losses at the tail. The clustering is perceptible in the presence of extremely large losses (see Figure 1).
Figure 1: Histogram of an empirical aggregate loss bootstrap distribution. M is the largest single loss. β α is defined in Section 3.3. In this example, α is equal to 99.9%. As expected, the empirical bootstrap analytic (EBA) approximation (Section 3) is efficient for type I samples. The binomial approximation (Section 7) estimates the empirical bootstrap distribution given that M appears precisely β α times (with the remainder due to the contribution of smaller losses).
3 Empirical bootstrap analytic approximation
We present various theoretical descriptions of the empirical bootstrap, first exploring some special boundary cases for which a precise formula exists and then developing a more general framework leading to multiple approximation formulas.
3.1 All losses are identical
The first extreme case is when all losses are equal to the maximum, ie, ${\stackrel{~}{x}}_{1}=\mathrm{\cdots}={\stackrel{~}{x}}_{n}=1$. In this situation, the aggregate loss distribution, $\stackrel{~}{T}$, is identical to the frequency random variable, $f$. Indeed,
$$\stackrel{~}{T}=\sum _{i=1}^{f}{\stackrel{~}{s}}_{i}=\sum _{i=1}^{f}1=f.$$  (3.1) 
Therefore,
${\stackrel{~}{T}}_{\alpha}$  $={f}_{\alpha},$  
where  
${f}_{\alpha}$  $={F}_{\lambda}^{1}(\alpha )$  (3.2) 
and ${F}_{\lambda}^{1}(\cdot )$ is the inverse CDF of the Poisson distribution with intensity $\lambda $.
3.2 One (or more) extremely large loss(es)
We consider the case where there is a single, isolated largest loss, with all other losses being much smaller. The extreme example corresponds to ${\stackrel{~}{x}}_{1}={\stackrel{~}{x}}_{2}=\mathrm{\cdots}={\stackrel{~}{x}}_{n1}=0$ and ${\stackrel{~}{x}}_{n}=1$.^{2}^{2}We are generally interested in the problem when all losses are strictly greater than zero; however, this can be thought of as the limiting case characterizing $M\gg {x}_{n1}$. Let us generalize this situation slightly to cover the case of $l$ identical largest losses and $nl$ trivial losses, ie, ${\stackrel{~}{x}}_{1}={\stackrel{~}{x}}_{2}=\mathrm{\cdots}={\stackrel{~}{x}}_{nl}=0$ and ${\stackrel{~}{x}}_{nl+1}=\mathrm{\cdots}={\stackrel{~}{x}}_{n}=1$. Note that variable $\stackrel{~}{T}$ becomes a counting variable for the number of times the maximal loss is sampled in each compound draw. This is similar to (3.1); however, the expected count will be adjusted by the reduced probability of choosing a largest loss. A candidate for the resulting counting variable is
$$\stackrel{~}{T}\sim \mathrm{Poisson}\left(\lambda \frac{l}{n}\right).$$  (3.3) 
We now show that this is precisely the case. First note that the ${s}_{i}$ are Bernoulli trials with probability of success equal to $l/n$. Therefore, we have
$$\stackrel{~}{T}=\sum _{i=1}^{f}{\stackrel{~}{s}}_{i}\mathit{\hspace{1em}}\u27f9\mathit{\hspace{1em}}\stackrel{~}{T}f\sim \mathrm{binom}(f,\frac{l}{n}).$$ 
The equivalence in (3.3) is due to the following lemma.
Lemma 3.1.
Let $f\mathrm{\sim}\mathrm{Poisson}\mathit{}\mathrm{(}\lambda \mathrm{)}$ and suppose $\stackrel{\mathrm{~}}{T}\mathrm{\in}\mathrm{\{}\mathrm{0}\mathrm{,}\mathrm{1}\mathrm{,}\mathrm{2}\mathrm{,}\mathrm{\dots}\mathrm{\}}$ is a discrete random variable such that the conditional distribution $\mathrm{(}\stackrel{\mathrm{~}}{T}\mathrm{\mid}f\mathrm{=}m\mathrm{)}\mathrm{\sim}\mathrm{binom}\mathrm{(}m\mathrm{,}\pi \mathrm{)}$.
Then,
$$\stackrel{~}{T}\sim \mathrm{Poisson}(\lambda \pi ).$$  (3.4) 
Proof.
This is a wellknown result that can be shown by simply computing the probability mass function for $\stackrel{~}{T}$ at $k\in \{0,1,2,\mathrm{\dots}\}$:
$\mathrm{Pr}(\stackrel{~}{T}=k)$  $={\displaystyle \sum _{m=k}^{\mathrm{\infty}}}\mathrm{Pr}(\stackrel{~}{T}=k\mid f=m)\mathrm{Pr}(f=m)$  
$={\displaystyle \sum _{m=k}^{\mathrm{\infty}}}\left({\displaystyle \begin{array}{c}m\\ k\end{array}}\right){\pi}^{k}{(1\pi )}^{mk}{\mathrm{e}}^{\lambda}{\displaystyle \frac{{\lambda}^{m}}{m!}}$  
$={\mathrm{e}}^{\lambda}{\displaystyle \frac{{(\lambda \pi )}^{k}}{k!}}{\displaystyle \sum _{m=k}^{\mathrm{\infty}}}{\displaystyle \frac{{[\lambda (1\pi )]}^{mk}}{(mk)!}}$  
$={\mathrm{e}}^{\lambda}{\displaystyle \frac{{(\lambda \pi )}^{k}}{k!}}{\displaystyle \sum _{m=0}^{\mathrm{\infty}}}{\displaystyle \frac{{[\lambda (1\pi )]}^{m}}{m!}}$  
$={\mathrm{e}}^{\lambda}{\displaystyle \frac{{(\lambda \pi )}^{k}}{k!}}{\mathrm{e}}^{\lambda (1\pi )}$  
$={\mathrm{e}}^{\lambda \pi}{\displaystyle \frac{{(\lambda \pi )}^{k}}{k!}}.$ 
∎
The result for the first example (ie, for $l=n$) can be seen as a consequence of this when $\pi =1$.
3.3 Analytic approximation
A key observation from the previous examples is that in each of these special cases the mean of the normalized data equals the probability of drawing a largest loss in a single trial. That is,
$${\stackrel{~}{\mu}}_{n}=\frac{l}{n}=\mathrm{Pr}({\stackrel{~}{s}}_{i}=1).$$ 
${\stackrel{~}{\mu}}_{n}$ can be written as
$$ 
Thus, ${\stackrel{~}{\mu}}_{n}\in [1/n,1]$ and attains its minimum and maximum values in the extreme cases corresponding to $l=1$ and $l=n$, respectively. Therefore, ${\stackrel{~}{\mu}}_{n}$ is a generalization of the probability of selecting a maximal loss in a single random draw. Given this, a candidate distribution for $\stackrel{~}{T}$ in the general case, motivated by (3.4), is
$$\stackrel{~}{T}\sim \mathrm{Poisson}(\lambda {\stackrel{~}{\mu}}_{n}).$$ 
This approximation has a few shortcomings: the first being that it is integer valued; the second is (as we shall see) that, while it does capture the correct mean, it overestimates the variance of the true distribution. We discuss these issues in later sections, where we also propose solutions. Meanwhile, we suggest another approximation of ${\stackrel{~}{T}}_{\alpha}$, which is our main focus in the remainder of the paper.
For a fixed percentile level, $\alpha $, simulation experiments (see, for example, Figure 3) suggest a linear relationship exists between ${\stackrel{~}{T}}_{\alpha}$ and ${\stackrel{~}{\mu}}_{n}$. Consequently, we linearly interpolate between the extreme values of ${\stackrel{~}{\mu}}_{n}$ and their corresponding known values of ${\stackrel{~}{T}}_{\alpha}$ (corresponding to the $l=1$ and $l=n$ boundary cases discussed above). For the $l=n$ case, the value of ${\stackrel{~}{T}}_{\alpha}$ is given by (3.2); at the other extreme, $l=1$, it is equal to the percentile, $\alpha $, of the variable
$$\beta \sim \mathrm{Poisson}\left(\frac{\lambda}{n}\right).$$ 
Then, the line between $(1/n,{\beta}_{\alpha})$ and $(1,{f}_{\alpha})$ is given by
$$\frac{n}{n1}\left({\stackrel{~}{\mu}}_{n}\frac{1}{n}\right)({f}_{\alpha}{\beta}_{\alpha})+{\beta}_{\alpha}.$$ 
We define
$${\stackrel{~}{\mu}}_{n1}=\frac{n}{n1}\left({\stackrel{~}{\mu}}_{n}\frac{1}{n}\right)=\frac{1}{n1}\sum _{i=1}^{n1}{\stackrel{~}{x}}_{i},$$ 
and then write the approximation to ${\stackrel{~}{T}}_{\alpha}$ as
$${\stackrel{~}{T}}_{\alpha}\approx {\stackrel{~}{\mu}}_{n1}({f}_{\alpha}{\beta}_{\alpha})+{\beta}_{\alpha}.$$ 
Rewriting in terms of the nonnormalized distribution, $L$, we obtain
$${T}_{\alpha}\approx {\mu}_{n1}({f}_{\alpha}{\beta}_{\alpha})+{\beta}_{\alpha}M,$$  (3.5) 
where ${\mu}_{n1}={\stackrel{~}{\mu}}_{n1}M$ (see Figure 1).
3.4 Remarks
$\bm{\alpha}$  

$\bm{N}$  99%  99.9% 
5  2  3 
6  2  2 
7  1  2 
23  1  1 
99  1  1 
100  0  1 
999  0  1 
1000  0  0 
The development of (3.5) implicitly relies on ${\beta}_{\alpha}>0$, corresponding to
$$\alpha \ge {F}_{\lambda /n}(1).$$  (3.6) 
This is generally true for high $\alpha $ and when the number of data points is less than $\lambda /(1\alpha )$.^{3}^{3}For example, when $\alpha =99$% and $\lambda =10$, ${\beta}_{\alpha}>0$ when the number of data points is less than $10\times 1000=\mathrm{10\hspace{0.17em}000}$. Otherwise, it turns out that the maximum loss is actually too rare to contribute to the $\alpha $ percentile of the aggregate loss distribution. ${\beta}_{\alpha}$ appears as the coefficient of the maximum loss, $M$, in (3.5) and is defined as
$${\beta}_{\alpha}={F}_{\lambda /n}^{1}(\alpha ).$$  (3.7) 
Note that if $\lambda =n/N$, where $N$ is the number of time periods over which the sample was collected, then we have
$${\beta}_{\alpha}={F}_{1/N}^{1}(\alpha )$$ 
and so this can be directly linked to the number of data collection periods involved in obtaining the sample. ${\beta}_{\alpha}$ is given for various values of $\alpha $ and $N$ in Table 1.
Such information could be used to adjust empirical bootstrap estimates. For example, after collecting five periods of data, we might have reason to believe that the largest observed loss has an actual frequency of one in ten years rather than the observed one in five. From this, we conclude that a better empirical bootstrap analytic (EBA) estimate of the 99th percentile is obtained by adjusting the ${\beta}_{\alpha}$ parameter (ie, from $2$ to $1$).
4 Multinomial representation of the total loss distribution
In this section, we look at the distribution that governs the count, ${f}^{i}$, of a particular point, ${\stackrel{~}{x}}_{i}$, conditional on the total annual frequency, $f$. We have
$$\sum _{i=1}^{n}{f}^{i}=f,f\sim \mathrm{Poisson}(\lambda ).$$ 
The experiment can be seen as drawing $f$ independent trials from $n$ categories. Each trial consists of choosing a loss with probability $1/n$. ${f}^{i}$ is the number of trials in which ${x}_{i}$ is drawn. Therefore, $({f}^{1},\mathrm{\dots},{f}^{n})$ has a multinomial distribution with index $f$ and parameter $(1/n,\mathrm{\dots},1/n)$. Steel (1953) shows that, conditional on $f$, a multinomial distribution is equivalent to the number of successes observed in $n$ independent Poisson distributions with intensities ${\lambda}_{i}$, $i=1,\mathrm{\dots},n$. ${\lambda}_{i}$ is defined as the probability weight assigned to ${\stackrel{~}{x}}_{i}$ multiplied by the total intensity, $\lambda $. Consequently,
$${f}^{i}\sim \mathrm{Poisson}({\lambda}_{i}),{\lambda}_{i}=\frac{\lambda}{n}.$$ 
The total loss random variable can be written as
$\stackrel{~}{T}={\displaystyle \sum _{i=1}^{n}}{f}^{i}{\stackrel{~}{x}}_{i}.$  (4.1) 
Corollary 4.1.
$\stackrel{~}{T}$ is a linear combination of $n$ independent Poisson distributions with intensities ${\lambda}_{i}\mathrm{=}\lambda \mathrm{/}n$, $i\mathrm{=}\mathrm{1}\mathrm{,}\mathrm{\dots}\mathrm{,}n$. It has mean ${\overline{\mu}}_{\mathrm{1}}$, variance ${\overline{\mu}}_{\mathrm{2}}$ and third central moment ${\overline{\mu}}_{\mathrm{3}}$, where
${\overline{\mu}}_{1}$  $=\lambda {\stackrel{~}{\mu}}_{n},$  
${\overline{\mu}}_{2}$  $=E[{(\stackrel{~}{T}{\overline{\mu}}_{1})}^{2}]={\displaystyle \frac{\lambda}{n}}{\displaystyle \sum _{i=1}^{n}}{\stackrel{~}{x}}_{i}^{2},$  
${\overline{\mu}}_{3}$  $=E[{(\stackrel{~}{T}{\overline{\mu}}_{1})}^{3}]={\displaystyle \frac{\lambda}{n}}{\displaystyle \sum _{i=1}^{n}}{\stackrel{~}{x}}_{i}^{3}.$ 
Corollary 4.1 uses the fact that a Poisson distribution has its first three central moments equal to its intensity. As $\stackrel{~}{x}\le 1$, we have
$${\overline{\mu}}_{2}\le {\overline{\mu}}_{1}.$$ 
5 Approximation of the total loss distribution
5.1 Poisson approximation
Corollary 4.1 shows that $\stackrel{~}{T}$ is a linear combination of independent Poisson distributions. $\stackrel{~}{T}$ follows a Poisson distribution if and only if ${\stackrel{~}{x}}_{i}=1$ for all $i=1,\mathrm{\dots},n$, ie, ${\stackrel{~}{\mu}}_{n}=1$. Otherwise, ${\overline{\mu}}_{1}>{\overline{\mu}}_{2}$, as there exists at least a normalized loss that is strictly less than 1. We suggest linearly mapping $\stackrel{~}{T}$ into another random variable, $Y$, which has equal first and second central moments. Define
$$Y=\frac{{\overline{\mu}}_{1}}{{\overline{\mu}}_{2}}\stackrel{~}{T}.$$ 
$Y$ has the same mean and variance, equal to
$${\lambda}^{\star}=\frac{{\overline{\mu}}_{1}^{2}}{{\overline{\mu}}_{2}}.$$ 
Figure 2: Extension of the Poisson distribution over the positive real line.
We approximate the $Y$ distribution by an extension over the positive real line of a Poisson distribution with parameter ${\lambda}^{\star}$. The Poisson distribution CDF is a stepwise function, which can be defined as
${F}_{{\lambda}^{\star}}(x)={\displaystyle \frac{\mathrm{\Gamma}(\lfloor x+1\rfloor ,{\lambda}^{\star})}{\lfloor x\rfloor !}}\mathit{\hspace{1em}}\text{for all}x0,$  (5.1) 
where $\mathrm{\Gamma}(\cdot ,\cdot )$ is the incomplete gamma function and $\lfloor \cdot \rfloor $ is the floor function. We extend ${F}_{{\lambda}^{\star}}$ by a smooth envelope that matches its values over the natural numbers (see Figure 2). A straightforward generalization is to remove the floor function from (5.1) and replace the factorial by the gamma function, $\mathrm{\Gamma}(\cdot )$:
$x\mapsto {\displaystyle \frac{\mathrm{\Gamma}(x+1,{\lambda}^{\star})}{\mathrm{\Gamma}(x+1)}}\mathit{\hspace{1em}}\text{for all}x0,$  (5.2) 
However, CDF (5.2) can be inaccurate on the corners. Therefore, we suggest the following generalization:
$$x\mapsto 1\frac{\gamma (x+1,{\lambda}^{\star})}{\mathrm{\Gamma}(x+1)},$$ 
where $\gamma (\cdot ,\cdot )$ is the lower incomplete gamma function. In fact, CDF (5.1) can be written as a function of the ${\chi}^{2}$distribution:
$$x\mapsto \mathrm{Pr}({\chi}_{2(x+1)}^{2}>2{\lambda}^{\star})$$ 
(see Johnson et al (2005) for more details).
Proposition 5.1.
The quantile function of $\stackrel{\mathrm{~}}{T}$ can be approximated as follows:
$${\stackrel{~}{T}}_{\alpha}\approx \frac{{\overline{\mu}}_{2}}{{\overline{\mu}}_{1}}({G}_{\mathrm{P}}^{1}(\alpha ,{\lambda}^{\star})+1),$$ 
where
${G}_{\mathrm{P}}:{\mathrm{\Re}}_{+}\times {\mathrm{\Re}}_{+}$  $\to [0,1],$  
$(x,y)$  $\mapsto 1{\displaystyle \frac{\gamma (x+1,y)}{\mathrm{\Gamma}(x+1)}}.$ 
5.2 Cornish–Fisher expansion
The Poisson approximation matches the first two moments of $\stackrel{~}{T}$. We suggest approximating the discrete quantile function of the aggregate distribution by a continuous function using the Cornish–Fisher expansion. The latter is based on the availability of cumulants. We present the formula using the first three moments provided in Corollary 4.1. The quantile function is approximated by
${\stackrel{~}{T}}_{\alpha}$  $\approx {\overline{\mu}}_{1}+\sqrt{{\overline{\mu}}_{2}}\delta (\alpha ),$  
$\delta (\alpha )$  $={\mathrm{\Phi}}^{1}(\alpha )+{\displaystyle \frac{{\overline{\mu}}_{3}}{{\overline{\mu}}_{2}^{3/2}}}{\displaystyle \frac{{\mathrm{\Phi}}^{1}{(\alpha )}^{2}1}{6}}.$ 
We provide a higherorder approximation of the Cornish–Fisher expansion in Appendix A online.
5.3 Panjer recursion
Panjer (1981) suggests an algorithm to compute the compound Poisson distribution using a recursive relationship between the probabilities of the number of losses that occur in the same time interval. The interval $[{\stackrel{~}{x}}_{1},{\stackrel{~}{x}}_{n}]$ is split into $m$ equispaced points. Let us denote the step size by $\mathrm{\Delta}\stackrel{~}{x}$, and grid points by ${x}_{i}^{\star}$, $i=1,\mathrm{\dots},m$, with ${x}_{1}^{\star}={\stackrel{~}{x}}_{1}$, ${x}_{m}^{\star}={\stackrel{~}{x}}_{n}$ and ${x}_{i}^{\star}={\stackrel{~}{x}}_{1}+(i1)\mathrm{\Delta}\stackrel{~}{x}$. Each point ${x}_{i}^{\star}$ is assigned a probability weight, ${p}_{i}$. The normalized aggregate loss $\stackrel{~}{T}$ is approximated by
$${\stackrel{~}{T}}_{n}\approx {x}_{0}^{\star}+{D}_{m}\mathrm{\Delta}\stackrel{~}{x},$$  
$${D}_{m}=\sum _{i=1}^{m}i{f}_{i}^{\star},{x}_{0}^{\star}={\stackrel{~}{x}}_{1}\mathrm{\Delta}\stackrel{~}{x},$$ 
where ${f}_{i}^{\star}$ has a Poisson distribution with intensity $\lambda {p}_{i}$. A simple probability scheme is to assign
${p}_{1}$  $={\displaystyle \frac{1}{n}},$  
${p}_{i}$  $={\displaystyle \frac{\{\mathrm{\#}{\stackrel{~}{x}}_{j}\in ({x}_{i1}^{\star},{x}_{i}^{\star}],j=1,\mathrm{\dots},n\}}{n}},i=2,\mathrm{\dots},m.$ 
The probability mass function $g(\cdot )$ of ${D}_{m}$ is computed using the following recursion:
$g(0)$  $={\mathrm{e}}^{\lambda},$  
$g(y)$  $={\displaystyle \frac{\lambda}{y}}{\displaystyle \sum _{i=1}^{m}}i{p}_{i}g(yi)\mathit{\hspace{1em}}\text{for all}y=1,2,\mathrm{\dots}.$ 
The quantile function of ${D}_{m}$, ${F}_{{D}_{m}}^{1}(\cdot )$, can be computed as follows:
$${F}_{{D}_{m}}^{1}(p)=\mathrm{min}\{x,x=1,\mathrm{\dots},\sum _{j=0}^{x}g(j)\ge p\}.$$ 
Therefore,
$${\stackrel{~}{T}}_{\alpha}\approx {x}_{0}^{\star}+{F}_{{D}_{m}}^{1}(\alpha )\mathrm{\Delta}\stackrel{~}{x}.$$ 
Different probability schemes are proposed in the literature, eg, methods of matching moments (Gerber 1982). The choice of $\mathrm{\Delta}\stackrel{~}{x}$ and ${p}_{i}$, $i=1,\mathrm{\dots},m$, is of paramount importance to obtain an efficient Panjer approximation.
6 Numerical results
We compare the efficiency of each approximation with respect to a Monte Carlo estimate using 1000 000 simulations. Even though the Monte Carlo simulation embeds statistical errors, it remains a popular and reliable benchmark to estimate the quantile function.
$\bm{\alpha}$(%)  EBA (%)  Poisson (%)  Fisher 3 (%)  Fisher 5 (%)  Panjer (%) 

95  $$1(8)  3(6)  1(3)  0(3)  5(4) 
97.5  $$10(7)  2(7)  4(6)  3(6)  5(4) 
99.5  $$13(11)  $$4(7)  4(8)  4(13)  4(3) 
99.9  1(7)  $$9(5)  4(4)  4(13)  3(2) 
We generate three sets of 5000 samples with sample size 100, 500 and 1000, respectively, from a lognormal distribution with an associated normal distribution of mean $0$ and standard deviation $2$. The losses are assumed to be observed in a fiveyear ($\lambda =n/5$) or tenyear ($\lambda =n/10$) time window.
We focus on estimating the 99.9th percentile for different frequencies and sample sizes. We classify samples into three categories.
 Type I:

sample has an extremely large loss (ie, ${\stackrel{~}{\mu}}_{n}\approx 1/n$).
 Type II:

all losses have the same order of magnitude (ie, ${\stackrel{~}{\mu}}_{n}\approx 1$; see Section 3.1).
 Type III:

otherwise.
Operational risk samples are usually of type I or III, as they generally contain at least one outlier. Therefore, we hereafter focus our numerical investigation on these two types.
Figure 3: Empirical bootstrap analytic (EBA) approximation compared with Monte Carlo simulation at the 99.9th percentile.
Figure 4: Poisson approximation compared with Monte Carlo simulation at the 99.9th percentile.
Figure 5: Cornish–Fisher approximation using three cumulants compared with Monte Carlo simulation at the 99.9th percentile.
Figure 6: Cornish–Fisher approximation using five cumulants compared with Monte Carlo simulation at the 99.9th percentile.
Figure 3 compares EBA to Monte Carlo. By construction, EBA is efficient for sample types I and II. It is close to Monte Carlo elsewhere. Figure 4 depicts the Poisson approximation results. It behaves poorly for category I. Otherwise, it provides relatively efficient estimates. Cornish–Fisher approximates well the quantile function using three cumulants (see Figure 5). Expansion to five cumulants overestimates the 99.9th percentile for sample type I (see Figure 6). The empirical aggregate loss distribution is a discrete distribution, and it is fairly approximated with a continuous distribution by matching the first three cumulants. Figure 7 depicts the Panjer recursion for $m=1000$. Panjer recursion is a numerical procedure that provides efficient results if it is well parameterized.
Table 2 reports the average percentage deviation from the Monte Carlo estimates for the 15 000 samples as well as their standard deviations for the 95th, 97.5th, 99.5th and 99.9th percentiles. On average, EBA seems to slightly underestimate the quantile function for low percentiles. Otherwise, other approximations behave well on average, except for the Poisson when estimating the 99.9th percentile.
7 Binomial approximation for samples with a single large loss
We focus on the distribution of $\stackrel{~}{T}$ in the presence of an extremely large loss in the sample. We isolate the largest loss in (4.1):
$\stackrel{~}{T}={\displaystyle \sum _{i=1}^{n1}}{f}^{i}{\stackrel{~}{x}}_{i}+{f}^{n}{\stackrel{~}{x}}_{n}.$  (7.1) 
At a high $\alpha $, the largest loss is expected to appear at least ${\beta}_{\alpha}$ times (see Figure 1). Therefore, ${f}^{n}$ is lefttruncated to exclude $0,\mathrm{\dots},{\beta}_{\alpha}1$ as a possible outcome. Let us denote its lefttruncated distribution by ${f}^{n\star}$. The probability mass function is
$$\mathrm{Pr}({f}^{n\star}=j)=h(j){\left(1\sum _{i=0}^{{\beta}_{\alpha}1}h(i)\right)}^{1},$$ 
where
$$h(i)=\frac{{\mathrm{e}}^{\lambda /n}{(\lambda /n)}^{i}}{i!}.$$ 
We can show that it has a mean
$${\theta}_{1}=\left(\frac{\lambda}{n}\sum _{i=0}^{{\beta}_{\alpha}1}ih(i)\right){\left(1\sum _{i=0}^{{\beta}_{\alpha}1}h(i)\right)}^{1}$$ 
and variance
$${\theta}_{2}=\left(\frac{\lambda}{n}\left(\frac{\lambda}{n}+1\right)\sum _{i=0}^{{\beta}_{\alpha}1}{i}^{2}h(i)\right){\left(1\sum _{i=0}^{{\beta}_{\alpha}1}h(i)\right)}^{1}{\theta}_{1}^{2}.$$ 
Corollary 7.1.
Conditional on $M$ being drawn ${\beta}_{\alpha}$ times, $\stackrel{\mathrm{~}}{T}$ has mean ${\overline{\mu}}_{\mathrm{1}}^{\mathrm{\prime}}$ and variance ${\overline{\mu}}_{\mathrm{2}}^{\mathrm{\prime}}$, where
${\overline{\mu}}_{1}^{\prime}$  $={\displaystyle \frac{n1}{n}}\lambda {\stackrel{~}{\mu}}_{n1}+{\theta}_{1}{\stackrel{~}{x}}_{n},$  
${\overline{\mu}}_{2}^{\prime}$  $={\displaystyle \frac{\lambda}{n}}{\displaystyle \sum _{i=1}^{n1}}{\stackrel{~}{x}}_{i}^{2}+{\theta}_{2}{\stackrel{~}{x}}_{n}^{2}.$ 
Figure 7: Panjer recursion compared with Monte Carlo simulation at the 99.9th percentile.
Lefttruncation increases the mean and reduces the variance:
$${\theta}_{2}\le \frac{\lambda}{n}\le {\theta}_{1}.$$ 
Therefore, we have
$${\overline{\mu}}_{2}^{\prime}\le {\overline{\mu}}_{2}\le {\overline{\mu}}_{1}\le {\overline{\mu}}_{1}^{\prime}.$$ 
We approximate the distribution of $\stackrel{~}{T}$ by an extension over the positive real line of the binomial distribution with parameters $({r}^{\star},{p}^{\star})$,
${r}^{\star}$  $=\lceil {\displaystyle \frac{{\overline{\mu}}_{1}^{\prime}}{{p}^{\star}}}\rceil ,$  
${p}^{\star}$  $=1{\displaystyle \frac{{\overline{\mu}}_{2}^{\prime}}{{\overline{\mu}}_{1}^{\prime}}},$ 
where $\lceil \cdot \rceil $ is the ceiling function. The Camp–Paulson approximation links the binomial distribution CDF, ${F}_{{p}^{\star},{r}^{\star}}$, to the standard normal distribution CDF, $\mathrm{\Phi}$:
${F}_{{p}^{\star},{r}^{\star}}(x)$  $\approx \mathrm{\Phi}(W),$  (7.2)  
$W$  $={\displaystyle \frac{Y}{3\sqrt{(Z)}}},$  (7.3)  
$Y$  $={\left[{\displaystyle \frac{\eta x{p}^{\star}}{(x+1)(1{p}^{\star})}}\right]}^{1/3}\left(9{\displaystyle \frac{{p}^{\star}}{\eta {p}^{\star}x}}\right)9+{\displaystyle \frac{1}{x+1}},$  (7.4)  
$Z$  $={\left[{\displaystyle \frac{\eta x{p}^{\star}}{(x+1)(1{p}^{\star})}}\right]}^{2/3}\left({\displaystyle \frac{{p}^{\star}}{\eta {p}^{\star}x}}\right)+{\displaystyle \frac{1}{x+1}},$  (7.5)  
$\eta $  $={\displaystyle \frac{{\overline{\mu}}_{1}^{\prime}}{{p}^{\star}}}.$  (7.6) 
Based on Johnson et al (2005), its maximum absolute error is less than $0.007{({\overline{\mu}}_{1}^{\prime}(1{p}^{\star}))}^{1/2}$.
Proposition 7.2.
The quantile function of $\stackrel{\mathrm{~}}{T}$ can be approximated as follows:
$${\stackrel{~}{T}}_{\alpha}\approx {\mathrm{\Phi}}^{1}(W),$$ 
where $W$ is given through (7.3)–(7.6). ${\mathrm{\Phi}}^{\mathrm{}\mathrm{1}}\mathit{}\mathrm{(}\mathrm{\cdot}\mathrm{)}$ is the standard normal inverse CDF.
Figure 8: Binomial approximation compared with Monte Carlo simulation at the 99.9th percentile when the sample L ~ contains a very large loss.
8 Extension to continuous distributions
The extension of the EBA to the continuous case is motivated by the fact that continuous distributions may be approximated by discretizations, to which the EBA is then applicable. However, better approximations correspond to a larger sample size, $n$, and for large enough $n$ the constraint on $\alpha $ given by (3.6) is violated; moreover, the largest loss is unbounded as $n$ increases. To remedy this, we replace the largest loss with that expected to occur at least once in ${t}_{\alpha}=1/(1\alpha )$, or on average once in $\lambda {t}_{\alpha}=\lambda /(1\alpha )$ losses. We define
${M}_{{\alpha}^{\star}}$  $=sup\{xP(X>x)>{\displaystyle \frac{1}{\lambda /(1\alpha )}}\}$  
$={F}^{1}\left(1{\displaystyle \frac{1\alpha}{\lambda}}\right)$  
$={F}^{1}({\alpha}^{\star}),$ 
where
${\alpha}^{\star}=1{\displaystyle \frac{1\alpha}{\lambda}}.$ 
In our context, ${M}_{{\alpha}^{\star}}$ is the analog of the largest loss $M$ in the discrete case. We propose the following formula as an approximation of the quantile function in the continuous case:
${\mathrm{EBA}}^{\mathrm{c}}={\mu}_{{\alpha}^{\star}}({f}_{\alpha}^{\mathrm{c}}{\beta}_{\alpha}^{\mathrm{c}})+{\beta}_{\alpha}^{\mathrm{c}}{M}_{{\alpha}^{\star}},$  (8.1) 
where ${\mu}_{{\alpha}^{\star}}$ is the conditional mean below ${M}_{{\alpha}^{\star}}$,
${\mu}_{{\alpha}^{\star}}={\displaystyle \frac{1}{{\alpha}^{\star}}}{\displaystyle {\int}_{0}^{{M}_{{\alpha}^{\star}}}}x\mathrm{d}F(x).$ 
${f}_{\alpha}^{\mathrm{c}}$ (respectively, ${\beta}_{\alpha}^{\mathrm{c}}$) is the equivalent of ${f}_{\alpha}$ (respectively, ${\beta}_{\alpha}$) in (3.5) and is computed using the extension over the positive line of the Poisson distribution:
${f}_{\alpha}^{\mathrm{c}}$  $={G}_{\mathrm{P}}^{1}(\alpha ,\lambda )+1,$  (8.2)  
${\beta}_{\alpha}^{\mathrm{c}}$  $={G}_{\mathrm{P}}^{1}(\alpha ,1\alpha )+1,$  (8.3) 
where ${G}_{\mathrm{P}}(\cdot ,\cdot )$ is given in Proposition 5.1. As previously mentioned, ${\beta}_{\alpha}^{\mathrm{c}}$ represents the number of times the large loss, in our case ${M}_{{\alpha}^{\star}}$, appears in ${t}_{\alpha}$ years. Thus, its intensity in (8.3) is $1\alpha $.
$\bm{\alpha}$  
95%  97.5%  99%  99.5%  99.9%  
Minimum (%)  2.0  1.2  $$1.4  $$3.4  $$1.4 
Average (%)  3.4  2.1  0.7  0.0  2.0 
Maximum (%)  4.8  3.6  4.8  6.1  9.6 
$\bm{\alpha}$  

95%  97.5%  99%  99.5%  99.9%  
Minimum (%)  $$1.5  $$1.4  $$1.2  $$1.2  $$5.0 
Average (%)  0.3  $$0.2  0.1  0.8  $$2.4 
Maximum (%)  2.7  1.3  1.1  2.0  1.0 
We investigate the efficiency of this extension to the lognormal distribution with an associated normal distribution of mean $0$ and a standard deviation varying from 1 to 6 with a 0.05 step increase. Tables 3 and 4 report the minimum, average and maximum percentage deviations from Monte Carlo simulation estimates for frequencies 100 and 1000. The same statistics are reported for the single loss approximation with mean correction (Böcker and Sprittulla 2006) in Tables 5 and 6:
$\mathrm{SLA}(\alpha )={F}^{1}({\alpha}^{\star})+\mu (\lambda 1),$  (8.4) 
where $\mu $ is the unconditional mean. The extension is more accurate than the SLA for the 99th percentile and below. Its efficiency is comparable with the SLA for higher percentiles. Figure 9 compares the EBA extension to Monte Carlo and SLA for a lognormal distribution with mean $0$ and standard variation $2$ at various percentiles. The ${\mathrm{EBA}}^{\mathrm{c}}$ is efficient at low percentiles, and comparable with Monte Carlo estimates. As expected, the SLA formula behaves poorly at low percentiles. We recall that SLA is asymptotically efficient for subexponential distributions.
$\bm{\alpha}$  
95%  97.5%  99%  99.5%  99.9%  
Minimum (%)  $$10.3  $$11.7  $$12.7  $$13.2  $$12.6 
Average (%)  155.7  52.8  11.8  2.2  $$0.0 
Maximum (%)  1707.5  541.8  126.1  40.9  6.8 
Table 6: Minimum, average and maximum percentage deviation of SLA from the Monte Carlo estimate for $\lambda =\text{1000}$. at various percentiles.
$\bm{\alpha}$  
95%  97.5%  99%  99.5%  99.9%  
Minimum (%)  $$5.5  $$6.6  $$7.4  $$7.8  $$8.8 
Average (%)  52.1  20.2  5.5  2.2  $$3.6 
Maximum (%)  449  168.5  49.0  20.9  $$0.3 
Figure 9: ${\text{EBA}}^{\text{c}}$ formula (8.1) compared with SLA and Monte Carlo simulation for a lognormal distribution with mean 0 and standard deviation 2 at various percentiles.
9 Conclusions
We investigated the mechanics of the empirical aggregate loss bootstrap distribution and show that it can be represented as a linear combination of independent Poisson distributions. We suggest different approximations of its quantile function for a high percentile.
 (a)
Empirical bootstrap analytic (EBA): this is an intuitive analytical approximation derived based on extreme cases. By construction, it is efficient when the sample has an extremely large loss (type I) or all losses have the same order of magnitude (type II). It remains close to the Monte Carlo estimates for samples of type III.
 (b)
Poisson approximation: this matches the first and second central moments of the total aggregate loss distribution and smooths out the Poisson cumulative density function. It behaves poorly in type I, and provides relatively efficient estimates for other samples.
 (c)
Cornish–Fisher expansion: this approximates the aggregate loss bootstrap distribution, which is a discrete distribution, by a continuous one using its cumulants. It approximates well the quantile function using three cumulants in all three categories.
 (d)
Panjer recursion: this is a numerical procedure whose efficiency depends on its parameterization. An appropriate choice of equispaced points and their probability mass weights results in an efficient approximation.
 (e)
Binomial approximation: this approximates the aggregate loss bootstrap distribution in the presence of an extremely large loss in the sample (type I). The distribution is approximated by an extension over the real line of a binomial distribution. It provides an efficient estimate of the quantile function for a high percentile.
We also presented a possible extension of the EBA to the continuous case. At high percentiles, numerical investigation shows that its efficiency is comparable with the single loss approximation, when the severity distribution can be efficiently approximated by the empirical distribution of a finite sample. It is also efficient at approximating low percentiles (see Figure 9).
The empirical bootstrap distribution has been of great interest to operational risk practitioners for benchmarking capital results and stress testing exercises. A deeper understanding of how outlier losses and simple statistics of the loss sample affect the bootstrap distribution, particularly at high percentiles, enhances its utility as a benchmark.
Declaration of interest
The authors report no conflicts of interest. The authors alone are responsible for the content and writing of the paper. The views expressed here, if any, are those of the authors only and not necessarily of the institution with which they are affiliated. Any remaining inadequacies are the authors’ alone.
Acknowledgements
The authors thank Eric Cope for his useful discussions, and the participants of the ORX 2015 Analytics Working Group forum for their feedback.
References
Abramowitz, M., and Stegun, I. S. (1972). Handbook of Mathematical Functions with Formulas, Graphs and Mathematical Tables. National Bureau of Standards, Applied Mathematics Series. Government Printing Office, Washington, DC.
Böcker, K., and Sprittulla, S. (2006). Operational VAR: meaningful means. Risk 19(12), 96–98.
Cope, E. W., Mignola, G., Antonini, G., and Ugoccioni, R. (2009). Challenges and pitfalls in measuring operational risk from loss data. The Journal of Operational Risk 4(4), 3–27 (http://doi.org/bpqx).
DasGupta, A. (1998). On a differential equation and one step recursion for Poisson moments. Technical Report 9802, Purdue University (http://bit.ly/2qSlkcC).
Gerber, H. U. (1982). On the numerical evaluation of the distribution of aggregate claims and its stoploss premiums. Insurance: Mathematics and Economics 1(1), 13–18 (http://doi.org/cq4kwn).
Johnson, N. L., Kemp, A. W., and Kotz, S. (2005). Univariate Discrete Distributions, 3rd edn. Wiley Interscience (http://doi.org/ct345v).
Panjer, H. H. (1981). Recursive evaluation of a family of compound distributions. ASTIN Bulletin 12(1), 22–26 (http://doi.org/b6t4).
Steel, R. G. D. (1953). Relation between Poisson and multinomial distributions. Biometrics Unit Technical Report BU39M, Cornell University (http://bit.ly/2qJcYaY).