Operational Risk and Insurance: Quantitative and qualitative aspects Silke Brandts a Preliminary version - do not quote - This version: April 30, 2004 Abstract This paper incorporates insurance contracts into an operational risk model based on idiosyncratic and common shocks. A key feature of the approach is the explicit modelling of residual risk inherent in insurance contracts, such as counterparty default, payment uncertainty and liquidity risk due to delayed payments. Compared to the standard haircut approach, the net loss distribution exhibits a larger weight on the tail. Thereby an underestimation of extreme losses and loss clusters is avoided. The difference between the models is statistically significant for the means and the 99.9%-quantiles of the distribution. Keywords: Operational risk, risk management, insurance, simulation JEL Classification: C16, C69, G18, G21, G22 a Graduate Program Finance & Monetary Economics, Goethe University Frankfurt, Mertonstr. 17-21, Uni-PF 77, 60054 Frankfurt am Main phone: +49 (0)69-798 - 28943 E-Mail: brandts@stud.uni-frankfurt.de
1 Introduction Operational risk and its quantification have become increasingly important topics for financial institutions during the past couple years. Officially defined as the risk of loss resulting from inadequate or failed internal processes, people and systems or from external events, 1 some of the best known operational risk incidents are the $9 billion loss of Banco National due to credit fraud in 1995, the $2.6 billion loss of Sumimoto Corporation due to unauthorized trading activity in 1996, the $1.7 billion loss and subsequent bankruptcy of Orange County due to unauthorized trading activity in 1998 and the $1.2 trading loss by Nick Leeson causing the collapse of Barings Bank in 1995. Considering the size of these events and their unsettling impact on the financial community as well as the growing likelihood of operational risk losses due to an ever-growing complexity of products and processes, a sound monitoring and quantification of operational risk losses becomes increasingly necessary. In this paper, we develop a comprehensive model to recognize the risk mitigating impact of insurance contracts within an operational risk model. The key feature of the model is the explicit modelling of the residual risk inherent in insurance contracts, such as the default of a counterparty, zero or partial recoveries due to payment uncertainty, and liquidity risk due to delayed reimbursement. Compared to the standard haircut approach, the net loss distribution exhibits a larger weight on the tail. Thereby an underestimation of extreme losses and loss clusters is avoided. The difference between the models is statistically significant for the means and the 99.9%-quantiles of the distribution. When positive dependence of the residual risk with loss severities is assumed, the difference increases further. In section 2, we give an overview of the existing literature on operational risk modelling and discuss fundamental building blocks of operational risk quantification in a statistical model. One key determinant will be the choice of the dependence structure, discussed in detail in section 2.2. Our choice will be the common shock model by McNeil and Lindskog (2001). As benchmarks to our algorithm, two models are presented in section 3.1, which value the impact of insurance separately and incorporate the residual risk of insurance contracts via haircuts. In section 3.2, we propose an algorithm based on common shocks and insurance recognition on an individual loss level. The model is extended in section 3.3 to include the explicit modelling of residual risk such as (1) the default of the insurer, (2) zero or partial compensation due to litigation, exclusion clauses etc. and (3) liquidity risk due to delayed compensation of claims. In section 4.2, we simulate our algorithm 1 Basel Committee on Banking Supervision (2001). 1
and a haircut model, show their impact compared to original gross losses as well as an artificially constructed world and test whether the differences are statistically significant. Section 5 concludes. 2 From loss data to operational risk models Due to insufficient loss histories and inaccurate data collection, there exists so far only little experience with the empirical modelling of operational risk. 2 Additionally, no final framework for the quantitative model has been communicated by the regulators, leaving the banks with a high degree of initial freedom as well as uncertainty. In this section, we will provide a short overview of the existing literature on operational risk modelling. Afterwards, we will discuss the fundamental steps and building blocks of a quantitative operational risk model. Special emphasis is laid on the question of dependence as the choice of the dependence structure needs to allow for the easy integration of insurance contracts into the model. So far, three different classes of operational risk models have been put forward by various authors. The first class is based on statistical models and historical data as the main input for calibration. The best known example - also known as the Loss Distribution Approach (LDA) - is the parametric estimation of a frequency and a severity distribution for individual loss types and their subsequent aggregation which can incorporate dependencies through the use of copulas. 3 This general approach also includes the modelling of operational risk via extreme value theory (EVT), which essentially specifies the use of certain distributions and simulation/fitting techniques. 4 An alternative approach - proposed by Ebnöther, Vanini, McNeil and Antolinez-Fehr (2001, 2002) - is based on common Poisson shock models such as McNeil and Lindskog (2001). In this context, the frequencies of individual loss events are determined through underlying common and idiosyncratic shocks. Closely related are the models by Embrechts and Samorodnitsky (2001, 2002) which rest upon insights from ruin theory. The second class of operational risk models also employs 2 According to the Basel guidelines, data on operational risk losses should be collected and categorized according to eight business lines and specific event types. These seven event types specified by the Committee are 1) internal fraud, 2) external fraud, 3) employment practices and workplace safety, 4) clients, products and business practices, 5) damage to physical assets, 6) business interruption and system failures and 7) execution, delivery and process management. See Risk Management Group (2002). 3 A thorough treatment can be found for example in Frachot, Georges, and Roncalli (2001). 4 For a discussion of the use of EVT in operational risk management see for example Medova and Kyriacou (2001) and Medova (2000). 2
statistical models to quantify operational risk, but uses mainly qualitative measures to calibrate the model. Key features of these models are scenario analyses and scorecards, as for example proposed by Lawrence (2000). The third class focusses on the functional modelling of operational risk processes. Functional processes are defined and dependencies are modelled via interdependence of the individual processes. While these models allow for a rather realistic, detailed and potentially forward-looking modelling of operational risk, they are highly elaborate in terms of data collection and involvement of bank staff and experts to validate the model set-up. Representatives are the models by Kühn and Neu (2002) who allow for catastrophic avalanches of process failures and the model by Alexander (2000) based on Bayesian belief networks. Though all three classes have their advantages and disadvantages, the statistical models are more flexible so that one can more easily incorporate additional extensions, such as the recognition of operational risk insurance. In this paper, we therefore focus on the first class of models. 2.1 Modelling basics The loss distribution for a certain loss type is characterized by frequency and severity. The frequency distribution describes the number of losses up to time t and is represented by a counting process N(t). The most popular distribution is the Poisson distribution. 5 For the choice of the severity distribution describing the size of individual losses, x l, several candidates are viable. The most commonly used are heavy-tailed distributions like extreme value distributions and log-normal distributions. 6 In the simplest case the aggregate loss up to time t simply follows a compound Poisson process of the form Y t (x) = L l=1 N l (t) τ=1 x l,τ (1) and is generated by adding up the severities x l,τ of all loss types l = {1,..., L} over time τ up to t. 7 As ignoring potential dependence underestimates the variance of the aggregate losses and the occurrence of loss clusters - as we will show below - we will discuss 5 The Poisson distribution is used for modelling the frequency of operational risk losses by most authors, see for example Ebnöther, Vanini, McNeil and Antolinez-Fehr (2001, 2002), Kühn and Neu (2002) and Frachot, Georges, and Roncalli (2001). An alternative choice is the negative binomial distribution. 6 See Ebnöther, Vanini, McNeil and Antolinez-Fehr (2001, 2002), Kühn and Neu (2002), Frachot, Georges, and Roncalli (2001), and Embrechts and Samorodnitsky (2002) for reference. 7 Several pitfalls stemming from the special characteristics of operational risk data can occur when trying to fit a parametric distribution to the empirical loss data. Examples are the treatment of truncation, data collection biases and the choice between one integrated or two separate distributions for low and high impact data. 3
three dependence concepts to refine our risk process before turning to the integration of insurance. 2.2 Copulas, common shocks and common factors Though currently empirical evidence of dependence within operational risk losses is sparse due to the lack of sufficient data on a segmented level, it is highly sensible to at least assume the existence of dependence on the frequency level. 8 This can either be causal - a fire in a headquarter causing damage to physical assets as well as potentially internal frauds if employees takes advantage of lower security standards during the reconstruction phase - or be determined through common underlying factors or shocks such as an earthquake causing property damage, system down-times and increased process failures due to manual execution of previously automated processes. To incorporate dependence in the context of stochastic models, three different concepts are viable. The first is to introduce dependence via copulas where the correlation structure stems from the parametrization and the choice of the specific copula. The second is the use of a common shock model in which the frequencies of different loss types are dependent on common and idiosyncratic shocks. The third is an intensity based factor model which models the intensity of the loss frequency distributions as linear combination of intensities of underlying factors. 2.2.1 Copula-based dependence The most popular method of incorporating dependence between different risk types in the last years has been the application of copulas. 9 A copula describes the dependence structure of a multivariate random variable. The simplest form commonly used in risk management applications is the Gaussian or normal copula with Gaussian univariate margins. 10 The specific copula function and its parametrization is then extracted from 8 Since there are no intuitive reasons to assume dependence between the severity of different loss types or a dependence between the frequency and the severity of a loss type, we will ignore these issues for now. It should be noted though, that any of these dependence structures could be incorporate via the mechanisms described below. 9 See for example the discussions in Embrechts, Lindskog, and McNeil (2001), Embrechts, McNeil, and Straumann (1998) and Romano (2002). Additional background on the application of copulas can be found in Genest and MacKay (1986) and Nelsen, Krickeberg, and Nelson (1999). 10 To ensure upper tail dependence, student-t copulas or Gumbel copulas are more appropriate for the modelling of operational risks due to their ability to simulate events like stock market crashes and catastrophic events. Embrechts, McNeil, and Straumann (1998) show the impact of the dependence 4
historical data or scenario analysis and simulation. While the copula approach appears to be very straight-forward, it is solely based on historical data and especially in the case of operational risk, there is barely sufficient data to estimate the structure and parametrization of a specific copula. A dependence structure that is based to a greater extent on an economic validation would - at least currently - be more appropriate as it can be calibrated more easily using expert opinion. Additionally, the following two models allow greater flexibility to incorporate the additional prerequisites required by the Committee for the recognition of insurance. 2.2.2 Common shock model The relevance of common Poisson shock models for the fields of finance and insurance has been emphasized by McNeil and Lindskog (2001) and has been applied to operational risk by Ebnöther, Vanini, McNeil and Antolinez-Fehr (2001, 2002). In this model, the occurrence of an operational risk loss l is driven by a set of S shocks, the first L of which are independently distributed idiosyncratic and the remaining ones are common. A shock can be interpreted as an event which potentially causes an operational risk such as a system downtime or a fire. The counting process N (s) (t) is a Poisson process with intensity λ (s), counting the number of occurrences of a shock of type s until time t. At every occurrence of shock s, the loss l occurs with a given probability p (s) l. Following McNeil and Lindskog (2001), at the τth occurrence of a shock, a Bernoulli variable I (s) l,τ determines whether a loss of type l occurs. The dependence of individual losses is thus modelled by a multivariate Bernoulli distribution. The total number of losses is given by N(t) = L l=1 S s=1 N (s) (t) τ=1 I (s) l,τ, (2) and the expected number of losses by E (N(t)) = L S l=1 s=1 λ(s) p (s) l. Obviously, the dependence introduced through common shocks does not influence the expected number of total losses. The effect of common shocks becomes evident by investigating the variance of the total loss. N k (t) and N l (t) are dependent if p (s) k,l (1, 1) 0 for some s, thus a shock s causes losses of type k and l simultaneously with a positive probability. Here, p (s) k,l (1, 1) denotes structure by simulating random variates from identical marginal distributions and correlations but different copula functions: Using a Gumbel copula versus a Gaussian, extreme events tend to be more clustered, therefore using a Gaussian copula in a Gumbel-dependence world would underestimate the loss in extreme events. 5
the probability that the indicator variable for a given occurrence of shock s is 1 for both loss k and l. McNeil and Lindskog (2001) state that the variance is given by var (N(t)) = L L cov (N l (t), N k (t)) > E (N(t)) (3) l=1 k=1 for common shocks whereas for an independent Poisson-process var (N(t)) = E (N(t)). 11 Consequently, ignoring the existence of common shocks underestimates the variance of the number of total losses and thus potentially underestimates the cluster occurrence of high impact losses/extreme events. Figure 1 illustrates this effect: Figure 1 shows the distribution of the total number of losses for 100,000 simulations in the presence of only idiosyncratic shocks, denoted by Shock I and in the presence of only common shocks, denoted by Shock II. The total frequency of the losses remains constant. In contrast, the variance clearly increases when common shocks are taken into account. Figure 1: Impact of common shocks Shock I Shock II 400 500 600 Notes: This figure shows the impact of common shocks on the variance of the number of total losses in the shock model. Shock I denotes the common shock model in the presence of only idiosyncratic shock, Shock II in the presence of only common shocks. The total number of shocks is kept constant. The average number of losses remains in the order of 500 while the variance increases in the case of only common shocks from 500 to about 625. NHtL Due to the distinct characteristics of losses of certain event types, it is sensible to assume that the severities x l of loss type l are drawn from a joint distribution function F l. For now, no dependence across the severities of different loss types is assumed. The aggregate loss process in the common shock model is then given by the process Y (t) = L l=1 S s=1 N (s) (t) τ=1 I (s) l,r x(s) l,τ. (4) 11 For a proof of equation 3, see appendix A. 6
2.2.3 Intensity-based factor model The third approach to model operational risk discussed in this paper are intensity-based factor models commonly used to model credit risk. 12 The default of a counterparty respectively in our case the occurrence of an operational risk loss l is modelled by a Poisson process with intensity λ l. 13 As in the case of common shocks, the occurrence of operational risk losses is driven by a set of L idiosyncratic factors or shocks and S L sectoral or economy-wide shocks. The underlying shocks are modelled by independent Poisson processes N (1),..., N (S) with constant intensity parameters λ (1),..., λ (S). Whenever there is a jump in a shock process process N (s), a loss of type l occurs with probability p (s) l. Contrary to the common shock model, the frequencies of the shocks are not explicitly simulated and no Bernoulli trials based on the realized frequencies conducted. Instead, the intensity of loss type l, λ l, is modelled as a linear combination of the intensities of the underlying shocks, 14 λ l = S s=1 p (s) l λ (s). (5) The counting process of loss l up to time t is thus Poisson distributed with intensity λ l. Aggregating over all loss types, the intensity of the total number of operational risk losses and thus the expected total number of losses is given by λ N (Factor) (t) = E(N (Factor) (t)) = L l=1 S s=1 p (s) l λ (s). (6) The number of total losses is then simulated using the above defined intensity λ N (Factor) (t). 15 As the total number of losses is Poisson distributed with intensity λ N (Factor) (t), the variance is given by var(n (Factor) (t)) = E(N (Factor) (t)) = λ N (Factor) (t). 12 See for example Duffie (1998), Duffie and Singleton (1999), Lando (1998), Jarrow and Yu (2001), Yu (2002), Jarrow and Turnbull (1995), Schönbucher (2000) and Giesecke (2002a, 2002b). 13 The stopping or default times, τ 1,..., τ n for the n counterparties would be denoted by the first jumps of the Poisson process, τ i = inf{t 0 : N i (t) > 0}. The stopping times are assumed to fulfil the following properties: (1) The intensity process (or hazard rate) λ t for the stopping time τ is a nonnegative and predictable process with t 0 λ s ds < and (2) is characterized by the property that for N(t) = 1 τ t, a martingale is defined by N t t 0 (1 N s)λ s ds, t 0. 14 See for example Duffie and Singleton (1999) for a detailed treatment of these credit models. 15 Alternatively, one could simulate the number of total losses for each loss type l using the intensity λ l and then aggregate over the realized loss numbers of each type. The expected number of total losses would again be equal to equation 6. 7
2.2.4 Shocks versus factors The difference between the factor model described above and the common shock model of Section 2.2.2 is the following: In the shock model, the frequencies of the S shocks are explicitly simulated, and the realized number of losses for each type are determined via Bernoulli trials. To obtain the total number of losses for a given loss type l, the realized number of losses caused by each shock are aggregated across all idiosyncratic and common shocks. In contrast, in the factor model, the intensities of the shocks, λ 1,..., λ S are aggregated into one aggregate intensity of the loss type l, λ l. The actual simulation is then based on this aggregated intensity. The effect of the aggregation prior to simulation is that the variance is underestimated in the presence of common shocks. As shown above, the variance var(n (Factor) (t)) of the Poisson distributed process N (Factor) (t) equals the expected number of losses E(N (Factor) (t)). The variance in the shock model is in contrast given by 3 and is higher than var(n (Factor) (t)) for common shocks. The factor model may therefore underestimate the occurrences of loss clusters. 16 Figure 2 shows the difference Figure 2: Comparison of shock and factor model Shock I Shock II Factor I Factor II 400 500 600 Notes: This figure shows the impact of common shocks on the variance of the number of total losses in the shock and the factor model. As before, Shock I denotes the common shock model in the presence of only idiosyncratic shock, Shock II in the presence of only common shocks. Factor I denotes the factor-based intensity model in the presence of only idiosyncratic shock, Factor II in the presence of only common shocks. The total number of shocks is kept constant. While the variance increases in Shock II to 625, it stays at 500 for Factor II. 16 If the loss process N l (t) is explicitly modelled via a multivariate Bernoulli structure as in the common shock model by S N (s) (t) s=1 τ=1 I (s) l,τ, the aggregate loss process is overdispersed as in the case of common shocks. For an example of such a model, see Giesecke (2002a). The usually proposed algorithms of directly calculated loss frequencies therefore needs to be extended by a prior step in which the loss frequency is given as a series of Bernoulli trials for a given loss history. Duffie and Singleton (1999) for example propose the following algorithm: In step one, the first event time τ is simulated. At time τ, the event is simulated to be of type e and will by definition be either joint with probability p joint = λ(s) m or not joint with probability 1 p joint. s=1 λ(s) NHtL 8
between the shock and the factor model for the case of only idiosyncratic and for the case of only common shocks. Factor I denotes the case of only idiosyncratic shocks whereas Factor II denotes the case with only common shocks. While the variance of the total shocks increases in the shock model with the proportion of common shocks, it stays constant at λ N (Factor) (t) for the factor model. 17 Concluding on the alternative methodologies, we can say that the approaches based on common shocks or intensity-based factor models provide an economically and statistically sound foundation for quantifying and simulating operational risk losses. Their advantage over the copula approach is the intuitive and economic calibration of parameters for the case of insufficient historic data. While the factor model provides at first sight an easier and computationally faster solution, it underestimates the variance of the total loss number. For the remainder of the paper, we will therefore base our calculations and simulations on the common shock model presented in section 2.2.2. 2.3 Regulatory capital and risk measures Once an aggregate loss distribution for the entire institution is obtained, an appropriate risk measure and the Economic Capital can be calculated. 18 Value at Risk (VaR) has in the past been the by far most popular risk measure. According to the Committee (2001), the bank must be able to demonstrate that the risk measure used for regulatory capital purposes reflects a holding period of one-year and a confidence level of 99.9 percent. The Committee proposes to define the Capital-at-Risk (CaR) as the unexpected loss, given by CaR 1 (α) = inf{x R F (x) α} x f(x) dx 0 } {{ } expected loss with f(x) denoting the density function of the aggregate loss process. Many institutions though compute the CaR as the sum of the expected loss and the unexpected loss which is equivalent to the Value-at-Risk measure, CaR 2 (α) = 0 x f(x) dx + inf{x R F (x) α} 0 x f(x) dx = F (x) 1 (α). 17 In the credit literature, the impact of common factors and simulated respectively correlated defaults have been treated in a series of models, e.g. based on multivariate exponential models etc. Candidate model set-ups are proposed by Duffie and Garleanu (2002), Giesecke (2002a, 2002b) and Duffie and Singleton (1999). 18 For a discussion on the characteristics of coherent risk measures see for example Artzner, Delbaen, Eber, and Heath (1999) and Tasche (2002). 9
3 Incorporating insurance Insurance contracts can be incorporated in two different ways into an operational risk model as described above. The first is to evaluate the insurance separately and then to deduct it from the aggregate gross loss. The second is to consider the effects of insurance contracts on each individual loss and then to aggregate the net losses. The later approach yields a much more realistic model of the impact of insurance coverage on the loss history for several reasons: The first is that it correctly accounts for potential overlaps and gaps of the individual insurance contracts as well as mismatches of risk exposure and compensation. 19 The second is that it explicitly allows for large losses to be uncovered by insurance if they occur at a later time when the coverage is already used up. The third is that it also allows for the explicit stochastic incorporation of extensions such as counterparty default, payment uncertainty, and liquidity risk. Naturally, this comes at the cost of computational complexity compared to the models in which the insurance is valued separately, e.g. the premium and limit-based approaches described in section 3.1. In terms of embedding the insurance algorithm into the overall operational risk methodology, the most natural choice is to apply insurance contracts at the very end of the process right before the capital calculation. In this case, the insurance is applied to the actual risk profile of the bank with all structural changes already considered. The complete framework - from the raw loss data to the final capital calculation - should be set up as follows: In a first step, the internal and external historical data are fitted to a parametric severity distribution. In the second step, the loss frequencies are generated by the above described common shock model. In the third and fourth step (which are not subject of this paper), scenario analysis and internal control and business environment factors are applied. 20 Once these adjustments have been made, the simulations of operational risk losses can begin. In the fifth step, a generic loss history is simulated and the insurance contracts applied to each individual loss. The aggregate net loss history then serves as basis for 19 If two insurance contracts cover the same kind of loss event, only one of the contracts is used to cover the claim. If some insurance coverage therefore remains unused, it should not be regarded and quantified as a risk mitigant because it does not lower the overall gross loss. 20 These model amendments correct for the fact, that structural changes in the internal control processes or in the business environment may substantially alter the risk profile of a business unit or even the entire bank. Prominent examples are the sale or closure of an entire business unit or the introduction of highly sophisticated or automated control processes. 10
the capital calculations which concludes the simulation process. An overview of these five steps is given in figure 3. Figure 3: Stylized AMA model for operational risk In the following sections, we first briefly present two models based on the separate valuation of the effect of operational risk insurances as they have been suggested by a working group of the European Commission. 21 In section 3.2, we develop an algorithm to incorporate insurance contracts into the general framework which allows the explicit recognition of the risk mitigating impact of insurances on an individual loss level. Model extensions such as the recognition of counterparty risk, payment uncertainty, and liquidity risk will follow in section 3.3. 3.1 Premium- and limit-based approaches The premium- and the limit-based approaches are based on the approximation of the risk transferred from the bank to the insurer by the insurance premiums paid and the difference between the overall policy limit and the premiums paid, respectively. In both cases, the value of the insurance is determined separately and then deducted from the aggregate loss. 22 The premium-based approach rests on the intuition that the insurance premiums paid by a bank can serve as a proxy for the degree of transfer of operational risk 21 See European Commission (2002). 22 For details on these approaches see European Commission (2002). 11
from the bank to the insurer. The total premiums P paid are corrected by (1) the factor γ determined by the Committee to transform the expected loss relief through insurance into the capital relief by insurance, (2) the factor ζ denoting the loss reduction per invested unit of premium and (3) the haircut factor h, 0 < h < 1 to account for residual risk the bank is exposed to. 23 The resulting figure is then deducted from the bank s gross economic capital EC gross : EC gross EC net = γ ζ h P. The limit-based approach in contrast allows recognition of insurance for the unexpected loss only: Insurance limits are used to proxy the maximum amount of risk transferred to the insurer. The premium represent the risk stemming from the expected loss, which should already be incorporated in the internal pricing of the bank. Therefore, the limit less the premium should proxy the amount of the unexpected loss that is transferred via the insurance policy. The capital relief can therefore be calculated as EC gross EC net = h (C P ) with C denoting the limit or cap of the insurance contract. In general, these approaches are not suited to adequately model operational risk insurance due to their lack of economic foundation and the neglect of the stochastic nature of operational risk losses both with respect to its time of occurrence and its severity. Additionally, further extensions concerning the remaining systemic risk can only be incorporated into the model as haircuts which overestimate the impact of insurance in the presence of large loss clusters. In the following section, we will therefore develop a model which explicitly accounts for the stochastic nature of operational risk losses and which gives a much more realistic picture of the risk mitigating impact of insurance contracts. 3.2 The algorithm In the following section, we propose a model to recognize the risk mitigating effects of operational risk insurance within an Advanced Measurement Approach (AMA) approach based on common shocks. The model consists of two steps. The first is to simulate the loss history in a way that allows the application of insurance contracts and the model extensions discussed in section 3.3. The second step is the actual application of the insurance contract on an individual loss basis. 23 The factor ζ basically approximates the correlation of premiums paid with the risk transferred for a specific bank while the haircut factor h accounts for residual risks such as counterparty risk, payment uncertainty and liquidity risk. 12
Loss occurrences are driven by s = {1,..., L} independently distributed specific and s = {L + 1,..., S} common shocks. In the current model, we will assume the intensities of the events to be deterministic as the occurrence of loss-causing events depends on a slowly changing environment like the system infrastructure of a bank or a client portfolio and not on random processes. 24 First, we simulate the loss history: Given the exogenous intensities of the shocks, λ (s), we simulate the realized number of shocks via independent Poisson distributions for each shock s. The results are S vectors of dimensions N (s) (t) 1 denoting the realized number N (s) (t) of shock occurrences for shock s in the sample. For each individual shock occurrence, a Bernoulli trial then determines whether the shock actually causes a loss or not. The success probability of loss type l given the occurrence of shock s is given exogenously by p (s) l. For each idiosyncratic shock s, we then have a vector I (s) l of dimension N (s) (t) 1 containing indicator variables I (s) l,τ with value 1 if the τth occurrence of shock s causes a loss l and 0 otherwise. ( ) I (s) l = I (s) l,1,..., I(s) l,n (s) (t) For each common shock s c, an equivalent matrix I (sc) of dimensions N (sc) (t) l denotes whether the τth occurrence of the common shock s c causes any of the losses. I (s) 1,1... I (s) 1,N (sc) (t) I (s c) =.. I (s) L,1... I (s) L,N (sc) (t) To obtain the total number of losses of type l, we sum over the elements of each vector and matrix containing the indicator variables for loss l given shock s and then aggregate the number of losses over all shocks, N l (t) = S s=1 N (s) (t) Assuming independence between the frequency and the severity of the losses, for each loss of type l, a severity x l,τ is drawn from the severity distribution F l. The aggregate gross loss for loss type l is given by τ=1 I (s) l,τ. Y l (t) = S s=1 N (s) (t) τ=1 I (s) l,τ x(s) l,τ. 24 An extension to stochastic intensities as in the credit literature is possible though. For examples, see Duffie and Singleton (1999). 13
and the aggregate gross loss across all loss type by Y (t) = L l=1 S s=1 N (s) (t) τ=1 I (s) l,τ x(s) l,τ. In order to accurately apply the insurance contracts to each loss, we need to determine a chronological order of all losses across all loss types. As the number of losses is already determined by the Bernoulli trials, we generate appropriate loss dates as follows: We simulate a series of inter-arrival times υ i for each loss type based on the realized intensity of the loss process, λ l = N l (t). 25 To obtain actual loss dates, we cumulate these interarrival times. The date of the τth loss is given by Υ τ = τ i The dates are assigned to the losses in order of occurrence, yielding a simulated loss history for each loss type l: H l = υ i x l,1, Υ 1...,... x l,τ, Υ τ...,... x l,nl (t), Υ Nl (t) The chronological order of the losses is important in the case of less than full insurance. As the claims are filed in the order of occurrence, it may happen that a large number of small losses use up the full limit of the insurance policy. If a catastrophic loss occurs at a later date, it then remains completely uncompensated for and constitutes a rather large tail event. To quantify the risk mitigating impact of insurance contracts, we now have to apply the insurance contracts to the individual losses of the generated loss history. An insurance contract for loss type l is specified by the deductible d l, which is to be paid by the bank itself, the cap c l, which denotes the maximum coverage of an individual loss, and the overall limit lim l of the policy. As long as the overall limit of the policy is not exhausted, the gross loss is reduced by an appropriate compensation which increases in the size of the loss until the cap is reached. The insurance covers a specified intermediate risk layer which depends on the value of the deductible and the cap as is shown in figure 4. 25 The inter-arrival times of a Poisson process with intensity λ are independently exponentially distributed with parameter λ. 14
Figure 4: Net losses after insurance recognition The algorithm for applying the insurance contract for a given loss is displayed in figure 5. We first have to check, whether the overall limit of the policy is already exceeded or not. Given that the overall insurance limit is not yet exceeded, we have to consider three different cases. In the first, the loss is smaller than the deductible and no compensation is paid by the insurer. The net loss equals the value of the gross loss. In the second case, the loss severity lies between the deductible and the deductible plus the cap. Here, the loss is reduced to the value of the deductible and the amount paid by the ensurer equals x l,τ d l c l. The net loss equals the value of the deductible. In the third case, the loss exceeds the value of the cap plus the deductible. The loss is reduced by the amount of the cap, yielding a net loss of x l,τ c l. The resulting net losses for all cases are given by x l,τ for x l,τ < d l x net l,τ = d l for d l < x l,τ < (c l + d l ) (7) x l,τ c l for (c l + d l ) < x l,τ If the overall limit of the insurance policy is almost exhausted and the resulting claim exceeds the remaining overall amount available for compensation, the loss is only reduced by this remaining amount. Once the policy limit is exhausted, no more compensation is granted and all remaining net losses equal the gross losses. The resulting net loss history represents the net operational risk losses at the bank for 15
Figure 5: Algorithm of insurance application a representative sample. To get an accurate estimate for the regulatory capital, the above algorithm has to be repeated for a sufficiently large sample size. Each sample denotes a representative loss history of the bank for one year. For each sample, the net losses are aggregated into one total net loss figure representing the total net operational risk loss for the sample year. Combining these total loss figures into a distribution allows to calculate the appropriate Capital at Risk figure by taking the respective quantile of the aggregate distribution, e.g. the 99,9 quantile. This figure is the total net loss of the bank from operational risk that will not be exceeded with a probability of 99,9 percent in a given year. 3.3 Further issues and extensions To be allowed to recognize the risk mitigating impact of insurance, banks have to consider certain limitations of the insurance contracts and their coverage in their operational risk models. 26 The impact of insurance is reduced in the case of the default of the insurer, 26 Additionally to the discussed amendments to the quantification methodology, there are several restrictions concerning the type of insurer and the contract specifications: The insurance provider needs to be licensed as an insurance company under EU legislation and shall not belong to the same group as the insured. Insurance provides through captives and affiliates may only be recognized to the extent to which the exposure has been laid off to an independent third party entity that meets the eligibility criteria. The insurance provider needs to have a minimum credit quality and the recognition of insurance has to be based on an explicit insurance contract having a maturity of no less than one year and a notice period for cancellation of no less than three months. 16
the potential shortfall or reduction of claims due to litigation, and additional costs due to delayed insurance payment. The insurance model therefore has to include the following elements: 27 the recognition of counter-party risk inherent in the credit worthiness of the insurers and potential concentration risks to insurance providers the uncertainty of payment due to litigation etc. as well as mismatches in coverage of insurance policies and measured operational risk exposures potential liquidity risks associated with mismatches in the timing of payments on third party settlements and claims paid by insurers. In the following section, these extensions will be incorporated into the model. Generally, two different methodologies are viable: The first is to discount all insurance payments by a haircut as in the premium- and limit-based approaches. An appropriate haircut needs to be chosen to approximate actual defaults of insurers and payment shortfalls. The second is to explicitly model the default of the insurer, the shortfall of claims associated with payment uncertainty as well as the occurrence of liquidity risk due to delayed payment. As our aim is to give as accurate a picture as possible of the risk mitigating effect of insurance, we propose an explicit approach to model these events in the following sections. We will show later that approaches relying only on haircuts bias the risk profile towards the body of the distribution and underestimate the occurrence of tail events due to default of the counterparty or compensation shortfalls. In the following three subsections, we will extend the above algorithm to include the effects of counterparty default, compensation shortfalls and liquidity risk. Simulations of the effect of these limitations and comparison to the more simple haircut approach are shown in section 4.2. 3.3.1 Counter-party risk Counter-party risk in the context of operational risk insurance can be defined as the risk that the insurer defaults or has severe liquidity constraints and is not able to fulfill his payment obligations. Even though this seems to be unlikely at first sight, external events such as the hurricane Andrew have caused the default of at least 10 insurers in the U.S. and any major operational risk event that impacts multiple institutions is likely to cause 27 See Risk Management Group (2001b). 17
clusters of losses and claim files for individual insurers. 28 As mentioned before, there are two ways to model the impact of a potential default of a counter-party: The first is to discount all insurance payments with a factor that reflects the probability of default of the counterparty. The second, which we propose, is to explicitly model the default given an exogenous default intensity λ default. As mentioned before, default in a credit-risk environment can be modelled by shock models with the underlying shocks determining the default intensity of the counterparty. Intuitively it is likely that the exogenous shocks triggering operational risk losses at a banking institution are not entirely independent of the shocks triggering losses - and eventually the default - of an insurer. The dependence will nevertheless be limited to shocks influencing the entire financial services industry or the economy as a whole such as the circulation of an extremely harmful computer virus, catastrophic events affecting a financial center such as New York or London, a severe economic downturn, or a large case of external fraud affecting several counterparties. 29 Due to this dependence, it is not sufficient to model the default process of the insurer as an independent Poisson-process with a given intensity determined by its rating, but to generate a failure rate for each counter-party given the same shock history used to generate the operational risk loss history of the bank. The model derivation is analogous to that in section 2.2.2. The default frequency is driven by C specific and the S L common shocks which also govern the loss occurrences with C being the number of counterparties. Analogous to the above described algorithm for loss occurrences, a vector I (s) c is generated for each loss by Bernoulli trials with predetermined default probability of counterparty c given the occurrence of shock s. The vectors contain indicator variables I c,τ (s) with value 1 if the τth occurrence of shock s causes the default of the counterparty c and 0 otherwise. Common shocks are treated analogously. Calibration of the default probabilities of the insurers will not be difficult for large banking institutions as credit ratings and appropriate estimates for hazard rates will be available in the credit risk departments of the institutions. The realized default frequency then determines the number of samples, in which a default of counterparty c occurs: N (default) (t) = 28 See Cummins, Lewis, and Phillips (1998). C+(S L) s=1 N (s) (default) (t) τ=1 I (s) c,τ. 29 This includes also shocks which affect several banks that are insured by the same provider. 18
Once the default frequency is determined, the first-passage time of default can be simulated and denotes the exact date, at which the default occurs in the sample year. 30 All losses occurring after this date have a zero recovery and will enter the net loss history unchanged. 3.3.2 Payment uncertainty The second extension is the incorporation of payment uncertainties and mismatches in coverage and measured exposure. Mismatches in coverage and operational risk exposure are already captured by individually applying insurance contracts to individual operational risk losses which allows for example for excess coverage. The methodology must only be amended for the general uncertainty in payments. We define this uncertainty as the risk that the insurer does not pay the compensation specified in the contract or grants only partial compensation. Economically, this can have different reasons: Examples are disagreements between the bank and the insurer about the specific cause for a loss or the fulfillment of necessary precautions and therefore the applicability of the insurance contract. A second source of disagreement is inherent in the unobservable nature of operational risk losses for outsiders: An insurer may not be able to verify the exact severity of an operational risk loss. Since the bank has an incentive to overstate its loss when reporting it to the insurer to collect a higher compensation, the insurer could be inclined to challenge the reported figures and to reduce the compensation in individual cases. Quite often, the actual size of the reimbursement is subject to litigation also resulting in a deviation of the actual compensation from the claim filed. Empirically, 80% of losses filed for insurance claims received a nonzero recovery and the overall recovery rate as a percentage of the loss amount is 73.3% of the claim size. 31 Technically, payment uncertainty again can be recognized in two different ways, first via one single haircut applied to all insurance payments to capture both effects and second via explicit modelling of a certain number of zero-recoveries and discounting of the remaining payments. Again, we model the occurrence of zero recoveries in the same way we do for 30 The time to default is exponentially distributed with parameter λ c. As the exact number of defaults is determined by the default frequency, any date exceeding the range of 365 days needs to be scaled to actually cause a default in the given sample. For simplicity, we just draw a random number {1, 365} in this case. 31 See Risk Management Group (2003) for details. The proportion of nonzero recoveries differs for various loss types, ranging from 55% for Execution, Delivery & Process Management to 88% for Damage to Physical Assets and Business Disruption & System Failures. For individual loss types, the average recovery rate as percentage of the loss amount ranges from 51% for Clients, Products & Business Practices to 88% for Employment Practices & Workplace Safety. 19
the default of a counterparty: For each insurance contract, a zero-recovery occurs with probability p (s) (PU) given the occurrence of shock s.32 As before, the realized frequency of zero-recoveries is determined via underlying shocks and Bernoulli trails and for a certain loss type l is given by N (PU),l (t) = l s=1 N (s) (PU) (t) τ=1 I (s) P U,τ. Since the dates of the zero-recoveries have to coincide with actual loss dates, it is sensible to just take N (PU),l (t) actual loss dates that are equally distributed as dates on which a zero recovery occurs. All remaining insurance payments are multiplied by 0.733 to reflect the partial overall reimbursement observed in reality. 3.3.3 Liquidity risk Liquidity risk in the context of operational risk and Basel II can be defined as the risk of liquidity shortages and increased cost due to delayed insurance payments. 33 In this context, liquidity risk will only occur, if the cumulative loss for which payment is delayed is high enough to trigger the need for external capital, thus if it exceeds earmarked internal capital reserves. Liquidity risks can be quantitatively captured by determining the probability of liquidity shortages and the cost of (external) capital needed to cover these shortages. Depending on the size and the type of the loss, a settlement period is determined. During this time, capital of the size of the loss is bound and can not be used to cover any other losses or be used in other business activities. If at any point in time, the accumulated losses exceed a critical threshold, external capital is needed and the costs have to be included as incurred losses. In general, a liquidity shortage would translate into additional costs in the form of external financing, but in the extreme case in which external financing is not viable, the bank would default. 32 In contrast to the case of counterparty default, the impact of common shocks is less intuitive in the setting of payment uncertainty. 33 Payments can be delayed substantially if it takes some time for the insurance company to accurately determine the size of the compensation or verify the applicability of a certain policy. 20
4 Empirical results 4.1 Data and distribution fitting The data used in this paper is taken from external databases based on publicly available data sources as they are typically used in a globally operating universal bank. 34 It is limited to operational losses occurring at banks only and ranges back to 1972, though for the first years there are only very few data points available. Two different data bases were merged. Of a total of 3892 data points, 690 data points were deleted due to double-entry. 35 The resulting database used for estimation in this paper contains 3202 operational risk losses. All losses were adjusted for inflation using the U.S. Consumer Price Index (CPI) 36 to reflect the current value of the losses. In general, operational loss data exhibits a large number of small losses and very few high impact losses. This results in a distributional form with extremely fat tails. Table 5 in the appendix shows some of the characteristics of the data. When fitting a parametric distribution to the date, the estimation methodology has to consider the truncated nature of the data since public databases only include losses of more than $1 million. Additionally, the data of public databases will suffer from a selection bias which overestimates the probability of very high impact events: As catastrophic events are more likely to become publicly known, they are much more easily picked up in a literature research than small losses. Fontnouvelle et al. (2003) propose a weighting factor which gives high impact losses a high likelihood to be reported and low impact losses a lower probability to correct for this data selection bias. 37 A second way to approximate the true distribution is to combine as many databases as possible and to eliminate all multiple entries. Since in our database the bias will already be reduced compared to Fontnouvelle et al. (2003), we fit the data 34 For a detailed description and statistical analysis of two such databases see Fontnouvelle, DeJesus- Rueff, Jordan, and Rosengreen (2003). 35 Due to the reliance on public information and mostly second-hand sources (e.g. news services, press articles etc.) the specification and monetary amount of identical events can differ across various data bases. To ensure a maximum possible data quality, identical events were defined as those which has a difference in severity of less than 5%, happened within the same calender year and were apparently identical according to the detailed description of the individual events. Obviously identical events with a difference of more than 5% in the amount reported were additionally verified by a literature search. In some cases, the difference was due to lack of currency conversion and similar problems. If no verification could be obtained, the average value of the data sources was taken as the new data point. 36 Source: U.S. Bureau of Labor Statistics 37 Frachot and Roncalli (2002) alternatively suggest to estimate internal as well as external data together and to allow for a stochastic threshold. 21
with and without the weighting factor for comparison. Table 1: Estimated parameters for operational risk losses Exponential b β τ Aggregate 0.91031 0.62008 137.771 Internal fraud 0.91546 0.64189 94.2859 External fraud 0.90198 0.58478 109.163 Execution 0.90198 0.58478 109.163 Employment 0.89208 0.54287 120.083 Asset damage 0.94263 0.75701 64.2997 Clients 0.91546 0.64189 94.2859 Business disruption 0.91760 0.65095 91.928 Lognormal µ σ β τ Aggregate 0.01469 0.62364 0.24729 9.34765 Internal fraud 0.02453 0.62740 0.16685 4.58363 External fraud 0.08945 0.57009 0.15587 11.0981 Execution 0.10070 0.57540 0.10070 13.459 Employment 0.04324 0.54828 0.26911 7.55767 Asset damage 0.01157 0.78502 0.16812 8.94502 Clients 0.03234 0.63039 0.17352 7.632431 Business disruption 1.361910 7 0.65194 0.17715 7.657972 Due to the fat-tailed nature of operational risk losses candidate choices are the Generalized Pareto Distribution and the lognormal distribution which is extensively used in the risk management of banks. Following Fontnouvelle et al. (2003) we assume that the truncation point of loss reporting has a logistic distribution G(x) = (1 + exp( β(x τ))) 1, where τ represents the point at which half of the operational risk losses are on average reported and β represents the adjustment speed. Further, we assume that the distribution of the loss data can be approximated by the Generalized Pareto Distribution or by the lognormal distribution. 38 The parameters are then obtained by estimating the conditional 38 Assuming that the distribution of operational risk losses belongs to the heavy-tailed class, Fontnouvelle et al. (2003) argue that the log of these losses therefore belongs to the light-tailed class which converge to the exponential distribution with ξ = 0. Taking the log of operational risk losses instead of the 22
likelihood function for the log losses adjusted by the weighting factor: L(b, β, τ) = n i=1 ( exp( xi /b)/b ) 1 + exp( β(x i τ) / exp( x/b)/b u 1 + exp( β(x τ) dx (8) Extending the analysis of Fontnouvelle et al. (2003) we run the same likelihood estimation for the lognormal distribution. L(µ, σ, β, τ) = n i=1 ( 1 xσ exp( 1 log x µ ( ) 2 2π 2 σ 1 + exp( β(x i τ) / u The results for both distributions are reported in Table 1 and 2. 1 xσ exp( ) 1 log x µ ( ) 2 2π 2 σ dx 1 + exp( β(x τ) (9) Table 2: Estimated parameters for operational risk losses without weighting Exponential Lognormal b µ σ Aggregate 2.12152 0.03234 0.63038 Internal fraud 2.25389 0.04998 0.63712 External fraud 1.90733 0.12194 0.58543 Execution 1.90733 0.12512 0.58694 Employment 1.69314 0.06089 0.55661 Asset damage 3.24303 0.01019 0.78472 Clients 2.25389 0.06089 0.64129 Business disruption 2.30956 0.00254 0.65283 Since no unbiased data are available to test the goodness of fit, the usual non-parametric adequacy test like Anderson-Darling, Cramer-von-Mises and Kolmogorov-Smirnov statistics cannot be applied. Due to the existence of deductibles, for our purpose we are mostly interested in the tail behavior and the relative size of the distribution s tail compared to the body. An appropriate selection criterion are then certain stylized facts such as the ratio of losses exceeding certain thresholds, or the maximum losses generated within an appropriate sample. When simulating loss histories according to the above estimated parameters, and comparing it to stylized facts such as the reported Economic Capital of absolute values becomes necessary for the optimization procedure in the truncated case as otherwise computational problems due to the extremely small value of the integral arise. Therefore we run all optimization procedures using the log value of the losses and the regular likelihood function as done in Fontnouvelle et al. (2003). 23
large banks, the exponential distribution comes closest to the actual data. 39 In the following, we will therefore use the above shown specification for the exponential distribution to simulate our artificial loss history. 40 4.2 Simulation and test of the model In this section, the model is simulated using generic loss and insurance data to compare the algorithm presented above with a simple haircut approach. For now, we abstract from including liquidity risk, since there is no data publicly available for assumptions concerning thresholds and internal capital costs. In contrast, data for assumptions concerning counterparty default and payment uncertainty are either publicly available from the 2002 Loss Data Collection Exercise or can easily be generated by the bank-internal credit risk departments. The two main results from the simulation are the following. First, the stochastic model based on shocks and Bernoulli trials mirrors the shape of the original net loss distribution more closely, keeping a greater weight on the tail events compared to the haircut model. Second, the difference between the two models is statistically significant. The simulation is done in the following way. First, we simulate an artificial loss history that is used as input for the two different insurance models we want to apply. To have a common basis concerning the gross loss history and the assumptions for the model extensions, we simulate an artificial world with certain characteristics concerning counterparty default and payment uncertainty. If one would use the net loss data of the artificial world as basis for calibration, one would arrive at the data specifications that we use as inputs into our two approaches: 1. the stochastic model which explicitly models counterparty risk and payment uncertainty and 2. the haircut model, in which the model extensions are approximated via a haircut that is equally applied to all insurance payments. 39 While the exponential distribution tends to underestimate the percentage of total losses exceeding $1 million, the lognormal distribution drastically overestimates it as well as the EC figures of approximately $2 billion to $7 billion reported by large banks such as Deutsche Bank and JPMorgan Chase and EC figure estimates of The Federal Reserve Bank of Boston. For details see Fontnouvelle et al. (2003) and the annual reports of Deutsche Bank and JPMorgan Chase. 40 More sophisticated approximation would exceed the scope of the paper and is not the focus of the work presented here. For practical application, the algorithm has to be applied to the bank-internal data anyway as also the external data bases provide only a crude approximation of the true distribution. 24
The gross loss history will be the same for both models. In this way, we isolate the effects of the insurance application and abstract from any influence that different loss histories may have. The data for severity and frequency of the gross losses are calibrated to reflect the empirical data of the 2002 Loss Data Collection Exercise (LDCE) by the Risk Management Group (2003) and the empirical data described in section 4.1. Two types of operational risk losses are considered, l = {1, 2}. As shown in section 4.1, the log losses are assumed to be exponentially distributed with parameters b 1 = b 2 = 0.91031. Due to the existence of deductibles, only the region of the net loss distribution above the deductible will display any difference between the two approaches. As empirically approximately 86% of the total value of the losses stems from losses that are larger than $ 100,000 and the deductible of an operational risk insurance policy will typically be higher than this threshold, we will only simulate the region above losses of $ 100,000. 41 The frequencies of the losses are driven by two idiosyncratic and one common shock, which also drives counterparty defaults and payment shortfalls. The frequencies of the shocks and the success probabilities p (s) l are given exogenously and are chosen to replicate the empirical frequencies reported in the 2002 LDCE. Each idiosyncratic shock is assumed to have an intensity of λ (1) = λ (2) = 40 and the common shock an intensity of λ (3) = 25. The success probabilities of the idiosyncratic and the common shocks are given by p (1) 1 = p (2) 2 = 0.5 and p (3) 1 = p (3) 2 = 0.5. The expected total number of losses per sample is therefore N(t) = p (1) 1 λ (1) + p (2) 2 λ (2) + p (3) 1 λ (3) + p (3) 2 λ (3) = 65. This equals the number of losses that empirically exceed the threshold of $100,000 according to the 2002 LDCE. The insurance policies are modelled as follows: We consider two insurance contracts, each applicable to one loss type. The overall coverage for the insurance policies is set at lim 1 = lim 2 = $1, 000, 000, 000, the deductible at d 1 = d 2 = $500, 000 and the cap at c 1 = c 2 = $500, 000, 000. 42 The default probability of the insurer is set at 0.1% per sample year, resulting in 10% of all samples experiencing a counterparty default. 43 The parameters concerning payment uncertainty are taken from the 2002 LDCE. Empirically, 20% of all claims filed are not compensated due to litigation, exclusion clauses etc. For 41 See the 2002 LDCE for details. 42 These insurance specifications are roughly equivalent to a 45% coverage of the losses above $100,000, thus correspond to a total coverage of approximately 40% for all losses. For robustness checks, different specifications for the insurance cover were used as inputs for the two models. 43 At first sight, this number may seem high but in consequence of the losses caused by hurricane Andrew in 1994, at least 10 insurers in the U.S. defaulted. In the light of the increasing frequency and severity of catastrophic losses by natural phenomena or terrorism attacks, a frequency of one insurer defaulting every ten years seems to be reasonable to us. 25
the reimbursed claims, the average recovery rate is 73.3%. The sample size for the simulation is 10, 000. The gross loss history for each sample is simulated according to the algorithm described in the section 3.2. The gross loss history for each sample consists of one vector for each loss type containing the severities of the losses, the dates on which the losses occur and the type of the losses. The artificial world is constructed as follows: Of the 10, 000 samples, 1, 000 equally distributed samples are selected to serve as default samples. For each of them, a random number τ {1,..., 365} is drawn to denote the date of default. Similarly, for each sample and each loss type, 0.2 N l (t) equally distributed losses are selected to denote the cases of payment shortfalls for each loss type. Finally, the insurance algorithm is applied to the loss histories and the net loss vectors are obtained. When calculating the resulting net losses, all insurance payments are multiplied by the factor 0.733 to account for the average recovery rate. The net losses for each sample are aggregated and form the aggregate net losses distribution. For the stochastic model, the defaults of the counterparties are driven by the underlying shocks and are simulated as described in section 3.3.1. The same applies to the payment shortfalls due to payment uncertainty. The intensities of the shocks driving the counterparty defaults, λ (default), and the payment shortfalls, λ (payment uncertainty), and the success probabilities for these idiosyncratic shocks as well as the common shock are calibrated to equal the data of the artificial world in expectation. The intensity of the shock driving counterparty default is given by λ (default) = 25 and the success probabilities of the idiosyncratic and the common shock are given by p (default) = p (3) default = 0.002. In expectation, these specifications will result in a 10% probability of the counterparty defaulting. The appropriate specifications for payment uncertainty are given by λ (payment uncertainty) = 20, p (payment uncertainty) = 0.2, and p (3) payment uncertainty = 0.1, resulting in a probability of complete payment shortfall of 20% in expectation. As in the artificial world, the insurance algorithm is applied and the resulting net losses aggregated. The difference between the stochastic model and the artificial world is that the number of default samples is determined stochastically via the underlying shocks and will only in expectation be equal to the number of default samples of the artificial world. The same is true for the payment shortfalls. Therefore, the samples in which a counterparty defaults will be different from the artificial world as will the losses that are not compensated for. This will lead to differences in the overall realized risk mitigation. The main difficulty of calibrating the haircut model is to choose an appropriate haircut. If sufficient historical data is available, an aggregate historical haircut factor across all insurance claims can be estimated. If no data is available, external sources need to be employed just as in the construction of the stochastic model above. To focus on the 26
difference of the models impact on the form of the net loss distribution and not on potential level effects due to the size of the haircut, we assume that the data from the artificial world can be used as historical data to approximate an appropriate haircut. This ensures, that the overall compensation of the insurance policies will be approximately the same for all models. The haircut model is constructed in the following way: To ensure that the haircut used to account for the default of the insurer is of the same magnitude as in the artificial world, it is given by ( (net, default) Y artificial (t) h default = 1 Y (gross) artificial (t) Y (net) artificial (t) Y (gross) artificial (t) The proportion of the aggregate net loss after default, Y artificial to the aggregate gross loss, Y (gross) artificial (t) thus remains the same for the haircut model. Assuming that the size of the losses that are compensated and those that are not compensated are identically ) (net, default) distributed, the total compensation rate in the haircut model is obtained by multiplying the fraction of claims that are reimbursed with the compensation rate, h payment uncertainty = 0.8 0.733 = 0.5864. To correct for the fact that in the artificial world and the shock model some of the payment shortfalls are simultaneously subject to counterparty default, we reduce the total haircut by corr = 0.5 0.1 0.2, assuming that the payment shortfalls and the default dates are equally distributed across the respective samples. Thus, in the haircut model, all insurance payments are discounted with h = h default h payment uncertainty + corr. Table 3: Overview Table Gross losses Net losses Shock model Haircut model Median 5.306 10 7 3.265 10 7 3.266 10 7 3.283 10 7 Mean 8.670 10 7 4.956 10 7 4.965 10 7 4.750 10 7 75%-quanitle 9.009 10 7 4.859 10 7 4.867 10 7 4.848 10 7 90%-quanitle 1.707 10 8 8.493 10 7 8.517 10 7 8.217 10 7 95%-quanitle 2.638 10 8 1.289 10 8 1.291 10 8 1.208 10 8 99%-quanitle 5.962 10 8 3.429 10 8 3.448 10 8 2.724 10 8 99.5%-quanitle 7.288 10 8 4.950 10 8 4.954 10 8 3.925 10 8 99.9%-quanitle 9.455 10 8 7.277 10 8 7.306 10 8 6.118 10 8 99.95%-quanitle 1.008 10 9 8.731 10 8 8.739 10 8 6.347 10 8 99.99%-quanitle 1.235 10 9 9.588 10 8 9.588 10 8 7.129 10 8 Notes: All values are in $. The quantiles are taken from the aggregate gross loss distribution and the net loss distributions of the artificial world, the shock model and the haircut model. 27
Table 3 summarizes various quantiles of the aggregate gross loss distribution and the aggregate net loss distributions of the artificial world, the shock model and the haircut model. Except for the median, the quantiles of the haircut model always remain below the respective quantiles of the artificial world and the shock model. The difference becomes more pronounced with the size of the quantiles. The quantiles of the net loss distribution of the shock model slightly exceed those of the artificial world except for the highest quantile. For all observations, the difference between the artificial world and the haircut model is much more pronounced than between the artificial world and the shock model. Density Figure 6: Aggregate net distributions up to 75% quantiles Art. model Stoch. model Haircut model 6.8 x 10^6 2.7 x 10^7 4.8 x 10^7 Y t Notes: In this figure, the distribution of the aggregate net losses for the artificial world, the shock model as well as the haircut model is depicted up to the 75% quantiles of the respective distributions. To better illustrate the effect, we will look separately at the body and the tail of the resulting net loss distributions. Figure 6 displays the distributions of the aggregate net losses up to the 75% quantiles. It shows that the stochastic model mirrors more closely the artificial world while the haircut model puts greater weight on the center of the distribution. This is intuitively clear as all insurance payments are reduced due to the haircut, thus losses are never fully reimbursed as they are in the shock model. For the part of the distribution up to the 75% quantile, the haircut model predicts a higher likelihood of higher net losses compared to the stochastic model, resulting in a less skewed net distribution. Figure 7 looks at the tail of the net loss distributions. For the shock and the haircut model, the aggregate net distributions above the 99% quantiles are depicted. Here, we clearly see that the haircut model predicts a lower probability of extreme events and that the quantiles of the haircut model remain below the respective quantiles of the shock model. This is intuitively understandable, as in the stochastic model, some losses remain 28
Density Figure 7: Top 1% quantile of the aggregate net distributions Art. world Stoch. model Haircut model 2.7 x 10^8 9.5 x 10^8 1.7 x 10^9 Y t Notes: In this figure, the aggregate net distributions for the shock and the haircut model are depicted above the 99.9% quantile. The 99% quantile for the haircut model lies clearly below the 99% quantile for the shock model. completely uncompensated and therefore constitute tail events of the net distribution. One has to keep in mind that the total reduction is approximately the same for both models, the impact of the insurance contract just affects the aggregate distribution differently. While both models predict approximately the same overall loss reduction by insurance in terms of the share of the gross loss that is compensated, the haircut model underestimates the probability of tail events and loss clusters. The effect would be even more pronounced if we assume a positive dependence between the severity of the losses and the likelihood of default or payment uncertainty. Figure 8: Difference of quantiles compared to artificial world 2 10 7 99.5% 99.9% Quantiles 4 10 7 Stoch. model 6 10 7 8 10 7 Haircut model 1 10 8 Notes: In this figure, the differences of the 99.0% to 99.9% quantiles of the stochastic model respectively of the haircut model compared to the artificial world are depicted. The values in $ correspond to the difference between the quantiles of the artificial world to the quantiles of the two different approaches. 29
Figure 8 shows the difference of the 99.0% to 99.9% quantiles between the artificial world on one side and the stochastic model and the haircut model on the other side. For the graphical presentation, the quantiles of the artificial world are subtracted from the equivalent quantiles of the two simulated models. As we can see, the haircut approach always underestimates the quantiles of the artificial world and always remains below the quantiles of the stochastic model. Additionally, one can see that for all quantiles, the difference between the artificial world and the stochastic model is smaller in absolute value than the difference between the artificial world and the haircut model. To test for the statistical significance of the difference between the models, a t-test for the difference of the means is performed as well as a Wilcoxon signed rank test for the differences of the means and the 99.9% quantiles. 44 Keeping the above specifications, the n for the test is set at 100 and the test statistics reported in Table 4. The differences are significant in both cases at the 1% significance level. Table 4: t-statistics and Wilcoxon signed rank statistics mean t-statistic 76.6157 99.9% quantile signed rank statistics -8.6818-6.2234 Notes: t-statistics and Wilcoxon signed rank statistics for the mean and the 99.9% quantile of the two simulated models. The n for the test sample is 100. The differences are significant in both cases at the 1% significance level. For robustness checks, three variations in the model specifications are simulated. The first is a change in the overall insurance policy, reducing the total coverage from approximately 40% to 25%. The difference between the models remained significant at the 1% confidence level, though the value of the Wilcoxon signed rank statistics decrease to 3.9199 for the means and to 3.8826 for the 99.9% quantiles. Preliminary calculation suggest the same effect for a reduction in the probability of counterparty default. In a third step, a positive dependence of the defaults respectively zero-recoveries on the severities of the losses is assumed. This is intuitively sensible, as especially large losses respectively a cluster of medium to large losses may cause an insurer to default. Additionally, insurance companies will try to minimize the compensation especially for large losses and exert more effort to challenge and reduce the level of the compensation. Empirically, this can be seen 44 The t-test for differences assumes that the differences of means and quantiles are independent and normally distributed. While this may be sensible to assume for the means, for the quantiles one should use a non-parametric test. 30
in the 2002 Loss Data Collection Exercise by the Committee: For losses exceeding $1 million, the recovery rate is only 66% compared to 73.3% for all losses. In terms of modelling, the N c (t) samples with the highest aggregate gross losses are chosen to be the samples in which the default occurs and the 0.2 N l (t) highest losses are taken to be the cases in which a zero-recovery occurs. 45 For the haircut model, the overall haircut is corrected in the way that the total proportion of insured losses to gross losses is the same as in the artificial world. Under the assumption of positive dependence, the difference between both models remains significant for the means and the quantiles at the 1%-level. To conclude, we have seen that the stochastic model depicts a more accurate picture of the real net operational risk losses. The difference between the more simple haircut model and the stochastic model is statistically significant for means as well as the 99.9%- quantiles. When quantifying the risk mitigating impact of operational risk insurances, a more simple model could thus underestimate the probability of extreme losses and loss clusters due to counterparty risk and payment uncertainty. 5 Conclusion In this paper we presented an algorithm to incorporate the risk mitigating impact of insurance into a quantitative model of operational risk. Concerning limitations of operational risk insurances due to residual risk, we motivated the explicit modelling of counterparty default, zero-recoveries and liquidity cost due to delayed payments. In our simulation, we showed that our algorithm puts a larger weight on the tails of the loss distribution while the simple haircut approach tended to underestimate the probability of large losses. The difference between the models was shown to be statistically significant for the means and the 99.9%-quantiles. The difference decreased with the total insurance coverage and the default probability of the counterparty. In the presence of positive correlation between compensation shortfalls - due to counterparty default and non-zero recoveries - and loss severities, the significance on the contrary increased substantially. The present model is a first step towards the development of an integrated framework for the quantitative treatment of operational risk and it is, at least to our knowledge, the first to incorporate a more sophisticated model of operational risk insurances than a simple haircut factor. Possible extensions of this paper are twofold. First, the model needs 45 This is a very extreme but simple way of modelling positive dependence and thus shows the other extreme compared to the assumption of independence. A more moderate way would be the employment of copulas for various degrees of correlation. 31
to be checked for its robustness, e.g. towards larger variations in the insurance structure and a more sophisticated structure for the dependence of compensation shortfalls and loss severities. The second extension could explore further the question of the valuation of operational risk insurance. In this context, one could quantify the actual value that an insurance contract has through the value of the realized compensation on one hand and the reduced capital requirement on the other hand. In terms of practical application, such a model could enable a bank to reassess and optimize its operational risk insurance profile under the consideration of the impact on capital requirements. 32
References Alexander, Carol, 2000, Bayesian methods for measuring operational risk, Discussion Papers in Finance, ISMA Centre. Artzner, Philippe, Freddy Delbaen, Jean-Marc Eber, and David Heath, 1999, Coherent measures of risk, Mathematical Finance 9. Cummins, J., Christopher Lewis, and Richard Phillips, 1998, Pricing Excess-of-loss Reinsurance contracts against catastrophic loss, Wharton Financial Institutions Center Working Paper. Duffie, Darrell, 1998, First-to-default valuation, Stanford University Working Paper. Duffie, Darrell, and Nicolae Garleanu, 2002, Risk and valuation of collateralized debt obligations, Stanford University Working Paper. Duffie, Darrell, and Kenneth Singleton, 1999, Simulating correlated defaults, Stanford University Working Paper. Ebnöther, S., P. Vanini, A. McNeil, and P. Antolinez-Fehr, 2001, Modelling operational risk, ETH Zürich Working Paper. Ebnöther, S., P. Vanini, A. McNeil, and P. Antolinez-Fehr, 2002, Operational risk: A practitioner s view, ETH Zürich Working Paper. Embrechts, P., F. Lindskog, and A. McNeil, 2001, Modelling dependence with copulas and applications to risk management, ETH Zürich Working Paper. Embrechts, P., A. McNeil, and D. Straumann, 1998, Correlation and dependency in risk management: Properties and pitfalls, ETH Zürich Working Paper. Embrechts, Paul, and Gennady Samorodnitsky, 2001, Ruin problem, operational risk and how fast stochastic processes mix, ETH Zürich Working Paper. Embrechts, P., and G. Samorodnitsky, 2002, Ruin theory revisited: Stochastic models for operational risk, ETH Zürich Working Paper. European Commission, 2002, Working document of the commission services on capital requirements for credit institutions and investment firms. (European Commission Brussels). 33
Fontnouvelle, Patrick, Virginia DeJesus-Rueff, John Jordan, and Eric Rosengreen, 2003, Using loss data to quantify operational risk, Federal Reserve Bank of Boston Working Paper. Frachot, A., P. Georges, and T. Roncalli, 2001, Loss distribution approach for operational risk, Groupe de Recherche Operationelle, Credit Lyonnais Working Paper. Frachot, A., and T. Roncalli, 2002, Mixing internal and external data for managing operational risk, Groupe de Recherche Operationelle, Credit Lyonnais Working Paper. Genest, C., and J. MacKay, 1986, The joy of copulas: Bivariate distributions with uniform marginals, The American Statistician 40. Giesecke, Kay, 2002a, Credit risk modelling and valuation: An introduction, Working Paper. Giesecke, Kay, 2002b, An exponential model for dependent defaults, Working Paper. Jarrow, Robert, and Stuart Turnbull, 1995, Pricing derivatives on financial securities subject to credit risk, Journal of Finance 50. Jarrow, Robert, and Fan Yu, 2001, Counterparty risk and the pricing of defaultable securities, Journal of Finance 56. Kühn, Reimer, and Peter Neu, 2002, Functional correlation approach to operational risk in banking organizations, Working Paper. Lando, David, 1998, On Cox processes and credit risky securities, Working Paper. Lawrence, Mark, 2000, Marking the cards at ANZ, Operational Risk - Risk special edition. McNeil, A., and F. Lindskog, 2001, Common Poisson shock models: Applications to insurance and credit risk modelling, RiskLab/ETH Zürich Working Paper. Medova, Elena, 2000, Measuring risk by extreme values, Operational risk special report. Medova, E., and M. Kyriacou, 2001, Extremes in operational risk management, Working Paper. Nelsen, Roger, K. Krickeberg, and R. Nelson, 1999, An introduction to copulas. (Springer New York). Risk Management Group, 2001a, Consultative Document Operational Rsik. (Basel Committee on Banking Supervision Basel). 34
Risk Management Group, 2001b, Working paper on the regulatory treatment of operational risk. (Basel Committee on Banking Supervision Basel). Risk Management Group, 2002, Sound practices for the management and supervision of operational risk. (Basel Committee on Banking Supervision Basel). Risk Management Group, 2003, The 2002 Loss Data Collection Exercise for Operational Risk: Summary of the Data Collected. (Basel Committee on Banking Supervision Basel). Romano, C., 2002, Applying copula function to risk management, Working Paper. Schönbucher, Philipp, 2000, Factor models for portfolio credit risk, Working Paper. Tasche, Dirk, 2002, Expected shortfall and beyond, Journal of Banking & Finance 26. Yu, Fan, 2002, Correlated defaults in reduced-form models, Working Paper. 35
A Proof of Equation 3 ( ) var (N(t)) = var N j (t) = = = = = = = = = > j=1 j=1 k=1 e=1 j=1 k=1 e=1 j=1 k=1 e=1 j=1 k=1 e=1 j=1 k=1 m cov ( Nj e (t), Nk(t) ) e m E N (e) (t) r=1 m E m ( j=1 k=1 e=1 m ( j=1 k=1 e=1 j=1 k=1 e=1 j=1 j=1 N (e) (t) r=1 cov (N j (t), N k (t)) N (e) (t) I (e) j,r I (e) k,r r=1 N (e)(t) I (e) j,r I(e) k,r + E r=1 N (e) (t) τ=1 N (e) (t) I (e) j,r r=1 I (e) j,r I(e) k,τ λ (e) p j,k (1, 1) + ( λ (e)) 2 pjr p k,τ (1, 1) λ (e) p (e) j λ (e) p j,k (1, 1) + λ (e) p j λ (e) p k λ (e) p (e) j m λ (e) p j,k (1, 1) m λ ( (e) p j + p j,k k j (1, 1) ) m λ (e) p j + e=1 j=1 k=1 e=1 m λ (e) p j,k k j (1, 1) ( m ) λ (e) p j = E N j (t) = E (N(t)) e=1 j=1 E λ (e) p (e) j ) λ (e) p (e) k N (e) (t) I (e) k,r r=1 ) λ (e) p (e) k λ (e) p (e) r τ k r τ 36
B Tables Table 5: Descriptive statistics of operational risk loss data Loss type Mean Median St. deviation Skewness Kurtosis Aggregate 41.5769 6.5329 236.4308 30.3488 1271.8100 Internal fraud 61.2614 6.5000 389.0508 21.5114 561.1984 External fraud 20.8947 5.0429 50.5469 9.9928 137.0378 Execution 29.7025 5.2489 87.6357 6.9436 59.3648 Employment 14.0230 4.1031 32.5137 6.0605 43.7911 Asset damage 79.3731 53.1347 85.3793 0.7058-1.0136 Clients 38.0702 7.4712 135.5509 10.0171 127.1541 Business disruption 42.5109 7.3699 98.1753 3.2539 10.9635 Notes: Values for mean and median are in million $. Except for the category asset damage, all loss types display high skewness and kurtosis with the median being significantly below the mean. 37