Extreme Value Theory with Applications in Quantitative Risk Management Henrik Skaarup Andersen David Sloth Pedersen Master s Thesis Master of Science in Finance Supervisor: David Skovmand Department of Business Studies 2010 Aarhus School of Business Aarhus University
...or how we learned to stop worrying and love em fat tails
Abstract In this thesis we investigate extreme value theory and its potential in financial risk management. In the first part of the thesis, we provide a thorough and rigorous exposition of extreme value theory (EVT). We describe the theoretical foundation of the theory covering the fundamental theorems and results. In relation to this, we explicitly emphasize the statistical issues and limitations of the theory with applications in financial risk management in mind. Moreover, we discuss how the theory may be applied to financial data and the specific issues that may arise in such applications. Also, we approach the issue of working with multivariate risk factors using copula theory and discuss some copula results in multivariate extreme value theory. In the second part of the thesis, we conduct an empirical study of the performance of EVT-based risk measurement methods based on an equallyweighted portfolio composed of three Danish stocks. The performance of the methods is evaluated by their ability to accurately estimate well-known risk measures such as Value at Risk (VaR) and Expected Shortfall (ES). Treating the portfolio value as a single risk factor, we consider a univariate EVT-method, HS-CONDEVT, which combines GARCH-type modeling of volatility and fitting of a generalized Pareto distribution to the tails of the underlying distribution. Compared to the performance of alternative univariate methods for risk measurement, the empirical results demonstrate that HS-CONDEVT outperforms alternative univariate methods such as historical simulation (HS) and HS combined with a GARCH-type model assuming normally distributed innovations. Moreover, HS-CONDEVT is found to be a viable alternative to filtered HS and to HS combined with a GARCH-type model assuming t distributed innovations. Treating the three stocks in the portfolio as risk factors, we consider a multivariate EVT-based method, MCONDEVT, which combines a copula with margins based on GARCH-type modeling and GPD fitting. MCON- DEVT is implemented in three variants using three different copulas; a Gaussian, a t, and a Gumbel copula. Comparatively, we find that the variants of the MCONDEVT method outperform other multivariate methods such as variance-covariance (VC), VC combined with a multivariate EWMA model, multivariate GARCH based on a constant conditional correlation structure, and multivariate GARCH based on a dynamic conditional correlation structure. Finally, comparing the performance univariate and multivariate methods altogether, we find that the implemented variants of the MCONDEVT method are among the top performing methods. Especially, MCONDEVT based on a t copula appears to have the best overall performance among the competing methods.
Contents 1 Introduction 1 1.1 Purpose and Research Questions................ 2 1.2 Delimitations........................... 3 1.3 Structure............................. 4 2 Theoretical Framework 5 2.1 Quantitative Risk Modeling and Measurement: Essential Concepts and Methods........................ 5 2.1.1 Empirical Properties of Financial Return Data.... 5 2.1.2 Risk Factors and Loss Distribution........... 6 2.1.3 Risk Measures...................... 8 2.1.4 Quantitative Methods for Risk Modeling........ 10 2.2 Extreme Value Theory...................... 13 2.2.1 Modeling of Extremal Events I: Block Maxima Models 14 2.2.2 Modeling of Extremal Events II: Peaks over Threshold Models........................... 21 2.3 Copula Theory and Multivariate Extremes........... 32 2.3.1 Copulas and Dependence Modeling........... 33 2.3.2 Copula Results in Multivariate Extreme Value Theory 40 3 Methodology 46 3.1 Data Selection.......................... 46 3.2 Method Selection and Implementation............. 47 3.2.1 Univariate Methods................... 48 3.2.2 Multivariate Methods.................. 53 3.2.3 Dynamic Models for Changing Volatility........ 60 3.3 Backtesting Methodology.................... 63 3.3.1 Backtesting Value at Risk................ 64 3.3.2 Backtesting Expected Shortfall............. 66 4 Empirical Results 69 4.1 Univariate Risk Measurement Methods............. 69 4.1.1 Preliminary Data Analysis and Descriptive Statistics. 69
4.1.2 Dynamic Model Selection................ 70 4.1.3 Relative Performance of the Methods.......... 73 4.2 Multivariate Risk Measurement Methods............ 86 4.2.1 Preliminary Data Analysis and Descriptive Statistics. 87 4.2.2 Dynamic Model Selection................ 89 4.2.3 Relative Performance of the Methods.......... 89 4.3 Overall Comparison of Backtest Results............ 101 5 Reflections 106 5.1 Implications and Limitations of the Results.......... 106 5.2 Applicability in Practice..................... 108 5.3 Ideas for Further Research.................... 110 6 Conclusion 113 Bibliography 116 A Derivations 124 A.1 Derivation of Equation 2.26................... 124 A.2 Derivation of Equation 2.36................... 125 A.3 Derivation of Equation 2.37................... 125 A.4 Derivation of Equation 2.40................... 126 A.5 Derivation of Equation 2.44................... 126 A.6 Derivation of Equation 2.45................... 127 A.7 Derivation of Equation 2.55................... 127 A.8 Derivation of Equation 2.57................... 128 A.9 Derivation of Equation 2.62................... 129 B Tables 131
List of Tables 4.1 Descriptive statistics for the portfolio losses.......... 71 4.2 Parameter estimates and descriptive statistics for the standardized residuals......................... 74 4.3 1-day VaR-based backtest results: Right tail.......... 78 4.4 1-day VaR-based backtest results: Left tail........... 80 4.5 10-day VaR-based backtest results............... 81 4.6 1-day ES-based backtest results................. 83 4.7 10-day ES-based backtest results................ 85 4.8 Sensitivity analysis: VaR-based backtest results........ 86 4.9 Sensitivity analysis: ES-based backtest result......... 87 4.10 Descriptive statistics for the risk factor return series..... 88 4.11 Parameter estimates and descriptive statistics for the standardized residuals......................... 90 4.12 1-day VaR-based backtest results: Right tail.......... 91 4.13 1-day VaR-based backtest results: Left tail........... 94 4.14 10-day VaR-based backtest results............... 95 4.15 1-day ES-based backtest results................. 98 4.16 10-day ES-based backtest results................ 100 4.17 Sensitivity analysis: VaR-based backtest results........ 101 4.18 Sensitivity analysis: ES-based backtest result......... 102 B.1 Information criteria values for the fitted dynamic models... 132 B.2 1-day VaR-based backtest results: Right tail.......... 133 B.3 1-day VaR-based backtest results: Left tail........... 134 B.4 1-day ES-based backtest results................. 135 B.5 10-day VaR-based backtest results............... 136 B.6 10-day ES-based backtest results................ 137
List of Figures 4.1 Time series of portfolio losses................... 70 4.2 Correlograms for the in-sample raw portfolio losses (A) and their squared values (B) as well as for the total sample raw portfolio losses (C) and their squared values (D)........ 72 4.3 Information criteria values for the fitted models. AR1 denotes a first order autoregressive mean specification, t and n denotes the distribution assumption, and GPQ denotes a GARCHtype variance specification with P and Q order numbers.... 73 4.4 Correlograms for the in-sample raw standardized residuals (A) and their squared values (B) as well as for the total sample raw standardized residuals (C) and their squared values (D) extracted from the AR(1)-GJR(1,1) model fitted with the QML estimation method......................... 75 4.5 Correlograms for the in-sample raw standardized residuals (A) and their squared values (B) as well as for the total sample raw standardized residuals (C) and their squared values (D) extracted from the AR(1)-GJR(1,1) model fitted with ML under the assumption of t distributed innovations......... 76 4.6 QQ-plots for the in-sample standardized residuals versus a normal distribution (A) and a t distribution (B) as well as for the total sample standardized residuals versus a normal distribution (C) and a t distribution (D)............. 77 4.7 1-day VaR α=99% estimates plotted against the portfolio losses. 81 4.8 10-day VaR α=99% estimates plotted against the 10-day portfolio losses............................. 83 4.9 Time series of risk-factor returns. These are log-returns on (A) NOVO B, (B) CARLS B, and (C) DANSKE....... 88 4.10 1-day VaR α=99% estimates plotted against the portfolio losses. 96 4.11 10-day VaR α=99% estimates plotted against the 10-day portfolio losses............................. 98
Chapter 1 Introduction In the last fifty years, the ten most extreme days in the financial markets represent half the returns. [Taleb, 2007, p. 275]. Lately, the financial crisis has exposed major shortcomings of the traditional risk assessment methodologies in terms of capturing the risk of rare but damaging events, which has made the search for better approaches to risk modeling and measurement more crucial than ever. The above quote by Taleb from his famous book The Black Swan captures the essence of what we are up against. By its very nature, the risk of extreme events (e.g. very large losses) is related to the tails of the distribution of the underlying data generating process. Thus, a crucial challenge in getting good risk measure estimates is to be able to estimate the tail of the underlying distribution as accurately as possible. Since the pioneering works of Mandelbrot [1963] and Fama [1965], several studies have documented that financial return series have more mass in the tail areas than what would be predicted by a normal distribution. In other words, the return distributions have fat-tails causing the probability of extreme values to be higher than in a normal distribution. To capture this phenomenon, early studies tried to model the distribution of financial returns using stable distributions like the Cauchy distribution. However, because financial theory almost always requires finite second moments of returns, and often higher moments as well, these distributions have lost their popularity [Campbell et al., 1997]. Instead, more recent studies have resorted to some kind of mixture distribution, e.g. the Normal-Inverse-Gaussian or the 1
Variance-Gamma distributions, which are more tractable as moments of all orders exist. For the purpose of measuring financial risk, however, our practical interest is concentrated on the tails. So, instead of forcing a single distribution for the entire return series, one might just investigate the tails of the returns using some kind of limit laws. This is where extreme value theory may become the star of the show by providing statistically well-grounded results on the limiting behavior of the underlying distribution. Pioneered by Fisher and Tippett [1928], Gnedenko [1943], and Gumbel [1958] and later by Balkema and de Haan [1974] and Pickands [1975], extreme value theory has been around for quite some time as a discipline within probability and statistics. Applications of the theory have since appeared in diverse areas such as hydrology or wind engineering. Only recently, though, has extreme value theory seen the light of day within the realms of finance, the first comprehensive volume on the theory completely devoted to finance and insurance being Embrechts et al. [1997]. However, since its introduction to finance, the body of research on financial applications of extreme value theory has grown considerably. With respect to tail estimation and risk measurement, two crucial properties make extreme value theory particularly attractive. First, it is based on well-established and sound statistical theory. Secondly, it offers a parametric form for the tail allowing us to model rare and extreme phenomena that lie outside the range of available observations. Thus, extreme value theory may provide the means to obtain more accurate risk measure estimates that are true to the extremal or fat-tailed behavior of the underlying distribution. In this thesis, we wish to investigate this possibility further. 1.1 Purpose and Research Questions The purpose of this thesis is to investigate extreme value theory and its potential in financial risk management. We will provide a thorough and rigorous exposition of the theoretical foundation of extreme value theory. In relation to this, we will explicitly emphasize the statistical issues and limitations of the theory with applications in financial risk management in mind. Moreover, we will discuss how the theory may be applied to financial data and the specific issues that may arise in such applications. Using return data on three Danish stocks, we conduct an empirical study of the performance of risk measurement methods based on extreme value theory. Compared to the performance of various alternative methods for risk measurement, the methods are evaluated by their ability to accurately estimate well-known risk measures such as Value at Risk (VaR) and Expected Shortfall (ES). Furthermore, most studies on extreme value theory in finance have focused solely on a univariate setting where only one risk factor is 2
accounted for; this thesis will also investigate the performance of extreme value theory based methods in a more realistic, multivariate setting where we deal with more than one risk factor. In conclusion, the thesis seeks to provide answers to the following three research questions: 1. What is the theoretical foundation of extreme value theory? 2. How can extreme value theory be applied to financial risk measurement and what kind of issues arise? 3. Compared to alternative risk measurement methods, how do methods based on extreme value theory perform with respect to estimating Value at Risk and Expected Shortfall? 1.2 Delimitations In our discussions of financial risk measurement as well as in the empirical study, we concentrate on market risk. Market risk is the risk of a movement in the value of a financial position due to changes in the value of the underlying components on which the position depends, such as stock, bond, and commodity prices, exchange rates, etc. However, banks and other financial institutions are exposed to other categories of risks. One such category is credit risk, which is the risk of not receiving promised repayments on outstanding investments such as loans and bonds, because of the default of the borrower. Another category is operational risk, the risk of losses resulting from inadequate or failed internal processes, people and systems, or from external events [McNeil et al., 2005]. Even though, we consider these kinds of risks equally important, they will not be further covered in this thesis. Furthermore, in the empirical study, we only consider the market risk associated with a financial position in stocks, specifically three stocks listed on the Danish C20 index (OMXC20). We also refrain from discussing nor applying other risk measures than Value at Risk and Expected Shortfall. We acknowledge that other risk measures may have merit. However, since Value at Risk was sanctioned by the Basel Committee in 1996 for market risk capital requirement, it has become the standard measure for financial market risk [Wong, 2007]. Expected Shortfall is closely related to Value at Risk but so far less used in practice. Nonetheless, it addresses some of the deficiencies of Value at Risk that critics have pointed out. For these reasons, we concentrate on these two measures of risk. Further delimitations will be made throughout the thesis when appropriate. 3
1.3 Structure The main part of the thesis is structured in three chapters covering the theoretical framework of the thesis, the methodology of the empirical study, and ultimately a presentation of the empirical results. This is followed by a short chapter on the implications and limitations of our study, leading up to a conclusion in the final chapter. Thus, the structure of the thesis is as follows Chapter 2 The chapter presents the theoretical framework of the thesis. We first review some essential concepts and methods within quantitative risk management. Following this, we turn to the primary topic of the thesis, extreme value theory. We provide a thorough exposition of the theory and its main results with applications in financial risk management in mind. Finally, we approach the issue of working with multivariate risk factors using copula theory and discuss some copula results in multivariate extreme value theory. Chapter 3 The chapter outlines the methodology of the empirical study. We first discuss the selection of financial data for our empirical investigation. We then turn to the selection and implementation of the different risk measurement methods, giving special emphasis to the statistical issues involved. Lastly, we describe the methodology used for backtesting and performance evaluation of the implemented risk measurement methods. Chapter 4 The chapter presents the results of our empirical study. We evaluate the relative performance of the risk measurement methods, those based on extreme value theory as well as alternative methods, with respect to estimating Value-at-Risk and Expected Shortfall. To this end, we use both statistical tests and more qualitative assessments. Chapter 5 The chapter discusses the implications and limitations of our study and main results. In this connection, ideas for further research within extreme value theory and its applications in financial risk management are proposed. Chapter 6 The final chapter summarizes our main findings and concludes on our study. 4
Chapter 2 Theoretical Framework In this chapter we present the theoretical framework of the thesis. The concepts, theories, and results discussed in the following sections constitute the foundation of the empirical study. In Section 2.1, we discuss some essential concepts and methods within quantitative risk modeling and measurement, which will be used throughout the thesis. After this, we turn to the primary topic of the thesis in Section 2.2, namely extreme value theory, giving a thorough and rigorous account and discussion of the theory and its central results with applications in financial risk management in mind. Finally, in Section 2.3, we approach the issue of working with multivariate risk factors using copula theory and we discuss some copula results in multivariate extreme value theory. 2.1 Quantitative Risk Modeling and Measurement: Essential Concepts and Methods In this section we introduce a series of concepts and definitions within the discipline of quantitative finance, which will be used throughout the thesis. We start by describing some general empirical properties of financial return series data in Section 2.1.1. This is followed by a discussion of the concept of financial loss distributions and risk factors in Section 2.1.2. Here we especially dwell on the difference between unconditional and conditional loss distributions. In Section 2.1.3 we discuss well-known measures of financial risk such as Value at Risk and Expected Shortfall. Finally, we outline the three main quantitative methods for modeling financial risk and their limitations in Section 2.1.4. 2.1.1 Empirical Properties of Financial Return Data Since stock prices are mostly non-stationary (usually, integrated of order 1), it is common to model relative changes of prices, i.e. the log-return series 5
[Cont, 2001]. In this section, we give an overview of some typical properties of daily financial return data, which have become known as stylized facts. These properties often also extend to series of both longer (weekly or monthly) and shorter (intra-day) time interval series [McNeil et al., 2005]. Financial return series are not independently and identically distributed (iid). They tend to exhibit temporal dependence in the second moment. In other words, while return series seem to show little serial correlation, absolute or squared returns seem to be highly serially correlated, causing time-varying volatility and volatility clustering. Volatility clustering is the tendency for large returns (of either sign) to be followed by more large returns (of either sign) [Campbell et al., 1997, McNeil et al., 2005]. Also, Black [1976] found that negative innovations to stock returns tend to increase volatility more than positive innovations of similar magnitudes. This phenomenon has become known as the leverage effect. Furthermore, Fama [1965] found that financial returns appear to have heavy-tailed or leptokurtic distributions. Compared to the normal or Gaussian distribution, return distributions tend to exhibit excess kurtosis (i.e. kurtosis larger than 3) indicating that returns have more mass in the tail areas than predicted by the normal distribution [Campbell et al., 1997]. In mathematical terms, the tails seem to display a slow, power-law type of decay different from the faster, exponential type of decay displayed by the normal distribution [Cont, 2001]. When we deal with multivariate return series we have similar stylized facts. While multivariate returns series show little evidence of cross correlation (except for contemporaneous returns), absolute returns of such series tend to exhibit cross correlation. In addition, the contemporaneous correlation between returns appears to be time varying. Also, multivariate returns series tend to exhibit tail or extremal dependence, i.e. extreme returns in different return series often tend to coincide. Moreover, extremal dependence seems to be asymmetric; joint negative returns show more tail dependence than joint positive returns [McNeil et al., 2005]. The last two stylized facts correspond to the phenomenon that correlations observed in calm periods differ from correlations observed during financial turmoil. 2.1.2 Risk Factors and Loss Distribution Consider a portfolio of financial assets and let V t denote its current value. The portfolio value is assumed to be observable at time t. The portfolio loss over the time interval from t to t + 1 is written as L t+1 = (V t+1 V t ) (2.1) Because V t+1 is unknown to us, L t+1 is random from the perspective of time t. The distribution of L t+1 will be referred to as the loss distribution. Note that the definition of a loss presented here implicitly assumes that the portfolio 6
composition is constant over the considered time interval. The portfolio value V t will be modeled by a function of time and a set of d underlying risk factors. We write V t = f(t, Z t ), (2.2) for some measurable function f : R + R d R, where Z t = (Z t,1,..., Z t,d ) denotes a d-dimensional vector of risk factors. We define the time series process of risk factor changes {X t } t N, where X t := Z t Z t 1. Using the function f we can relate the risk factor changes to the changes in the portfolio value as L t+1 = (f(t + 1, Z t + X t+1 ) f(t, Z t )). (2.3) Given realizations z t of Z t, we define the loss operator l [t] : R d R at time t as l [t] (x) := (f(t + 1, z t + x) f(t, z t ), x R d, (2.4) and we can write L t+1 = l [t] (X t+1 ) as shorthand notation for the portfolio loss. In practice, it is often convenient to work with the so-called delta approximation. Assuming that the mapping f is differentiable, we may use a first-order approximation of the loss operator instead of the true operator. We define the linearized loss operator as ) d l[t] (f (x) := t (t, z t ) + f zi (t, z t )x i, (2.5) where the terms f t and f zi are the partial derivatives of f with respect to time and risk factor i. The linear approximation makes the problem of modeling l [t] simpler to handle analytically by representing it as a linear function of the risk-factor changes. The quality of the approximation is influenced by the length of the time interval and the size of the second order derivatives. It works best for short time horizons and if the portfolio value is approximately linear in the risk factors. As Z t is observable from the perspective of time t, the loss distribution is determined entirely by the distribution of X t+1. If we assume that {X t } t N follows a stationary time series, we have to make a distinction between the conditional and unconditional loss distribution. If we assume instead that {X t } t N is an iid series, the two distributions coincide. Let F t = σ ({X s : s t}) be the Borel σ-field representing all information on the risk factor developments up to the present. This leads us to the formal definitions below. Definition 1 (Unconditional Loss Distribution). The unconditional loss distribution F Lt+1 is the distribution of l [t] ( ) under the stationary distribution F X, where F X denotes the unconditional distribution of X assuming stationarity. i=0 7
Definition 2 (Conditional Loss Distribution). The conditional loss distribution F Lt+1 F t is the distribution of l [t] ( ) under F Xt+1 F t, where F Xt+1 F t denotes the conditional distribution of X t+1 given F t. Conditional risk measurement focuses on modeling the dynamics of {X t } t N in order to make risk forecasts. If we do not model any dynamics, we basically assume that X forms a stationary time series with a stationary distribution F X on R d. We will mainly consider conditional risk measurement methods as they appear most suitable for market risk management and shorter time intervals. The worries of market risk managers center on the possible size of the short-term (e.g. one-day or two-week) loss caused by unfavorable shifts in market values. Thus, they are concerned about the tails of the conditional loss distribution, given the current volatility background [McNeil and Frey, 2000]. The unconditional loss distribution is more relevant when interest centers on possible worst case scenarios over longer periods (e.g. one year or more) and is more frequently used in credit risk management [McNeil et al., 2005]. 2.1.3 Risk Measures In this section we discuss statistical summaries of the loss distribution that quantify the portfolio risk. We call these summaries risk measures. First, we introduce the so-called axioms of coherence, which are properties deemed desirable for measures of risk. Hereafter, we discuss two widely used measures of financial risk: Value at Risk and Expected Shortfall. Both risk measures consider only the downside risk, i.e. the right tail of the loss distribution. Artzner et al. [1999] argue that a good measure of risk should satisfy a set of properties termed the axioms of coherence. Let financial risks be represented by a set M interpreted here as portfolio losses, i.e. L M. Risk measures are real-valued functions ϱ : M R. The amount ϱ(l) represents the capital required to cover a position facing a loss L. The risk measure ϱ is coherent if it satisfies the following four axioms: 1. Monotonicity: L 1 L 2 = ϱ(l 1 ) ϱ(l 2 ). 2. Positive homogeneity: ϱ(λl) = λϱ(l), λ > 0. 3. Translation invariance: ϱ(l + l) = ϱ(l) + l, l R. 4. Subadditivity: ϱ(l 1 + L 2 ) ϱ(l 1 ) + ϱ(l 2 ). Monotonicity states that positions which lead to higher losses in every state of the world require more risk capital. Positive homogeneity implies that the capital required to cover a position is proportional to the size of that position. Translation invariance states that if a deterministic amount l is added to the position, the capital needed to cover L is changed by precisely 8
that amount. Subadditivity reflects the intuitive property that risk should be reduced or at least not increased by diversification, i.e. the amount of capital needed to cover two combined portfolios should not be greater than the capital needed to cover the portfolios evaluated separately. In the following discussion of Value at Risk and Expected Shortfall, we put aside the distinction between l [t] and l[t] and also between unconditional and conditional loss distributions, assuming that the choice of focus has been made from the outset of the analysis. Also, we denote the distribution function of the loss L t+1 := L by F L, so that F L (x) = P(L x), x R. Value at Risk Value at Risk (VaR) is the maximum loss over a given period that is not exceeded with a high probability. We begin with a formal definition of the concept. Definition 3 (Value at Risk). The Value at Risk (VaR) at confidence level α (0, 1) is defined as the smallest value x such that the probability of L exceeding x is no larger than (1 α) VaR α := inf {x R : P(L > x) 1 α} = inf {x R : F L (x) α}. (2.6) Using the concepts of generalized inverse and quantile functions given in Definition 4, it is clear that VaR is simply the quantile of the loss distribution F L. Consequently, (2.6) can be written as VaR α := q α (F L ) = F L (α). (2.7) Definition 4 (Generalized Inverse and Quantile Function). 1. Given some increasing function F : R R, the generalized inverse of F is defined as F (y) = inf {x R : F (x) y}, where we set inf { } =. 2. At any confidence level α (0, 1), the quantile of a distribution function F is defined as q α (F ) = inf {x R : F (x) α} = F (α). VaR α has been adopted into the regulatory Basel framework for banks as the major determinant of the risk capital required for covering potential losses arising from market risks [Basel Committee on Banking Supervision, 2004]. A major advantage of VaR α is that it does not depend on a specific kind of distribution and therefore, in theory, can be applied to any kind of financial asset [Danielsson, 2007]. In addition, VaR α is intuitively appealing because of its ability to describe the financial risk of a portfolio in a single figure. Its simplicity makes it an attractive risk measure because it is easily comprehended and communicated to the quantitatively novice compared to other risk measures. 9
However, by definition VaR α gives no information about the size of the losses which occur with probability smaller than 1 α, i.e. the measure does not tell how bad it gets if things go wrong. Moreover, Artzner et al. [1999] make the observation that VaR α fails to satisfy the axiom of subadditivity in all cases, implying that the VaR α of a portfolio is not necessarily bounded above by VaR α of the individual portfolio components added together. 1 This is very unfortunate as non-subadditive risk measures can lead to misleading conclusions and wrong incentives, e.g. to avoid portfolio diversification and to split entire companies up into separate legal entities to reduce regulatory capital requirements. This conceptual deficiency has led to much debate and criticism of VaR α as a risk measure. Given these problems with VaR α, we seek an alternative measure which satisfies the axioms of coherence. Expected Shortfall The second risk measure we consider is Expected Shortfall (ES). Again, we begin with a formal definition of the concept. Definition 5 (Expected Shortfall). For a loss L with E( L ) < and distribution function F L, the Expected Shortfall (ES) at confidence level α (0, 1) is defined as ES α := 1 1 α 1 α q ϕ (F L )dϕ = 1 1 α 1 where q ϕ (F L ) = F L (ϕ) is the quantile function of F L. α VaR ϕ (L)dϕ, (2.8) If the loss distribution F L is continuous, ES α can be thought of as the average loss given VaR α is exceeded. That is ES α := E ( ) LI {L VaRα} = E(L L VaR α ), (2.9) 1 α where I {L V arα} is a binary violation indicator. ES α may be considered superior to VaR α for two reasons. First, in contrast to VaR α, ES α gives an idea of how bad things can get, i.e. it informs about the probable size of the worst loss which occur with probability 1 α. Second, Artzner et al. [1999] find that ES α satisfies the axioms of coherence, including subadditivity (for a formal proof of ES being a coherent risk measure, see McNeil et al. [2005] p. 243). 2.1.4 Quantitative Methods for Risk Modeling Statistical methods for modeling the distribution of a loss L t+1 = l [t] (X t+1 ) can be divided into three main methods: Variance-Covariance, Historical 1 McNeil et al. [2005] demonstrate that the non-subadditivity of VaR α can occur when the dependence structure is of a highly asymmetric form or when the portfolio components have highly asymmetric loss distributions. 10
Simulation, and Monte Carlo Simulation. We present the basics of each method, discuss its limitations and suggest possible extensions. Variance-Covariance Method We begin by presenting the unconditional version of the variance-covariance (VC) method. In contrast to historical simulation and Monte Carlo methods, VC provides an analytical solution to the risk measure estimation problem which requires no simulation. The method is based on the following two assumptions: 1. The vector of risk factor changes X t+1 has an (unconditional) multivariate normal distribution denoted by X t+1 N d (µ, Σ), where µ is the mean vector and Σ is the covariance matrix. 2. The linearized loss in terms of risk factors L t+1 := l [t] (X t+1) is a sufficiently accurate approximation of L t+1. The second assumption allows us to estimate risk measures based on the distribution of L t+1 instead of L t+1, which makes the estimation problem analytically tractable. Taken together, the assumptions ensure that the loss distribution is linear in the risk factor changes and univariate normal. Specifically, the linearized loss operator has the form l [t] (x) = (c t+b tx) and since the multivariate normal distribution is stable under affine transformation, we have l [t] (X t+1) N ( c t b t µ, b tσb t ), (2.10) where c t and b t denote some constant and constant vector known at time t, respectively. The mean vector µ and the covariance matrix Σ are estimated from the risk factor change data X t n+1,..., X t. Estimates of the risk measures VaR and ES are calculated from the estimated moments of the distribution. The assumptions underlying the method may have some undesirable consequences. The linearized loss can be a poor approximation for portfolios with nonlinear instruments such as options or if risk is measured over long time horizons as the first-order approximation only works well with small risk factor changes. However, the most paramount disadvantage is the unconditional normality assumption which may lead to underestimation of the risk exposure due to the small probability assigned to large losses [Hull and White, 1998]. A conditional version of the VC method is obtained if we alter the first assumption and instead assume that the vector X t+1 follows a conditional multivariate normal distribution, i.e. X t+1 F t N d (µ t+1, Σ t+1 ). In consequence, the conditional loss distribution has conditional mean E(L t+1 F t) = (c t + b tµ t+1 ) and conditional variance V ar(l t+1 F t) = b tσ t+1 b t. The conditional moments can be estimated based on a multivariate dynamic model, and risk measure estimates can then be calculated from these estimated moments. 11
Historical Simulation In the Historical Simulation (HS) method the loss distribution of L t+1 is estimated under the empirical distribution of historical data X t n+1,..., X t. Thus, the method does not rely on any parametric model assumptions. It does, however, rely on stationarity of X to ensure convergence of the empirical loss distribution to the true loss distribution. The historically simulated loss series is generated by using the loss operator applied to recent historical risk factor changes. The univariate dataset of historically simulated losses is given by { L s = l [t] (X s ) : s = t n + 1,..., t}, (2.11) where the values L s represent the losses that would occur if the historical riskfactor returns on day s reoccurred at time t + 1. Statistical inference about the loss distribution and risk measures can be made using the historically simulated data L t n+1,..., L t. To ensure sufficient estimation precision, HS requires large amounts of relevant and synchronized data for all risk factors. However, it is not always practically feasible to obtain such large appropriate samples of data. Even if data is available, the history of appropriate data may only contain a few if any extreme observations. Additionally, the unconditional nature of the method makes it likely to miss periods of temporarily elevated volatility which can result in clusters of VaR violations [Jorion, 2001]. We can combine HS with a univariate time series model calibrated to the historical simulation data and thereby estimate a conditional loss distribution. In principle, we are not estimating the conditional loss distribution defined in Section 2.1.2; we are estimating the conditional loss distribution F Lt+1 G t, where G t = σ({ L s : s t}). Even though we are working with a reduced information set, this simple method may work well in practice [McNeil et al., 2005]. Monte Carlo Simulation The main idea of the Monte Carlo (MC) method is to estimate the distribution of L t+1 = l [t] (X t+1 ) under some explicit parametric model for X t+1. Unlike VC, we make no use of the linearized loss operator to make the estimation problem analytically tractable. Instead, we make inference about the loss distribution by simulating new risk factor change data. MC is essentially a three-step method. First, we set up a data generating process (DGP) by calibrating the parameters of a suitable parametric model to historical risk factor change data X t n+1,..., X t. Second, we simulate a large set of m independent future realizations of {X t } t N, denoted by X (1) (m) t+1,..., X t+1. Third, we construct Monte Carlo simulated loss data by applying the loss operator to each realization { L (i) t+1 = l [t](x (i) t+1 ) : i = 1,..., m}. (2.12) 12
Statistical inference about the loss distribution and risk measures is made (1) (m) using the simulated losses L t+1,..., L t+1. In contrast to HS, the method avoids the problem of having insufficient synchronized historical data. Also, we can address heavy tails and extreme scenarios through the pre-specified stochastic risk factor change process. For large portfolios the computational costs of using MC can be large, especially if the loss operator is difficult to evaluate. This is the case when the portfolio holds complex instruments, e.g. derivatives for which no closedform price solution is available. The same critique applies to HS but to a smaller degree since the sample size n representing the number of historical simulations is usually smaller than the number of simulations m in MC. An alternative or supplement to MC is to use bootstrapping 2. Where MC simulates new data by setting up a DGP and generating random numbers from a hypothetical distribution, bootstrapping simulates new data by vector-wise random sampling from X = (X t 1+n,..., X t ) with replacement as many times as needed [Efron and Tibshirani, 1993]. However, a large sample size is needed to ensure that the bootstrapped distribution is a good approximation of the true one. A further drawback is that any pattern of time variation in X is broken by the random sampling. This can be circumvented by combining MC and bootstrapping. In this case, we would set up a DGP without assuming a theoretical innovation distribution but instead applying bootstrapping to the standardized residuals. 2.2 Extreme Value Theory The purpose of this section is to give a thorough and self-contained account and discussion of extreme value theory (EVT) with applications in financial risk management in mind, but without losing mathematical rigor. Within the context of EVT, there are roughly two approaches to modeling extremal events. One of them is the direct modeling of the distribution of maximum (or minimum) realizations. These kinds of models are known as block maxima models. The other approach is the modeling of exceedances of a particular threshold. Models based on this approach are known as peaks over threshold models. Today, it is generally acknowledged that the latter approach uses data more efficiently and it is therefore considered the most useful for practical applications [McNeil et al., 2005]. EVT rests on the assumption of independently and identically distributed (iid) data. In this thesis, however, we are concerned with financial time series data and a stylized fact of financial log-returns is that they tend to exhibit dependence in the second moment, i.e. while they are seemingly uncorrelated, the autocorrelation of the squared (or absolute) log-returns is significant. Consequently, when EVT is applied to financial time series data 2 For an introduction to bootstrap methods, see Efron and Tibshirani [1993] 13
we need to take temporal dependence into account. If not, we will produce estimators with non-optimal performance [Brodin and Klüppelberg, 2006]. The Sections 2.2.1 and 2.2.2 describe the block maxima and the peaks over threshold models, respectively, and are organized as follows: First, we present the mathematical concepts and results that constitute the theoretical foundation of the two extreme value modeling approaches. Second, we present the models, their assumptions, limitations, and statistical estimation based on maximum likelihood (ML). Third, we discuss how the models can be generalized and applied to financial time series data and the subtleties this involves. And finally, we discuss how to estimate quantiles and risk measures. 2.2.1 Modeling of Extremal Events I: Block Maxima Models Suppose that {X i } i N is a sequence of iid non-degenerate random variables representing financial losses with common distribution function F (x) = P(X i x). The Generalized Extreme Value Distribution Let M n = n i=1 X i = max(x 1,..., X n ) define the sample maxima of the iid random variables. In classical EVT we are interested in the limiting distribution of affinely transformed (normalized) maxima. The mathematical foundation is the class of extreme value limit laws originally derived by Fisher and Tippett [1928] and summarized in Theorem 1. Theorem 1 (Fisher and Tippett [1928], Gnedenko [1943]). If there exist norming constants c n > 0 and d n R such that c 1 n (M n d n ) d H (2.13) for some non-degenerate 3 distribution function H, then H belongs to one of the following three families of distributions: Gumbel: Λ(x) = exp { e x }, x R. Fréchet: Φ α (x) = { 0, x 0, exp { x α }, x > 0, α > 0. Weibull: Ψ α (x) = { exp { ( x) α }, x 0, α > 0, 1, x > 0. A rigorous proof of the theorem can be found in Gnedenko [1943]. 3 A non-degenerate distribution function is a limiting distribution function that is not concentrated on a single point [McNeil et al., 2005]. 14
The Λ, Φ α and Ψ α distribution functions are called standard extreme value distributions. In accordance with von Mises [1936] and Jenkinson [1955], we can obtain a one-parameter representation of the three standard distributions. This representation is known as the standard generalized extreme value (GEV) distribution. Definition 6 (Generalized Extreme Value Distribution). The distribution function of the standard GEV distribution is given by { exp { (1 + ξx) 1/ξ}, ξ 0, H ξ (x) = exp { e x (2.14) }, ξ = 0, where 1 + ξx > 0 and ξ is the shape parameter. The related location-scale family H ξ;µ,σ can be introduced by replacing the argument ( x above by (x µ)/σ for µ R, σ > 0; that is H ξ,µ,σ (x) := H x µ ) ξ σ. The support has to be adjusted accordingly. Moreover, due to its crucial role in determining the likelihood function when fitting the GEV distribution, we calculate the density function of the three-parameter GEV distribution, obtained by differentiating H ξ,µ,σ (x) with respect to x. { 1 ( σ 1 + ξ x µ ) 1/ξ 1 σ exp { ( 1 + ξ x µ ) } 1/ξ σ, ξ 0, h ξ,µ,σ (x) = 1 σ exp { x µ } { σ exp e (x µ)/σ } (2.15), ξ = 0, Theorem 1 shows that affinely transformed maxima converge in distribution to the GEV distribution H ξ, and convergence of type (see Embrechts et al. [1997], p. 121 and p. 554) insures that the limiting distribution is uniquely determined up to affine transformations. 4 Under the iid assumptiom, the exact distribution function of the maxima M n is P(M n x) = P(X 1 x,..., X n x) = F n (x), x R, n N. (2.16) As a result of (2.16) and the fact that the extreme value distribution functions are continuous on R, c 1 n (M n d n ) d H is equivalent to lim P(M n c n x + d n ) = lim F n (c n x + d n ) = H(x), (2.17) n n or equivalently, by taking logarithms and using ln(1 y) y as y 0, we have lim n(1 F (c nx + d n )) = lim n F (c n x + d n ) = ln H(x). (2.18) n n 4 Using the identity min(x 1,..., X n) = max( X 1,..., X n), it can be shown that the appropriate limits of minima are distributions of type 1 H ξ ( x), see McNeil et al. [2005] p. 267. 15
In fact, we have the more general equivalence. sequence u n of real numbers For 0 τ and any lim n F (u n ) = τ n lim P(M n u n ) = e τ. (2.19) n where F is defined by F = 1 F and denotes the tail of F. Definition 7 (Maximum Domain of Attraction). If (2.17) holds for some norming constants c n > 0, d n R and non-degenerate distribution function H, we say that the distribution function F belongs to the maximum domain of attraction of the extreme value distribution H, and we write F MDA(H). Consequently, we can restate Theorem 1: Theorem 2 (Fisher-Tippett-Gnedenko Theorem Restated). If F is in the maximum domain of attraction of some non-generate distribution function H (F MDA(H)), then H must be a GEV distribution, i.e. H belongs to the distribution family H ξ. The Fisher-Tippett-Gnedenko Theorem essentially says that the GEV distribution is the only possible limiting distribution for normalized maxima. If ξ = α 1 > 0, F is said to be in the maximum domain of attraction of the Fréchet distribution Φ α. Distributions in this class include the Pareto, t, Burr, log-gamma, and Cauchy distributions. If ξ = 0, F is in the maximum domain of attraction of the Gumbel distribution Λ, which includes distributions such as the normal, log-normal, and gamma distributions. Finally, if ξ = α 1 < 0, F is in the maximum domain of attraction of the Weibull distribution Ψ α, which includes distributions such as the uniform and beta distributions [McNeil et al., 2005, Embrechts et al., 1997]. Maximum Domain of Attraction In the following we will investigate what kind of underlying distributions F that give rise to which limit laws by characterizing the maximum domain of attraction of each of the three extreme value distributions. For the Fréchet distribution the maximum domain of attraction consists of distribution functions F whose tails are regularly varying 5 with negative index of variation. Regularly varying functions are functions which can be represented by power functions multiplied by slowly varying 6 functions. 5 A Lebesque-measurable function ψ: R + R + is regularly varying at with index ρ R if ψ(tx) lim x ψ(x) = tρ, t > 0, and we write ψ RV ρ. See Resnick [2007] ch. 2. 6 A Lebesque-measurable function L: R + R + is slowly varying at if L(tx) lim = 1, t > 0, x L(x) and we write L RV 0. See Resnick [2007] ch. 2. 16
Thus, the distribution function F belongs to the maximum domain of attraction of Φ α, α > 0, if and only if F = x α L(x) for some slowly varying function L. That is F MDA(Φ α ) F RV α, where α = 1/ξ is called the tail index of the distribution. This class of distribution functions contains very heavy-tailed distributions in the sense that E[X k ] =, k > α for some non-negative stochastic variable X with distribution function F MDA(Φ α ), which makes the distributions specifically attractive for modeling large fluctuations in log-returns and other financial applications. The maximum domain of attraction of the Weibull distribution consists of distribution functions F with support bounded to the right, i.e. they have a finite right endpoint, x F = sup{x R : F (x) < 1} <. The distribution function F belongs to the maximum domain of attraction of Ψ α, α > 0, if and only if x F < and F (x F x 1 ) = x α L(x) for some slowly varying function L. That is F MDA(Ψ α ) x F <, F (xf x 1 ) RV α. The fact that x F < renders this class of distributions the least appropriate for modeling extremal events in finance. In practice, financial losses clearly have an upper limit, but often distributions with x F = is favored since they allow for arbitrarily large losses in a sample. The maximum domain of attraction of the Gumbel distribution consists of the so-called von Mises distribution functions and their tail-equivalent distribution functions. A distribution function F is called a von Mises function if there exists z < x F such that F has the representation F (x) = c exp { x z 1 a(t) dt }, z < x < x F, where c is a positive constant, a( ) is a positive and absolutely continuous function with density a and lim x xf a (x) = 0. Furthermore, two distribution functions F and G are called tail-equivalent if they have the same right endpoint, i.e. x F = x G, and lim x xf F (x)/ Ḡ(x) = c for some constant 0 < c < [Embrechts et al., 1997]. MDA(Λ) contains a large variety of distributions with very different tails ranging from light-tailed distributions (e.g. the exponential or Gaussian distributions) to moderately heavy-tailed distributions (e.g. the log-normal or heavy-tailed Weibull distributions), which makes the Gumbel class interesting for financial applications alongside the Fréchet class [McNeil et al., 2005]. However, the tails of the distributions in the Gumbel class decrease to zero much faster than any power law and thereby the regularly varying power-tailed distributions of the Fréchet class. 17
A non-negative stochastic variable X with distribution function F MDA(Λ) has finite moments of any positive order, i.e. E[X k ] < for every k > 0. Also, the distributions in the Gumbel class can have both finite and infinite right endpoints, x F [Embrechts et al., 1997]. Method for Block Maxima Modeling Based on the theoretical results presented in the previous sections, we are now ready to present the practical and statistical application of the block maxima model. Assume that we have data from an underlying distribution with distribution function F MDA(H ξ ) and these data are iid. We know from the previous sections and Theorem 1, in particular, that the true distribution of the maxima M n can be approximated by a GEV distribution for large n. In practice, we do not know the true distribution of losses and can therefore not determine the norming constants c n and d n ; thus we use the three-parameter specification H ξ,µ,σ where we have replaced c n and d n by σ > 0 and µ [McNeil et al., 2005]. The implementation of the method is relatively straightforward. First, we divide the data into m blocks of size n and collect the maximum value in each block, denoting the block maximum of the jth block by M n (j). This, of course, requires that the data can be divided in some natural way. Assuming that we are dealing with daily return data (or similarly, daily losses), we could e.g. divide the data into monthly, quarterly or yearly blocks. 7 However, to avoid seasonality, it might be preferable to choose yearly periods [Gilli and Kellezi, 2006]. Next, we fit the three-parameter GEV distribution to the m block maximum observations M n (1),..., M n (m). One estimation procedure is the theoretically well-established maximum likelihood (ML) method [Prescott and Walden, 1983, Hosking, 1985], which allows us to give estimates of statistical error for the parameter estimates. However, alternative methods do exist, e.g. Hosking et al. [1985] propose method of probability-weighted moments (PWM) but the theoretical justification for this method is less wellestablished [Embrechts et al., 1997]. Assuming that the block size n is large enough so that the m block maximum observations can be assumed independent, regardless of whether the underlying data is dependent, then the likelihood function based on the data M n (1),..., M n (m) is given by L(ξ, µ, σ; M n (1),..., M n (m) ) = m i=1 h ξ,µ,σ (M (i) n ), where h ξ,µ,σ is the density function of the GEV distribution given in (2.15). 7 We ignore that the exact number of days in each block will differ slightly. 18
By taking logarithms we obtain the log-likelihood function l(ξ, µ, σ; M n (1),..., M n (m) ) = m i=1 = m ln σ ln h ξ,µ,σ (M (i) n ) ( 1 + 1 ) m ln ξ ( m 1 + ξ M n (i) µ σ i=1 i=1 ) 1/ξ ( ) 1 + ξ M n (i) µ σ The maximum likelihood estimators (MLE) of ξ, µ and σ are given by (ˆξ, ˆµ, ˆσ) = arg max ξ,µ,σ (1) l(ξ, µ, σ; M n,..., M n (m) ) (2.20) subject to σ > 0 and 1+ξ(M n (i) µ)/σ > 0, i. That is, ˆξ, ˆµ and ˆσ maximize the log-likelihood function l(ξ, µ, σ; M (1) n,..., M n (m) ) over the appropriate parameter space. In the so-called regular cases maximum likelihood estimation yields consistent, efficient, and asymptotically normal estimators [Heij et al., 2004]. However, the maximum likelihood problem (2.20) poses a non-regular case because the parameter space depends on the values of the data, or put equivalently, the support of the underlying distribution function depends on the unknown parameters. Fortunately, Smith [1985] shows that even in the non-regular case the resulting MLEs are consistent and asymptotically efficient whenever ξ > 1/2. Generalization to Financial Time Series Data In the previous sections we have restricted ourselves to iid series. However, extremal events often tend to occur in clusters caused by local dependence in financial data. If a large value occurs in a financial time series, we can usually observe a cluster of large values over a short period afterwards. In this section we will give the conditions on the stationary process {X i } i N which ensure that its sample maxima M n and the corresponding maxima M n of an iid sequence {X i } i N with common distribution function F exhibit similar limit behavior, i.e. the same type of limiting distribution applies. Leadbetter et al. [1983] show that under two technical conditions the classes of limit laws for the normalized sequences M n and M n are exactly the same. The first condition is a distributional mixing condition under which the stationary series shows only weak long-range dependence. The second condition is an anti-clustering condition under which the stationary series shows no tendency to form clusters of large values (for details of these results, please refer to Leadbetter et al. [1983] or Embrechts et al. [1997]). The two conditions ensure that the stationary sequence {X i } i N has the same asymptotic extremal behavior as an associated iid sequence. 19
Unfortunately, while the first condition is often a tenable assumption for financial time series, the anti-clustering condition is not. Financial time series often exhibit volatility clustering which in turn causes clusters of extremal observations [McNeil, 1998]. The standard measure for describing clustering of extreme values of a process is the so-called extremal index of the process. The extremal index allows one to characterize the relationship between the dependence structure of the data and their extremal behavior. Formally, if we let {X i } i N be a stationary process and θ (0, 1]. Assume that for every τ > 0 there exists a sequence u n such that and lim n F (u n ) = τ, (2.21) n lim P(M n u n ) = e θτ, (2.22) n then the process {X i } i N has extremal index θ. Observe that we have the same equivalence as in (2.19) except for the extremal index introduced in the limit of (2.22). For θ < 1 there is a tendency of extreme values to cluster, while for θ = 1 there is no such tendency. 8 Now, for any sequence of real numbers u n, it can be shown that (2.21), (2.22) and P( M n u n ) exp{ τ} (cf. relation 2.19) are equivalent [Leadbetter, 1983]. From this, we infer P(M n u n ) P θ ( M n u n ) = F nθ (u n ), (2.23) for large enough n. Thus, in the limit, the maximum of n observations from a stationary series with extremal index θ behaves like the maximum of nθ observations from the associated iid series. Consequently, we have the following result for stationary time series. Theorem 3. If {X i } i N is strictly stationary with extremal index θ (0, 1] then lim P{( M n d n )/c n x} = H(x), (2.24) n for a non-degenerate H(x) if and only if with H θ (x) also non-degenerate. lim P{(M n d n )/c n x} = H θ (x), (2.25) n Since the extreme value distribution H is max-stable 9, H θ is of the same type as H, which means there exist constants c > 0 and d R such that 8 Strict white noise processes have extremal index θ = 1 9 A non-degenerate random variable X (and its distribution) is called max-stable if it satisfies max(x 1,..., X n) d = c nx + d n for appropriate constants c n > 0, d n R and every n 2. 20
H θ = H(cx + d). This implies that the limits in (2.24) and (2.25) can be chosen to be identical after a single change of norming constants; raising the distribution function to the power θ only affects location and scaling parameters [Embrechts et al., 1997]. Thus, provided F MDA(H ξ ) for some ξ, the asymptotic distribution of normalized maxima of the stationary series {X i } i N with extremal index θ is also an extreme value distribution with exactly the same shape parameter ξ as in the iid case. However, the dependence in {X i } i N has the effect that the convergence to the GEV distribution happens slower because the effective sample size nθ is smaller than n, which means that we have to choose larger blocks when fitting a GEV distribution to their maxima than in the iid case [McNeil, 1998]. See Embrechts et al. [1997] pp. 418-425 for approaches to estimating the extremal index. Quantiles and Measures of Risk Despite of the fact that the block maxima model is considered less useful than the threshold models discussed in the next section, the method is not without practical relevance and could be used to provide estimates of stress losses [McNeil, 1998, McNeil et al., 2005]. The fitted GEV distribution of the block maxima allows for the determination of the so-called return level, which can be considered as a kind of unconditional quantile estimate for the unknown underlying distribution. Assuming that the maxima in blocks of length n follow the GEV distribution with distribution function H ξ,µ,σ, then the k n-block return level is defined as the (1 1/k)-quantile of H R n,k = q 1 1/k (H) This is the level we expect to be exceeded in one n-block every k n-blocks, on average. That is, assuming a model for annual (252 trading days per year) maxima, the 15-year return level R 252,15 is on average only exceeded in one year out of every 15 years. We shall call the n-block in which the return level is exceeded a return or stress period. Using the parameter estimates of the fitted GEV distribution, we can estimate the return level as ( ˆR n,k = H 1 1 1 ) ( ( = ˆµ + ˆσˆξ ( ln 1 1 )) ) ˆξ 1 (2.26) ˆξ,ˆµ,ˆσ k k The derivation of this expression can be found in Appendix A.1. 2.2.2 Modeling of Extremal Events II: Peaks over Threshold Models The peaks over threshold (POT) models are concerned with modeling exceedances over a certain threshold, referring to this as an extreme event. 21
The block maxima models that we discussed in the previous section are quite wasteful of data due to the trade-off between the size of the blocks and the number of blocks to be constructed from a given dataset. In contrast, POT models are more efficient in their use of the (often limited) data on extreme values as they retain all observations that are extreme in the sense that they exceed some defined threshold. Consequently, these models are generally considered the most useful for practical applications [McNeil et al., 2005]. Also, the block maxima models do not allow for the estimation of popular risk measures such as VaR and ES. In this section we present the theory and statistical aspects of the POT models. Within the class of threshold exceedance models one can further distinguish between two competing analysis approaches. First, we have the semi-parametric models based on upper order statistics such as the Hill estimator [Hill, 1975], the Pickands estimator [Pickands, 1975], and the DEdH or moment estimator [Dekkers et al., 1989]. Embrechts et al. [1997], ch. 4, and Resnick [2007], ch. 4, provide excellent overviews of the theoretical foundation and the statistical properties of these estimators. For empirical applications in a financial context, consult Koedijk et al. [1990], Jansen and De Vries [1991], Lux [1996], Longin [1996], and Danielsson and De Vries [1997]. We refrain from considering these semi-parametric models in this thesis. Secondly, we have the fully parametric models based on the approximation of excess losses by the generalized Pareto distribution. These models provide parametric estimation of the tails and will be the focus of this section. In the following we concentrate on the case of iid data but we note that the results also carry over to dependent processes with extremal index θ = 1, which are processes that show no tendency to cluster. Generalized Pareto Distribution The generalized Pareto distribution (GPD) is the pivotal distribution for modeling the magnitudes of exceedances over a high threshold, i.e. excess amounts [Embrechts et al., 1997, McNeil et al., 2005]. In an EVT context, the GPD is usually expressed as a two parameter distribution. Definition 8 (Generalized Pareto Distribution). The distribution function of the GPD is given by { 1 (1 + ξx/β) 1/ξ, ξ 0, G ξ,β (x) = (2.27) 1 exp { x/β}, ξ = 0, where ξ R, β > 0, and the support is x 0 when ξ 0 and 0 x β/ξ when ξ < 0. ξ is the shape parameter of the distribution and β is an additional scaling parameter. 22
The GPD subsumes a number of specific distributions under its parameterization. When ξ > 0 then G ξ,β is a re-parameterized version of a heavytailed, ordinary Pareto distribution and the kth moment E[X k ] is infinite for k 1/ξ; when ξ = 0 we have a light-tailed, exponential distribution; and, finally, ξ < 0 corresponds to a bounded (i.e. short-tailed), Pareto type II distribution. Moreover, G ξ,β MDA(H ξ ) for any ξ R. Finally, we can extend the distribution family by adding a location parameter µ R, i.e. G ξ,µ,β (x) := G ξ,β (x µ). The support has to be adjusted accordingly. When µ = 0 and β = 1, the representation is known as the standard GPD. Because of its important role in the likelihood functions in the following sections, we take the first derivative of the distribution function to get the density function { β 1 (1 + x/β) 1/ξ 1, ξ 0, g ξ,µ,β (x) = β 1 (2.28) exp { x/β}, ξ = 0, Now, consider a series {X i } i N of iid random variables representing financial losses with common distribution function F MDA(H ξ ) and upper endpoint x F. Let u be a certain threshold and denote by N u = card {i : X i > u, i = 1,..., n} the (random) number exceedances of u by X 1,..., X n. We denote the losses exceeding u by X 1,..., X Nu and the corresponding excess amounts by Y 1,..., Y Nu, where Y j := X j u. The conditional excess distribution of X over threshold u is given by F u (y) = P(X u y X > u) = F (u + y) F (u), (2.29) 1 F (u) for 0 y < x F u. The excess distribution represents the probability that a loss exceeds the threshold u by at most an amount y, given that the loss exceeds the threshold. A famous limit result by Pickands [1975] and Balkema and de Haan [1974], captured in Theorem 4, shows that the GPD is the natural limiting distribution for excesses over a high threshold. Theorem 4 (Pickands [1975], Balkema and de Haan [1974]). For every ξ R and some positive measurable function β( ) lim sup Fu (y) G ξ,β(u) (y) = 0, (2.30) u x F 0 y x F u if and only if F MDA(H ξ ), ξ R. Thus, for any distribution F belonging to the maximum domain of attraction of an extreme value distribution, the excess distribution F u converges 23
(uniformly) to a generalized Pareto distribution (GPD) as the threshold u is raised. In other words, distributions for which the affinely transformed maxima converge to a GEV distribution constitute the set of distributions for which the excess distribution over a threshold converges to a GPD. In addition, the shape parameter ξ of the limiting GPD of excesses is exactly the same as that of the limiting GEV distribution of normalized maxima. Consequently, in the limit, the distribution of excess amounts is generalized Pareto. Hence, utilizing (2.27) and (2.30), the excess distribution function F u may be approximated by the GPD for large enough u F u (y) G ξ,β(u) (y), 0 y < x F u. (2.31) Point Process of Exceedances In the previous section we solely discussed the limit behavior of excess amounts and we found that the distribution of excess amounts can be approximated by a generalized Pareto distribution. In contrast, this section focuses on the occurrence of exceedances in the limit. To this end, we need to clarify a few concepts and results from the theory of point processes. Definition 9 (Point Process). Assume the state space E, i.e. the space where points live, is a complete separable metric space 10 (c.s.m.s.) equipped with a σ-field E of Borel subsets of E generated by open sets, then a point process N( ) on the state space E is a measurable map from the probability space (Ω, F, P) into the Borel-measurable space (M p (E), M p (E)), where M p (E) is the space of all point measures on E and M p (E) is the σ-algebra of Borel subsets of M p (E) generated by open sets. In the following, we downplay the measure-theoretical notation somewhat to avoid disturbing the focus of the text. However, interested readers might want to consult Embrechts et al. [1997] Chapter 5 or Resnick [2007] Chapter 5 for short, accessible introductions to the notion of point processes with an explicit focus on extreme value theory. A more advanced and rigorous treatment can be found in Daley and Vere-Jones [2003]. Roughly, one can think of a point process N( ) on E simply as a random distribution of points W i in state space E. Consider a sequence W 1,..., W n of random variables or vectors taking values in E and define for any set A E n N(A) = card {i : W i A} = ɛ Wi (A), where ɛ Wi i=1 is the Dirac measure for W i E defined by { 1, W i A, ɛ Wi (A) = 0, W i / A. 10 For our purposes, however, it is safe to assume a finite-dimensional Euclidean space [Embrechts et al., 1997]. 24
The point process N(A) counts the number of points W i falling into the subset A of the state space E. One point process closely related to extreme value theory is the Poisson point process. Definition 10 (Poisson Point Process). N( ) is called a Poisson point process on E with mean measure µ, or synonymously, a Poisson random measure PRM(µ) under the following two conditions: 1. For A E and k 0, P(N(A) = k) = {e µ(a) (µ(a)) k k!, µ(a) <, 0, µ(a) =. 2. For any m 1, if A 1,..., A m are mutually disjoint sets of E in E, then N(A 1 ),..., N(A m ) are independent random variables. After this short detour into the realms of point processes, let us return to the issue of modeling the occurrence of exceedances. As in the previous section, we are still considering the series {X i } i N of iid random variables representing financial losses with common distribution function F MDA(H ξ ) and right endpoint x F. If we let u n denote a sequence of real thresholds, then for n N and 1 i n the point process of exceedances N n ( ) with state space E = (0, 1] N n (A) = n ɛ n 1 i(a)i {Xi >u n}, n = 1, 2,... (2.32) i=1 counts the number of exceedances of the threshold u n by the sequence X 1,..., X n with time of occurrence in the set A, where A E. Note that the point process of exceedances is time-normalized, i.e. an observation X i exceeding threshold u n is plotted at n 1 i on (0, 1] and not at i on (0, n]. Also, note that (2.32) is considered an element in a sequence of point processes indexed by n. Recall from (2.19) that the relation P(M n u n ) exp{ τ} holds if and only if ( n ) n F (u n ) = E τ, n, (2.33) i=1 I {Xi >u n} for any τ [0, ). This implies that the sequence of point processes of exceedances N n converges in distribution to a homogeneous Possion process N on E = (0, 1] with intensity τ (see Embrechts et al. [1997] Section 5.3.1 and Theorem 5.3.2., in particular). In fact, letting the sequence of thresholds u n be defined by u n (y) := c n y + d n for some fixed value y and combining relation (2.18) and (2.33), we can write the intensity as τ(y) = ln H ξ (y) = ln H ξ ((u d n )/c n ). 25
Furthermore, replacing the norming constants d n and c n by µ and σ > 0, respectively, it is clear that, in the limit, exceedances of the level x u occur according to a homogeneous Poisson process with intensity τ(x) = ln H ξ,µ,σ (x). We can understand the intensity as expressing the instantaneous risk of a new exceedance of the threshold u at time t. Clearly, this intensity does not depend on time and takes the constant value τ := τ(x). Method for Peaks over Threshold Modeling Based on the theoretical exposition of the asymptotic behavior of threshold exceedances in the previous two sections, we can now formulate the peaks-over-threshold (POT) model for iid data. The model rests on the following two assumptions Assumptions 1. Exceedances occur in time according to a homogeneous Poisson process with constant intensity. 2. Excess amounts are independently and identically distributed according to the generalized Pareto distribution, particularly they are independent of their location in time. Under these assumptions, we can model extreme events as either a marked Poisson point process [Chavez-Demoulin et al., 2005] or a two-dimensional Poisson point process [McNeil et al., 2005]. In the marked Poisson process, the marks represent the excess losses over the threshold u and the times of exceedance constitute the points. However, in this section, we use the bivariate representation which is a Poisson point process on two-dimensional space with points (t, x) representing times and magnitudes of extreme events, i.e. losses exceeding threshold u. Bivariate Poisson point process representation. Let X 1,..., X n be an iid series of random variables representing financial losses. Assuming the POT assumptions are satisfied, then the point process given by N u ( ) = n ɛ n 1 i,x i ( ) (2.34) i=1 is a (non-homogeneous) Poisson process on the two-dimensional state space E = (0, 1] (u, ) with intensity { σ 1 (1 + ξ(x µ)/σ) 1/ξ 1, if 1 + ξ(x µ)/σ > 0, λ(t, x) = (2.35) 0, otherwise, at point (t, x). The Poisson process is non-homogeneous due to the fact that this intensity only depends on the loss magnitude x and not on the exceedance time t. 26
Obtained through backward engineering, this representation secures that the tails are generalized Pareto distributed, i.e. the excess amounts are iid GPD, and that exceedances occur in time according to a homogeneous Poisson process with constant intensity τ. To see this, we first calculate the mean measure of the process (2.34) for any subset Ω = (t, T ) (x, ) of the state space E. We get T µ(ω) = λ(ϕ, ω)dωdϕ = τ(x)dϕ = (T t) ln H ξ,µ,σ (x). (2.36) Ω t See the calculations in Appendix A.2. From this, we see that for x u, the implied one-dimensional point process of exceedances of the level x is a homogeneous Poisson process with intensity τ(x) = ln H ξ,µ,σ (x), i.e. exceedances of the threshold level x follow a Poisson process in time and the instantaneous risk of a incurring a loss exceeding the level x at any point in time t is the constant rate τ := τ(x). Moreover, calculating the tail of the excess distribution over threshold u as the ratio of the intensities of exceeding the loss levels u + y and u at any point in time t, we obtain F u = τ(u + y) τ(u) = ( 1 + ) ξx 1/ξ, (2.37) σ + ξ(u + µ) see Appendix A.3. Defining β := β(u) = σ + ξ(u µ) > 0, we see that (2.37) is equal to the tail of the generalized Pareto distribution with shape parameter ξ and scale parameter β. That is, we have F u (y) = Ḡξ,β(y) for 0 y < x F u, which is exactly the limiting result we obtained for the excess distribution over a high threshold u in (2.31). Fitting the bivariate Poisson point process. Given a sample of losses X 1,..., X n from a loss distribution F MDA(H ξ ), a random number N u > 0 will exceed the threshold u. Remember, we denote the losses exceeding u by X 1,..., X Nu and the corresponding excess amounts of the losses by Y 1,..., Y Nu. The likelihood function for the entire point process can be written as L(ξ, µ, σ; X 1,..., X N u Nu ) = exp( τ(u)) λ( X j ), (2.38) given the exceedance data X 1,..., X Nu [McNeil et al., 2005]. The derivation of likelihoods of point processes is somewhat tedious and beyond the scope of this thesis, but tractable likelihood functions are obtainable using Janossy densities. We refer to Daley and Vere-Jones [2003], Chapter 7, Sections 7.1-7.3, for the necessary theoretical concepts and tools. The maximum likelihood estimators of ξ, µ, and σ are j=1 (ˆξ, ˆµ, ˆσ) = arg max ξ,µ,σ ln L(ξ, µ, σ; X 1,..., X Nu ). (2.39) 27
However, if we define β := β(u) = σ+ξ(u µ) and τ := τ(u) = ln H ξ,µ,σ (u), which are the scaling parameter of the GPD for the excess loss amounts and the intensity of the one-dimensional homogeneous Poisson process of exceedances, respectively, we can re-parameterize the intensity (2.35) of the two-dimensional Poisson point process as λ(x) := λ(t, x) = τ β ( 1 + ξ x u ) 1/ξ 1, (2.40) β where ξ R, τ > 0, and β > 0. The derivation of this equation can be found in Appendix A.4. Substituting for λ in (2.38) and taking logarithms, we can write the log-likelihood function as l(ξ, µ, σ; X 1,..., X Nu ) = ln L(ξ, µ, σ; X 1,..., X Nu ) ( Nu = ln exp( τ) τ 1 + ξ X ) 1/ξ 1 j u β β j=1 = τ + N u ln τ ( N u ln β 1 + 1 ξ ( ln 1 + ξ X ) j u β ) Nu j=1 This parameterization allows the log-likelihood to be written as l(ξ, µ, σ; X 1,..., X Nu ) = l 1 (τ; N u ) + l 2 (ξ, β; Y 1,..., Y Nu ), (2.41) where l 1 is the log-likelihood of a one-dimensional homogeneous Poisson point process with intensity τ and l 2 is the log-likelihood for fitting the GPD, which we will see shortly. Thus, inference can be performed separately for the frequency of exceedances and their sizes (excess amounts). In other words, estimates of ξ and β can be obtained by fitting a GPD and an estimate of τ can be obtained by fitting a one-dimensional Poisson process 11. Then these estimates can be used to infer estimates of µ and σ [McNeil et al., 2005]. Fitting the GPD. If our main interest is modeling the distribution of excess losses over a high threshold u, it is clear from (2.41) that we can just fit a GPD directly to the excess loss data Y 1,..., Y Nu. As was the case when we fitted the GEV distribution, we have several statistical methods at our disposal for estimating the parameters of the GPD distribution fitted to the excess loss data. Among the more pronounced methods are maximum likelihood (ML) and probability-weighted moments (PWM). 11 The ML estimator of τ is N u, which is obtained from the first order condition l 1 1 + τ 1 N u! = 0 τ = 28
If we take Assumption 2 to hold exactly, i.e. the excess amounts are exactly generalized Pareto distributed, Smith [1987] has shown that the usual maximum likelihood estimation properties such as consistency and asymptotic normality hold for the MLEs ˆξ and ˆβ as N u, provided ξ > 1/2. Under the more realistic assumption that Assumption 2 only holds approximately, i.e. the excess are approximately generalized Pareto distributed, asymptotic normality of ˆξ and ˆβ still holds. Furthermore, it can be shown that ˆξ and ˆβ are asymptotically unbiased when we let u = u n x F and N u as the total sample size n, provided that u x F sufficiently fast [Smith, 1987, McNeil and Frey, 2000]. The speed of which u should approach x F depends on the rate of convergence of F u towards G ξ,β(u) (see (2.30)), i.e. how fast d(u) := sup x [0,xF u) F u (x) G ξ,β(u) (x) converges to 0 for u x F. 12 Thus, for practical purposes, the choice of threshold u is a trade-off between (a) choosing u sufficiently high so that (2.31) can be taken to be essentially exact (i.e. reduce the risk of bias), and (b) choosing u to be low enough so that we have sufficient observations in the tail to obtain reliable parameter estimates (i.e. controlling the variance of the parameter estimates). Under the assumption of independence, the likelihood function is the product of marginal GPD densities N u L(ξ, β; Y 1,..., Y Nu ) = g ξ,β (Y j ), where g ξ,β is the density function given in (2.28). The corresponding loglikelihood function is then given by l(ξ, β; Y 1,..., Y Nu ) = N u j=1 ln g ξ,β (Y j ) = N u ln β j=1 ( 1 + 1 ) Nu ( ln 1 + ξ Y ) j (2.42) ξ β The MLEs ˆξ and ˆβ are obtained by maximizing (2.42) subject to the parameter constraints that β > 0 and 1 + ξy j /β > 0, j. In the case ξ 0, Hosking and Wallis [1987] show that PWM provides a viable alternative to ML. However, the ML approach allows us to fit more complicated models in which time dependence of the parameters and other effects may be present [Embrechts et al., 1997]. 12 If the true underlying distribution F is known, which is not the case in practice, Raoult and Worms [2003] provide the necessary theoretical results to determine the rate of convergence, see their Theorem 1 and Corollary 1, in particular. j=1 29
Generalization to Financial Time Series Data The theory outlined so far assumes that the underlying series {X i } i N representing financial losses is independent. As discussed in Section 2.1.1, most financial return series exhibit temporal dependence; volatility bursts cause extremes to cluster together. Thus, in contrast to iid data, threshold exceedances for daily return series do not necessarily follow a homogeneous Poisson process in time [Mc- Neil et al., 2005]. If we just apply the threshold model to the raw return data, we can estimate unconditional risk measures but the asymptotics of these estimators are not well-known for dependent data. Research has offered various attempts to deal with the issue of dependent data. Leadbetter [1991] shows that for stationary processes with extremal index θ < 1 the extremal clusters themselves occur according to a homogeneous Poisson process in time under weak mixing conditions. Thus, one can apply the POT model to cluster maxima only. This approach is known as declustering. Unfortunately, this leads to difficulties of identifying clusters of exceedances in the return data; there is no clear way to identify when a cluster begins and ends [Chavez-Demoulin et al., 2005, McNeil et al., 2005]. However, there do exist two more tenable approaches. The first approach is to modify the POT model by incorporating a self-exciting process structure, while the latter approach suggests pre-whitening the raw data using some kind of volatility model before applying the EVT methods suggested in the previous sections. Both approaches make it possible to calculate conditional risk measures. POT with self-exciting structure. This approach to deal with the problem of dependent data has recently been proposed by Chavez-Demoulin et al. [2005] and further extended in McNeil et al. [2005]. They suggest modifying the standard POT model based on the homogeneous Poisson assumption by incorporating a model for the dependence of frequencies and sizes of exceedances of high thresholds. Specifically, they consider self-exciting processes also known as Hawkes processes (see the original papers by Hawkes [1971] and Hawkes and Oakes [1974]). In these models, recent threshold exceedances cause the instantaneous risk of a new exceedance of the threshold at the present point in time to be higher. Technically speaking, each previous exceedance contributes to the conditional intensity of the process and the amount with which it contributes depends on the time elapsed since that exceedance occurred and the size of the excess loss over the threshold. Furthermore, McNeil et al. [2005] generalize the standard POT model by constructing two marked self-exciting process models; one with unpredictable marks (i.e. excess losses are iid GPD) and one with predictable marks (i.e. losses are conditionally GPD) given the the history of exceedances up to the present exceedance and with scaling parameter dependent on the exceedance history. Thus, in the latter model, both the temporal intensity of occurrence and the size of exceedances increase in times of high market excitement. 30
A major advantage of incorporating a self-exciting structure in the POT model is that the model can be fitted directly to (dependent) raw return data in a single step. McNeil et al. [2005] show that both self-exciting models perform better than the standard POT model, but the model with predictable marks performs the best. However, little empirical research has so far been done on these models besides the original contributions of Chavez-Demoulin et al. [2005] and McNeil et al. [2005]. Thus, empirical evidence of the performance of the models applied to financial data remains sparse. Also, to the authors knowledge, no generalization to a setting with multivariate risk factors is available at present. POT based on pre-whitened data. This approach was originally proposed by McNeil and Frey [2000] but has since then been the preferred approach to deal with dependent data in a number of studies in EVT, see e.g. Wang et al. [2010], Ghorbel and Trabelsi [2009], Maghyereh and Al-Zoubi [2008], Byström [2005], Wagner and Marsh [2005], and Lauridsen [2000]. We will thoroughly explain the technicalities of the approach in Section 3.2.1. It is a two-stage procedure where we first fit a GARCH-type model 13 to the raw data using the quasi-maximum likelihood (QML) estimation procedure. Bollerslev and Wooldridge [1992] demonstrate that even when normality is inappropriately assumed, the QML estimators (QMLE) are consistent and asymptotically normally distributed, provided the dynamics are correctly modeled. In the second stage, we perform EVT analysis on the standardized residuals, which are in principle iid, using the methods discussed in the previous sections. One crucial advantage of this approach is its ease of implementation in practice. Also, it is straightforward to use the approach in a setting with multivariate return series; we can simply fit univariate GARCH-type models to the single return series and then use the pre-whitened residual series in multivariate EVT analysis afterwards [Ghorbel and Trabelsi, 2009]. One drawback of the approach is, however, that it is a two-stage procedure. The results of the EVT analysis in the second stage will be sensitive to the fitting of the GARCH-type model to the raw data series in the first stage. In other words, the error from the time series modeling in the first stage propagates through to the GPD fitting in the second stage, making the overall error hard to quantify. Quantiles and Measures of Risk By setting x = u + y, we can rewrite (2.29) which gives us the following expression for the tail of F F (x) = F (u) F u (x u), 13 Alternatively, one might use EWMA volatility forecasting. 31
for losses x > u. By estimating F u (x u) and F (u) separately, we can construct an estimator of this tail for losses larger than u. We estimate the first term on the right hand side, F (u), by its empirical estimator, which is given by n ˆ F (x) = n 1 I {Xi >u} = n 1 N u, x > u. i=1 Note that we here assume that there are enough observations above u (i.e. N u is sufficiently large) to reliably estimate F (u). For the second term on the right hand side, F u (x u), Theorem 4 motivates the following estimator ˆ F u (x u) = Ḡˆξ, ˆβ(x u), x > u, where ˆξ = ˆξ Nu and ˆβ = ˆβ Nu are the maximum likelihood estimators. The resulting tail estimator is ˆ F = N u n ( 1 + ˆξ x u ) 1/ˆξ, (2.43) ˆβ for x > u. Inverting (2.43), which is done in Appendix A.5, we obtain the following estimator of Value at Risk (VaR α ) for a given probability α F (u) defined as the α-quantile of F, cf. Definition 3. VaR α = q α (F ) = u + ˆβ ˆξ ( ( ) n ˆξ (1 α) 1 N u ). (2.44) From the Definition 5 and (2.44), we obtain the following estimator of Expected Shortfall (ES α ) for α F (u) ÊS α = 1 1 q ϕ (F )dϕ = VaR α 1 α α 1 ˆξ + ˆβ ˆξu, (2.45) 1 ˆξ assuming that ξ < 1. For the derivation of this expression, see Appendix A.6. 2.3 Copula Theory and Multivariate Extremes The theory of extremes discussed so far has been univariate in the sense that it only deals with one risk factor. In practice, however, we usually have to account for several risk factors. So, in order to work with multivariate risk factors, we turn our attention to copula theory in this section. Copula theory offers a very attractive solution to the problem of modeling the joint distribution of multivariate risk factors by separating it into two subproblems; we model the marginal distributions and the dependence structure separately. 32
In Section 2.3.1 we outline some basic results in copula theory and discuss three specific copulas; the Gaussian, t, and Gumbel. In Section 2.3.2 we discuss some copula results in multivariate extreme value theory (MEVT), but we stress already now that MEVT is still very much a work in progress and there exists no coherent theory like that in the univariate case. 2.3.1 Copulas and Dependence Modeling Let X = (X 1,..., X d ) be a random vector of risk factors with joint distribution function F (x 1,..., x d ) = P(X 1 x 1,..., X d x d ) (2.46) and marginal distribution functions F i (x i ) = P(X i x i ), 1 i d. (2.47) The Concept of Copulas A d-dimensional copula C is a d-dimensional distribution function defined on the unit cube [0, 1] d with standard uniform marginal distributions. Formally, we have Definition 11 (d-dimensional copula). A d-dimensional copula is any function C : [0, 1] d [0, 1], which satisfies the following three properties 1. C(u 1,..., u d ) is increasing in each component u i. 2. C(1,..., 1, u i, 1,..., 1) = u i, i {1,..., d}, u i [0, 1]. 3. For all (a 1,..., a d ), (b 1,..., b d ) [0, 1] d with a i b i, we have 2 i 1 =1 2 ( 1) i 1+ +i d C(u 1i1,..., u did ) 0, i d =1 where u j1 = a j and u j2 = b j, j {1,..., d}. As mentioned, it is possible to separate the joint distribution function F into (1) a part which describes the marginal behavior and (2) a part which describes the dependence structure, using copula theory. The latter can be represented by a copula as formalized in the following theorem known as Sklar s Theorem. Theorem 5 (Sklar [1959]). Let F be a d-dimensional distribution function with margins F 1,..., F d. Then there exists a copula C : [0, 1] d [0, 1] such that F (x 1,..., x d ) = C(F 1 (x 1 ),..., F d (x d )), (2.48) for any x = (x 1,..., x d ) in R d, where R = R {± } denotes the extended real line. If F 1,..., F d are all continuous, then C is unique; otherwise, C 33
is uniquely determined on Range(F 1 ) Range(F d ). Conversely, for a copula C and continuous margins F 1,..., F d, the function F defined in (2.48) is a joint, d-dimensional distribution function with margins F 1,..., F d. For a proof, see Nelsen [1999]. Sklar s Theorem is perhaps the most important result in copula theory. It shows that all multivariate distribution functions have a copula representing the dependence structure and that multivariate distribution functions can be constructed by coupling together marginal distributions with copulas. In addition, we are only dealing with continuous and strictly increasing marginal distributions F i in this thesis, such as e.g. the GPD, so we know from Theorem 5 that the joint distribution function F has a unique copula. Consequently, for our purposes, C will always be unique. Remember from Definition 4 that the generalized inverse of the (univariate) marginal distribution F i is defined as Fi (y) = inf {x R F i (x) y} for all y in [0, 1]. Given that F i is strictly increasing and continuous, we have Fi = Fi 1, where Fi 1 denotes the ordinary inverse of F i. Using this and Theorem 5, we obtain the following explicit representation of copula C. C(u) := C(u 1,..., u d ) = F (F 1 1 (u 1 ),..., F 1 1 (u d )), (2.49) for any u = (u 1,..., u d ) in [0, 1] d. The copula C can be thought of as the distribution function of the component-wise probability-integral transformed random vector (F 1 (X 1 ),..., F d (X d )). 14 Furthermore, a very useful property of the copula of F is that it remains invariant under standardization of the marginal distributions. In fact, it remains invariant under any strictly increasing transformations of the components of the random vector X [McNeil et al., 2005, Demarta and McNeil, 2005]. As a result, if the random vector X has copula C, then (h 1 (X 1 ),..., h d (X d )) also has copula C for any strictly increasing functions h 1,..., h d. Finally, the parametric copulas that we consider have densities given by c(u 1,..., u d ) = d C(u 1,..., u d ) u 1 u d, (2.50) which we have to calculate when fitting copulas using the method of maximum likelihood. Coefficients of Tail Dependence As should be apparent by now, this thesis concerns the statistical modeling of extremal events. Thus, when 14 Let F be a univariate distribution function. 1. If U U(0, 1), then F (U) F. 2. If Y F and F is continuous, then F (Y ) U(0, 1). 34
it comes to the use of copulas, we are most interested in their ability to capture extremal dependence. To evaluate this we will use the coefficients of tail dependence which provide asymptotic measures of the dependence in the tails of the bivariate distribution of (X 1, X 2 ), where X 1 and X 2 are random variables with continuous distribution functions F 1 and F 2. The coefficient of upper tail dependence of X 1 and X 2 is λ u := λ u (X 1, X 2 ) = lim q 1 P(X 2 > F 1 2 (q) X 1 > F 1 1 (q)), (2.51) provided a limit λ u [0, 1] exists. dependence of X 1 and X 2 is Similarly, the coefficient of lower tail λ l := λ l (X 1, X 2 ) = lim q 0 + P(X 2 F 1 2 (q) X 1 F 1 1 (q)), (2.52) provided a limit λ l [0, 1] exists. Thus, these dependence measures are the limiting probabilities that both margins exceed the q-quantile given that one margin does. In contrast to linear correlation, these are a copula-based dependence measures in the sense that they are functions of the copula C of (X 1, X 2 ) only. Using (2.48) and conditional probability, one can derive the copulabased expressions of (2.51) and (2.52) given in Joe [1997] λ u = lim q 0 + Ĉ(q, q) q and C(q, q) λ l = lim, (2.53) q 0 + q where Ĉ(u, u) = 1 2u + C(u, u) is the survival copula15 of C. If these coefficients are strictly greater than zero, this indicates a tendency of the copula to generate joint extreme events. For instance, if λ u (0, 1], we have tail dependence in the upper tail, and if λ u = 0, we have asymptotic independence in the upper tail. For a radially symmetric 16 copula C, we have that C = Ĉ. As a result, the two measures λ u and λ l coincide and we write λ := λ u = λ l. In the following sections we will describe two distinct families of copulas and their basic properties. The Family of Elliptical Copulas Elliptical Copulas are copulas extracted from elliptical multivariate distributions using Sklar s Theorem. Here we concentrate on the so-called Gaussian copula and t copula, which can be thought of as representing the dependence structure implicit in the multivariate normal distribution and the multivariate t distribution, respectively. 15 In general, the survival copula Ĉ of a copula C is the distribution function of 1 U, when U has distribution function C. U is a random vector of uniform variates. 16 A random vector X is radially symmetric about a if X a d = a X. 35
The Gaussian copula and its properties. Assume that the d-dimensional random vector X = (X 1,..., X d ) has a multivariate normal distribution with mean vector µ and positive-definite covariance matrix Σ, i.e. X N d (µ, Σ). In this case, the so-called Gaussian copula may be extracted from (2.48) by evaluating (2.49). Remember that copulas are invariant under series of strictly increasing transformations of the margins. So, the copula of the N d (µ, Σ)-distribution is identical to that of a N d (0, P)-distribution, where P is the correlation matrix implied by Σ. Thus, the (unique) Gaussian copula is given by C Ga P (u) = Φ P (Φ 1 (u 1 ),..., Φ 1 (u d )) (2.54) Φ 1 (u 1 ) Φ 1 (u d ) 1 = { (2π) d P exp 1 } 2 x P 1 x dx, where Φ 1 denotes the quantile of the standard univariate normal distribution function. We calculate the corresponding copula density using the relation in (2.50). This gives us c Ga P (u) = 1 P exp { 1 } 2 ψ (P 1 I)ψ, (2.55) where ψ = (Φ 1 (u 1 ),..., Φ 1 (u d )). See derivation of the density in Appendix A.7. Copulas of elliptical distributions like the normal distribution (and the t distribution which we consider in the following) are radially symmetric. Hence, the coefficients of upper tail dependence and lower tail dependence are equal. Consider a bivariate Gaussian copula, the coefficient of tail dependence is zero ( λ = 2 lim Φ x 1 ρ/ ) 1 + ρ = 0, x provided that ρ < 1, where ρ is the off-diagonal element of P (see a sketchy proof in Embrechts et al. [2003]). Consequently, the Gaussian copula has no tail dependence. In other words, if we go far enough out in the tail, extreme events occur independently in the tails. This result extends to the d-dimensional case, but here we talk about pairwise tail independence of the Gaussian copula since the coefficients of tail dependence are bivariate concepts. The t copula and its properties. Now, we assume that the d-dimensional random vector X = (X 1,..., X d ) has a multivariate t distribution with ν degrees of freedom, mean vector µ and positive-definite dispersion matrix Σ, i.e. X t d (ν, µ, Σ). The t copula may be extracted from (2.48) by evaluating (2.49). Again, remembering the invariance property of copulas, we have that the copula of 36
the t d (ν, µ, Σ)-distribution is identical to that of a t d (ν, 0, P)-distribution, where P is the correlation matrix implied by the dispersion matrix Σ. Thus, the (unique) t copula is given by C t ν,p(u) = t ν,p (t 1 ν = t 1 ν (u 1 ) (u 1 ),..., t 1 ν (u d )) (2.56) t 1 ν (u d ) Γ ( ) ν+d ) ν+d 2 2 Γ ( ν 2 ) (πν) d P ( 1 + x P 1 x ν where t 1 ν denotes the quantile function of the standard univariate t ν distribution. Using (2.50) we can calculate the density of the t copula c t ν,p(u) = Γ ( ) ( ( ν+d 2 Γ ν ) ) d ( 2 1 + 1 ( P Γ ν ) Γ ( ) ν ψ P 1 ψ ) ν+d 2 ν+1 ( ) 2 2 ν+1, (2.57) d j=1 1 + ψ2 2 j ν where ψ = (t 1 ν (u 1 ),..., t 1 ν (u d )) and ψ j = t 1 ν (u j ). See derivation of the density in Appendix A.8. Again, let us consider the coefficient of tail dependence of the bivariate t copula. Noting that the t copula is radially symmetric, it suffices to calculate λ, which is given by λ = 2t ν+1 ( ν + 1 1 ρ/ ) 1 + ρ, (2.58) provided that ρ > 1, where ρ is the off-diagonal element of P (see proof in Demarta and McNeil [2005]). Thus, the t copula exhibits asymptotic dependence in both the upper and lower tail. Notice that the coefficient is increasing in ρ and decreasing in the degrees of freedom ν. In fact, even under zero or negative correlation, the t copula shows dependence in the tails. This extends to the d-dimensional case, where we talk about pairwise tail dependence of the t copula. The Family of Archimedean Copulas Elliptical copulas have the advantage of being easy to simulate from. However, as we saw, elliptical copulas do not have closed-form expressions and also they are restricted to radial symmetry. A copula class where the copulas do have explicit, simple closedform expressions is the family of Archimedean copulas. Also, this class allows for a great variety of different dependence structures, not restricted by radial symmetry. As mentioned, one of the stylized facts of multivariate financial return series is that there seems to be stronger dependence between large losses than between large gains. Such dependence asymmetries cannot be modeled with the elliptical copulas we have considered so far [Embrechts et al., 2003]. dx, 37
We start out with the case of bivariate Archimedean copulas and then show how these can be extended to higher dimensions under certain conditions. Specifically, we focus on the Gumbel copula which allows for asymmetry in the dependence structure. First, however, we need a couple of definitions. Definition 12 (Copula generator). A continuous, strictly decreasing, convex function ϕ : [0, 1] [0, ] which satisfies ϕ(1) = 0 is known as an Archimedean copula generator. If ϕ(0) =, ϕ is a strict generator. Definition 13 (Pseudo-inverse generator). The pseudo-inverse of ϕ is the function ϕ [ 1] : [0, ] [0, 1] given by { ϕ [ 1] ϕ 1, 0 t ϕ(0), (t) = 0, ϕ(0) t. Theorem 6 (Bivariate Archimedean copula). Let ϕ be a continuous, strictly decreasing function from [0, 1] to [0, ] such that ϕ(1) = 0, and let ϕ [ 1] be the pseudo-inverse of ϕ. Then the function C : [0, 1] 2 [0, 1] given by C(u 1, u 2 ) = ϕ [ 1] (ϕ(u 1 ) + ϕ(u 2 )) (2.59) is a copula if and only if ϕ is a convex function, i.e. ϕ is the copula generator function in Definition 12. A proof is given in Nelsen [1999]. Copulas of the form (2.59) are called Archimedean copulas. If the generator function is strict, i.e. ϕ(0) =, we have ϕ [ 1] = ϕ 1 and the copula constructed according to (2.59) is called a strict Archimedean copula. Furthermore, we have that an Archimedean copula C with generator ϕ is symmetric 17 and associative 18. A proof of these properties can be found in Embrechts et al. [2003]. The Gumbel copula and its properties. If we let ϕ(t) = ( ln t) θ, where θ 1, then the generator is clearly continuous and ϕ(1) = 0. Moreover, ϕ (t) = θt 1 ( ln t) θ 1 and ϕ (t) = θt 2 ( (1 θ)( ln t) θ 2 + ( ln t) θ 1) 0, so ϕ is a strictly decreasing, convex function from [0, 1] to [0, ]. Also, ϕ is a strict generator as ϕ(0) =. From (2.59) we get the following closed-form expression for a bivariate Gumbel copula Cθ Gu (u 1, u 2 ) = ϕ 1 (ϕ(u 1 ) + ϕ(u 2 )) = exp { [ ( ln(u 1 ) θ + ( ln u 2 ) θ] } 1/θ. Before we discuss how we can construct a multivariate Gumbel copula, let us look at its dependence structure in the tails. Generally, Joe [1997] 17 C(u 1, u 2) = C(u 2, u 1) for all u 1, u 2 in [0, 1]. 18 C(C(u 1, u 2), u 3) = C(u 1, C(u 2, u 3)) for all u 1, u 2, u 3 in [0, 1]. 38
shows that a bivariate Archimedean copula C has upper tail dependence provided that ϕ 1 (0) = and the coefficient of upper tail dependence is λ u = 2 2 lim s 0 (ϕ 1 (2s)/ϕ 1 (s)). Similarly, the coefficient of lower tail dependence is equal to λ l = 2 lim s (ϕ 1 (2s)/ϕ 1 (s)). For the bivariate Gumbel copula 19, we have λ u = 2 2 lim s 0 (ϕ 1 (2s)/ϕ 1 (s)) = 2 2 1/θ lim s 0 { } exp( (2s) 1/θ ) exp( s 1/θ = 2 2 1/θ. ) Correspondingly, we can calculate the coefficient of lower tail dependence which is zero. Hence, the Gumbel copula causes the distribution to have asymptotic dependence in the upper tail while having asymptotic independence in the lower tail. Now, let us return to the issue of generalizing the bivariate Gumbel copula to dimensions higher than two. In the strict case, Theorem 7 gives necessary and sufficient conditions for the following generalization of (2.59) C d (u) = ϕ 1 (ϕ(u 1 ) + + ϕ(u d )) (2.60) to be a d-dimensional Archimedean copula. Theorem 7 (Kimberling [1974]). Let ϕ denote the generator given in Definition 12 and let ϕ 1 denote its inverse. If C d is the function from [0, 1] d to [0, 1] given by (2.60), then C d is a d-dimensional copula for all d 2 if and only if ϕ 1 is completely monotone 20 on [0, ). For the Gumbel copula we have the strict generator ϕ(t) = ( ln t) θ, so ϕ 1 (t) = exp( t 1/θ ). Since e x is completely monotone and t 1/θ is a positive function with a completely monotonic derivative, ϕ 1 is completely monotone; thus we can generalize the bivariate Gumbel copula to higher dimensions from (2.60). We obtain the following expression for a d-dimensional Gumbel copula { [ Cθ Gu (u) = exp ( ln(u 1 ) θ + ( ln u 2 ) θ + + ( ln u d ) θ] } 1/θ. (2.61) for θ 1 and d 2. The d-dimensional Gumbel copula exhibits pairwise upper tail dependence, but lower tail independence. We will later return to the issue of fitting this copula using maximum likelihood, so we need the 19 The bivariate Gumbel copula has the strict generator ϕ(t) = ( ln t) θ, hence ϕ 1 (s) = exp( s 1/θ ) and ϕ 1 (s) = s 1/θ 1 exp( s 1/θ )/θ, where θ 1. 20 A function f(x) is completely monotone on the interval I if it satisfies for all x in the interior of I and k N. ( 1) k d k dx k f(x) 0 39
corresponding copula density. In this thesis, we are specifically interested in fitting the trivariate Gumbel copula. Using (2.50) we obtain the following density for the Gumbel copula in three dimensions c Gu θ (u 1, u 2, u 3 ) = u 1 1 ( ln u 1) θ 1 u 1 2 ( ln u 2) θ 1 u 1 3 ( ln u 3) θ 1 { exp [( ln u 1 ) θ + ( ln u 2 ) θ + ( ln u 3 ) θ] } 1/θ [( ln u 1 ) θ + ( ln u 2 ) θ + ( ln u 3 ) θ] 1/θ 3 ( (θ 1)(2θ 1) + (2.62) [( ln u 1 ) θ + ( ln u 2 ) θ + ( ln u 3 ) θ] 2/θ + 3(θ 1) [( ln u 1 ) θ + ( ln u 2 ) θ + ( ln u 3 ) θ] ) 1/θ See derivation in Appendix A.9. We could also describe the d-dimensional Archimedean copula of the form (2.60) in terms of Laplace-Stieltjes transformations of distribution functions on R +. Let G be a distribution function on R + satisfying G(0) = 0 with Laplace-Stieltjes transform Ĝ(t) = 0 e tx dg(x), t 0, (2.63) then (2.60) is an Archimedean copula with generator ϕ equal to the inverse of Ĝ, i.e. ϕ = Ĝ 1. This result is of importance when we want to simulate from Archimedean copulas and we will return to this issue in Section 3.2.2. Finally, note that in our discussion of both Elliptical and Archimedean copulas we have restricted our attention to exchangeable copulas, i.e. copulas that are distribution functions of an exchangeable random vector of uniform variates U. 21 Embrechts et al. [2003] discuss how multivariate, nonexchangeable Archimedean copulas may be obtained. 2.3.2 Copula Results in Multivariate Extreme Value Theory In this section we present and discuss some copula results in multivariate extreme value theory (MEVT). It should be stressed that, compared to univariate EVT, the theory of multivariate extremes is less developed and does not constitute well-established theory. Thus, the intention of this section is to give the reader a brief overview of some central results in MEVT based on copula theory. 21 For an exchangeable copula C, we have C(u 1,..., u d ) = C(u Π(1),..., u Π(d) ) for all permutations (Π(1),..., Π(d)) of (1,..., d). 40
Limit Copulas for Multivariate Maxima We first investigate the case of the limiting dependence structure of multivariate, component-wise maxima. Extreme Value Copulas. Let X 1,..., X n be multivariate iid random vectors in R d representing financial losses on d risk factors. Also, let F be the joint distribution function of the losses on the risk factors and F 1,..., F d the marginal distribution functions. We define M n = (M n,1,..., M n,d ) to be the component-wise block maxima, where the jth component of M n is the maximum of the n loss observations of the jth risk factor, i.e. M n,j = n i=1 X i,j, j = 1,..., d. We stress that all relations and operations should be taken as component-wise from now on. The affinely transformed maxima is given by ( M n d n Mn,1 d n,1 =,, M ) n,d d n,d, (2.64) c n c n,1 c n,d where d n = (d n,1,..., d n,d ) and c n = (c n,1,..., c n,d ) are vectors of norming constants, satisfying c n > 0, d n R d. As in the univariate case, we are interested in the limiting distribution of (2.64) as n. Definition 14 (Multivariate Extreme Value Distribution). If there exists sequences of vectors c n > 0 and d n R d, such that ( ) lim P Mn d n x = lim F n (c n x + d n ) = H(x) (2.65) n n c n for some joint distribution function F and some non-degenerate distribution function H, then H is said to be a multivariate extreme value (MEV) distribution and F is said to be in the maximum domain of attraction of H. In accordance with Galambos [1987], Lemma 5.2.1., convergence in distribution of (M n d n )/c n to a random vector with joint distribution function H implies (M n,j d n,j )/c n,j d Hj for each j = 1,..., d. Thus, if (2.65) holds and the margins H 1,..., H d of H are non-degenerate, then the Fisher- Tippett Theorem (see Theorem 1) tells us that these univariate marginal distributions must be extreme value distributions of either the Fréchet, Gumbel or Weibull class. Furthermore, since the three types of extreme value distributions are all continuous, we know from Sklar s Theorem (see Theorem 5) that the joint distribution function H has a unique copula C 0. This unique copula of H must satisfy the scaling property as is shown in Joe [1997]. C 0 (u t ) = C t 0(u), t > 0, (2.66) 41
Definition 15 (Extreme value copula). Any copula satisfying property (2.66) can be the copula of the limiting distribution H, i.e. a MEV distribution, and is a so-called extreme value (EV) copula 22. Consequently, any distribution function H with an extreme value copula and univariate margins H j of the type H ξ for some ξ R, given by H(x) = C 0 (H 1 (x 1 ),..., H d (x d )), is a multivariate extreme value distribution. Of the copulas that we have considered so far, the Gumbel copula provides an example of an EV copula family. It can easily be shown that the d-variate Gumbel copula given in (2.61) satisfies property (2.66). For d 2, we have { C0(u t 1,..., u d ) = exp t [( ln u 1 ) θ + + ( ln u d ) θ] } 1/θ { = exp [( t ln u 1 ) θ + + ( t ln u d ) θ] } 1/θ { = exp [( } ln u t 1) θ + + ( ln u t d )θ] 1/θ = C 0 (u t 1,..., u t d ). Besides the Gumbel copula, we have various other EV copulas to our disposal, for instance the Galambos copula [Galambos, 1975] and the EV-t copula [Demarta and McNeil, 2005], to name a few, and a number of other possibilities given in the literature. However, the functional form of these copulas is essentially quite similar, so for practical purposes it makes sense to work with the Gumbel or Galambos copulas which have fairly simple expressions permitting easy calculation of the copula density to be used in maximum likelihood estimation [McNeil et al., 2005]. Copula Domain of Attraction. Writing the underlying joint distribution function in copula form, we have F (x) = C(F 1 (x 1 ),..., F d (x d )). If the convergence in distribution happens as in (2.65), then the margins of F determine the margins of the limiting distribution function H, but they have 22 Various characterizations of EV copulas exist, see e.g. Pickands [1981] or Joe [1997]. In particular, a bivariate copula is an EV copula if and only if it takes the form { ( )} ln u1 C 0(u 1, u 2) = exp ln(u 1u 2)A, ln u 1u 2 for some function A : [0, 1] [0, 1] defined as A(w) = 1 max((1 x)w, x(1 w))dh(x), 0 where H is a finite measure on [0, 1]. A is known as the dependence function and must be differentiable, convex and satisfy max(w, 1 w) A(w) 1 for 0 w 1 [McNeil et al., 2005]. 42
no influence on the limit copula C 0 of H. C 0 is solely determined by the dependence structure of F, i.e. the copula C. This is formalized in the following theorem. Theorem 8. Let F be a multivariate distribution function with continuous marginal distribution functions F 1,..., F d and some copula C. Let H be a multivariate extreme value distribution with EV copula C 0. Then F MDA(H) if and only if 1. F j MDA(H j ), 1 j d, 2. lim t C t (u 1/t ) = C 0 (u), u [0, 1] d. For a proof, see Galambos [1987] The fact that only the copula C of F is relevant to the limiting copula C 0 motivates the introduction of some kind of characterization of which underlying copulas C are attracted to which EV copulas C 0. Thus, we use the following workable definition of the so-called copula domain of attraction Definition 16 (Copula Domain of Attraction). Let C be a copula and C 0 an extreme value copula. Then we say C is in the domain of attraction of C 0 if 1 C(1 sx 1,..., 1 sx d ) lim = ln C 0 (e x 1,..., e x d ), (2.67) s 0 + s for all x [0, ) d, and we write C CDA(C 0 ). Various other, but similar characterizations are listed in Kotz and Nadarajah [2000]. Note that (2.67) is just an equivalent way of writing lim t C t (u 1/t ) = C 0 (u) in Theorem 8 (see e.g. McNeil et al. [2005]), but it is more convenient to work with. Demarta and McNeil [2005] show how (2.67) can be used to derive the extreme value copula if the underlying distribution is a multivariate t distribution. In general, if we know the underlying copula C, we can derive its limit copula C 0 from (2.67). However, in practice we do not know the underlying copula (or distribution), so the approach is generally to work with any tractable EV copula. As mentioned, the Gumbel copula proves to be quite preferable in practical applications due to its functional simplicity. Limit Copulas for Multivariate Threshold Exceedances In this section we turn our attention to the limiting dependence structure of multivariate threshold exceedances. A recent development in this area investigates the specific kinds of copulas that emerge in the limit when losses are above (or below) some threshold. Originally described by Juri and Wüthrich [2002], these classes of copulas have been termed limiting threshold copulas or tail 43
dependence copulas and can be considered as natural limiting models for the dependence structure of multivariate threshold exceedances, just as we have the GPD as the natural limiting model for threshold exceedances in the univariate case [McNeil et al., 2005]. So far, however, limiting threshold copulas have not been extensively investigated in dimensions higher than two, thus we will limit our discussion to the bivariate, exchangeable copula case. Limits for lower and upper threshold copulas. Let X = (X 1, X 2 ) be a random vector with bivariate distribution function F (x) = C(F 1 (x 1 ), F 2 (x 2 )), where F 1 and F 2 are continuous margins and C is an exchangeable copula. Let us denote the event that both X 1 and X 2 are below some quantile level v by A v = { X 1 F1 1 (v), X 2 F2 1 (v) } for all 0 < v 1. If we assume that P(A v ) = C(v, v) 0, then the probability of the event that X 1 lies below its x 1 -quantile and X 2 lies below its x 2 -quantile given A v is P(X 1 F 1 1 (x 1 ), X 2 F 1 2 (x 2 ) A v ) = C(x 1, x 2 ) C(v, v), x 1, x 2 [0, v]. (2.68) If we consider this only as a function of x 1 and x 2, it defines the joint distribution function of (X 1, X 2 ) conditional on both X 1 and X 2 being below their v-quantile, i.e. a bivariate distribution function on the domain [0, v] 2 with (conditional) continuous marginal distribution functions F v (x) = P(X i F 1 i (x) A v ) = C(x, v), 0 x v. (2.69) C(v, v) Thus, from Theorem 5 it is known that the conditional joint distribution in (2.68) has a unique copula and we can write the bivariate distribution function as C(x 1, x 2 ) C(v, v) = C l v(f v (x 1 ), F v (x 2 )), x 1, x 2 [0, v]. (2.70) The unique copula C l v can be written as C l v(u 1, u 2 ) = C(F 1 v (u 1 ), Fv 1 (u 2 )). (2.71) C(v, v) In the original paper by Juri and Wüthrich [2002] this copula is referred to as a lower tail dependence copula (LTDC). Here, however, we refer to it as the lower threshold copula of C. If we let the threshold level v 0 in (2.71), the limiting copula will be known as a limiting lower threshold copula and denoted by C l 0. We can define the upper threshold copula C u v of C analogously following the same strains of reasoning if we condition on X 1 and X 2 being above their v-quantiles for 0 v < 1, see e.g. McNeil et al. [2005]. Similarly, the 44
limiting upper threshold copula is obtained when we let v 1 and is denoted by C1 u. In fact, Lemma 7.55 in McNeil et al. [2005] shows that it is sufficient to study either lower or upper threshold copulas because the results for one follow easily from the other. In consequence, the survival copulas of limiting lower threshold copulas are limiting upper threshold copulas. However, these limiting upper or lower threshold copulas must be stable under the conditioning operations that we used above. For instance, the limiting lower threshold copula must posses a stability property under the operation of calculating lower threshold copulas as in (2.71) [McNeil et al., 2005]. A copula C is a limiting lower threshold copula if it satisfies C l v(u 1, u 2 ) = C(u 1, u 2 ), (2.72) for any threshold 0 < v 1. Let us briefly turn to the application of these limiting threshold copulas. Assume that we have a vector of high thresholds u = (u 1, u 2 ), so that for small probabilities P(X 1 > u 1 ) P(X 2 > u 2 ). For the conditional distribution of X = (X 1, X 2 ) over the threshold u, we could then assume a model of the form P(X x X > u) Ĉ(G ξ 1,β 1 (x 1 u 1 ), G ξ2,β 2 (x 2 u 1 )), where Ĉ is the survival copula of some tractable limiting lower threshold copula. Breymann et al. [2003] show that the Clayton copula, which belongs to the family of Archimedean copulas, and its survival copula prove to very useful for modeling the dependence in the tails of bivariate financial return data. Statistical inference in this model would be based on the exceedance data. Analogously, we would model lower threshold exceedances using a tractable limiting lower threshold copula and fitting a GPD tail function Ḡξ,β to the margins [McNeil et al., 2005]. As was stressed in the introduction of this section, the theory on multivariate threshold exceedances is still in a rather early stage of development. The copula results that we have just discussed have not been thoroughly investigated in higher dimensional cases. This naturally limits the direct applicability of the results in practice. It should be noted, however, that research has offered other approaches to multivariate threshold exceedances. One strain of research is based on the point process theory of multivariate extremes developed in de Haan and Resnick [1977], de Haan [1985], and Resnick [2007, 2008]. Based on this theory, Coles and Tawn [1991], Joe et al. [1992], and [Coles, 2001] have constructed statistical models for multivariate threshold exceedances. Unfortunately, these models suffer from the same issues when dealing with financial return data that we found in the case of the univariate POT model. Also, the theory is rather mathematically involved making the work with these models quite cumbersome in practice. 45
Chapter 3 Methodology In this chapter we outline the methodology of our empirical study. First, in Section 3.1, we briefly discuss the financial data used in the empirical analysis. In Section 3.2 we outline the different risk measurement methods we have implemented. Emphasis is given to the statistical and implemental issues of each method. Finally, in Section 3.3 we describe the methodology used for backtesting and evaluating the performance of the implemented risk measurement methods. 3.1 Data Selection We consider a hypothetical portfolio of financial assets held by a Danish institutional investor. The portfolio is comprised of three equity securities selected from the Danish OMX C20 Index. The selected stocks are Novo Nordisk B (NOVO B), Carlsberg B (CARLS B), and Danske Bank (DANSKE). The components of the portfolio are chosen based on the following criteria: (1) they are stocks of large Danish companies, (2) they are stocks of companies in completely different industry sectors (biopharmaceutical, brewing, and banking) making the equity portfolio somewhat diversified with respect to industry risk exposure, and (3) return data on the stocks are readily available. We use daily total return indices measured in DKK. These are extracted from DataStream for each of the three stocks for the time period February 6, 1995 to June 4, 2010. Following standard practice in finance and risk management, we use continuously compounded or logarithmic returns for each asset. Typically, stock price series are integrated of order one, i.e. they are non-stationary, while log-returns have desirable statistical properties such as stationarity and ergodicity [Campbell et al., 1997]. When local holidays take place, we record an artificial zero return. This leaves us with a total of 4, 000 daily observations on each of the three Danish stocks. Finally, it should be noted that although we only consider equity securities, the methods we 46
describe in the next section can be used for a much wider range of asset types. 3.2 Method Selection and Implementation In this section we describe the selected risk measurement methods and especially how these are implemented. For each method, we discuss how to estimate 1-day and 10-day estimates of VaR and ES at the 95%, 97.5%, 99%, and 99.5% confidence levels. In our empirical analysis, we consider the risk of both a long and a short position in the portfolio, but the following discussion of the methods will only take its outset in a long position. However, dealing with a short position is easily done simply by changing the sign of the return series and then applying the methods as described. Finally, we note that the collection of methods we consider is by no means complete and is merely meant as an indication of strategies that are possible. The methods that we implement are chosen based on the following three criteria: (1) the methods are used in practice, (2) the methods are based on parsimonious models and/or estimatable in stages making them tractable for multivariate risk factor data, and (3) the methods are true (within means) to the stylized facts of financial return series. Naturally, not all the criteria are satisfied by every method. In Section 3.2.1 we describe the univariate methods that we implement. These methods are univariate in the sense that we are only dealing with one risk factor, which is the logarithmic portfolio value. Analogously, in Section 3.2.2 we describe the multivariate methods that we implement. These methods are multivariate in the sense that we are dealing with three risk factors, which are the logarithmic dividend-adjusted prices of the three Danish stocks. Finally, in Section 3.2.3 we discuss how we deal with time-varying volatility using GARCH-type models. Notational conventions: Please note the following notational conventions used in the sections: (1) we have three assets in our portfolio, so d = 3, (2) out of a total of n T = 4, 000 daily observations on each of the three assets, we use a rolling estimation window of n = 1, 000 daily observations (see Section 3.3), (3) we let t denote the most recent time point in the estimation window so that the data points included in the estimation window are t n + 1 to t, (4) we let h denote the forecast period measured in days and calculate 1-day and 10-day VaR and ES estimates, i.e. we set h = 1 when estimating oneday risk measures and h = 10 for 10-day risk measures, (5) when we conduct Monte Carlo simulations, we simulate m = 20, 000 paths, and finally (6) we denote the loss from time t over the next h periods by L (h) t+h. 47
3.2.1 Univariate Methods In this section, we consider the following methods: Historical Simulation (HS), HS with a GARCH-type model (HS-GARCH and HS-GARCH-t), Filtered Historical Simulation (FHS), and HS with Conditional EVT (HS- CONDEVT). Before discussing the methods, we will explain how we obtain the historically simulated losses of an equally-weighted portfolio of the three Danish stocks which constitute the input data in all four methods. Mapping of Risks to Portfolio Losses. Recall, we have d = 3 stocks in our portfolio. Let the dividend-adjusted price process of stock i be denoted by {P t,i } t Z. We consider the logarithmic dividend-adjusted stock prices as risk factors, i.e. Z t,i := ln P t,i for i = 1,..., d. We then have risk factor changes X t+1,i = ln P t+1,i ln P t,i, corresponding to the (dividend-adjusted) log-returns of the stocks in our portfolio. The value of the portfolio at time t is V t = f(t, Z t ) = d i=1 λ t,i exp(z t,i ), where λ t,i denotes the number of shares invested in stock i at time t. Hence, the loss at time t + 1 is given by L t+1 = (V t+1 V t ) = d λ t,i P t,i (exp(x t+1,i ) 1) i=1 d = V t w i (exp(x t+1,i ) 1), (3.1) where we hold the portfolio weights w i := (λ t,i P t,i )/V t constant through time. Setting the portfolio value V t = 1 and w i = 1/d at any time t, we have i=1 L t+1 = (V t+1 V t ) = 1 d d (exp(x t+1,i ) 1) (3.2) i=1 and the corresponding loss operator l [t] then takes the form l [t] (x) = 1 1 d d exp(x i ), (3.3) i=1 where x = (x 1,..., x d ) is a vector of log-returns on the stocks in the portfolio. From (2.5) we can derive the linearized loss operator l[t]. Note that for stocks there is no explicit time dependence, i.e. f t (t, Z t ) = 0, so the linearized loss operator becomes l [t] (x) = 1 d d x i. (3.4) i=1 48
Based on (3.4) we construct historically simulated losses for the equallyweighted portfolio using the n T = 4, 000 observations of daily log-returns on each of the three Danish stocks. This gives us the following sample of portfolio losses { L s = l [t] (X s ) : s = t n T + 1,..., t}, (3.5) where X s denotes the vector of risk factor changes observed at time s. Finally, we convert the simulated loss data back to logarithmic form. In the following we will refer to these mapped portfolio-level losses simply as portfolio losses. Unconditional Univariate Methods HS The first univariate method we consider is the standard unconditional historical simulation method as described in Section 2.1.4. We obtain estimates of 1-period VaR and ES in a two-step empirical estimation procedure, which we will also use in several of the other methods. First, we order the simulated portfolio losses by size L n,n L 1,n and define each as the (0.5/n), (1.5/n),..., ([n 0.5]/n) quantile, where n denotes the number of observations in the estimation window. Then, using linear interpolation, we estimate the quantile for a specific confidence level which is our VaR estimate. An estimate of ES is obtained as the average of the losses that are larger than this VaR estimate. For h 2, we use a bootstrap procedure to obtain VaR and ES estimates of the h-period loss distribution, i.e. the distribution of L (h) t+h. First, we generate a large set of independent future paths of pseudo-losses L t+1,..., L (j) (j) t+h for j = 1,..., m by repeated sampling with replacement from the portfolio losses L t n+1,..., L t. Then we add up each of the m paths { L (h)(j) t+h = ( L (j) (j) t+1 + + L t+h ) : j = 1,..., m}. Treating these h-period pseudo-losses as realizations of L (h) t+h, we apply the same empirical estimation procedure for estimating h-period VaR and ES that we used for estimating risk measures for the one-period loss distribution. Conditional Univariate Methods To calculate conditional estimates of VaR and ES, we assume that the time series process of portfolio losses {L t } t Z is adapted to the filtration {F t } t Z and follows a stationary model of the form L t = µ t +σ t Z t, where µ t and σ t are measurable with respect to the σ-field F t 1 and {Z t } t Z is a strict white noise innovation process with zero mean and unit variance, i.e. Z t SWN(0, 1). In the following methods, we fit a univariate dynamic time series model to the portfolio loss series to account for time-varying volatility. We con- 49
sider three dynamic GARCH-type models, GARCH, EGARCH, and GJR- GARCH, which are discussed in Section 3.2.3. The specific dynamic model to be used in the univariate methods will be chosen in Section 4.1.2. In the following we will refer to the used dynamic model as a GARCH-type model. HS-GARCH and HS-GARCH-t This is a conditional version of the historical simulation method in which we fit a GARCH-type model to the portfolio losses using ML. When estimating the GARCH-type model, we first assume that the innovations are conditionally normally distributed (HS- GARCH). However, in empirical applications the standardized residuals often appear to have fatter tails than the normal distribution. Therefore, to better capture this phenomenon, we will also estimate the GARCH-type model under the assumption of t innovations (HS-GARCH-t). Under the assumption of normally distributed innovations, we calculate conditional 1-period estimates of VaR and ES as VaR t+1 α = ˆµ t+1 + ˆσ t+1 Φ 1 (α), ÊS t+1 φ(φ 1 (α)) α = ˆµ t+1 + ˆσ t+1 1 α, (3.6) where Φ 1 denotes the inverse distribution function of a standard normal distribution, φ denotes the corresponding density function, and ˆµ t+1 and ˆσ t+1 are the estimated mean and volatility of the next period s loss obtained from the GARCH-type model fitted with normal-distributed innovations. Similarly, under the assumption of t distributed innovations, we calculate conditional 1-period estimates of VaR as VaR t+1 α and conditional 1-period estimates of ES as ÊS t+1 α = ˆµ t+1 + ˆσ t+1 (ˆν 2)/ˆνt 1 ˆν (α), (3.7) gˆν (t 1 = ˆµ t+1 + ˆσ t+1 (ˆν 2)/ˆν ˆν (α)) 1 α ( ˆν + (t 1 ˆν ) (α))2, (3.8) ˆν 1 t+1 t+h where t 1 ν denotes the inverse distribution function of a standard t distribution, g ν denotes the corresponding density function, and ˆµ t+1 and ˆσ t+1 are obtained from the GARCH-type model fitted with t-distributed innovations. To obtain h-period risk measure estimates for h 2, we use Monte Carlo simulation. First, we generate m future independent paths of innovations Z (j) (j),..., Z from the assumed innovation distribution, i.e. either a standard normal or t distribution with ν degrees of freedom. Second, we use these (j) (j) pseudo-innovations to simulate m paths of pseudo-losses L t+1,..., L t+h for j = 1,..., m based on the assumed model of {L t } t Z and the fitted dynamic GARCH-type model. After adding up each of the m paths of pseudo-losses over h, we treat the results as realizations of the loss distribution of L (h) t+h. Finally, we again apply the empirical estimation procedure for estimating h-period VaR and ES that we used under the HS method. 50
FHS Originally developed by Hull and White [1998], the filtered historical simulation (FHS) method is closely related to the HS-GARCH method except that no parametric distributional assumption is made about the innovations. Instead, the method uses the empirical innovation distribution and can therefore incorporate heavy-tailed and skewed innovation distributions in a natural way. First, a GARCH-type model is fitted to the portfolio losses using quasimaximum likelihood (QML). Using the conditional mean and volatility estimates, we obtain estimates of the standardized residuals {Ẑs = ˆσ s 1 ( L s ˆµ s ) : s = t n + 1,..., t}. Treating these standardized residuals as observations from the unobserved innovation distribution F Z, we estimate the conditional 1-period VaR and ES as VaR t+1 α = ˆµ t+1 + ˆσ t+1ˆq α (Z), ÊS t+1 α = ˆµ t+1 + ˆσ t+1 ÊS α (Z), (3.9) where ˆq α (Z) and ÊS α(z) denote the estimated quantile and ES of the empirical distribution of Z, respectively, obtained using the empirical estimation procedure that was described under the unconditional HS method. In order to obtain VaR and ES estimates of the h-period loss distribution of L (h) t+h for h 2, we use a combination of bootstrapping and Monte Carlo simulation. We do this by first generating a large set of independent future (j) (j) paths of pseudo-innovations Z t+1,..., Z t+h for j = 1,..., m by repeated sampling with replacement from the standardized residuals. Then we use these m paths of pseudo-innovations to simulate m paths of pseudo-losses based on the assumed model of {L t } t Z and the the dynamic GARCH-type model. After adding up each of the m paths of pseudo-losses over h, we treat the results as realizations of the loss distribution of L (h) t+h. Finally, we again apply the empirical estimation procedure for estimating h-period VaR and ES that we used under the HS method. HS-CONDEVT This is a conditional method based on the POT model of extreme value theory. In this method we use historical simulation to model the central part of the conditional loss distribution and the generalized Pareto distribution (GPD) to model the tails. The POT model is chosen as it uses data more efficiently than the block maxima model and allows for estimation of VaR and ES. Recall that EVT is based on the assumption of iid data. This is usually not satisfied by return series data which tend to exhibit conditional heteroskedasticity. However, we mitigate this problem by pre-whitening the portfolio loss data using a GARCH-type model before applying the methods of EVT in accordance with McNeil and Frey [2000]. The method is implemented as follows: First, a GARCH-type model is fitted to the portfolio losses using QML. Based on the conditional mean and volatility estimates ˆµ t+1 and ˆσ t+1, we obtain the estimated standardized 51
residuals {Ẑs = ˆσ s 1 ( L s ˆµ s ) : s = t n + 1,..., t}, which we consider as observations from the unobserved iid innovation distribution F Z. Second, we need to determine the threshold level u. Unfortunately, no unique way of doing this exists. Embrechts et al. [1997] discuss various graphical techniques that can be used to determine an appropriate threshold level, but for our purposes it would be practically infeasible to examine all the estimation windows and select an appropriate threshold value. Instead, we follow the approach used by McNeil and Frey [2000] and fix the number of threshold exceedances N u and set u equal to the empirical q-quantile of the distribution of the standardized residuals, where q = (1 N u /n). We choose to set N u equal to 100, a choice that is supported by various simulation studies. McNeil et al. [2005] show that, for a sample size of 1,000, fixing the number of threshold exceedances to 100 yields good estimates of VaR and ES. Similar simulation studies conducted by McNeil and Frey [2000] and Kuester et al. [2006] arrive at a suitable interval of 80 to 150 exceedances for iid t distributed data with different degrees of freedom. In addition, McNeil and Frey [2000] also find that the GPD method is robust to the choice of N u. To investigate the effects of varying N u, we conduct sensitivity analyses. Third, we fit a GPD to the excess standardized residuals over the threshold u by maximizing the log-likelihood function in (2.42) with respect to the parameters β and ξ. Conditional 1-period VaR and ES estimates are then calculated as VaR t+1 α = ˆµ t+1 + ˆσ t+1ˆq α (Z), ÊS t+1 α = ˆµ t+1 + ˆσ t+1 ÊS α (Z), (3.10) where ˆq α (Z) and ÊS α(z) are the VaR and ES estimators in (2.44) and (2.45), respectively. To obtain h-period estimates of VaR and ES for h 2, we follow a somewhat different procedure. Again, we start by obtaining the residuals Ẑ t n+1,..., Ẑt the same way as before. Then we determine a lower threshold u (l) and an upper threshold u (u) as the (N u (l) /n)-quantile and (1 N u (u) /n)- quantile of the distribution of the standardized residuals, where the number of lower order statistics N u (l) and upper order statistics N u (u) are both set to 100. Hereafter a GPD is separately fitted to the 100 lower tail excess standardized residuals and the 100 upper tail excess standardized residuals. The center of F Z is estimated by the empirical distribution smoothed using linear interpolation between data points. In the next step, a large set of m independent future paths of pseudoinnovations Z t+1,..., Z (j) (j) t+h for j = 1,..., m from the innovation distribution F Z are generated using a standard uniform random number generator combined with the inverse of the stepwise-fitted distribution. If the random number exceeds the upper tail fraction (1 N u (u) /n) a pseudo-innovation Z is generated by adding a random number simulated from the GPD fitted to the upper tail to the upper threshold u (u). Generalized Pareto distributed ran- 52
dom numbers V are generated from random uniform variates U and the inverse distribution function of the GPD, i.e. V = β(u ξ 1)/ξ GDP(ξ, β). If the standard uniform random number is below the lower tail fraction /n), a pseudo-innovation is generated by subtracting a GPD-distributed random number simulated with the estimated lower tail parameters. If the standard uniform random number is between the lower and upper tail fraction, a pseudo-innovation is generated via the inverse empirical distribution function. Finally, we use these m paths of pseudo-innovations to generate m paths of pseudo-losses based on the assumed model of {L t } t Z and the dynamic GARCH-type model. After adding up each of the m paths of pseudo-losses (N (l) u over h, we treat the results as realizations of the loss distribution of L (h) t+h. Finally, we again apply the empirical estimation procedure for estimating h-period VaR and ES that we used under the HS method. 3.2.2 Multivariate Methods In this section we consider the following methods: Variance-Covariance (VC), Variance-Covariance with EWMA (VC-EWMA), Constant Conditional Correlation Model (CCC-GARCH), Dynamic Conditional Correlation Model (DCC-GARCH), and Multivariate Conditional EVT using Gaussian, t, and Gumbel copulas (MCONDEVT). Unconditional Multivariate Methods VC This is the standard unconditional variance-covariance (VC) method assuming multivariate normal risk factor changes as described in Section 2.1.4. Though it has been exposed to much criticism, the VC method it still widely used in practice because it offers high-speed risk measure calculations compared to simulation-based methods and because of its simple time-scaling property, which makes it easy to estimate h-period risk measures. However, these advantages come at the cost of lower accuracy [Jorion, 2001]. We implement the method by first estimating the mean vector µ and covariance matrix Σ of the d return series using the sample estimators ˆµ = 1 n t s=t n+1 X s, ˆΣ = 1 n 1 t s=t n+1 (X s ˆµ)(X s ˆµ). (3.11) We then calculate 1-period and h-period VaR and ES estimates using VaR t+h α = hw ˆµ + h wσw Φ 1 (α), (3.12) and ÊS t+h α = hw ˆµ + h wσw φ(φ 1 (α)) 1 α, (3.13) 53
where h 1 is the length of the time period considered, w is the portfolio weight vector, Φ 1 (α) is the α-quantile of a standard normal distribution, and φ is the density function of the standard normal distribution. Note that we obtain multi-period risk measure estimates via the so-called square root of time rule and not by MC simulation as in the other methods. Conditional Multivariate Methods To calculate conditional VaR and ES estimates, we assume in the following that the time series process of risk factor changes {X t } t Z is adapted to the filtration {F t } t Z and follows a stationary model of the form X t µ t = Σ 1/2 t Z t = t P 1/2 t Z t, t Z, (3.14) where {Z t } t Z is a multivariate strict white noise process with mean vector zero and Cov(Z t ) = I d, Σ 1/2 t R d d is the Cholesky factor of the (positivedefinite) conditional covariance matrix Σ t measurable with respect to F t 1 = σ({x s : s t 1}), t = (Σ t ) = diag(σ t,1,..., σ t,d ) is the conditional volatility matrix containing the volatilities of the component series {X t,k } t Z for k = 1,..., d, and P 1/2 t is the Cholesky factor of the conditional correlation matrix P t. In several of the methods in the following, we will fit dynamic models to the d univariate component series of the multivariate return series process {X t } t Z. For this job, we again consider the three GARCH-type models that we also considered in the conditional univariate methods, i.e. the GARCH, EGARCH, and GJR-GARCH models. In Section 4.2.2 we discuss the choice of the specific model(s) that we use. VC-EWMA This method is a conditional version of the variance-covariance method in which a multivariate exponentially weighted moving-average (EWMA) model is used to estimate the conditional covariance matrix Σ t+h for h 1. In specifying a dynamic structure for the conditional covariance matrix, we take the time-varying correlations observed in multivariate return time series into account in a way that requires limited estimation efforts. We make the simplifying assumption that X t+1 have a conditional multivariate normal distribution with constant mean. Thus, we assume X t µ = Σ 1/2 t Z t (3.15) with the following multivariate EWMA updating equation for Σ t Σ t = α(x t 1 µ)(x t 1 µ ) (1 α)σ t 1, (3.16) where the value of the weight parameter α is set to 0.04, which gives the best results for our data. To estimate the mean vector µ we use the sample 54
mean estimator in (3.11) and Σ t+1 is estimated recursively from (3.16) using the sample (unconditional) covariance matrix of the demeaned data as initial estimate ˆΣ 1. To estimate 1-period VaR and ES estimates we use (3.12) and (3.13), where the conditional covariance matrix ˆΣ t+1 is used in place of the unconditional covariance matrix ˆΣ and h is set equal to one. To obtain h-period estimates of VaR and ES for h 2, we use Monte Carlo simulation and not the square root of time rule that we used in the unconditional VC method. First, we generate m future independent paths of innovations Z (j) t+1,..., Z (j) t+h for j = 1,..., m from the multivariate normal distribution N d (0, I d ). Next, we use a two-step iterative procedure to simulate m replications of future paths of the process X (j) t+1,..., X (j) t+h. First step is to generate m replications of X t+1 using Z t+1, ˆΣ t+1 and ˆµ. Second, for each replication X t+1 we use (3.16) to update the covariance estimate, i.e. we estimate a new ˆΣ t+2 for each X t+1. This procedure is iterated until we reach X t+h. We then apply the h-period loss operator to these simulated data to obtain Monte Carlo simulated losses { L (j)(h) t+h = l (h) [t] ( h (j) i=1 X t+i ) : j = 1,..., m}. We obtain h- period estimates of VaR and ES using the empirical estimation procedure described under the HS method applied to the simulated losses. CCC-GARCH This is a multivariate GARCH (MGARCH) method based on the constant conditional correlation (CCC) model proposed by Bollerslev [1990]. The method is chosen because of the simple way it combines univariate GARCH-type models without additional numerical optimization and because it lends itself to estimation in stages. The model has time-varying conditional covariances but assumes that the conditional correlations are constant over time, i.e. P t := P c for all t. To implement the method, we first fit univariate GARCH-type models using QML to each of the d component series of X t n+1,..., X t, providing us with volatility estimates { ˆ s : s = t n + 1,..., t} and mean vector estimates {ˆµ s : s = t n + 1,..., t}. Define by Y s = 1 s (X s µ s ) the devolatized process. We refer to the realizations of this process as standardized residuals and we estimate {Ŷ s = ˆ 1 s (X s ˆµ s ) : s = t n + 1,..., t}. Assuming the adequacy of the model, the standardized residuals should behave like realizations of a SWN(0, P c ) process and the correlation matrix P c is estimated as ˆP c = ( ˆQ) = ( ( ˆQ)) 1 ˆQ( ( ˆQ)) 1, (3.17) where ˆQ is the estimated covariance matrix of the standardized residuals. The operator ( ) ensures that ˆP c is a real, symmetric and positivedefinite matrix with ones on the diagonal and off-diagonal values in the interval [ 1; 1]. This, in turn, ensures that ˆΣ t = ˆ t ˆP c ˆ t is positive-definite 55
since the individual volatility processes are strictly positive. The positivedefiniteness property of ˆΣ t is required as it guarantees a non-negative portfolio variance regardless of the portfolio weights. Using ˆP c, we generate m future independent paths of the devolatized process Ỹ (j) t+1,..., Ỹ(j) t+h for j = 1,..., m from a multivariate normal distribution N d (0, ˆP c ). Next, we simulate m replications of future paths of the process X (j) t+1,..., X (j) t+h using the simulated innovations and the fitted univariate GARCH-type models. We then apply the h-period loss operator to these simulated data to obtain Monte Carlo simulated losses { L (j)(h) l (h) [t] ( h (j) i=1 X t+i t+h = ) : j = 1,..., m}. We obtain h-period estimates of VaR and ES using the empirical estimation procedure described under the HS method applied to the h-period simulated losses. DCC-GARCH This is a MGARCH method based on the dynamic conditional correlation (DCC) model proposed by Engle and Sheppard [2001]. In contrast to the CCC model, this model allows the conditional correlations to evolve dynamically over time and thus addresses the issue of time-varying correlations often exhibited by multivariate return series. However, it is constructed in a way that still allows for estimation in stages using univariate GARCH-type models. In accordance with Engle [2002], we assume the following dynamic firstorder process for the conditional correlation matrix P t P t = (Q t ), Q t = (1 α β)p c + αy t 1 Y t 1 + βq t 1 (3.18) where P c is the unconditional correlation matrix of the standardized residuals and can be thought of as representing the long-run correlation structure, Y t = 1 t (X t µ t ), and the coefficients α and β are non-negative scalar parameters such that α β < 1. Assuming Q 0 is positive-definite, this dynamic model preserves the positive-definiteness of P t. We implement the method as follows: First, we fit univariate GARCHtype models to the component series using QML to estimate the volatility matrix t and mean vector µ t. From this, we estimate the standardized residuals {Ŷ s = ˆ 1 s (X s ˆµ s ) : s = t n + 1,..., t}. Then we estimate P c from (3.17) and estimate the correlation persistence parameters α and β in (3.18) using QML. As starting value of Q t, we use the unconditional covariance matrix of the standardized residuals. The conditional log-likelihood function can be written as ln L(α, β; Y t n+1,..., Y t ) = 1 t ( ln Ps + Y sp 1 ) s Y s (3.19) 2 s=t n+1 where P s denotes the determinant of the conditional correlation matrix. The log-likelihood function (3.19) is maximized with respect to the parameters α and β. 56
Based on the QML estimates of α and β, we estimate P t+1 from (3.18) and use this estimate and the fitted univariate GARCH-type models to simulate m replications of future paths of the process X (j) t+1,..., X (j) t+h, assuming for simplicity that the correlations remain constant over the forecast horizon, i.e. P t+1 = = P t+h. We then apply the h-period loss operator to these simulated data to obtain Monte Carlo simulated losses [t] ( h (j) i=1 X t+i { L (j)(h) t+h = l (h) ) : j = 1,..., m}. We obtain h-period estimates of VaR and ES using the empirical estimation procedure applied to the h-period simulated losses. MCONDEVT This is a conditional method using Gaussian, t, or Gumbel copulas with EVT margins. Given multivariate return series data, we use the conditional POT model where we pre-whiten the component series using GARCH-type models and fit a GPD to the tails of the distributions of the pre-whitened return series. The bodies of the marginal distributions are modeled empirically. This is followed by copula fitting to obtain a joint distribution of the d risk factor return series. As we discussed in Section 2.3.2, the theory of multivariate extremes is still in an early stage of development in many aspects. The theory on limiting copulas for threshold exceedances has so far only been investigated in the bivariate case, which is not appropriate in our context. Thus, here we choose to use a single copula to model the entire dependence structure in the joint distribution and refrain from considering the types of limiting copulas that may exist in the tails. The Gaussian copula is chosen because of its wide use in practice and its relatively easy implementation, which makes it tractable for risk management of large portfolios. Unfortunately, it does not incorporate the tail dependence usually exhibited by multivariate return data (see Section 2.1.1). Therefore, we also consider the t copula which has the same tail dependence in both the upper and lower tail. However, it is often claimed that joint negative returns on stocks show more tail dependence than joint positive returns. To account for this asymmetry, we finally consider the Gumbel copula which has upper tail dependence and no lower tail dependence and we fit the copula to the standardized residuals of the negative return series. The method is implemented as follows: First, we fit univariate GARCHtype models to each of the d component series of X t n+1,..., X t using QML, providing us with volatility matrix estimates { ˆ s : s = t n + 1,..., t} and mean vector estimates {ˆµ s : s = t n + 1,..., t} from which we calculate the standardized residuals {Ŷ s = ˆ 1 s (X s ˆµ s ) : s = t n + 1,..., t}. Based on the standardized residuals, we estimate the marginal distribution functions F 1,..., F d in the manner used in the HS-CONDEVT method. We model the body of each marginal distribution using its empirical distribution and fit a GPD to the margins upper and lower tails using the log-likelihood function in (2.42). As in the HS-CONDEVT method, we fix the number of 57
threshold exceedances in both the upper and lower tail to 100. Following this, we construct a pseudo-sample of observations from the copula we wish to fit. This pseudo-sample consists of the vectors Û t n+1,..., Û t, where Û s = (Ûs,1,..., Ûs,d) = ( ˆF 1 (Ŷs,1),..., ˆF d (Ŷs,d)). (3.20) The parameters of the copula are then estimated by ML based on the pseudosample. In the following we describe how we fit the three different copulas using ML. Fitting a Gaussian copula. Using the copula density in (2.55), we set up the log-likelihood function 1 as ln L(P; Û t n+1,..., Û t ) = = 1 2 t s=t n+1 ln c Ga P (Û s ) (3.21) t s=t n+1 ( ln P + ψ s P 1 ψ s ), where ψ s = (Φ 1 (Ûs,1),..., Φ 1 (Ûs,d)). The MLE ˆP is obtained by maximizing (3.21) with respect to P over the parameter space P, which is the set of all possible linear correlation matrices. Searching over P can be very slow in higher dimensions due to the large number of parameters to be estimated. However, it is possible to obtain an approximate analytical solution. This is achieved by maximizing (3.21) over the set of all covariance matrices instead of P which leads to an analytical solution given by ˆΣ = n 1 t s=t n+1 ψ sψ s. The normalizing operator is applied at the end to get the proxy ˆP = ( ˆΣ). Fitting a t copula. Using the copula density in (2.57), we set up the log-likelihood function as ln L(ν, P; Û t n+1,..., Û t ) = t s=t n+1 ln c t ν,p(û s ) (3.22) ( = n ( Γ ν+d )) ( ( 2 ln P + n ln 2 Γ ν ) ) Γ ( ) 2 ν + nd ln 2 Γ ( ) ν+d 2 ν + d t ln (1 + 1ν ) 2 ψ sp 1 ψ s s=t n+1 ( ) + ν + 1 t d ln 1 + ψ2 s,j, 2 ν s=t n+1 j=1 1 Note that we have removed the constant term of the log-likelihood function as this is not relevant to the maximization with respect to P 58
where ψ s = (t 1 ν (Ûs,1),..., t 1 ν (Ûs,d)) and ψ s,j = t 1 ν ((Ûs,j)). For low dimensions, one could maximize (3.22) over the degrees of freedom ν and the set of all possible correlations matrices. However, this procedure can be very slow in higher dimensions [McNeil et al., 2005]. We choose therefore to fit the t copula using the approximate ML method proposed by Bouyé et al. [2000]. They suggest maximizing (3.22) iteratively with respect to ν (holding P constant), where the estimate of P is updated in each iteration according to ˆP q+1 = 1 n ( ) n ν + d ν t=1 ψ sψ s 1 + 1, (3.23) ν ψ 1 s ˆP q ψ s where q is the number of iterations until convergence. We use ( ˆΣ) = (n 1 t s=t n+1 ψ sψ s) as an initial estimate ˆP 0. The procedure should lead to estimates close to the MLEs for large sample sizes. Fitting a Gumbel copula. We fit the Gumbel copula using ML by maximizing the log-likelihood function with respect to the dependence parameter θ ln L(θ; Û t n+1,..., Û t ) = t s=t n+1 ln c Gu θ (Û s ) (3.24) subject to θ 1, where c Gu θ denotes the density of the Gumbel copula. In the three-dimensional case, the Gumbel copula density is given in (2.62). The expression looks rather nasty, so we do not write it out. After fitting the copula to the pseudo-sample, we simulate m future paths of the devolatized process {Ỹ (j) t+i = ( ˆF 1 (Ũ (j) t+i,1 ),..., ˆF d (Ũ (j) t+i,d )) : j = 1,..., m, i = 1,..., h} where the elements {Ũ (j) (j) t+i = (Ũ t+i,1,..., Ũ (j) t+i,d ) : j = 1,..., m, i = 1,..., h} are uniformly distributed random variates simulated from the fitted copula. The procedure for simulating from each of the candidate copulas is described below Simulating from a Gaussian copula. We simulate from the Gaussian copula by first generating a random vector X N d (0, P) and then transforming each component with its own marginal distribution function to obtain a random vector U = (U 1,..., U d ) = (Φ(X 1 ),..., Φ(X d )), which has distribution function C Ga P. Simulating from a t copula. Similar to the way we simulated from the Gaussian copula, we simulate from the t copula by generating a vector X t d (ν, 0, P) and then transforming each component with its own marginal distribution function to obtain a random vector U = (U 1,..., U d ) = (t ν (X 1 ),..., t ν (X d )), which has distribution function Cν,P t. 59
Simulating from a Gumbel copula. To simulate from the Gumbel copula, we use the Laplace-Stieltjes transform in (2.63). We generate a variate V with distribution function G such that Ĝ, the Laplace-Stieltjes transform of G, is the inverse of the generator ϕ of the Gumbel copula. The distribution function of a positive stable random variate V St(1/θ, 1, γ, 0), where γ = cos(π/(2θ)) θ and θ > 1, has Laplace-Stieltjes transform Ĝ(t) = exp( t1/θ ) which is the inverse Gumbel copula generator ϕ(t) = Ĝ 1 (t) = ( ln t) θ. To generate V, we first generate a standardized positive stable generate Z (α, β, 1, 0) using Theorem 1.19 in Nolan [2009]. Let Θ and W be independent with Θ U( π 2 ; π 2 ) and W exponentially distributed with unit mean and let ς := arctan(β tan(πα/2))/α. For any 0 < α 2 and 1 < β 1, the random variate ( ) sin α(ς + Θ) cos(ας + (α 1)Θ) (1 α)/α Z = (cos ας cos Θ) 1/α, α 1, (3.25) W has a St(α, β, 1, 0)-distribution. Generating two independent U(0, 1) random variates U 1 and U 2, we determine Θ = π(u 1 1/2) and W = ln(u 2 ). Also, observe that when β = 1, ς reduces to π/2. To obtain V St(α, 1, γ, 0), where α := 1/θ, we scale Z with γ = cos(π/(2θ)) θ = cos(απ/2) 1/α and obtain ( ) sin α(π/2 + Θ) cos(απ/2 + (α 1)Θ) (1 α)/α V = (cos Θ) 1/α, α 1, (3.26) W From a set of simulated independent uniform variates U 1,..., U d, we generate a random vector U = (Ĝ( ln U 1 V ),..., Ĝ( ln U d V )) = (exp( ( ln U 1 V )1/θ ),..., exp( ( ln U d V )1/θ )) which has distribution function Cθ Gu. Based on the simulated paths of the devolatized process Ỹ (j) t+1,..., Ỹ(j) t+h for j = 1,..., m, we simulate m future paths of the process {X t } t Z using the fitted univariate GARCH-type models, which gives us X (j) t+1,..., X (j) t+h for j = 1,..., m. We then apply the h-period loss operator to the simulated data to obtain MC simulated losses { L (j)(h) t+h = l (h) [t] ( h (j) i=1 X t+i ) : j = 1,..., m}. We obtain h-period estimates of VaR and ES using the empirical estimation procedure described under the HS method applied to the h-period simulated losses. 3.2.3 Dynamic Models for Changing Volatility In this section we briefly discuss the univariate dynamic models that we use in the methods described in the previous two sections to account for timevarying volatility and asymmetric effects of positive and negative shocks in financial return series (cf. Section 2.1.1). 60
We assume that the individual return series processes {X t } t Z of the three Danish stocks are adapted to the filtration {F t } t Z and that each follows a stationary pth-order autoregressive (AR(p)) process X t = µ t + ɛ t, where µ t is the conditional mean and ɛ t is the innovation with conditional mean zero and conditional variance σ 2 t, measurable with respect to F t 1. Thus, we assume the model X t = φ 0 + p φ i X t i + ɛ t, ɛ t = σ t Z t, (3.27) i=1 where {Z t } t Z is a strict white noise process with mean zero and variance one. In the following we consider three GARCH-type models for modeling the dynamics of the strictly positive-valued process {σ t } t Z : (1) the GARCH model, (2) the EGARCH model, and (3) the GJR-GARCH model. The choice of dynamic time series models and orders of the models implemented in the empirical study are based on information criteria such as AIC and BIC for the fitted models as well as various diagnostic tests in order to determine the models ability to pre-whiten the return series data. GARCH To capture the serial dependence of volatility in financial return series, Engle [1982] suggested the autoregressive conditional heteroskedastic (ARCH) model in which the conditional variance is modeled as a linear function of past squared innovations. The general ARCH(q) model has the form q σt 2 = ω + α j ɛ 2 t j, (3.28) j=1 where ω > 0 and α j 0, j = 1,..., q, in order to keep the conditional variance positive. Unfortunately, we often need q to be large in order to fit the data. As a way to model persistent movements in volatility without estimating a large number of parameters, Bollerslev [1986] proposed a more parsimonious model: The generalized autoregressive conditional heteroskedastic (GARCH) model. The general GARCH(p, q) model is σ 2 t = ω + p β i σt i 2 + i=1 q α j ɛ 2 t j, (3.29) where ω > 0, α j 0 for j = 1,..., q, and β i 0 for i = 1,..., p. However, further restrictions are required to ensure σt 2 > 0 (see Nelson and Cao [1992]). The model is a generalized version of the ARCH model in the sense that the squared conditional volatility σt 2 is a linear function of past squared conditional volatilities as well as past squared innovations of the process. The ARCH and GARCH models are symmetric in the sense that negative and positive shocks have the same effect on volatility, i.e. the signs of the 61 j=1
innovations or shocks have no effect on the conditional volatility, only the squared innovations enter the conditional variance equation [Campbell et al., 1997]. This is, however, inconsistent with the stylized fact that negative shocks tend to have a larger impact on volatility than positive shocks of the same magnitude. As the leverage (i.e. the ratio of dept to equity) of a stock increases when its price drops, this phenomenon is called the leverage effect [Black, 1976]. In the following we consider two extensions of the GARCH model that take this asymmetry into account. EGARCH The exponential generalized autoregressive conditional heteroskedastic (EGARCH) model proposed by Nelson [1991] explicitly allows for asymmetries in the relationship between return and volatility. The general EGARCH(p, q) model may be expressed as follows ln(σ 2 t ) = ω + + q j=1 p β i ln(σt i) 2 + i=1 α j [ ɛt j σ t j q j=1 ( ɛt j E σ t j γ j ɛ t j σ t j (3.30) )]. By parameterizing the logarithm of the conditional variance as opposed to the conditional variance, no inequality constraints are needed to ensure positive conditional variances. The expected value of the standardized innovation E( ɛ t j /σ t j ) depends on the assumed innovation distribution. Under the assumption of normally distributed innovations, we have that E( ɛ t j /σ t j ) = 2/π. While if we assume t innovations, we have ( ) ɛt j ν 2 Γ ( ) ν 1 2 E = σ t j π Γ ( ) ν, 2 provided that ν > 2. The EGARCH model differs from the GARCH model in three aspects. First, it allows positive and negative shocks to have a different impact on volatility. If the leverage effect holds, we would expect the coefficients γ j to be negative whereby negative shocks will have a larger impact on future volatility than positive shocks of the same magnitude. Second, the EGARCH model allows large shocks to have a greater impact on volatility. Finally, in contrast to the GARCH model (as well as the GJR-GARCH) which allows for volatility clustering (i.e. persistence) through a combination of the β i and α j terms, persistence is entirely captured by β i in the EGARCH model. GJR-GARCH The Glosten-Jagannathan-Runkle GARCH (GJR-GARC- H) model proposed by Glosten et al. [1993] also allows the conditional variance to respond differently to past negative and positive innovations, but 62
the manner in which it does so is different from the EGARCH model. In the EGARCH model the leverage coefficients γ j are applied to the actual innovation ɛ t j, while in the GJR-GARCH model the leverage coefficients enter the model through a Boolean indicator. The general GJR-GARCH(p, q) model is p q q σt 2 = ω + β i σt i 2 + α j ɛ 2 t j + γ j ɛ 2 t ji {ɛt j <0}, (3.31) i=1 j=1 where ω > 0, β i 0 for i = 1,..., p, α j 0 and α j + γ j 0 for j = 1,..., q. If the leverage effect hypothesis holds, we would expect the leverage coefficients γ j to be positive. I { } denotes the indicator function which returns the value one if the threshold level (which is zero in this case) is satisfied and zero otherwise. Thus, the GJR-GARCH model is closely related to the Threshold GARCH (TGARCH) model proposed by Zakoian [1994]. However, in the TGARCH, one models the conditional standard deviation instead of the conditional variance. Fitting the GARCH-type models using quasi-maximum likelihood. In general, we assume that the innovations in the loss process {L t } t Z and the component series of the multivariate return series process {X t } t Z are conditionally non-normal, but we estimate the parameters of the processes by fitting GARCH-type models with normal innovations. This model fitting procedure is known as quasi-maximum likelihood or QML. By estimating the models using QML we are still able to get consistent estimators of the model parameters, even though we have erroneously assumed conditionally normally distributed innovations. Moreover, we have asymptotically normal 2 estimators of the parameters given that the true innovation distribution has a finite fourth moment [Bollerslev and Wooldridge, 1992]. 3 3.3 Backtesting Methodology In the previous sections we have presented a number of different methods for estimation of VaR and ES. In this section we discuss how to compare the performance of these methods. For this purpose, we employ the widely used technique for statistical evaluation known as backtesting. To backtest the methods, we continually implement the methods using a rolling subset the estimation window of the total sample and compare the resulting 2 The form of the asymptotic covariance matrix changes, however, compared to the ML case where the model is assumed to be correctly specified. 3 Lee and Hansen [1994] prove consistency and asymptotic normality of QMLEs for a GARCH(1,1) model, while Berkes et al. [2003] prove these properties in the case of the general GARCH(p, q) model. j=1 63
estimates of VaR and ES with the actual observed changes in the portfolio value. We backtest VaR and ES estimates at the commonly used 95%, 97.5%, 99%, and 99.5% confidence levels. When evaluating the methods ability to estimate VaR, we compare the observed number of exceedances of the VaR estimate we call this violations with the expected number of exceedances given the used confidence level. For ES, we evaluate the methods performance by comparing the actual loss incurred when VaR is exceeded with the ES estimate. We use a rolling estimation window size of 1, 000 daily risk factor return observations. Starting with the first 1, 000 daily observations of the data series, the estimation window is continually moved one day ahead, providing us with us 3, 000 one-day VaR and ES estimates which are backtested against the actual changes in the portfolio value observed over the time period. From a regulatory perspective, it is also of interest to investigate the relative performance of the competing methods based on 10-day risk measures because the Basel Committee requires that VaR for banks is calculated using a 10-day holding period in the case of market risk [Basel Committee on Banking Supervision, 2004]. However, when comparing 10-day outcomes with 10-day risk measures, two issues arise: 1) Risk measures are generally calculated assuming a constant portfolio composition but most financial institutions make adjustments to their portfolio on a daily basis, and 2) using overlapping 10-day returns introduce dependence into the comparison, which complicates statistical inference [McNeil et al., 2005]. The issue of dependence may be avoided by basing the backtest on non-overlapping periods (i.e. move the estimation window 10 days forward after each estimation). However, this would greatly reduce the number of observations and hence the statistical validity of any tests performed. To avoid the dependence issue, we will only do a qualitative comparison of the methods when evaluating the performance for 10-day horizons. The following two sections will present the criteria on which we will compare the implemented methods and the statistical techniques used to assess the fulfillment of these criteria. 3.3.1 Backtesting Value at Risk With respect to VaR, the methods will be evaluated on the following two criteria: 1. Does the observed fraction of VaR violations π match the expected fraction of violations λ? 2. Do the VaR violations fall randomly in time? For a given confidence level α, the expected fraction of violations of VaR α is defined as λ := 1 α. If π is larger than λ, the method underestimates the 64
funds at risk, which may lead to financial distress because of insufficient risk capital in place. If π is smaller than λ, the method overestimates the funds at risk, causing an unnecessarily large allocation of risk capital. For a given sample of loss observations and VaR estimates, we may have a match between π and λ but if all violations are clustered within the same short period, the risk of financial distress is severely higher than if the violations are randomly scattered across time. Clustering of violations indicates that the probability of a new VaR violation increases when a violation has just occurred (positive dependence), which can be a sign of model misspecification. For 1-day VaR estimates, we can formally test the two criteria independently and jointly. We employ the likelihood ratio (LR) framework of Christoffersen [1998] which specifies a LR test of correct unconditional coverage, a LR test of independence between violations, and a joint LR test of correct coverage and independence (i.e. conditional coverage). Unconditional Coverage Testing First, we want to test if the observed fraction of violations π obtained using a particular method is significantly different from the expected fraction λ. We can test the null hypothesis that π = λ using the following likelihood ratio test statistic [ (1 λ) n 0 λ n ] 1 a LR uc = 2 ln (1 ˆπ) n 0 (ˆπ) n 1 χ 2 (1), (3.32) where n is the number of observations, n 1 is the number of violations, and n 0 = n n 1 is the number of non-violations. The fraction of violations π is simply estimated from ˆπ = n 1 /n. The test is asymptotically chi-squared distributed with one degree of freedom. 4 Independence Testing Next, we want to test the second criteria, i.e. if violations fall randomly in time. We can test the null hypothesis of independence, i.e. the probability of a violation tomorrow does not depend on whether or not there is a violation today, using a likelihood ratio test. Let n ij denote the number of observations with a j following an i and π ij denote the probability of an observation with a j following an i, where i and j can take the values 0 or 1 denoting a non-violation and a violation, respectively. We have the following test statistic [ (1 ˆπ) n 0 (ˆπ) n 1 LR ind = 2 ln (1 ˆπ 01 ) n 00 ˆπ n 01 01 (1 ˆπ 11) n 10 ˆπ n 11 11 ] a χ 2 (1), (3.33) where the probability of a violation day following a non-violation day π 01 can be estimated using the maximum likelihood estimator ˆπ 01 = n 01 /(n 00 +n 01 ). Using the fact that the probabilities sum to one, it follows that ˆπ 00 = 1 ˆπ 01. 4 Kupiec [1995] applies similar tests for unconditional coverage 65
Correspondingly, we have that the probability of a violation day following a violation day can be estimated from ˆπ 11 = n 11 /(n 10 +n 11 ), and it follows that ˆπ 10 = 1 ˆπ 11 as before. Note that the independence hypothesis corresponds to π 01 = π 11. Conditional Coverage Testing Ultimately, we would like to test both evaluation criteria jointly. We conduct a joint test of correct coverage and independence, which we refer to as conditional coverage, simply by summing the two previous likelihood-ratio test statistics. We have LR cc = LR uc + LR ind a χ 2 (2). (3.34) The test is asymptotically chi-squared distributed with two degrees of freedom. 3.3.2 Backtesting Expected Shortfall In Section2.1.3 we defined ES and made the remark that is has the desirable property of subadditivity, which makes it an attractive measure of risk. For this reason, we introduce a third evaluation criteria: 3. Does the size of the violating loss match the expected shortfall? Thus, we evaluate the methods based on the discrepancy between the h- day expected shortfall estimate and the observed h-day loss given VaR is exceeded. For 1-day ES estimates, we can conduct a formal test suggested by McNeil and Frey [2000]. The test procedure is described below. Expected Shortfall Testing Let ES t α be the expected shortfall of the (continuous) conditional loss distribution F Lt+1 F t and define the discrepancy as S t+1 = (L t+1 ES t α)i t+1, where I t+1 := I {Lt+1 >VaR t is a violation α} indicator taking the value one on violation days and zero otherwise. Then, for an arbitrary loss process {L t } t Z, it follows from relation (2.9) that the process {S t } t Z satisfies the identity E(S t+1 F t ) = 0 (3.35) Under the assumption that the loss process {L t } t Z follows a stationary model of the form L t = µ t + σ t Z t, defined as in Section 3.2, it is given that VaR t α = µ t+1 + σ t+1 q α (Z), and ES t α = µ t+1 + σ t+1 ES α (Z), 66
where q α (Z) is the qth quantile and ES α (Z) is the expected shortfall of the innovation distribution, respectively. From this, we can write the discrepancy as S t+1 = σ t+1 (Z t+1 ES α (Z))I {Zt+1 >q α(z)}. With conditional estimates of the expected shortfall and the volatility of the loss process, we generate exceedance residuals of the form: ˆR t+1 := Ŝt+1/ˆσ t+1, Ŝ t+1 := (L t+1 ÊSα t )Ît+1, (3.36) where Ît+1 is the violation indicator previously defined. Under the null hypothesis that we have correctly estimated ES and correctly modeled the dynamics of the loss process, these residuals are expected to behave as realizations of iid variables of a distribution with zero mean [McNeil et al., 2005]. The estimate of σ t+1 is obtained in a different way in each method. In the conditional univariate methods, we obtain the estimate ˆσ t+1 directly from the fitted GARCH-type model to the portfolio loss distribution. In the conditional multivariate method, we estimate σ t+1 with the sample standard (1) L t+1 (m) deviation estimator applied to the simulated 1-day losses,..., L t+1. We also wish to compare the performance of the unconditional methods, i.e. the HS method and the VC method, with the performance of the conditional methods. So, for the unconditional methods, we standardize the discrepancy using the sample standard deviation of the historically simulated portfolio losses in the HS method and with the estimate of wσw in the VC method. To test the null hypothesis that the residuals come from a population with mean zero, we perform a bootstrap test that makes no assumptions about the underlying distribution of the residuals (see Efron and Tibshirani [1993] for an accessible introduction to the bootstrapping). The bootstrap test is based on the distribution of the test statistic t( ˆR t+1 ) = R t+1 ˆσ Rt+1 / n, (3.37) where n denotes the number of violations in the sample, ˆR t+1 is a vector of non-zero exceedance residuals ˆR t+1 = ( ˆR t+1,1,..., ˆR t+1,n ), and R t+1 and ˆσ Rt+1 denote their sample mean and sample standard deviation, respectively. We conduct a one-sided test against the alternative that the mean is greater than zero since it is most likely that expected shortfall is systematically underestimated [McNeil and Frey, 2000]. We want to estimate the probability value that a random variable Rt+1 distributed according to the null hypothesis (i.e. with zero mean distribution) is above the observed value. For this purpose we need a distribution ˆF that estimates the population of residuals under the null. A way to obtain such a null distribution is to use a version 67
of the empirical distribution which has been transformed to have the desired mean; in our case, mean zero. In other words, we estimate the null distribution F by the empirical distribution of the values R t+1,i = ˆR t+1,i R t+1 for i = 1,..., n. We bootstrap R t+1,1,..., R t+1,n from ˆR t+1,1,..., ˆR t+1,n and calculate for each bootstrap sample the test statistic t( R t+1) = R t+1 ˆσ R t+1 / n, (3.38) where R t+1 is the average of the bootstrap sample and ˆσ Rt+1 is the standard deviation of the bootstrap sample. Finally, we count the number of bootstrap test statistics above the observed test statistic t( ˆR t+1 ). The fraction of exceeding bootstrap statistics to the 20,000 bootstrap samples is taken as the asymptotic p-value, which we use to compare the different methods ability to accurately estimate 1-day ES. 68
Chapter 4 Empirical Results This chapter presents the results of our empirical analysis of the performance of the selected risk measurement methods. Section 4.1 focuses on the results for the univariate methods and Section 4.2 focuses on the results for the multivariate methods. The two sections have the same three-part structure. First, we perform a preliminary data analysis. Second, we compare different dynamic model specifications to find the most suitable model(s) for our data. Third, we present the results of the backtests and discuss the relative performance of each method. Lastly, we compare the overall performance of all the methods in Section 4.3. 4.1 Univariate Risk Measurement Methods The methods considered in this section are HS, HS-GARCH, HS-GARCH-t, FHS, and HS-CONDEVT. Section 4.1.1 presents the results of the preliminary data analysis. In Section 4.1.2, we select a suitable dynamic time series model for the portfolio losses based on the AIC and BIC information criteria and evaluate the adequacy of the selected model via statistical and graphical diagnostics. The empirical results of the backtest and the comparative performance analysis of the univariate methods are finally presented in Section 4.1.3. 4.1.1 Preliminary Data Analysis and Descriptive Statistics The time series of portfolio losses is depicted in Figure 4.1. From the plot, the assumption of stationarity seems reasonable. However, the plot also appears to confirm the presence of volatility clustering. Table 4.1 contains descriptive statistics for the portfolio losses in the first estimation window (called in-sample) and in the total sample. The in-sample and total sample show similar non-normal characteristics, such as excess kurtosis and skewness. Normality is formally rejected by the 69
15 10 5 0 5 10 15 1997 2000 2002 2005 2007 2010 Figure 4.1: Time series of portfolio losses. Jarque-Bera test for both the in-sample and total sample. The Ljung-Box test computed for 20 lagged autocorrelations rejects the null hypothesis of no serial correlation in the raw portfolio losses for the total sample data but not for the in-sample. The rejection is supported by the correlograms in Figure 4.2. In addition, the correlograms and the Ljung-Box test for 20 lagged autocorrelations indicate a presence of serial correlation in the squared portfolio losses, implying temporal dependence in the portfolio losses. This is confirmed by the Lagrange Multiplier test for ARCH effects which rejects the null hypothesis of conditional homoskedasticity in both the in-sample and total sample. Thus, we conclude that there is considerable evidence against an iid hypothesis for the portfolio losses, which supports the choice of pre-whitening the data before GPD modeling in the EVT-based method. 4.1.2 Dynamic Model Selection In this section, we investigate which dynamic model specification is the most appropriate for the portfolio losses. Ideally, we would fit every model likely of providing a good description of the time series process and decide upon the best model using graphical and statistical diagnostic tools in each rolling estimation window of the backtest. However, due the length of our backtest period, we choose the more feasible approach of only evaluating the total sample performance of the models. Specifically, we base our initial model choice on the Akaike and the Bayes information criteria (AIC and BIC) in order to determine the model that provides the best fit to the total sample without overfitting. The majority of the models are estimated using QML, i.e. we fit the models as if the innovations are conditionally normally distributed, even though we are well aware that this may not be true. The HS-GARCH-t method is one exception, where we estimate the model parameters using ML under the explicit assumption of t distributed innovations. Thus, we need to determine two dynamic models; one with t distributed innovations to be used in HS-GARCH-t and one fitted by QML to be used in HS-GARCH, FHS, and HS-CONDEVT. 70
Table 4.1: Descriptive statistics for the portfolio losses In-sample Total sample Start date 6-Feb-95 6-Feb-95 End date 4-Dec-98 4-Jun-10 Observations 1000 4000 Mean -0.0834-0.0598 Median -0.0890-0.0165 Max 6.4344 10.4484 Minimum -3.6635-9.1031 Std.Dev. 1.0473 1.3599 Kurtosis 6.4980 8.7421 Skewness 0.6830 0.2015 Jarque-Bera 580.5504 5.5057 10 3 (0.0000) (0.0000) LB(20) 19.3927 58.5912 (0.4964) (0.0000) LB 2 (20) 602.9520 4.0724 10 3 (0.0000) (0.0000) LM(20) 209.2441 887.2602 (0.0000) (0.0000) In-sample corresponds to the first 1000 observations. LB(20) and LB 2 (20) are the Ljung-Box Q test computed with 20 lags applied to raw portfolio losses and their squared values, respectively. LM is the LM test for ARCH effects computed with 20 lags. Probability values are stated in parentheses below the test statistics. 71
1 (A) 1 (B) Sample Autocorrelation 0.5 0 Sample Autocorrelation 0.5 0 0.5 0 5 10 15 20 Lag 0.5 0 5 10 15 20 Lag 1 (C) 1 (D) Sample Autocorrelation 0.5 0 Sample Autocorrelation 0.5 0 0.5 0 5 10 15 20 Lag 0.5 0 5 10 15 20 Lag Figure 4.2: Correlograms for the in-sample raw portfolio losses (A) and their squared values (B) as well as for the total sample raw portfolio losses (C) and their squared values (D). Figure 4.3 depicts the results from fitting 18 plausible specifications by QML and ML based on t distributed innovations. We observe that models fitted under on the assumption of t distributed innovations generally have a better overall fit; both AIC and BIC prefer the AR(1)-GJR(1,1) specification with t distributed innovations and we therefore use the specification in HS- GARCH-t. Among the models with normal innovations, the AR(1)-GJR(1,1) specification is also favored by the criteria and we choose this specification for modeling portfolio losses in HS-GARCH, FHS, and HS-CONDEVT. To obtain standardized residuals, we fit the AR(1)-GJR(1,1) specification to both the in-sample and the total sample using QML and ML under the assumption of t distributed innovations. We investigate the properties of the standardized residuals with the purpose of evaluating the adequacy of the AR(1)-GJR(1,1) specification. Descriptive statistics for the standardized residuals and parameter estimates from each of the estimations are presented in Table 4.2. We observe that neither conditional heteroskedasticity (ARCH effects) nor serial correlation can be detected in the standardized residuals. The LM test for ARCH effects is insignificant and the Ljung-Box test computed for 20 lagged autocorrelations cannot reject the null hypothesis of no serial correlation in the raw and the squared standardized residuals. The correlograms displayed in Figure 4.4 and Figure 4.5 are equally unable to detect serial correlation. This indicates that the AR(1)-GJR(1,1) specifica- 72
Models AR1 G21t AR1 G12t AR1 G11t G21t G12t G11t AR1 G21n AR1 G12n AR1 G11n G21n G12n G11n Compare GARCH with AIC GARCH EGARCH GJR 1.24 1.25 1.26 1.27 AIC x 10 4 Models AR1 G21t AR1 G12t AR1 G11t G21t G12t G11t AR1 G21n AR1 G12n AR1 G11n G21n G12n G11n Compare GARCH with BIC GARCH EGARCH GJR 1.24 1.25 1.26 1.27 BIC x 10 4 Figure 4.3: Information criteria values for the fitted models. AR1 denotes a first order autoregressive mean specification, t and n denotes the distribution assumption, and GPQ denotes a GARCH-type variance specification with P and Q order numbers. tion adequately pre-whitens the data. Note that normality is clearly rejected due to excess kurtosis and non-zero skewness. Furthermore, we investigate the properties of the underlying distribution with QQ-plots of the standardized residuals obtained from the AR(1)- GJR(1,1) model fitted by QML versus a normal and a t distribution (see Figure 4.6). We see from Figure 4.6 that both the in-sample and total sample QQ plots indicate heavy tails. This is apparent from plot (A) and plot (C) where we observe that the theoretical quantiles from the normal distribution are smaller than the empirical quantiles. The in-sample evidence in plot (A) indicates that the right tail of the underlying distribution is somewhat heavier than the normal distribution, while the total sample evidence in plot (C) shows strong signs of an underlying distribution with heavy tails. The strong linear trend in plot (B) and plot (D) indicates that the t distribution is more capable of capturing the fat tail behavior than the normal distribution. 4.1.3 Relative Performance of the Methods This section presents the results of the backtests of the methods performance with respect to estimating VaR and ES accurately. The risk measures are estimated for day t+1 and day t+10 using a rolling window of 1,000 portfolio losses L t 999,..., L t. Every time the estimation window is rolled forward, the AR(1)-GJR(1,1) model is refitted. For the sake of practical relevance and to increase the amount of empirical evidence in our study, we compute risk measures for both tails of the loss distribution. The left tail corresponds to a short trading position in the portfolio while the right tail corresponds to a long trading position. We begin by discussing the results of the VaR-based backtest, saving the results of the ES-based backtest for last. 73
Table 4.2: Parameter estimates and descriptive statistics for the standardized residuals In-sample Total sample Normal t Normal t ˆφ 0-0.0975-0.1014-0.0703-0.0749 (0.0003) (0.0002) (0.0000) (0.0000) ˆφ 1 0.1201 0.1132 0.0578 0.0539 (0.0002) (0.0005) (0.0002) (0.0005) ˆω 0.0035 0.0050 0.0098 0.0154 (0.2885) (0.2559) (0.0000) (0.0001) ˆα 1 0.9424 0.9376 0.9417 0.9280 (0.0000) (0.0000) (0.0000) (0.0000) ˆβ 1 0.0445 0.0463 0.0677 0.0840 (0.0002) (0.0018) (0.0000) (0.0000) ˆγ 1 0.0263 0.0294-0.0282-0.0403 (0.2441) (0.2739) (0.0000) (0.0015) Degrees of freedom 21.0840 6.8677 (0.1149) (0.0000) Kurtosis 3.2973 3.3210 4.8973 5.0029 Skewness 0.2142 0.2192 0.2200 0.2148 Jarque-Bera 11.1237 12.0782 629.7674 696.6568 (0.0072) (0.0054) (0.0000) (0.0000) LB(20) 12.5492 12.3264 20.8383 20.5837 (0.8959) (0.9044) (0.4067) (0.4220) LB 2 (20) 16.8746 16.1090 14.5651 12.6452 (0.6611) (0.7098) (0.8007) (0.8921) LM(20) 16.7289 16.0500 14.2537 12.7585 (0.6705) (0.7135) (0.8174) (0.8875) In-sample corresponds to the first 1000 observations. LB(20) and LB 2 (20) are the Ljung-Box Q test computed with 20 lags applied to raw portfolio losses and their squared values, respectively. LM is the LM test for ARCH effects computed with 20 lags. Probability values are stated in parentheses below the test statistics. 74
1 (A) 1 (B) Sample Autocorrelation 0.5 0 Sample Autocorrelation 0.5 0 0.5 0 5 10 15 20 Lag 0.5 0 5 10 15 20 Lag 1 (C) 1 (D) Sample Autocorrelation 0.5 0 Sample Autocorrelation 0.5 0 0.5 0 5 10 15 20 Lag 0.5 0 5 10 15 20 Lag Figure 4.4: Correlograms for the in-sample raw standardized residuals (A) and their squared values (B) as well as for the total sample raw standardized residuals (C) and their squared values (D) extracted from the AR(1)-GJR(1,1) model fitted with the QML estimation method. 75
1 (A) 1 (B) Sample Autocorrelation 0.5 0 Sample Autocorrelation 0.5 0 0.5 0 5 10 15 20 Lag 0.5 0 5 10 15 20 Lag 1 (C) 1 (D) Sample Autocorrelation 0.5 0 Sample Autocorrelation 0.5 0 0.5 0 5 10 15 20 Lag 0.5 0 5 10 15 20 Lag Figure 4.5: Correlograms for the in-sample raw standardized residuals (A) and their squared values (B) as well as for the total sample raw standardized residuals (C) and their squared values (D) extracted from the AR(1)-GJR(1,1) model fitted with ML under the assumption of t distributed innovations. 76
Quantiles of Std. Residuals Quantiles of Std. Residuals 8 6 4 2 0 2 4 (A) 6 6 4 2 0 2 4 6 Quantiles of Gaussian Distribution 8 6 4 2 0 2 4 (C) 6 6 4 2 0 2 4 6 Quantiles of Gaussian Distribution Quantiles of Std. Residuals Quantiles of Std. Residuals 8 6 4 2 0 2 4 (B) 6 6 4 2 0 2 4 6 Quantiles of t distribution 8 6 4 2 0 2 4 (D) 6 6 4 2 0 2 4 6 Quantiles of t distribution Figure 4.6: QQ-plots for the in-sample standardized residuals versus a normal distribution (A) and a t distribution (B) as well as for the total sample standardized residuals versus a normal distribution (C) and a t distribution (D). Value at Risk 1-day Value at Risk. Table 4.3 and Table 4.4 document the results of the backtest of the methods based on 1-day VaR estimates for the right and left tail of the loss distribution, respectively. We summarize the results for both tails in the following. The unconditional HS method is the worst performing method by a large margin. For nearly all confidence levels of VaR and in both tails, we reject the null hypotheses of unconditional coverage, independence, and conditional coverage at any reasonable significance level. Unlike the other univariate methods, HS only adapts to changing market uncertainty (changing volatility) very slowly which is the likely cause of the bad performance. The significantly high values of the LR ind test statistics confirm that the slow response to changing volatility results in clusters of VaR violations. The conditional HS-GARCH method is a definite improvement to the unconditional HS method. At the 1% significance level, we fail to reject the null hypotheses of unconditional and conditional coverage in both tails for the 95% VaR estimates. However, for the 99% and 99.5% VaR estimates, we reject the hypotheses of unconditional and conditional coverage in both tails 77
Table 4.3: 1-day VaR-based backtest results: Right tail Model λ π LR uc LR ind LR cc R uc R cc VaR HS 5 6.47 12.4853* 30.3323* 42.8176* 5 5 1.9395 2.5 3.33 7.7507* 36.1377* 43.8884* 5 5 2.6325 1 1.73 13.3682* 18.0258* 31.3940* 4 5 3.5548 0.5 0.77 3.6839 6.3662 10.0501* 4 4 4.5040 HS-GARCH 5 5.30 0.5580 2.4102 2.9682 1 1 2.0613 2.5 3.13 4.5753 6.4555 11.0308* 4 4 2.4678 1 1.73 13.3682* 1.0380 14.4062* 4 4 2.9404 0.5 1.30 26.7234* 0.3849 27.1083* 5 5 3.2622 HS-GARCH-t 5 5.60 2.1924 2.2189 4.4113 2 2 1.9990 2.5 2.97 2.5315 0.0497 2.5812 3 1 2.5191 1 1.27 1.9871 0.9754 2.9625 3 3 3.2271 0.5 0.70 2.1439 0.2962 2.4401 3 3 3.7944 FHS 5 5.87 4.5047 4.1604 8.6651* 4 4 1.9894 2.5 2.93 2.1912 3.5632 5.7544 1 2 2.5584 1 1.07 0.1318 0.8598 0.9916 2 2 3.3986 0.5 0.53 0.0656 0.1716 0.2372 1 1 4.2723 HS-CONDEVT 5 5.67 2.6961 2.9213 5.6174 3 3 1.9980 2.5 2.93 2.1912 3.5632 5.7544 1 2 2.5751 1 1.03 0.0333 0.6476 0.6809 1 1 3.3920 0.5 0.53 0.0656 0.1716 0.2372 1 1 4.0563 *Significance at the 1% level. The critical value (CV) for significance at the 1% level is 9.2103 for LR cc and 6.6349 for LR uc and LR ind. The CV for significance at the 5% level is 5.9915 for LR cc and 3.8415 for LR uc and LR ind. λ = 1 α is the expected fraction of violations in percent. The observed violation ratio in percent is denoted π. LR uc, LR ind and LR cc are the likelihood ratio test statistics for the unconditional coverage, independence, and conditional coverage tests, respectively. R uc and R cc are the ranks based on the size of LR uc and LR cc, respectively. VaR is the average VaR in percent over the backtest period. 78
at the 1% significance level. The coverage failure is probably related to the use of the normal distribution for estimating risk measures. As indicated in the QQ-plots in Figure 4.6, the underlying distribution is most likely a heavy tailed distribution, so the normal distribution is a crude approximation. For the HS-GARCH-t method, we fail to reject the null hypotheses of unconditional coverage, independence, and conditional coverage at the 5% significance level in almost all cases. The one exception is the 99% VaR estimates in the left tail, where we reject the independence hypothesis at the 5% significance level. The otherwise low values of the LR ind test statistics suggest that the method quickly adapts to changing market uncertainty. While the method is generally top ranked in the left tail, it receives a mediocre ranking in the right tail. Taken together, the assumption of t distributed innovations appears to offer an improvement to the HS-GARCH method. The performance of the FHS method seems quite similar to that of HS- GARCH-t. We fail to reject the null hypothesis of independence at the 5% significance level in all cases expect for the 95% VaR estimates in the right tail. While we experience a few more VaR violations in excess of the expected in the left tail, the observed number of violations in the right tail is generally closer to the expected number compared to HS-GARCHt. The one exception is the 95% confidence level in the right tail, where we reject the null hypotheses of unconditional coverage, independence, and conditional coverage at the 5% significance level. Nonetheless, the method still outperforms both HS and HS-GARCH and is a viable alternative to HS-GARCH-t. The EVT-based method HS-CONDEVT is generally top ranked in the right tail and does a less-well but fair job in the left tail. At the 5% significance level, we fail to reject the null hypotheses of unconditional coverage, independence, and conditional coverage in almost all cases. The one exception is the 99% VaR estimates in the left tail, where we reject the null hypotheses of unconditional and conditional coverage at the 5% significance level, but fail to reject the null at the 1% significance level. In conclusion, HS-CONDEVT outperforms HS and HS-GARCH, and the method is a good alternative to HS-GARCH-t and FHS. For illustrative purposes, we have plotted the 99% 1-day VaR estimates against the portfolio losses (see Figure 4.7). It is clear from the staircase pattern in Figure 4.7 that the unconditional HS method leads to clusters of VaR violations. In contrast, the conditional methods are able to respond to changing volatility and consequently give fewer violations and less clustering of VaR violations. 10-day Value at Risk. We continue with the 10-day VaR-based backtest results for the univariate methods which are reported in Table 4.5. We provide a qualitative comparison of methods based on the absolute difference (AD) between the observed fraction and the expected fraction of VaR violations. 79
Table 4.4: 1-day VaR-based backtest results: Left tail Model λ viol LR uc LR ind LR cc R uc R cc VaR HS 5 6.63 15.3532* 16.8997* 32.2529* 5 5-2.0634 2.5 3.73 16.2964* 8.5355* 24.8319* 5 5-2.6094 1 1.93 20.7373* 7.7358* 28.4731* 5 5-3.3457 0.5 1.03 13.0940* 4.1540 17.2480* 5 5-3.9823 HS-GARCH 5 5.20 0.2495 0.1779 0.4274 1 1-2.1823 2.5 2.77 0.8464 0.2084 1.0548 2 2-2.5888 1 1.70 12.2729* 0.0201 12.2930* 4 4-3.0614 0.5 0.97 10.3020* 0.5663 10.8683* 4 4-3.3832 HS-GARCH-t 5 5.53 1.7390 0.3790 2.1180 2 2-2.1352 2.5 2.73 0.6507 0.2461 0.8968 1 1-2.6553 1 1.00 0.0000 4.3867 4.3867 1 1-3.3633 0.5 0.40 0.6476 0.0964 0.7440 1 1-3.9306 FHS 5 5.67 2.6961 0.0152 2.7113 4 4-2.0999 2.5 2.93 2.1912 0.0684 2.2596 3 3-2.5647 1 1.43 5.0172 0.2074 5.2246 2 2-3.1311 0.5 0.77 3.6839 0.3555 4.0394 2 2-3.5209 HS-CONDEVT 5 5.60 2.1924 0.0406 2.2330 3 3-2.1026 2.5 3.07 3.6903 0.0117 3.7020 4 4-2.5508 1 1.47 5.7694 0.1723 5.9417 3 3-3.1276 0.5 0.77 3.6839 0.3555 4.0394 2 2-3.5580 *Significance at the 1% level. The critical value (CV) for significance at the 1% level is 9.2103 for LR cc and 6.6349 for LR uc and LR ind. The CV for significance at the 5% level is 5.9915 for LR cc and 3.8415 for LR uc and LR ind. λ = 1 α is the expected fraction of violations in percent. The observed violation ratio in percent is denoted π. LR uc, LR ind and LR cc are the likelihood ratio test statistics for the unconditional coverage, independence, and conditional coverage tests, respectively. R uc and R cc are the ranks based on the size of LR uc and LR cc, respectively. VaR is the average VaR in percent over the backtest period. 80
15 10 5 0 5 10 15 Portfolio losses HS HS GARCH HS GARCH t FHS HS CONDEVT 2000 2002 2004 2006 2008 2010 Figure 4.7: 1-day VaR α=99% estimates plotted against the portfolio losses. Table 4.5: 10-day VaR-based backtest results Left Tail Right tail Method λ π AD R VaR π AD R VaR HS 5 5.7840 0.7840 5-7.0815 6.6199 1.6199 5 6.2338 2.5 3.5774 1.0774 5-8.3981 4.1123 1.6123 5 7.7138 1 2.1398 1.1398 5-9.9612 2.2735 1.2735 5 9.5189 0.5 1.5379 1.0379 5-11.0508 1.5045 1.0045 5 10.8027 HS-GARCH 5 4.3798 0.6202 3-7.5762 5.5165 0.5165 3 6.3399 2.5 2.1732 0.3268 3-8.9963 2.9087 0.4087 3 7.9201 1 1.0030 0.0030 1-10.6723 1.4376 0.4376 4 10.1257 0.5 0.4681 0.0319 2-11.9656 0.8358 0.3358 4 11.4975 HSGARCH-t 5 4.3129 0.6871 4-7.5213 5.7172 0.7172 4 6.2520 2.5 2.0060 0.4940 4-9.0186 2.9087 0.4087 3 8.0000 1 0.7690 0.2310 4-10.9479 1.3373 0.3373 1 10.3684 0.5 0.3678 0.1322 4-12.4296 0.6018 0.1018 2 12.3288 FHS 5 4.8813 0.1187 1-7.3234 5.0485 0.0485 1 6.5514 2.5 2.4407 0.0593 2-8.7307 2.6078 0.1078 1 8.2923 1 1.1033 0.1033 2-10.4663 1.3373 0.3373 1 10.5986 0.5 0.4346 0.0654 3-11.7230 0.6687 0.1687 3 12.3955 HS-CONDEVT 5 4.8144 0.1856 2-7.3928 5.4162 0.4162 2 6.4126 2.5 2.4741 0.0259 1-8.7591 2.6413 0.1413 2 8.2625 1 1.1033 0.1033 2-10.4244 1.3373 0.3373 1 10.5870 0.5 0.5015 0.0015 1-11.8762 0.5684 0.0684 1 12.2070 The table reports 10-day VaR-based backtest results. λ = 1 α is the expected fraction of violations in percent. The observed violation ratio in percent is denoted π. R is the rank based on the size of the absolute difference (AD) between λ and π. VaR is the average VaR in percent over the backtest period. 81
The unconditional HS method achieves the highest AD between the observed and the expected fraction of violations and therefore the lowest rank in all cases. The excessively high observed violation ratios imply that the method severely underestimates the 10-day VaR. HS-GARCH also seems to underestimate VaR in the right tail at all confidence levels. In the left tail, the method appears to overestimate VaR at the two lower confidence levels but obtains a low AD at the 99% and 99.5% levels. As a result, the method is only ranked third or fourth in all cases. In the right tail, HS-GARCH-t achieves the lowest AD at the 99% level and it is ranked as second best at the 99.5% level. However, it is only ranked fourth best in the left tail where it seems to overestimate VaR at all confidence levels. The inherent distributional symmetry in the HS-GARCH and HS-GARCH-t methods may provide a partial explanation for the pattern of over- and underestimation observed in the results of the two methods. The FHS method appears to perform very well in both tails at the 95%, 97.5%, and 99% confidence levels. The observed violation ratios are generally close to the expected ratios. In consequence, FHS seems to outperform HS, HS-GARCH, and HS-GARCH-t overall. Finally, the observed violation ratios of HS-CONDEVT are close to the expected fraction of violations and the method is ranked second to first in all cases. Thus, based on a qualitative assessment, HS-CONDEVT appears to be the best suited method for 10-day VaR estimation. For illustrative purposes, we plot the 10-day VaR α=99% estimates against the portfolio losses in Figure 4.8. As before, we observe that the unconditional HS method responds very slowly to changes in volatility which results in clustering of VaR violations and an excessive number of violations. Conversely, we see that the conditional methods adapt to changing volatility and consequently give fewer violations and less violation clustering. Expected Shortfall So far, we have only considered the frequency of VaR violations. In the following we consider the size of the losses occurring when VaR is violated. Specifically, we investigate the discrepancy between the ES estimate and portfolio loss on days where VaR is exceeded. For 1-day ES estimates, we test whether the average discrepancy is statistically significant, while the discrepancy will only be evaluated qualitatively in the case of 10-day ES estimates. 1-day Expected Shortfall. Table 4.6 documents the results of the 1-day ESbased backtest. The average discrepancies for the unconditional HS method are the highest experienced among the methods at nearly all confidence levels. At the 1% significance level, we reject the null hypothesis of zero 82
40 30 20 10 0 10 20 30 40 Portfolio losses HS HS GARCH HS GARCH t FHS HS CONDEVT 2000 2002 2004 2006 2008 2010 Figure 4.8: 10-day VaR α=99% estimates plotted against the 10-day portfolio losses. Table 4.6: 1-day ES-based backtest results λ = 5 λ = 2.5 λ = 1 λ = 0.5 HS 0.1421(0.0265) 0.2358(0.0141) 1.0324(0.0000) 1.3535(0.0000) HS-GARCH 0.3142(0.0000) 0.4438(0.0000) 0.5996(0.0000) 0.6200(0.0000) HS-GARCH-t 0.0771(0.0535) 0.0987(0.1205) 0.2309(0.0602) 0.3133(0.0477) FHS -0.0300(0.5790) -0.0107(0.4818) 0.2066(0.1393) 0.3849(0.0589) HS-CONDEVT -0.0054(0.4258) -0.0036(0.4664) 0.2804(0.0534) 0.4329(0.0193) HS 0.2190(0.0008) 0.2719(0.0018) 0.2412(0.0245) 0.3748(0.0239) HS-GARCH 0.1607(0.0004) 0.2968(0.0000) 0.2201(0.0122) 0.3001(0.0226) HS-GARCH-t -0.0442(0.7436) -0.0434(0.7585) -0.1334(0.7316) -0.0394(0.4696) FHS 0.0757(0.0514) 0.1431(0.0351) 0.0483(0.2712) 0.0929(0.2660) HS-CONDEVT 0.0811(0.0379) 0.1042(0.0849) 0.0295(0.3720) 0.1685(0.1229) The table contains the average discrepancy between the portfolio loss and the ES estimate on VaR violation days measured in percentage points. Probability values (in parentheses) are from a one-sided bootstrap test of the null hypothesis of zero mean exceedance residuals. Right tail Left tail 83
mean exceedance residuals for the 99% and 99.5% ES estimates in the right tail and for the 95% and 97.5% ES estimates in the left tail. Moreover, at the 5% significance level, we reject the null hypothesis in all cases, providing statistical evidence that the method significantly underestimates ES. Based on these results, we find the HS method inappropriate for measuring 1-day ES. HS-GARCH has lower average discrepancies than the unconditional HS method at most confidence levels. However, at the 5% significance level, we reject the null hypothesis of zero mean exceedance residuals in all cases, providing statistical evidence that the method significantly underestimates ES. At the 1% significance level, we also reject the null in all cases except for the 99% and 99.5% ES estimates in the left tail. Consequently, we find the HS-GARCH method unfit for measuring 1-day ES. These observations are in line with the findings of McNeil and Frey [2000]. Compared to HS and HS-GARCH, HS-GARCH-t has the best performance. The average discrepancies are generally quite low in both tails. At the 1% significance level, we fail to reject the null hypothesis of zero mean exceedance residuals in all cases and at the 5% level we only find statistical evidence that the method underestimates ES at the 99.5% confidence level in the right tail. Based on the sizes of the average discrepancies, FHS generally has the best performance in close competition with HS-GARCH-t and HS-CONDEV- T. We fail to reject the null in all cases at the 1% significance level. At the 5% significance level, we only find statistical evidence that the method underestimates ES at the 97.5% confidence level in the right tail. Similar to HS-GARCH-t and FHS, HS-CONDEVT performs quite well in general. It achieves average discrepancies closest to zero at the lower confidence levels (95% and 97.5%) and in the left tail at the 99% confidence level. At the 1% significance level, we fail to reject the null hypothesis in all cases. However, at the 5% significance level, we reject the null for the 99.5% ES estimates in the right tail and for the 95% ES estimates in the left tail, providing statistical evidence that the method underestimates ES in these two cases. In consequence, we conclude that HS-CONDEVT outperforms both HS and HS-GARCH and the method proves to be a good alternative to FHS and HS-GARCH-t when it comes to estimation of 1-day ES. 10-day Expected Shortfall. Table 4.7 documents the results of the 10-day ES-based backtest. We provide a qualitative comparison of the methods based on the average discrepancies and mean squared errors (MSE). HS achieves the highest average discrepancies and MSE values among the methods at all confidence levels in both tails. So, based on a qualitative assessment, it appears to be unfit for 10-day ES estimation in comparison with the other methods. For HS-GARCH, we experience the second highest positive discrepancies 84
Table 4.7: 10-day ES-based backtest results λ = 5 λ = 2.5 λ = 1 λ = 0.5 HS 1.8988(30.0444) 2.5260(38.7520) 3.7600(53.9178) 5.1364(67.7022) HS-GARCH 0.5280(12.6024) 0.8602(15.6513) 1.3681(17.7006) 2.0221(21.5346) HS-GARCH-t 0.1694(12.1577) 0.2959(14.5984) 0.4097(15.2149) 1.3526(17.8373) FHS 0.2505(13.1743) 0.5754(14.7532) 0.4144(15.3427) 0.8875(17.4529) HS-CONDEVT 0.1214(12.7039) 0.4857(14.7880) 0.4396(14.9392) 1.1536(18.0018) HS 1.1824(9.6302) 1.3421(8.9430) 1.2066(7.1672) 0.9957(5.2222) HS-GARCH 0.1976(5.7663) 0.4287(6.0284) 0.4160(5.5731) 0.6663(5.3719) HS-GARCH-t 0.1358(5.9056) 0.3924(5.5650) 0.4590(5.3149) 0.5398(3.0908) FHS 0.2528(5.7401) 0.4550(5.8364) 0.4550(5.3325) 0.6739(4.4807) HS-CONDEVT 0.2050(5.7893) 0.3532(5.6954) 0.2102(5.0926) 0.5640(3.9021) The table contains the average discrepancy between the portfolio loss and the ES estimate on VaR violation days measured in percentage points. Mean squared errors are reported in parentheses. Right tail Left tail and MSE values at the confidence levels 97.5% through 99.5% in both tails. Moreover, it has the largest MSE in the left tail at the 99.5% level. Thus, from a qualitative assessment, it appears that the method underestimates ES, especially at high confidence levels. HS-GARCH-t seems to perform reasonably well overall. It has the average discrepancy closest to zero at the 97.5% and 99% confidence levels in the right tail and at the 95% and 99.5% level in the left tail. Moreover, the method achieves the lowest MSE at the 95% and 97.5% confidence levels in the right tail and at the 97.5% and 99.5% levels in the left tail. Based on MSE, the performance of FHS seems mediocre; its MSE values and average discrepancies are generally higher than those obtained by HS- GARCH-t and HS-CONDEVT. However, the method has the lowest MSE and average discrepancy at the 99.5% level in the right tail. Finally, HS-CONDEVT attains the lowest average discrepancy for the 95% ES estimates in the right tail and for the 97.5% ES estimates in the left tail. Moreover, HS-CONDEVT is superior in comparison with the other univariate methods at the 99% level in both tails in terms of the MSE values. Thus, from a qualitative assessment, HS-CONDEVT proves to be a good alternative to FHS and HS-GARCH-t. Sensitivity Analysis In Section 3.2.1 we described that HS-CONDEVT is implemented by fitting a GPD to N u = 100 excess standardized residuals. In the following we investigate the sensitivity of the performance of HS-CONDEVT to the size of N u. We focus on the performance with respect to estimating 1-day VaR and ES. This analysis, inspired by Marimoutou et al. [2009], involves repeated evaluation of HS-CONDEVT over a range of different values for N u that has been found to be appropriate in simulation studies performed by McNeil and Frey [2000] and Kuester et al. [2006]. Specifically, we start from N u = 140 and decrease the value by 10 until we reach N u = 70. Table 4.8 and Table 4.9 report the results of the sensitivity 85
Table 4.8: Sensitivity analysis: VaR-based backtest results Left tail Right tail N u λ = 5 λ = 2.5 λ = 1 λ = 0.5 λ = 5 λ = 2.5 λ = 1 λ = 0.5 140 5.63(3) 2.83(2) 1.37(1) 0.73(2) 5.73(3) 2.97(3) 1.03(1) 0.53(1) 130 5.63(3) 2.87(2) 1.43(3) 0.73(2) 5.77(3) 2.93(2) 1.03(1) 0.53(1) 120 5.73(4) 3.00(4) 1.47(3) 0.73(2) 5.77(3) 2.90(2) 1.07(1) 0.53(1) 110 5.60(3) 3.03(4) 1.47(3) 0.80(3) 5.73(3) 2.93(2) 1.07(1) 0.53(1) 100 5.60(3) 3.07(4) 1.47(3) 0.77(2) 5.67(3) 2.93(2) 1.03(1) 0.53(1) 90 5.60(3) 3.00(4) 1.47(3) 0.73(2) 5.77(3) 2.93(2) 1.03(1) 0.53(1) 80 5.60(3) 3.07(4) 1.50(3) 0.80(3) 5.70(3) 2.90(2) 1.00(1) 0.53(1) 70 5.70(3) 3.03(4) 1.50(3) 0.77(2) 5.70(3) 2.90(2) 1.10(2) 0.53(1) The table reports violation ratios for HS-CONDEVT obtained by using different numbers of excess losses N u in the implementation of the method. λ = 1 α is the expected probability in percent. The numbers in parentheses show the rank of HS-CONDEVT based on the likelihood ratio test statistic for conditional coverage. analyses of the VaR-based and ES-based backtest, respectively. Both the left and right tail findings show robustness of the 1-day VaRbased backtest results. The ranking of HS-CONDEVT relative to the competing univariate methods in the right tail persists in all cases, except at the 99% confidence level for N u = 70 where the rank of HS-CONDEVT shifts from one to two and at the 97.5% level for N u = 140 where the rank shifts from two to three. However, due to small differences in the values of the LR cc test statistics in the 1-day VaR backtest in the left tail, we observe a few more rank shifts compared to the right tail. If the choice of N u had been 140 instead of 100, the ranking of HS-CONDEVT would have improved for the 97.5% and 99% confidence level in the left tail. Nonetheless, we find that the small changes in the ranking observed in the sensitivity analysis are negligible with respect to the overall conclusions made previously. The findings in the sensitivity analysis of the ES-based backtest results (presented in Table 4.9) are similarly robust. The average discrepancies and p-values change very little in all cases, except at the 99% confidence level in the left tail for N u = 140 where we observe an increase in the average discrepancy by about 0.07 percentage points compared to the case of N u = 100, but not enough to reject the null hypothesis of zero mean residuals at any reasonable significance level. Consequently, we find no reason to alter any of the conclusions made in the previous sections. 4.2 Multivariate Risk Measurement Methods This section presents the empirical results for the multivariate methods: VC, VC-EWMA, CCC-GARCH, DCC-GARCH, and MCONDEVT. The latter method is implemented in three different versions; one with a Gaussian copula denoted by MCONDEVT-Ga, one with a t copula denoted by 86
Table 4.9: Sensitivity analysis: ES-based backtest result Right tail Left tail N u λ = 5 λ = 2.5 λ = 1 λ = 0.5 140-0.0148(0.4704) -0.0129(0.5077) 0.2932(0.0569) 0.4724(0.0127) 130-0.0202(0.4908) -0.0030(0.4637) 0.2830(0.0607) 0.4264(0.0154) 120-0.0200(0.4891) 0.0032(0.4305) 0.2395(0.0917) 0.4255(0.0158) 110-0.0163(0.4769) -0.0118(0.4745) 0.2291(0.0880) 0.4015(0.0222) 100-0.0054(0.4258) -0.0036(0.4664) 0.2804(0.0534) 0.4329(0.0193) 90-0.0182(0.4929) -0.0066(0.4773) 0.2776(0.0582) 0.4363(0.0256) 80-0.0104(0.4540) 0.0063(0.4303) 0.3261(0.0359) 0.4673(0.0157) 70-0.0130(0.4571) 0.0109(0.4299) 0.2247(0.1003) 0.4629(0.0169) 140 0.0663(0.0684) 0.1691(0.0131) 0.1035(0.1431) 0.1477(0.1535) 130 0.0705(0.0607) 0.1540(0.0217) 0.0676(0.2528) 0.1368(0.1569) 120 0.0650(0.0821) 0.1169(0.0620) 0.0403(0.3442) 0.1111(0.1728) 110 0.0797(0.0396) 0.1107(0.0720) 0.0328(0.3635) 0.1412(0.1526) 100 0.0811(0.0379) 0.1042(0.0849) 0.0295(0.3720) 0.1685(0.1229) 90 0.0832(0.0347) 0.1218(0.0555) 0.0286(0.3854) 0.1728(0.1279) 80 0.0869(0.0314) 0.1079(0.0761) 0.0163(0.4173) 0.1366(0.1717) 70 0.0708(0.0647) 0.1123(0.0680) 0.0265(0.3960) 0.1850(0.1083) The table contains the average discrepancy between the portfolio loss and the ES estimate on VaR violation days measured in percentage points. VaR and ES estimates where obtained using different numbers of excess losses N u in the implementation of HS-CONDEVT. λ = 1 α is the expected probability in percent. Probability values (in parentheses) are from a one-sided bootstrap test of the null hypothesis of zero mean exceedance residuals. MCONDEVT-t, and one with a Gumbel copula denoted by MCONDEVT- Gu. Section 4.2.1 presents the results of the preliminary data analysis of the three univariate risk-factor return series. In Section 4.2.2, we select suitable dynamic models for the return series based on AIC and BIC and evaluate the adequacy of the selected models via statistical and graphical diagnostics. The empirical results of the backtest and the comparative performance analysis of the multivariate methods are finally presented in Section 4.2.3. 4.2.1 Preliminary Data Analysis and Descriptive Statistics Figure 4.9 illustrates the time series of risk-factor returns for the total sample period. We observe that the assumption of stationarity seems reasonable for all three risk-factor return series. Also, the series exhibit signs of volatility clustering, i.e. there is a tendency for extreme returns to be followed by other extreme returns of either sign. Table 4.10 contains descriptive statistics for the three return series. We observe that the in-sample and total sample have similar non-normal characteristics, such as negative skewness and excess kurtosis. Normality is formally rejected in all cases by the Jarque-Bera test. Correlograms (not depicted) and Ljung-Box tests computed for 20 lagged autocorrelations reject the null hypothesis of no serial correlation in all three series of squared returns, which implies temporal dependence in the time series of risk-factor returns. More- 87
20 10 0 10 20 (A) 1997 2000 2002 2005 2007 2010 20 10 0 10 20 (B) 1997 2000 2002 2005 2007 2010 20 10 0 10 20 (C) 1997 2000 2002 2005 2007 2010 Figure 4.9: Time series of risk-factor returns. These are log-returns on (A) NOVO B, (B) CARLS B, and (C) DANSKE Table 4.10: Descriptive statistics for the risk factor return series NOVO B CARLS B DANSKE In-sample TS In-sample TS In-sample TS Start date 6-feb-95 6-feb-95 6-feb-95 6-feb-95 6-feb-95 6-feb-95 End date 4-dec-98 4-jun-10 4-dec-98 4-jun-10 4-dec-98 4-jun-10 Obs. 1000 4000 1000 4000 4000 4000 Mean 0.0995 0.0780 0.0298 0.0265 0.1058 0.0470 Maximum 6.5099 16.6833 6.2849 14.5466 6.6490 13.9763 Minimum -13.2896-24.0225-6.7444-16.7053-9.9090-17.1851 Std.Dev. 1.4847 1.8558 1.3324 1.9960 1.5296 1.9187 Kurtosis 11.9689 16.5145 5.9485 12.5100 8.3973 10.0883 Skewness -0.7053-0.6033-0.1311-0.2823-0.4789-0.1068 JB 3.397 10 3 3.060 10 4 360.0230 1.508 10 4 1.237 10 3 8.357 10 3 (0.0000) (0.0000) (0.0000 (0.0000 (0.0000) (0.0000) LB(20) 51.9413 44.6615 20.0582 59.0284 31.1055 79.2597 (0.0001) (0.0012) (0.4543) (0.0000) (0.0538) (0.0000) LB 2 (20) 200.9346 180.8870 346.7079 2.591 10 3 414.928 3.278 10 3 (0.0000) (0.0000) (0.0000) (0.0000) (0.0000) (0.0000) LM(20) 98.5455 123.7065 111.3371 695.2555 200.4458 835.9011 (0.0000) (0.0000) (0.0000) (0.0000) (0.0000) (0.0000) TS is short for total sample. In-sample corresponds to the first 1000 observations. JB is the Jarque-Bera test for normality LB(20) and LB 2 (20) are the Ljung-Box Q test computed with 20 lags applied to raw returns and squared returns, respectively. LM is the LM test for ARCH effects computed with 20 lags. Probability values are stated in parentheses below the test statistics. 88
over, the Ljung-Box tests computed for 20 lagged autocorrelations and the correlograms (not depicted) indicate that there is serial correlation in the raw return series in the total sample. We conclude from the findings that there is considerable evidence against an iid hypothesis for the risk-factor return series, which supports the choice of pre-whitening the data before GPD modeling in the EVT-based methods. 4.2.2 Dynamic Model Selection To determine which dynamic model specifications are most appropriate for the risk-factor return series, we follow the same procedure as in Section 4.1.2. Recall, we consider the GARCH, GJR, and EGARCH models as candidates and evaluate which type of model provides the best fit to the total sample based on the Akaike and Bayes information criteria. Table B.1 in Appendix B provides an overview of AIC and BIC values for 18 plausible dynamic model specifications fitted by QML. Both information criteria tend to prefer the EGARCH model specifications. For NOVO B and DANSKE, an EGARCH(2,1) appears to provide the best fit, while an AR(1)-EGARCH(1,2) does so for CARLS B. However, diagnostic analyses of the standardized residuals reveal that these do not behave like realizations of a strict white noise process as they show significant serial correlation. Analyzing the standardized residuals of the alternative models shows that the AR(1)-GARCH(1,1) does a good job in pre-whitening the standardized residuals. Also, the AR(1)-GARCH(1,1) is more parsimonious than the alternative models which is an important issue when dealing with multivariate risk factors. When we increase the number of risk factors, the number of parameters to be estimated increases significantly, hence we would want to use the most parsimonious models possible. Based on these issues, we choose to fit an AR(1)-GARCH(1,1) to all three risk-factor return series. Table 4.11 reports parameter estimates for the AR(1)-GARCH(1,1) models fitted to the return series by QML as well as descriptive statistics for the resulting standardized residuals. The descriptive statistics reveal that the standardized residuals are skewed and have excess kurtosis. Not surprisingly, normality is clearly rejected by the Jarque-Bera test. The LM test for ARCH effects fails to reject the null hypothesis of conditional homoskedasticity. Moreover, both the correlograms (not depicted) and Ljung-Box tests applied to raw and squared standardized residuals fail to reject the null hypothesis of no serial correlation implying that the AR(1)-GARCH(1,1) models adequately pre-whiten the risk-factor return series. 4.2.3 Relative Performance of the Methods This section presents the results of the backtest of the multivariate methods based on VaR and ES. We use a rolling window of 1,000 days of historical 89
Table 4.11: Parameter estimates and descriptive statistics for the standardized residuals NOVO B CARLS B DANSKE In-sample TS In-sample TS In-sample TS ˆφ 0 0.1141 0.1012 0.0584 0.0565 0.1051 0.0772 (0.0022) (0.0001) (0.1097) (0.0136) (0.0053) (0.0001) ˆφ 1 0.1773 0.0361 0.0761 0.0407 0.1000 0.0262 (0.0000) (0.0167 (0.0113) (0.0075) (0.0006) (0.0001) ˆω 0.0303 0.0058 0.0064) 0.0162 0.0085 0.0366 (0.0000) (0.0000) (0.2087) (0.0000) (0.1406) (0.0000) ˆα 1 0.8794 0.9807 0.9654 0.9588 0.9459 0.8729 (0.0000) (0.0000) (0.0000) (0.0000) (0.0000) (0.0000) ˆβ 1 0.1124 0.0184 0.0332 0.0383 0.0530 0.1247 (0.0000) (0.0000) (0.0000) (0.0000) (0.0000) (0.0000) Kurtosis 5.3508 18.1506 4.5569 9.4568 4.4944 4.8235 Skewness 0.1849-0.7889 0.1041-0.2201 0.0204-0.0473 Jarque-Bera 232.4830 3.857 10 4 101.0236 6.960 10 3 91.4523 553.3901 (0.0000) (0.0000) (0.0000) (0.0000) (0.0000) (0.0000) LB(20) 16.6660 27.5158 17.9287 22.0353 11.6819 25.6581 (0.6745) (0.1214 (0.5921) (0.3386) (0.9266) (0.1774) LB 2 (20) 20.1165 3.8548 9.6204 8.1173 11.5638 24.0515 (0.4507) (1.0000) (0.9745) (0.9911) (0.9303) (0.2401) LM 19.6877 3.7974 9.7168 7.9166 13.0924 23.8934 (0.4776) (0.4776) (0.9730) (0.9730) (0.8734) (0.8734) TS is short for total sample. In-sample corresponds to the first 1000 observations. LB(20) and LB 2 (20) are the Ljung-Box Q test computed with 20 lags applied to raw standardized residuals and their squared values, respectively. LM is the LM test for ARCH effects computed with 20 lags. Probability values are stated in parentheses below the test statistics. 90
Table 4.12: 1-day VaR-based backtest results: Right tail Model λ π LR uc LR ind LR cc R uc R cc VaR VC 5 5.73 3.2498 37.5661* 40.8159* 7 7 2.0716 2.5 3.97 22.5344* 31.5493* 54.0837* 7 7 2.4767 1 2.83 68.0720* 25.5259* 93.5979* 7 7 2.9476 0.5 2.23 97.4613* 23.3134* 120.7747* 7 7 3.2684 VC-EWMA 5 5.00 0 10.1446* 10.1446* 1 6 2.1130 2.5 2.90 1.8744 13.9676* 15.8420* 6 6 2.5260 1 1.87 18.1336* 5.1509 23.2845* 6 6 3.0062 0.5 1.33 28.6763* 0.3347 29.0110* 6 6 3.3332 CCC-GARCH 5 4.63 0.8695 5.8134 6.6829 6 4 2.1798 2.5 2.70 0.4800 2.8471 3.3271 4 4 2.6268 1 1.63 10.2029* 3.7327 13.9356* 4 4 3.0991 0.5 1.27 24.8224* 2.7844 27.6068* 5 5 3.3425 DCC-GARCH 5 5.13 0.1113 5.7399 5.8512 3 2 2.1066 2.5 3.03 3.2813 3.0798 6.3611 5 5 2.5351 1 1.73 13.3682* 3.1925 16.5607* 5 5 3.0275 0.5 1.17 19.4452* 0.6281 20.0733* 4 4 3.4104 MCONDEVT-Ga 5 5.00 0 6.6133 6.6133 1 3 2.1188 2.5 2.60 0.1215 1.5932 1.7147 3 1 2.6609 1 1.23 1.5358 2.9554 4.4912 3 3 3.3584 0.5 0.70 2.1439 0.2962 2.4401 3 3 3.9248 MCONDEVT-t 5 5.23 0.3389 5.1307 5.4696 4 1 2.1047 2.5 2.53 0.0136 1.8176 1.8312 1 2 2.6685 1 1.13 0.5165 3.5153 4.0318 2 2 3.4435 0.5 0.67 1.5157 0.2685 1.7842 2 2 4.0916 MCONDEVT-Gu 5 5.33 0.6875 7.4856* 8.1731 5 5 2.0758 2.5 2.47 0.0137 2.0594 2.0731 2 3 2.6927 1 1.03 0.0333 0.9475 0.9808 1 1 3.6117 0.5 0.47 0.0685 0.1313 0.1998 1 1 4.3706 *Significance at the 1% level. The critical value (CV) for significance at the 1% level is 9.2103 for LR cc and 6.6349 for LR uc and LR ind. The CV for significance at the 5% level is 5.9915 for LR cc and 3.8415 for LR uc and LR ind. λ = 1 α is the expected fraction of violations in percent. The observed violation ratio in percent is denoted π. LR uc, LR ind and LR cc are the likelihood ratio test statistics for the unconditional coverage, independence, and conditional coverage tests, respectively. R uc and R cc are the ranks based on the size of LR uc and LR cc, respectively. VaR is the average VaR in percent over the backtest period. risk factor returns X t 999,..., X t to estimate VaR and ES for day t + 1 and day t + 10 for both tails of the loss distribution. The AR(1)-GARCH(1,1) models are refitted each time the window is rolled forward. Value at Risk 1-day Value at Risk. Table 4.12 and Table 4.13 documents the results of the VaR-based backtest for the right and left tail of the loss distribution, respectively. We summarize the results for both positions in the following. The unconditional VC method is ranked at the bottom at all confidence levels in both tails. At the 5% significance level, we reject the null hypotheses of unconditional coverage, independence, and conditional coverage in almost all cases, and at the 1% significance level we only fail to reject the null hy- 91
pothesis of unconditional coverage at the 95% confidence level in both tails. Thus, the overall picture is that the method systematically underestimates VaR. The poor performance may partly be ascribed to the erroneous normality assumption underlying the VC method, making it unable to account for the heavy tails of the return series. Based on the results, the conditional VC-EWMA method appears to be superior to the unconditional VC method, but it is only ranked second last at confidence levels above 95%. At the 5% significance level, we reject the null hypotheses of unconditional and conditional coverage for the 99% and 99.5% VaR estimates in both tails, providing statistical evidence that the method significantly underestimates VaR at high confidence levels. The coverage failure is most likely related to the conditional normality assumption. At the 1% significance level, we reject the null hypotheses of independence and conditional coverage for the 95% and 97.5% right tail VaR estimates due to significant clustering of VaR violations, indicating that the simple EWMA model for of the covariance matrix is not able to fully account for changes in market uncertainty (volatility). The CCC-GARCH method shows mixed performance in terms of coverage. The observed violation ratios tend to be larger than the expected ratios in both tails, indicating that the method underestimates VaR. At the 5% significance level, we reject the null hypotheses of unconditional and conditional coverage for the 99.5% left tail VaR estimates. Similarly, at the 1% significance level, we reject the null hypotheses of unconditional and conditional coverage for the 99% and 99.5% right tail VaR estimates. Put together, the empirical evidence implies that the method significantly underestimates VaR at the higher confidence levels. The coverage failure may be explained in part by the conditional normality assumption. Our failure to reject the null hypothesis of independence at the 5% significance level in all cases, except for the 95% right tail VaR estimates, indicates that the multivariate GARCH specification with constant conditional correlation structure is able to account for changing volatility. Based on the results, DCC-GARCH performs better in comparison to VC, VC-EWMA, and CCC-GARCH. However, at the 1% significance level, we reject the null hypotheses of unconditional coverage and conditional coverage for the 99% and 99.5% right tail VaR estimates, providing statistical evidence that the method underestimates VaR at these high confidence levels in the right tail. In the left tail, we fail to reject the null hypotheses of unconditional and conditional coverage at all confidence levels using a 1% significance level, but we would reject unconditional coverage at the 99.5% confidence level using a 5% significance level. Again, we suspect that the coverage failure can be tributed to the conditional normality assumption. Finally, similar to CCC-GARCH, we generally fail to find statistical evidence of dependence between VaR violations, suggesting that the method appropriately adapts to changing market uncertainty. 92
Let us turn to the results of the EVT-based methods. MCONDEVT-Ga generally produces violation ratios close to the expected at all confidence levels and the method is ranked first to third best in both tails. At the 5% significance level, we generally fail to reject the null hypotheses of unconditional coverage, independence, and conditional coverage in both tails with the exception of the 95% confidence level in the right tail, where we reject conditional coverage due to significant dependence between VaR violations. MCONDEVT-t is similar in performance to MCONDEVT-Ga, achieving VaR violation ratios close to the expected. In the right tail, the method receives first and second ranks at the 97.5%, 99%, and 99.5% confidence levels. Moreover, we generally fail to reject the null hypotheses of unconditional coverage, independence, and conditional coverage in all cases at the 5% significance level. Ultimately, we have the MCONDEVT-Gu method which outperforms the competing multivariate methods at the 99% confidence level in the right tail and at the 99.5% level in both tails in terms of smallest difference between the observed and expected violation ratio. However, at the lower confidence levels, the performance is mediocre and the method appears to have a tendency to underestimate VaR. We generally fail to reject the null hypotheses of unconditional coverage, independence, and conditional coverage in both tails. However, at the 5% significance level, we reject conditional coverage for the 95% right tail VaR estimates due to significant dependence between VaR violations. Based on the above backtest results, the MCONDEVT methods are favored for estimating 1-day VaR, especially at the 97.5%, 99% and 99.5% confidence levels. Over a wide front, they achieve violation ratios close to the expected. While MCONDEVT-t and MCONDEVT-Gu are the preferred choices in the right tail, MCONDEVT-Ga shows the best performance in the left tail. For the sake of illustration we plot the 99% level 1-day VaR estimates against the portfolio losses (see Figure 4.10). Observe that the unconditional VC method is almost unresponsive to changing volatility which leads to clustering of VaR violations. This fits well with the general belief that unconditional methods are less suitable for risk measurement over short horizons, such as day-to-day market risk management. 10-day Value at Risk. To evaluate the methods performance with respect to estimating 10-day VaR, we follow the same procedure as in Section 4.1.3. As previously mentioned, the dependence introduced in the analysis from using overlapping returns prevents us from using the earlier implemented statistical tests. Instead, we simply provide a qualitative comparison of the methods based on the absolute differences (AD) between the observed violation ratios and the expected ratios. Table 4.14 reports the 10-day VaR-based backtest results for the multivariate methods. 93
Table 4.13: 1-day VaR-based backtest results: Left tail Model λ π LR uc LR ind LR cc R uc R cc VaR VC 5 5.93 5.2048 12.8687* 18.0735* 7 7-2.1574 2.5 3.63 13.8982* 5.1938 19.0920* 7 7-2.5624 1 2.23 34.1312* 11.5518* 45.6830* 7 7-3.0334 0.5 1.83 63.4596* 12.3762* 75.8358* 7 7-3.3541 VC-EWMA 5 5.17 0.1736 0.1311 0.3047 1 1-2.1988 2.5 3.00 2.8949 0.2114 3.1063 6 6-2.6118 1 1.70 12.2729* 1.7647 14.0376* 6 6-3.0920 0.5 1.00 11.6643* 0.6063 12.2706* 6 6-3.4190 CCC-GARCH 5 4.40 2.3654 0.0067 2.3721 5 5-2.2739 2.5 2.47 0.0137 0.4668 0.4805 1 1-2.7030 1 1.33 3.0483 0.3347 3.3830 5 5-3.1654 0.5 0.90 7.7888* 0.4906 8.2794 5 5-3.4868 DCC-GARCH 5 4.33 2.9338 0.5723 3.5061 6 6-2.2760 2.5 2.47 0.0137 0.4668 0.4805 1 1-2.7170 1 1.27 1.9871 0.9754 2.9625 3 4-3.2161 0.5 0.80 4.5873 0.3872 4.9745 4 4-3.5566 MCONDEVT-Ga 5 4.60 1.0371 0.0216 1.0587 3 3-2.2245 2.5 2.47 0.0137 0.4668 0.4805 1 1-2.7491 1 1.03 0.0333 0.6476 0.6809 1 2-3.4210 0.5 0.40 0.6476 0.0964 0.7440 2 1-3.9281 MCONDEVT-t 5 4.67 0.7170 0.0495 0.7665 2 2-2.2119 2.5 2.37 0.2227 0.3366 0.5593 4 4-2.7764 1 0.97 0.0340 0.5663 0.6003 2 1-3.5160 0.5 0.37 1.1819 0.0810 1.2629 3 3-4.1001 MCONDEVT-Gu 5 5.53 1.7390 0.0783 1.8173 4 4-2.1177 2.5 2.93 2.1912 0.0684 2.2596 5 5-2.5791 1 1.30 2.4917 0.3849 2.8766 4 3-3.1955 0.5 0.60 0.5666 0.2174 0.7840 1 2-3.6614 *Significance at the 1% level. The critical value (CV) for significance at the 1% level is 9.2103 for LR cc and 6.6349 for LR uc and LR ind. The CV for significance at the 5% level is 5.9915 for LR cc and 3.8415 for LR uc and LR ind. λ = 1 α is the expected fraction of violations in percent. The observed violation ratio in percent is denoted π. LR uc, LR ind and LR cc are the likelihood ratio test statistics for the unconditional coverage, independence, and conditional coverage tests, respectively. R uc and R cc are the ranks based on the size of LR uc and LR cc, respectively. VaR is the average VaR in percent over the backtest period. 94
Table 4.14: 10-day VaR-based backtest results Left Tail Right tail Method λ π AD R VaR π AD R VaR VC 5 6.1852 1.1852 7-7.1155 6.0181 1.0181 7 6.2576 2.5 4.0455 1.5455 7-8.3965 4.0455 1.5455 7 7.5386 1 2.7081 1.7081 7-9.8859 2.2066 1.2066 7 9.0280 0.5 1.9057 1.4057 7-10.9000 1.7051 1.2051 7 10.0421 EWMA 5 6.2521 1.2521 6-7.2427 5.1488 0.1488 5 6.1764 2.5 3.2096 0.7096 6-8.6282 2.9087 0.4087 6 7.5690 1 1.2705 0.2705 6-10.3636 1.5045 0.5045 6 9.2504 0.5 0.6687 0.1687 6-11.4735 0.9361 0.4361 5 10.5211 CCC-GARCH 5 3.9786 1.0214 4-7.9164 5.0819 0.0819 1 6.4815 2.5 1.9392 0.5608 4-9.4100 2.8419 0.3419 4 7.9303 1 1.0030 0.0030 1-11.1296 1.4376 0.4376 5 9.7669 0.5 0.5015 0.0015 1-12.3725 0.9696 0.4696 6 11.0339 DCC-GARCH 5 3.8449 1.1551 5-8.0370 4.9147 0.0853 2 6.5391 2.5 1.9057 0.5943 5-9.5231 2.7081 0.2081 3 8.0019 1 0.8693 0.1307 5-11.4763 1.3708 0.3708 4 9.9132 0.5 0.3343 0.1657 5-12.8995 0.8358 0.3358 4 11.4033 MCONDEVT-Ga 5 4.5804 0.4196 2-7.6857 5.3159 0.3159 4 6.5049 2.5 2.3404 0.1596 2-9.1779 2.6078 0.1078 2 8.1272 1 0.9696 0.0304 2-11.0828 1.3039 0.3039 3 10.2602 0.5 0.5015 0.0015 1-12.5615 0.6687 0.1687 3 11.8588 MCONDEVT-t 5 4.5804 0.4196 2-7.6758 5.1822 0.1822 3 6.5494 2.5 2.2066 0.2934 3-9.2722 2.5410 0.0410 1 8.1974 1 0.9361 0.0639 3-11.2869 1.2705 0.2705 2 10.3614 0.5 0.4681 0.0319 3-12.8117 0.6352 0.1352 2 12.0764 MCONDEVT-Gu 5 4.7141 0.2859 1-7.5918 4.5470 0.4530 6 6.7493 2.5 2.4072 0.0928 1-9.0971 2.1398 0.3602 5 8.6042 1 1.0699 0.0699 4-10.9027 0.9361 0.0639 1 11.2612 0.5 0.6352 0.1352 4-12.2516 0.4681 0.0319 1 13.4587 The table reports 10-day VaR-based backtest results. λ = 1 α is the expected probability in percent. The observed violation ratio in percent is denoted by π. R is the rank based on the size of the absolute difference (AD) between λ and π. VaR is the average VaR in percent over the backtest period. 95
15 10 5 Portfolio losses VC VC EWMA GARCH CCC GARCH DCC MCONDEVT Ga MCONDEVT t MCONDEVT Gu 0 5 10 15 2000 2002 2004 2006 2008 2010 Figure 4.10: 1-day VaR α=99% estimates plotted against the portfolio losses. The unconditional VC method serves as a poor method for estimating the 10-day VaR. Based on the absolute difference between the observed ratio of violations and the expected ratio, the method is bottom-ranked at all confidence levels in both tails. The apparent underestimation of VaR can probably be ascribed to the crude normality assumption and the unconditional nature of the method. Moreover, VC is the only method which bases the multiple period forecasts on the simple square root of time rule and not on MC simulations as the other methods. The conditional VC-EWMA method shows an improved performance over the pure VC method, demonstrating that even a relatively simple volatility updating scheme can improve VaR estimation. However, the method generally underestimates VaR for both tails and is ranked third to second last at all confidence levels. The results are more equivocal for the CCC-GARCH method. The method has a general tendency to underestimate VaR in the right tail, except at the 95% level where the method is top ranked. In the left tail, however, the method overestimates VaR at the two lowest levels but shows remarkably good performance at the two highest levels. It seems that DCC-GARCH generally overestimates VaR in the left tail at all confidence levels, but the method has a better performance than CCC- GARCH in the right tail and receives a higher ranking at the three highest confidence levels. MCONDEVT-Ga has a general tendency to slightly overestimate VaR 96
in the left tail and to slightly underestimate VaR in the right tail. However, the performance in terms of absolute difference between the observed and expected violation ratios is generally better than that of VC, VC-EWMA, CCC-GARCH, and DCC-GARCH. MCONDEVT-t has a good overall performance and is slightly better than MCONDEVT-Ga in the right tail, but slightly worse in the left tail. Once again, we observe a general tendency for overestimation in the left tail and underestimation in the right tail. Finally, the MCONDEVT-Gu method has the lowest absolute differences among the multivariate methods in the right tail at the two highest levels. However, the method overestimates VaR at the two lower levels in the right tail. In the left tail, the method appears to slightly underestimate VaR at the higher levels, but seems to accurately estimate VaR at the two lower levels. Overall, we conclude that the results are in favor of the EVT-based methods, but no specific method does it overly well at all confidence levels and in both tails. In the left tail, MCONDEVT-Ga seems to have the best performance in general. The right tail results generally favor MCONDEVT-t, but similar to the 1-day VaR-based backtest, MCONDEVT-Gu is the preferred choice at the 99% and 99.5% confidence levels. For illustrative purposes, we plot the 10-day VaR α=99% estimates against the 10-day portfolio losses (see Figure 4.11). The pattern of the estimates indicates that the unconditional VC method has a slow response to increasing market volatility which leads to clustering of violations. Expected Shortfall As in Section 4.1.3, we investigate the average discrepancy between the ES estimate and the portfolio loss on VaR violation days. While the average discrepancy will be evaluated statistically in the case of 1-day ES estimates, we only give a qualitative assessment in the case of 10-day ES estimates. 1-day Expected Shortfall. Table 4.15 documents the results of the 1-day ES-based backtest. The unconditional VC method has the largest average discrepancies and we strongly reject the null hypothesis of zero mean exceedance residuals, providing statistical evidence that the method significantly underestimates ES. As a result, VC is the worst performing method with respect to estimating 1-day ES. Similarly, at the 5% significance level, we reject the null hypothesis of zero mean for VC-EWMA, CCC-GARCH and DCC-GARCH in all cases. At the 1% significance level, we only fail to reject the null for DCC-GARCH at the 99% confidence level and for both CCC-GARCH and DCC-GARCH at the 99.5% level in the left tail. Apparently, these three methods are not able to appropriately account for the fat-tailedness of the underlying distribution, 97
40 30 20 10 Portfolio losses VC VC EWMA GARCH CCC GARCH DCC MCONDEVT Ga MCONDEVT t MCONDEVT Gu 0 10 20 30 40 2000 2002 2004 2006 2008 2010 Figure 4.11: 10-day VaR α=99% estimates plotted against the 10-day portfolio losses. Table 4.15: 1-day ES-based backtest results λ = 5 λ = 2.5 λ = 1 λ = 0.5 VC 0.6716(0.0000) 0.7983(0.0000) 0.8356(0.0000) 0.8496(0.0000) VC-EWMA 0.3347(0.0000) 0.5136(0.0000) 0.5511(0.0000) 0.6185(0.0000) CCC-GARCH 0.3328(0.0000) 0.4818(0.0000) 0.5525(0.0000) 0.6629(0.0000) DCC-GARCH 0.3097(0.0000) 0.4424(0.0000) 0.5613(0.0000) 0.6488(0.0000) MCONDEVT-Ga 0.1036(0.0389) 0.1962(0.0160) 0.3042(0.0254) 0.3679(0.0316) MCONDEVT-t -0.0071(0.3705) 0.0953(0.1468) 0.1593(0.1944) 0.0133(0.4431) MCONDEVT-Gu -0.1022(0.7844) -0.0354(0.4823) -0.0557(0.6313) -0.1332(0.5409) VC 0.6716(0.0000) 0.7983(0.0000) 0.8356(0.0000) 0.8496(0.0000) VC-EWMA 0.1747(0.0001) 0.2088(0.0001) 0.2764(0.0004) 0.3759(0.0004) CCC-GARCH 0.1601(0.0004) 0.2921(0.0000) 0.3234(0.0017) 0.3073(0.0213) DCC-GARCH 0.1609(0.0013) 0.2441(0.0010) 0.2753(0.0107) 0.2628(0.0555) MCONDEVT-Ga -0.0245(0.4832) -0.0087(0.6807) -0.0328(0.6579) 0.2175(0.1253) MCONDEVT-t -0.0947(0.8257) -0.1008(0.8814) -0.2090(0.8759) -0.1725(0.5631) MCONDEVT-Gu 0.0539(0.1392) 0.1246(0.1206) 0.1391(0.2022) 0.2638(0.0720) The table contains the average discrepancy between the portfolio loss and the ES estimate on VaR violation days measured in percentage points. Probability values (in parentheses) are from a one-sided bootstrap test which tests the null hypothesis of zero mean exceedance residuals. Right tail Left tail 98
resulting in poor performance with respect to estimating ES. The right tail results indicate that MCONDEVT-Ga has a tendency to underestimate ES as indicated by positive average discrepancies. In fact, at the 5% significance level, we reject the null hypothesis of zero mean at all confidence levels. In the left tail, we fail to reject the null at any reasonable significance level. MCONDEVT-t has a remarkably good performance in both tails. The average discrepancies are relatively close to zero in general and we fail to reject the null hypothesis of zero mean exceedance residuals at any reasonable significance level in all cases. The method has the average discrepancy closets to zero at the 95% confidence level in the right tail and at the 99.5% level in both tails. MCONDEVT-Gu also has remarkably good performance in both tails. The average discrepancies are relatively close to zero in all cases. Notice that the average discrepancies in the right tail are negative, indicating that the method overestimates ES, whereas the average discrepancies in the left tail are positive indicating underestimation of the ES. However, none of the discrepancies are statistically significant at the 5% significance level and we fail to reject the null hypothesis of zero mean. Overall, we conclude that the EVT-based methods show the best performance among the multivariate methods with respect to estimating 1-day ES. MCONDEVT-Ga is favored by the left tail results, while MCONDEVTt and MCONDEVT-Gu both show good performance in both tails in terms of high p-values and average discrepancies relatively close to zero. 10-day Expected Shortfall. Table 4.16 reports the 10-day ES-based backtest results for the multivariate methods. Based on the average discrepancies and MSE values, we compare the performance of the methods with respect to estimating 10-day ES. The VC method has the largest average discrepancies and MSE values among the multivariate methods. Again, note that VC is the only method which uses the simple square root of time rule when estimating 10-day ES, which presumably is an influencing factor on the exhibited performance. The results indicate that the method is unfit for 10-day ES estimation. VC-EWMA clearly outperforms VC and achieves slightly smaller average discrepancies and MSE values than those obtained by CCC-GARCH and DCC-GARCH in the right tail. In fact, it achieves the lowest MSE values among the methods at the 95% and 97.5% confidence levels in the right tail. However, it has the second largest average discrepancies and relatively large MSE values in the left tail. CCC-GARCH is similar in performance to DCC-GARCH. Based on average discrepancies and MSE values, the method is the best performing at the 99% and 99.5% confidence levels in the left tail, while the DCC-GARCH method does best at the 95% and 97.5% confidence levels. However, in the 99
Table 4.16: 10-day ES-based backtest results λ = 5 λ = 2.5 λ = 1 λ = 0.5 VC 2.1779(31.2677) 2.7285(39.2380) 4.0643(55.1868) 4.7806(63.8289) VC-EWMA 0.8373(11.7617) 0.8830(13.0087) 1.4992(15.3175) 1.2578(15.1880) CCC-GARCH 0.8575(14.1572) 1.2133(16.9011) 2.0136(21.4172) 1.8153(20.8154) DCC-GARCH 0.8314(14.1752) 1.1747(17.1667) 1.5098(20.0665) 1.9434(20.4383) MCONDEVT-Ga 0.5724(12.8585) 0.8563(15.8603) 1.3001(17.4155) 0.8446(16.3599) MCONDEVT-t 0.4687(12.7308) 0.6914(15.7160) 1.0948(17.5531) 1.0884(16.4502) MCONDEVT-Gu 0.0155(12.9294) 0.2964(15.4360) 0.1129(14.9641) 0.5481(11.6450) VC 1.2895(10.1830) 1.4616(9.8075) 1.4973(8.8218) 1.5030(7.9265) VC-EWMA 0.8473(8.7617) 0.8833(7.0087) 1.4952(5.3175) 1.2578(4.1880) CCC-GARCH -0.0096(4.2325) 0.1080(3.4185) -0.4395(2.3044) -0.5968(1.3137) DCC-GARCH -0.1480(4.0129) -0.1604(3.2356) -0.8541(3.3279) -0.8744(1.3872) MCONDEVT-Ga -0.0619(4.6232) -0.0710(3.9956) -0.8334(3.2491) -0.9348(2.5290) MCONDEVT-t -0.1585(4.6798) -0.1627(4.0951) -0.9360(3.5651) -1.5262(4.0666) MCONDEVT-Gu 0.0372(4.9102) 0.1089(4.2559) -0.3487(2.7361) -1.0880(3.4184) The table contains the average discrepancy between the portfolio loss and the ES estimate on VaR violation days measured in percentage points. Mean squared errors are reported in parentheses. Right tail Left tail right tail, the methods are outperformed by the MCONDEVT methods. Let us turn to the MCONDEVT methods. The right tail results indicate that MCONDEVT-Ga outperforms VC, CCC-GARCH and DCC-GARCH at all confidence levels. However, it seems that the method has a general tendency to underestimate ES in the right tail as indicated by the positive average discrepancies, but to overestimate ES in the left tail as indicated by negative average discrepancies. MCONDEVT-t has the same pattern of over- and underestimation as MCONDEVT-Ga, but generally achieves lower average discrepancies and MSE values in the right tail at most confidence levels. MCONDEVT-Gu has the average discrepancies closets to zero in the right tail and achieves the lowest MSE at the 99% and 99.5% confidence levels. In the left tail, it has the average discrepancy closest to zero among the competing methods at the 95% and 99% level and generally quite low MSE values. Overall, it appears from the findings that the MCONDEVT methods perform well with respect to estimating 10-day ES. While left tail results seems in favor of CCC-GARCH and DCC-GARCH, the MCONDEVT methods are not far behind in terms of performance measured on average discrepancies and MSE values. In the right tail, the MCONDEVT-Gu method seems to be the preferred choice, but the method is closely followed by MCONDEVT-Ga and MCONDEVT-t. Sensitivity Analysis As in Section 4.1.3, we examine the sensitivity of the 1-day backtest results. Specifically, we focus on the results of MCONDEVT-t due to its overall good performance and evaluate the sensitivity of the results to the choice of N u. Table 4.17 reports the results of the sensitivity analysis of the VaR-based 100
Table 4.17: Sensitivity analysis: VaR-based backtest results Left tail) Right tail N u λ = 5 λ = 2.5 λ = 1 λ = 0.5 λ = 5 λ = 2.5 λ = 1 λ = 0.5 140 4.67(2) 2.37(4) 0.93(1) 0.37(3) 5.10(2) 2.47(2) 1.10(1) 0.67(2) 130 4.73(2) 2.37(4) 0.90(2) 0.37(3) 5.10(2) 2.47(2) 1.13(2) 0.67(2) 120 4.70(2) 2.33(4) 0.97(1) 0.37(3) 5.10(2) 2.53(2) 1.13(2) 0.67(2) 110 4.70(2) 2.37(4) 0.97(1) 0.37(3) 5.17(2) 2.53(2) 1.13(2) 0.67(2) 100 4.67(2) 2.37(4) 0.97(1) 0.37(3) 5.23(2) 2.53(2) 1.13(2) 0.67(2) 90 4.70(2) 2.37(4) 0.93(1) 0.37(3) 5.23(2) 2.53(2) 1.13(2) 0.67(2) 80 4.70(2) 2.37(4) 0.93(1) 0.37(3) 5.23(2) 2.53(2) 1.13(2) 0.67(2) 70 4.73(2) 2.37(4) 0.90(2) 0.37(3) 5.20(2) 2.53(2) 1.13(2) 0.67(2) The table reports violation ratios for MCONDEVT-t obtained by using different numbers of exceedances N u in the implementation of the method. λ = 1 α is the expected probability in percent. The numbers in parentheses show the rank of MCONDEVT-t based on the likelihood ratio test statistic for conditional coverage. backtest. The violation ratios and rankings based on conditional coverage are robust with respect to the choice of N u in the specified range. The ranking in the left and right tail persists in nearly all cases, and we conclude from the findings that the VaR-based backtest results are robust and that they support the choice of N u in the range of 100. Table 4.18 shows the results of the sensitivity analysis of the 1-day ESbased backtest results. The average discrepancies are fairly stable with respect to the values of N u. However, we note that in the right tail higher values of N u lead to increasing average discrepancies, suggesting that N u should be kept in an interval between 70 and 100 exceedances. On the other hand, a similar pattern is not observable in the left tail. Overall, we find no reasons for altering any of the previously made conclusions. 4.3 Overall Comparison of Backtest Results So far we have compared the univariate EVT-based HS-CONDEVT method with other univariate alternatives and the three variants of the multivariate EVT-based MCONDEVT method with alternative multivariate methods, separately. However, there are no reasons why the multivariate and univariate methods cannot be compared with each other when the overall purpose is to measure the risk of our three-asset portfolio. Thus, this section is devoted to an overall evaluation of the methods risk measurement performance, i.e. a search for the methods that most satisfactorily meet the evaluation criteria: 1. Does the observed fraction of VaR violations π match the expected fraction of violations λ? 2. Do the VaR violations fall randomly in time? 3. Does the size of the violating loss match the expected shortfall? 101
Table 4.18: Sensitivity analysis: ES-based backtest result Right tail Left tail N u λ = 5 λ = 2.5 λ = 1 λ = 0.5 140 0.0242(0.2347) 0.1540(0.0553) 0.2436(0.0838) 0.1447(0.2601) 130 0.0215(0.2429) 0.1436(0.0686) 0.2022(0.1256) 0.1192(0.2764) 120 0.0224(0.2424) 0.1085(0.1163) 0.1996(0.1280) 0.0996(0.3008) 110 0.0112(0.3100) 0.1076(0.1080) 0.1707(0.1651) 0.0271(0.3947) 100-0.0071(0.3705) 0.0953(0.1468) 0.1593(0.1944) 0.0133(0.4431) 90-0.0082(0.3664) 0.0976(0.1308) 0.1609(0.1898) 0.0134(0.4276) 80-0.0087(0.3656) 0.0940(0.1326) 0.1433(0.2153) -0.0008(0.4460) 70-0.0051(0.3468) 0.0912(0.1344) 0.1438(0.2140) -0.0097(0.4637) 140-0.0891(0.8038) -0.0874(0.8545) -0.1590(0.8165) -0.1304(0.5177) 130-0.1003(0.8537) -0.0918(0.8672) -0.1440(0.7979) -0.1227(0.5173) 120-0.0976(0.8282) -0.0795(0.8501) -0.1813(0.8542) -0.1299(0.5283) 110-0.0961(0.8375) -0.0974(0.8766) -0.1951(0.8643) -0.1501(0.5465) 100-0.0947(0.8257) -0.1008(0.8814) -0.209(0.8759) -0.1725(0.5631) 90-0.0957(0.8388) -0.1017(0.8848) -0.1923(0.8602) -0.2052(0.6018) 80-0.0971(0.8377) -0.0946(0.8814) -0.1898(0.8484) -0.2117(0.5956) 70-0.1042(0.8536) -0.1201(0.9154) -0.1843(0.8434) -0.2721(0.6501) The table contains the average discrepancy between the portfolio loss and the ES estimate on VaR violation days measured in percentage points. VaR and ES estimates where obtained using different numbers of excess losses N u in the implementation of MCONDEVT-t. λ = 1 α is the expected probability in percent. Probability values (in parentheses) are from a one-sided bootstrap test which tests the null hypothesis of zero mean exceedance residuals. presented in Section 3.3. In this evaluation, we focus on the results supported by statistical tests, but we comment on whether or not the results from the qualitative comparisons of the 10-day risk measurement performance support the conclusions. The combined results of the performed backtests are provided in Tables B.2-B.6 in Appendix B. Right Tail Results λ = 0.5%: The methods with the observed ratio of violation π closest to the expected ratio are FHS, HS-CONDEVT, and MCONDEVT-Gu followed by MCONDEVT-t. MCONDEVT-Gu and MCONDEVT-t both meet the three evaluation criteria at any reasonable level of significance, while the two univariate methods have difficulties in meeting the third criteria; we are close to rejecting the null hypothesis of zero mean exceedance residuals for FHS at the 5% significance level and we reject the null for HS-CONDEVT at the 5% significance level. The 10-day VaR- and ES-based backtest results seem in favor of the MCONDEVT methods over FHS in terms of mean squared error (MSE) values and the absolute difference (AD) between λ and π. λ = 1%: The methods with the observed fraction of violations π closets to the expected λ are HS-CONDEVT and MCONDEVT-Gu, but the methods 102
are closely followed by FHS, MCONDEVT-t and MCONDEVT-Ga. All the methods meet the three evaluation criteria at the 5% level of significance. However, HS-CONDEVT and MCONDEVT-Ga fail to satisfy the third criteria at the 10% significance level. All the MCONDEVT methods seem to be favored over the univariate methods by the 10-day VaR-based backtest results. MCONDEVT-Gu obtains the average discrepancy closest to zero in both the 1-day and 10-day ES-based backtest and the lowest MSE in the 10-day ES-based backtest. λ = 2.5%: The violation ratio obtained by MCONDEVT-t and MCONDEVT- Gu are the closest to λ in the right tail, but the methods are closely followed by MCONDEVT-Ga. The three methods meet the evaluation criteria at any reasonable significance level, except for MCONDEVT-Ga which fails to meet the third criteria at the 5% significance level. MCONDEVT-Gu obtains the average discrepancy closest to zero in both the 1-day and 10-day ES-backtest. Based on a qualitative assessment, we would prefer MCONDEVT-t over MCONDEVT-Gu based on the 10-day VaR-based backtest results. λ = 5%: VC-EWMA and MCONDEVT-Ga obtain violation ratios in line with the expected λ, while DCC-GARCH, MCONDEVT-t, and HS-GARCHt obtain violation ratios close to the expected ratios. VC-EWMA fails to meet the second and third evaluation criteria at any reasonable levels of significance. MCONDEVT-Ga fails to satisfy the second and third criteria at the 5% significance level. DCC-GARCH meets the first two criteria at the 5% level of significance but fails to meet the third criteria as the null hypothesis of zero mean exceedance residuals is rejected at any reasonable level of significance. The null hypotheses of conditional coverage and zero mean exceedance residuals cannot be rejected for MCONDEVT-t at the 5% significance level. However, conditional coverage is rejected at the 10% significance level. The only method that satisfies all evaluation criteria at any reasonable level of significance is HS-GARCH-t. While the 10-day VaR-based backtest results are clearly in favor of MCONDEVT-t, the 10-day ES-based backtest results indicate a slightly better performance by HS-GARCH-t. Left Tail Results λ = 0.5%: The methods that come nearest to the 0.5% expected violation ratio are HS-GARCH-t, MCONDEVT-Ga, and MCONDEVT-Gu, closely followed by MCONDEVT-t. All methods meet the three evaluation criteria at the 5% level of significance. However, MCONDEVT-Gu fails to meet the third criteria at the 10% significance level. The 10-day VaR-based backtest results are in favor of MCONDEVT-Ga and MCONDEVT-t over MCONDEVT-Gu and HS-GARCH-t. On the other hand, the 10-day ESbased backtest results are more in favor of HS-GARCH-t compared to the 103
EVT-based methods. λ = 1%: The smallest difference between the actual π and the expected λ is observed for HS-GARCH-t, closely followed by MCONDEVT-Ga and MCONDEVT-t. Independence between VaR violations is rejected for HS- GARCH-t at the 5% significance level, providing statistical evidence that the method fails to meet the second criteria. MCONDEVT-Ga and MCOND- EVT-t, on the other hand, meet the three evaluation criteria at any reasonable level of significance. The 10-day VaR-based backtest results also appear to support a choice of MCONDEVT-Ga and MCONDEVT-t due to their small AD values, and so do the relatively small MSE values in the 10- day ES-based backtest. λ = 2.5%: CCC-GARCH, DCC-GARCH, and MCONDEVT-Ga come closest to the expected fraction of violations, but are closely followed by MCOND- EVT-t. CCC-GARCH and DCC-GARCH fails to meet the third evaluation criteria at any reasonable level of significance. MCONDEVT-Ga and MCONDEVT-t meet all three criteria at any reasonable levels of significance. Moreover, the 10-day VaR-based backtest results indicate that MCONDEVT- Ga and MCONDEVT-t are superior to CCC-GARCH and DCC-GARCH. λ = 5%: The method with the observed violation ratio closest to λ = 5% in the left tail is VC-EWMA, followed by HS-GARCH, MCONDEVT-Ga, and MCONDEVT-t. VC-EWMA and HS-GARCH meet the first and second criteria but fail to meet the third criteria at any reasonable level of significance. MCONDEVT-Ga and MCONDEVT-t meet the three evaluation criteria at any reasonable level of significance and seem to be preferred by the 10-day VaR- and ES-based backtest results. Final Remarks The overall evaluation of the empirical results clarified that the multivariate EVT-based methods rank among the best performing methods. Especially MCONDEVT-t proves to be a powerful alternative to the competing methods as it attains violation ratios closely matching the expected fraction of violations in all cases while meeting the three evaluation criteria at any reasonable level of significance. Moreover, the good performance is supported by the 10-day backtest results. The univariate EVT-based HS-CONDEVT method proves to be a very viable alternative to FHS and HS-GARCH-t, but the empirical investigation gives no indication of a statistically significant advantage over these methods (or the multivariate EVT-based methods). MCONDEVT-Gu is one of the best methods in the right tail and the left tail performance of the method is fair, but less impressive. Finally, MCONDEVT-Ga provides an excellent 104
left tail performance, but the right tail performance shows that the method has a tendency to underestimate ES. 105
Chapter 5 Reflections In this brief chapter we start out by discussing the implications and limitations of our study in Section 5.1. This debate is continued in Section 5.2 where we discuss the applicability and feasibility of the EVT-based risk measurement methods in practice. Finally, in Section 5.3 we outline ideas for further research within extreme value theory applied to financial risk management. 5.1 Implications and Limitations of the Results Our study adds to the growing body of theoretical and empirical research on extreme value theory in finance. From a research-related perspective, the primary contributions can be divided into two categories; theoretical contributions and empirical contributions. On the theoretical front, we have given a concise yet rigorous treatment of extreme value theory, providing the academic and practitioner with a quick and accessible overview of the main theorems, models, and statistical issues of the theory. Naturally, we have had to leave out topics and details which others may have found important. For instance, semi-parametric threshold methods based on the so-called Hill estimator are not covered. However, we have striven to include the results that we consider the most important to the application of the theory in financial risk management while still giving a broad description of the theory. In addition, we have tried to achieve an appropriate balance between the theoretical foundation and the implemental issues of the theory, giving the reader the opportunity to understand the theory as well as providing him with the necessary tools to apply it. On the empirical front, we have provided both quantitative and qualitative evidence on the performance of EVT-based risk measurement methods. Our empirical study differs from previous empirical research in several aspects. First of all, unlike previous studies which for the most part only consider VaR, we evaluate the methods performance using a three-criterial 106
evaluation framework where the methods ability to accurately estimate both VaR and ES is assessed. This framework provides a more systematic and comprehensive evaluation of the methods performance. Also, existing empirical studies on EVT primarily focus on a univariate setting, i.e. where only a single risk factor is accounted for. In contrast, we provide a more comprehensive evaluation setup by investigating the performance of EVTbased risk measurement methods in both a univariate and a multivariate setting. Furthermore, to determine the methods relative performance, we compare the EVT-based methods with a set of more traditional, alternative methods for risk measurement. Finally, our study is one of the very few in extreme value theory based on Danish stock data (another study in EVT using Danish stock data is Lauridsen [2000]). Based on the empirical study, we reached the overall conclusion that the EVT-based methods performed better or equally well compared to the alternative risk measurement methods that we considered, see Chapter 4. However, there are certain potential limitations to our empirical study that need to be addressed. The first potential limitation concerns the data used in the study. We consider a hypothetical portfolio containing three large Danish stocks. Consequently, the general validity of our results beyond the context of the Danish stock market may be questioned. However, to comment on this criticism, we note that the return series used in the study exhibit typical empirical properties the so-called stylized facts of financial return series data, such as excess kurtosis and volatility clustering. Thus, in this sense, our results give a clear indication of the general performance of the EVT-based methods when using stock return data. An additional criticism might be that the portfolio we consider is too simplistic in its asset composition. We only consider a portfolio composed of stocks, while trading portfolios of financial institutions typically include a variety of different asset classes, such as options, futures, bonds, currencies, etc. In order to provide evidence on the performance of EVT-based risk measurement methods for other asset classes, further research is needed. Also, in relation to this, it may be argued that a portfolio exposed to only three risk factors is too simplistic and unrealistic, considering that large trading portfolios of financial institutions can be exposed to several thousand risk factors [Berkowitz and O Brien, 2002]. Consequently, this leads to the following question: Are our results scalable to real, high-dimensional portfolios? Again, further research is needed to answer this question. Others might argue that our study is limited in the sense that we use a single estimation window size of n = 1, 000 daily return observations for the evaluation of the risk measurement methods. Several studies (e.g. Marimoutou et al. [2009] and McNeil and Frey [2000]) use the same window size as us, while others (e.g. Lauridsen [2000] and Danielsson and De Vries [2000]) prefer 1,500 daily observations for backtesting. Yet others (e.g. Kuester et al. 107
[2006]) use smaller window sizes down to 250 observations. However, when estimating the risk of low probability or extreme events it can be argued that one should always use as large a sample as possible to ensure the inclusion of extreme observations. Danielsson and De Vries [2000] argue that 250 daily observations are insufficient for estimating 99% VaR estimates and advocate the use of large samples to increase the precision of the VaR estimates. On the other hand, relevant and synchronized data may not be available for all risk factors. With these arguments in mind, we consider the use of a window size of 1,000 observations as a prudent compromise. Further limitations concern the methods considered in the study. As mentioned, we found that the EVT-based methods generally performed well compared to a selected number of alternative methods. This does not mean, however, that there do not exist other methods that perform even better. The collection of methods used in our study is by no means exhaustive and one may argue that we should have included other methods. For instance, some may criticize our study for not including other more advanced MGARCH models, such as the DVEC model proposed by Bollerslev et al. [1988] or the Baba-Engle-Kraft-Kroner (BEKK) model described in Engle and Kroner [1995]. However, our choice of MGARCH methods was generally motivated by the fact that they lend themselves to estimation in stages. This two-stage estimation form makes the methods tractable for risk management of large portfolios by avoiding numerical optimizations in high dimensions [Christoffersen, 2003]. Most other MGARCH models are affected by the issue of dimensionality, which means that the number of parameters in the models to be estimated simultaneously increases rapidly with the number of risk factors to be modeled, making estimation infeasible in high-dimensional applications. In the following, we discuss the practical applicability of the EVT-based methods for risk measurement. 5.2 Applicability in Practice In financial risk management, balancing accuracy against complexity is often an issue. Compared with the alternative methods considered in this study, the EVT-based methods generally offer a higher degree of accuracy in exchange for a more complexity. From the perspective of a risk manager, this trade-off is very important. In this section we therefore discuss the practical feasibility and applicability of the implemented EVT-based methods. From a practical perspective, the univariate HS-CONDEVT method offers the simplest way of obtaining conditional risk measures based on EVT. By working with portfolio losses, we avoid modeling the joint distribution of the component risk factor return series without resorting to dimension reduction through principal components or factor analysis. Moreover, the 108
method is easy to implement for financial institutions that already employ historical simulation for their risk measurement purposes. However, there are two possible points of critique of the method. The first point is that the process of generating portfolio losses may be very time demanding if the portfolio contains many derivatives that can only be priced using Monte Carlo methods. Thus, for each day in the historical sample, it would be necessary to simulate thousands of price paths for each of the underlying assets in order to price the derivatives. The second point is that by only modeling portfolio losses we may disregard important information about the dependencies between the risk factors. The reduced form prevents the possibility of tracking joint extreme losses, which are the most dangerous from the perspective of a risk manager. Being a multivariate method, MCONDEVT allows for portfolio optimization and active risk management (e.g. minimization of VaR and ES) but at the cost of higher computational difficulty [Andersen et al., 2005, Kuester et al., 2006]. In general, the calibration of multivariate models to high-dimensional portfolios is an almost impossible task without some form of dimension reduction [McNeil et al., 2005]. However, the use of copulas gives MCONDEVT a computational advantage over many conventional multivariate methods. Recall, that the flexibility of copulas allows the marginal distributions to be estimated separately from their joint dependence structure. As a result, few parameters are estimated simultaneously which greatly facilitates estimation. The computational intensity, however, is dependent on the choice of copula. The elliptical copulas considered in this study both have approximate ML methods that make them tractable for higher dimensional applications, while high dimensions create validation and computational difficulties for the Archimedean class of copulas [Bouyé et al., 2000]. Thus, dependent on the choice of copula, MCONDEVT is relatively suitable for high-dimensional applications. While it would be natural to expect a multivariate method that is capable of accounting for the co-movements of risk factor return series to outperform simple univariate methods applied to portfolio losses, McNeil et al. [2005] demonstrate that HS-CONDEVT in some cases outperforms more advanced multivariate methods. Other evidence in favor of univariate methods includes Brooks and Persand [2003] who find that the advantage of using a DVEC model rather than a simple univariate GARCH model for volatility estimation is negligible. In another study, Berkowitz and O Brien [2002] find that a simple univariate ARMA-GARCH model applied to portfolio losses outperforms more advanced multivariate models in estimating VaR. In conclusion, empirical studies suggest that univariate methods, such as HS- CONDEVT, can at least be a useful complement to the toolbox of the risk manager, but multivariate methods, such as MCONDEVT, are still required for active risk management [Andersen et al., 2005]. From a practical perspective, unconditional methods may arguably be 109
more feasible for risk management than conditional methods as unconditional risk measure estimates are more stable over time. Thus, it is less costly to adjust the risk capital according to unconditional risk measure estimates. Due to the higher volatility of conditional estimates, financial institutions often prefer to use unconditional methods to set trading limits for their portfolio managers [Danielsson and Morimoto, 2000]. On the other hand, in accordance with McNeil and Frey [2000], it can be argued that unconditional methods give a false image of the current market risk exposure as they fail to account for the time-varying volatility exhibited by most financial return series. In contrast, conditional methods warn the portfolio manager when market volatility increases giving him the opportunity to react accordingly. Moreover, while some financial institutions are not able (or willing) to adjust their risk capital as rapidly as conditional measures prescribe, they may instead react by changing their market risk exposure. 5.3 Ideas for Further Research As mentioned, EVT is a recent addition to the toolbox for management of financial risk. Several studies have documented the strengths that EVT has to offer to the discipline of risk management, ours included. However, there is still potential and need for further development and refinement, both theoretically and empirically. In this brief section, we therefore propose possible ideas for further research. On the empirical front, our study could be extended and supplemented in several ways. First of all, one could apply the methods that we have studied to stock data from other countries and data from other asset classes, such as options, futures, currencies, and commodities (oil, electricity, etc.). In addition, one might also examine the effects of using data with other time intervals, such as high frequency data (hourly or bi-hourly data) or longer frequency data (weekly- or bi-weekly). Using bi-weekly data would solve the issue of dependence that arises when backtesting with 10-day risk measures as discussed in Section 3.3. Another extension would be to use alternative specifications for the conditional variance when modeling the portfolio loss series or the individual risk factor return series. A good overview of potential models is given by Hansen and Lunde [2005], who consider 330 different volatility structures for forecasting of volatility. Previous empirical studies in EVT have e.g. considered fractionally integrated GARCH models, also known as long memory models, with good results, see e.g. Maghyereh and Al-Zoubi [2008]. In the multivariate EVT-based methods, one might consider alternative types of copulas for modeling the dependence structure. To account for the possible asymmetric dependence structure in multivariate return series data, it may be worth considering copulas that allow for different levels of 110
tail dependence. Concrete examples of such asymmetric copulas include the class of mixture copulas, such as the skewed t copula introduced in Demarta and McNeil [2005]. Like the t copula used in our study, the skewed t copula is easy to simulate from and thus to use in Monte Carlo based risk measurement methods [McNeil et al., 2005]. Finally, the performance of the methods may be evaluated with other types of evaluation criteria besides the three considered in our study. Possible alternatives could be based on the loss function proposed by Lopez [1998]. The idea of the loss function is to assign a score to each risk measurement method based on specific concerns such as e.g. the frequency and size of VaR violations. On the theoretical front, possible directions for further research include the exploration of alternative ways of handling the issue of dependent data, other than by using the pre-whitening approach considered in our study. As noted in Section 2.2.2, the pre-whitening approach has the drawback of being a two-stage procedure which means that the results of the EVT analysis in the second stage will be sensitive to the fitting of the GARCH-type model to the risk factor return series in the first stage. Another more recent approach is that of Chavez-Demoulin et al. [2005] and further extended in McNeil et al. [2005], who suggest modifying the marked point process of the POT model by incorporating a model for the dependence of frequencies and sizes of exceedances of high thresholds. Specifically, they consider so-called selfexciting processes also known as Hawkes processes [Hawkes, 1971]. Empirical results of the performance of these models in financial applications are currently sparse, and further empirical research on the potential of these models may therefore prove fruitful. Also, the models have not, to our knowledge, been generalized to a setup with multivariate risk factors in a financial context, so another interesting research direction might be to investigate how multivariate versions of the models can be constructed and the advantages and limitations of such models. Another interesting area for further investigation is the strain of research within multivariate EVT that focuses on connecting univariate EVT with the theory of copulas. Some progress has been made in this area. For instance, a class of copulas known as extreme value copulas has been shown to be the natural limiting copulas of multivariate maxima, as described in Section 2.3.2. Also, copula theory has been applied to modeling of multivariate threshold exceedances. A recent development in this area investigates the specific kinds of copulas that emerge in the limit when losses are above (or below) some threshold. Originally described by Juri and Wüthrich [2002], these classes of copulas have been termed limiting threshold copulas or tail dependence copulas. They can be considered as natural limiting models for the dependence structure of multivariate threshold exceedances, just as the GPD can be considered the natural limiting model for threshold exceedances in the univariate case [McNeil et al., 2005]. We briefly discussed these copulas 111
in Section 2.3.2. So far, however, limiting threshold copulas have not been extensively investigated in dimensions higher than two, nor have the copula limits been thoroughly studied when we have different threshold levels for each risk factor and/or let these tend to infinity (in the case of upper limiting threshold copulas) at different rates. Thus, investigating these issues will give us further and important insight into the limiting behavior of multivariate risk factors which is important for estimating tail risk. 112
Chapter 6 Conclusion This thesis has investigated extreme value theory (EVT) and its potential in financial risk management. In the research process, we have concentrated on providing answers to three primary research questions stated in Section 1.1. In the following we summarize the main findings and conclusions that we have reached. A fundamental part of the theoretical foundation of EVT is the so-called Fisher-Tippett theorem which describes the limiting behavior of appropriately normalized sample maxima. Under the assumption of independently and identically distributed (iid) data, appropriately normalized maxima converge in distribution to a generalized extreme value (GEV) distribution and we say that the underlying loss distribution is in the maximum domain of attraction of either the Fréchet, Gumbel or Weibull class of standard extreme value distributions. Based on this result, one approach to modeling extremes is the so-called block-maxima model in which the distribution of maximum (or minimum) realizations, i.e. typically losses on a risk factor in the case of financial risk management, is directly modeled by a GEV distribution. A more recent approach to modeling extremes is the peaks over threshold (POT) model which is based on another fundamental result in EVT, the Pickands-Balkema-de Haan theorem. The theorem shows that if the underlying loss distribution is in the maximum domain of attraction of an extreme value distribution, the generalized Pareto distribution (GPD) is the limiting distribution for the excess losses, as the threshold is raised. Thus, the theorem suggests that if we choose a sufficiently high threshold, exceedances will show generalized Pareto behavior. Under the assumption of iid data, the POT model stipulates that the exceedances occur in time according to a homogenous Poisson process with constant intensity and the excess amounts (i.e. the size of the losses) are iid according to a GPD. Consequently, to model the tail of the underlying loss distribution of a risk factor, we can model the tail of the distribution above a high threshold as a GPD. Compared to the block maxima approach, the POT model uses data more efficiently. As a 113
result, it has achieved the most attention lately, both theoretically and in practical applications. Also, the POT model makes it possible to estimate well-known risk measures such as Value at Risk (VaR) and Expected Shortfall (ES). Thus, for risk management purposes, we favor the POT model. However, while the iid assumption may be a reasonable assumption in e.g. flood analysis or other disciplines to which EVT has been applied, independence can hardly be assumed in financial applications, such as management of financial risk, where extreme events tend to occur in clusters caused by temporal dependence in the data. Research has offered various attempts to deal with the issue of dependent data. The approach that we have adopted is to pre-whiten the return series using a volatility model such as GARCH and then conduct EVT analysis on the standardized residuals. To evaluate the performance of EVT-based risk measurement methods, we conducted an empirical study based on an equally-weighted portfolio composed of three Danish stocks. The performance of the methods was evaluated based on their ability to accurately estimate risk measures such as Value at Risk and Expected Shortfall for 1-day and 10-day forecast horizons. Furthermore, we considered the risk of both a short and long trading position in the portfolio. Based on the empirical results, we generally find that the EVT-based methods perform better or equally well compared to a number of alternative risk measurement methods. Treating the portfolio value as a single risk factor, we considered a univariate EVT-method, called HS-CONDEVT, which combines GARCH-type modeling of volatility and fitting of a generalized Pareto distribution to the tails of the underlying distribution. Compared to the performance of alternative univariate methods for risk measurement, the empirical results demonstrate that HS-CONDEVT outperforms alternative univariate methods such as historical simulation (HS) and HS combined with a GARCH-type model assuming normally distributed innovations. Moreover, HS-CONDEVT is found to be a viable alternative to filtered HS and to HS combined with a GARCH-type model assuming t distributed innovations. Treating the three stocks in the portfolio as risk factors, we considered a multivariate EVT-based method, called MCONDEVT, which combines a copula with margins based on GARCH-type modeling and GPD fitting. The method was implemented in three variants using three different copulas; a Gaussian, a t, and a Gumbel copula. Comparatively, we find that the variants of the MCONDEVT method outperform other multivariate methods such as variance-covariance (VC), VC combined with a multivariate EWMA model, multivariate GARCH based on a constant conditional correlation structure, and multivariate GARCH based on a dynamic conditional correlation structure. Finally, comparing the performance of the univariate and multivariate methods altogether, we find that the implemented variants of the MCON- DEVT method rank among the top performing methods. In comparison with 114
the competing methods, the variant based on a Gumbel copula appears to have the best performance for the long position and a fair, but less impressive performance for the short position. The variant based on a Gaussian copula delivers an excellent performance for the short position, but its ability to estimate Expected Shortfall was less adequate for the long position. Finally, the variant based on a t copula appears to have the best performance overall among the competing risk measurement methods. 115
Bibliography T.G. Andersen, T. Bollerslev, P.F. Christoffersen, and F.X. Diebold. Practical Volatility and Correlation Modeling for Financial Market Risk Management. NBER working paper, 2005. P. Artzner, F. Delbaen, J.M. Eber, and D. Heath. Coherent Measures of Risk. Mathematical Finance, 9(3):203 228, 1999. A.A. Balkema and L. de Haan. Residual Life Time at Great Age. The Annals of Probability, 2(5):792 804, 1974. The Basel Committee on Banking Supervision. Basel II: International Convergence of Capital Measurement and Capital Standards: A Revised Framework. Bank of International Settlements, 2004. I. Berkes, L. Horváth, and P. Kokoszka. GARCH Processes: Structure and Estimation. Bernoulli, 9(2):201 227, 2003. J. Berkowitz and J. O Brien. How Accurate Are Value-at-Risk Models at Commercial Banks? The Journal of Finance, 57(3):1093 1111, 2002. F. Black. Studies of Stock Price Volatility Changes. In Proceedings of the 1976 Meetings of the Business and Economic Statistics Section, American Statistical Association, pages 177 181, 1976. T. Bollerslev. Generalized Autoregressive Conditional Heteroskedasticity. Journal of Econometrics, 31(3):307 327, 1986. T. Bollerslev. Modelling the Coherence in Short-Run Nominal Exchange Rates: A Multivariate Generalized ARCH Model. The Review of Economics and Statistics, 72(3):498 505, 1990. T. Bollerslev and J.M. Wooldridge. Quasi-Maximum Likelihood Estimation and Inference in Dynamic Models with Time-Varying Covariances. Econometric Reviews, 11(2):143 172, 1992. T. Bollerslev, R.F. Engle, and J.M. Wooldridge. A Capital Asset Pricing Model with Time-Varying Covariances. The Journal of Political Economy, 96(1):116 131, 1988. 116
E. Bouyé, V. Durrleman, A. Nikeghbali, G. Riboulet, and T. Roncalli. Copulas for Finance - A Reading Guide and Some Applications. Groupe de Recherche Opérationelle, Credit Lyonnais, 2000. W. Breymann, A. Dias, and P. Embrechts. Dependence Structures for Multivariate High-Frequency Data in Finance. Quantitative finance, 3(1):1 14, 2003. E. Brodin and C. Klüppelberg. Extreme Value Theory in Finance. Submitted for publication: Center for Mathematical Sciences, Munich University of Technology, 2006. C. Brooks and G. Persand. Volatility Forecasting for Risk Management. Journal of Forecasting, 22(1):1 22, 2003. H.N.E. Byström. Extreme Value Theory and Extremely Large Electricity Price Changes. International Review of Economics & Finance, 14(1):41 55, 2005. J.Y. Campbell, A.W. Lo, and A.C. MacKinlay. The Econometrics of Financial Markets. Princeton University Press, Princeton, NJ, 1997. V. Chavez-Demoulin, A.C. Davison, and A.J. McNeil. A Point Process Approach to Value-at-Risk Estimation. Quantitative Finance, 5, 2005. P.F. Christoffersen. Evaluating Interval Forecasts. International Economic Review, 39(4):841 862, 1998. P.F. Christoffersen. Elements of Financial Risk Management. Academic Press, San Diego, CA, 2003. S. Coles. An Introduction to Statistical Modeling of Extreme Values. Springer Verlag, London, 2001. S. Coles and J.A. Tawn. Modelling Extreme Multivariate Events. Journal of the Royal Statistical Society. Series B (Methodological), 53(2):377 392, 1991. R. Cont. Empirical Properties of Asset Returns: Stylized Facts and Statistical Issues. Quantitative Finance, 1(2):223 236, 2001. D.J. Daley and D. Vere-Jones. An Introduction to the Theory of Point Processes - Volume I: Elementary Theory and Methods. Springer Verlag, Berlin, 2003. J. Danielsson. The Value-at-Risk Reference: Key Issues in the Implementation of Market Risk. Risk Books, London, 2007. 117
J. Danielsson and C. De Vries. Tail Index and Quantile Estimation with Very High Frequency Data. Journal of Empirical Finance, 4(2-3):241 257, 1997. J. Danielsson and C. De Vries. Value-at-Risk and Extreme Returns. Annales d Economie et de Statistique, (60):239 270, 2000. J. Danielsson and Y. Morimoto. Forecasting Extreme Financial Risk: A Critical Analysis of Practical Methods for the Japanese Market. Monetary and Economic Studies, 18(2):25 48, 2000. L. de Haan. Extremes in Higher Dimensions: The Model and Some Statistics. In Proceedings of the 45th Session International Statistical Institute, volume 26, 1985. L. de Haan and S.I. Resnick. Limit Theory for Multivariate Sample Extremes. Probability Theory and Related Fields, 40(4):317 337, 1977. A. Dekkers, J. Einmahl, and L. de Haan. A Moment Estimator for the Index of an Extreme-Value Distribution. The Annals of Statistics, 17(4): 1833 1855, 1989. S. Demarta and A.J. McNeil. The t Copula and Related Copulas. International Statistical Review, 73(1):111 129, 2005. B. Efron and R. Tibshirani. An Introduction to the Bootstrap. Chapman & Hall, Boca Raton, FL, 1993. P. Embrechts, C. Klüppelberg, and T. Mikosch. Modelling Extremal Events for Insurance and Finance. Springer Verlag, Berlin, 1997. P. Embrechts, F. Lindskog, and A. McNeil. Modelling Dependence with Copulas and Applications to Risk Management. In: Handbook of Heavy Tailed Distributions in Finance, Chapter 8, pages 329 384, 2003. R.F. Engle. Autoregressive Conditional Heteroscedasticity with Estimates of the Variance of United Kingdom Inflation. Econometrica: Journal of the Econometric Society, 50(4):987 1007, 1982. R.F. Engle. Dynamic Conditional Correlation. Journal of Business and Economic Statistics, 20(3):339 350, 2002. R.F. Engle and K.F. Kroner. Multivariate Simultaneous Generalized ARCH. Econometric Theory, 11(1):122 150, 1995. R.F. Engle and K. Sheppard. Theoretical and Empirical Properties of Dynamic Conditional Correlation Multivariate GARCH. NBER Working Paper, 2001. 118
E.F. Fama. The Behavior of Stock-Market Prices. Journal of Business, 38 (1):34 105, 1965. R.A. Fisher and L.H.C. Tippett. Limiting Forms of the Frequency Distribution of the Largest or Smallest Member of a Sample. In Proceedings of the Cambridge philosophical society, volume 24, pages 180 190, 1928. J. Galambos. Order Statistics of Samples from Multivariate Distributions. Journal of the American Statistical Association, pages 674 680, 1975. J. Galambos. The Asymptotic Theory of Extreme Order Statistics. Wiley, New York, NY, 1987. A. Ghorbel and A. Trabelsi. Measure of Financial Risk using Conditional Extreme Value Copulas with EVT Margins. The Journal of Risk, 11(4): 51 85, 2009. M. Gilli and E. Kellezi. An Application of Extreme Value Theory for Measuring Financial Risk. Computational Economics, 27(2):207 228, 2006. L.R. Glosten, R. Jagannathan, and D.E. Runkle. On the Relation Between the Expected Value and the Volatility of the Nominal Excess Return on Stocks. Journal of Finance, 48(5):1779 1801, 1993. B. Gnedenko. Sur la distribution limite du terme maximum d une série aléatoire. The Annals of Mathematics, 44(3):423 453, 1943. E.J. Gumbel. Statistics of Extremes. Columbia University Press, New York, NY, 1958. P.R. Hansen and A. Lunde. A Forecast Comparison of Volatility Models: Does Anything Beat a GARCH (1, 1)? Journal of Applied Econometrics, 20(7):873 889, 2005. A.G. Hawkes. Spectra of Some Self-Exciting and Mutually Exciting Point Processes. Biometrika, 58(1):83 96, 1971. A.G. Hawkes and D. Oakes. A Cluster Process Representation of a Self- Exciting Process. Journal of Applied Probability, 11(3):493 503, 1974. C. Heij, P. de Boer, P. Franses, T. Kloek, and H. van Dijk. Econometric Methods with Applications in Business and Economics. Oxford University Press, Oxford, 2004. B.M. Hill. A Simple General Approach to Inference about the Tail of a Distribution. The Annals of Statistics, 3(5):1163 1174, 1975. J. Hosking. Maximum-Likelihood Estimation of the Parameters of the Generalized Extreme-Value Distribution. Applied Statistics, 34:301 310, 1985. 119
J. Hosking and J. Wallis. Parameter and Quantile Estimation for the Generalized Pareto Distribution. Technometrics, 29(3):339 349, 1987. J. Hosking, J. Wallis, and E. Wood. Estimation of the Generalized Extreme- Value Distribution by the Method of Probability-Weighted Moments. Technometrics, 27(3):251 261, 1985. J. Hull and A. White. Incorporating Volatility Updating into the Historical Simulation Method for Value-at-Risk. Journal of Risk, 1(1):5 19, 1998. D.W. Jansen and C.G. De Vries. On the Frequency of Large Stock Returns: Putting Booms and Busts into Perspective. The Review of Economics and Statistics, 73(1):18 24, 1991. A.F. Jenkinson. The Frequency Distribution of the Annual Maximum (or Minimum) Values of Meteorological Elements. Quarterly Journal of the Royal Meteorological Society, 81(3):158 171, 1955. H. Joe. Multivariate Models and Dependence Concepts. Chapman & Hall, London, 1997. H. Joe, R.L. Smith, and I. Weissman. Bivariate Threshold Methods for Extremes. Journal of the Royal Statistical Society. Series B (Methodological), 54(1):171 183, 1992. P. Jorion. Value at Risk. McGraw-Hill, New York, NY, 2001. A. Juri and M.V. Wüthrich. Copula Convergence Theorems for Tail Events. Insurance: Mathematics and Economics, 30(3):405 420, 2002. C.H. Kimberling. A Probabilistic Interpretation of Complete Monotonicity. Aequationes Mathematicae, 10(2):152 164, 1974. K. Koedijk, M. Schafgans, and C. De Vries. The Tail Index of Exchange Rate Returns. Journal of International Economics, 29(1-2):93 108, 1990. S. Kotz and S. Nadarajah. Extreme Value Distributions: Theory and Applications. World Scientific Publishing Company, 2000. K. Kuester, S. Mittnik, and M.S. Paolella. Value-at-Risk Prediction: A Comparison of Alternative Strategies. Journal of Financial Econometrics, 4(1):53, 2006. P.H. Kupiec. Techniques for Verifying the Accuracy of Risk Measurement Models. The Journal of Derivatives, 3(2):73 84, 1995. S. Lauridsen. Estimation of Value at Risk by Extreme Value Methods. Extremes, 3(2):107 144, 2000. 120
M. Leadbetter. Extremes and Local Dependence in Stationary Sequences. Probability Theory and Related Fields, 65(2):291 306, 1983. M. Leadbetter. On a Basis for Peaks over Threshold Modeling. Statistics & Probability Letters, 12(4):357 362, 1991. M. Leadbetter, G. Lindgren, and H. Rootzén. Extremes and Related Porperties of Random Sequences and Processes. Springer Verlag, Berlin, 1983. S.W. Lee and B.E. Hansen. Asymptotic Properties of the Maximum Likelihood Estimator and Test of the Stability of Parameters of the GARCH and IGARCH Models. Econometric Theory, 10:29 52, 1994. M. Longin. The Asymptotic Distribution of Extreme Stock Market Returns. Journal of Business, 69(3):383 408, 1996. J.A. Lopez. Testing Your Risk Tests. The Financial Survey, 20(3):18 20, 1998. T. Lux. The Stable Paretian Hypothesis and the Frequency of Large returns: An Examination of Major German Stocks. Applied Financial Economics, 6(6):463 475, 1996. A.I. Maghyereh and H.A. Al-Zoubi. The Tail Behavior of Extreme Stock Returns in the Gulf Emerging Markets. Studies in Economics and Finance, 25(1):21 37, 2008. B. Mandelbrot. The Variation of Certain Speculative Prices. Journal of Business, 36(4):394 419, 1963. V. Marimoutou, B. Raggad, and A. Trabelsi. Extreme Value Theory and Value at Risk: Application to Oil Market. Energy Economics, 31(4):519 530, 2009. A.J. McNeil. Calculating Quantile Risk Measures for Financial Return Series using Extreme Value Theory. Department Methematik, ETH Zentrum, Zurich, 1998. A.J. McNeil and R. Frey. Estimation of Tail-Related Risk Measures for Heteroscedastic Financial Time Series: An Extreme Value Approach. Journal of Empirical Finance, 7(3-4):271 300, 2000. A.J. McNeil, R. Frey, and P. Embrechts. Quantitative Risk Management: Concepts, Techniques, and Tools. Princeton University Press, Princeton, NJ, 2005. R.B. Nelsen. An Introduction to Copulas. Springer Verlag, London, 1999. 121
D.B. Nelson. Conditional Heteroskedasticity in Asset Returns: A New Approach. Econometrica: Journal of the Econometric Society, 59(2):347 370, 1991. D.B. Nelson and C.Q. Cao. Inequality Constraints in the Univariate GARCH Model. Journal of Business & Economic Statistics, 10(2):229 235, 1992. J.P. Nolan. Stable Distributions. Manuscript, in preparation, 2009. J. Pickands. Statistical Inference using Extreme Order Statistics. The Annals of Statistics, 3(1):119 131, 1975. J. Pickands. Multivariate Extreme Value Distributions. In Proceedings of the 43rd Session International Statistical Institute, Buenos Aires, volume 2, pages 859 878, 1981. P. Prescott and A. Walden. Maximum Likeiihood Estimation of the Parameters of the Three-Parameter Generalized Extreme-Value Distribution from Censored Samples. Journal of Statistical Computation and Simulation, 16 (3):241 250, 1983. J.P. Raoult and R. Worms. Rate of Convergence for the Generalized Pareto Approximation of the Excesses. Advances in Applied Probability, 35(4): 1007 1027, 2003. S.I. Resnick. Heavy-Tail Phenomena: Probabilistic and Statistical Modeling. Springer Verlag, Berlin, 2007. S.I. Resnick. Extreme Values, Regular Variation, and Point Processes. Springer Verlag, London, 2008. A. Sklar. Fonctions de répartition à n dimensions et leurs marges. Publications de l Institut de Statistique de l Université de Paris, 8:229 231, 1959. R.L. Smith. Maximum Likelihood Estimation in a Class of Nonregular Cases. Biometrika, 72(1):67 90, 1985. R.L. Smith. Estimating Tails of Probability Distributions. The Annals of Statistics, 15(3):1174 1207, 1987. N.N. Taleb. The Black Swan: The Impact of the Highly Improbable. Random House, Boston, MA, 2007. R. von Mises. La distribution de la plus grande de n valeurs. Rev. math. Union interbalcanique, 1(1):141 160, 1936. N. Wagner and T.A. Marsh. Measuring Tail Thickness under GARCH and an Application to Extreme Exchange Rate Changes. Journal of Empirical Finance, 12(1):165 185, 2005. 122
Z. Wang, Y. Jin, and Y. Zhou. Estimating Portfolio Risk Using GARCH- EVT-Copula Model: An Empirical Study on Exchange Rate Market. Advances in Neural Network Research and Applications, 67:65 72, 2010. W.K. Wong. Backtesting Trading Risk of Commercial Banks using Expected Shortfall. Journal of Banking & Finance, 32(7):1404 1415, 2007. J.M. Zakoian. Threshold Heteroskedastic Models. Journal of Economic Dynamics and Control, 18(5):931 955, 1994. 123