Statistics and Probability Letters 79 (2009) 1097 1104 Contents lists available at ScienceDirect Statistics and Probability Letters journal homepage: www.elsevier.com/locate/stapro Goodness-of-fit test for tail copulas modeled by elliptical copulas Deyuan Li a,, Liang Peng b a School of Management, Fudan University, China b School of Mathematics, Georgia Institute of Technology, USA a r t i c l e i n f o a b s t r a c t Article history: Received 3 June 2008 Received in revised form 19 November 2008 Accepted 16 December 2008 Available online 4 January 2009 Modeling and estimating a tail copula play an important role in forecasting rare events. Due to their easy simulation, elliptical copulas have been employed in risk management. Recently, Klüppelberg, [Klüppelber, C., Kuhn, G., Peng, L., 2007. Estimating the tail dependence function of an elliptical distribution. Bernoulli 13 (1), 229 251; Klüppelberg, C., Kuhn, G., Peng, L., 2008. Semi-parametric models for the multivariate tail dependence function the asymptotically dependent case. Scandinavian Journal of Statistics 35, 701 718] proposed to model a tail copula by an elliptical copula, which results in an explicit parametric model for the tail copula. In this paper, we propose a goodnessof-fit test for such a parametric model and some real data analyses show that this fitting cannot be rejected. Therefore we demonstrate the practical applicability of this model. 2008 Elsevier B.V. All rights reserved. 1. Introduction The insurance and reinsurance industry is increasingly experiencing a rise in both intensity and magnitude of losses due to natural and man-made catastrophes. In general, these disasters happen rarely and do cost billions of dollars. Moreover, insurance risks exhibit skewed distributions, see Lane (2000), and heavy tailed distributions and other skewed distributions have been applied to model insurance risks. For example, Matthys et al. (2004) employed heavy tailed distributions to estimate Value-at-Risk for a European car insurance portfolio and the SOA Group Medical Large Claims Database, which records all the claim amounts exceeding $25,000 over the period 1991 1992; Vandewalle and Beirlant (2006) applied heavy tailed distribution to estimate the risk premium for an excess-of-loss reinsurance policy in excess of a high retention level with application to the Secura Belgian Re data set on automobile claims from 1998 until 2001; Vernic (2006) applied multivariate skew-normal distributions to derive explicit formulas for computing tail conditional expectation and capital allocation in insurance; Bolance et al. (2003) used transformed density estimation to estimate actuarial loss functions with application to the data set of automobile claims in the Netherlands; Valdez and Chernih (2003) applied elliptical distributions to derive capital allocation formula in insurance; Hashorva (2005) studied the tail asymptotic behavior of elliptical random vectors; Frees and Wang (2006) applied elliptical copulas to model the dependence over time with application to automobile liability claims from a sample of 29 towns of Massachusetts from 1994 till 1998. Due to the Basel II Capital Accord for banking regulation and Solvency II project for insurance regulation, copula and tail copula have attracted much attention in risk management. Suppose X = (X 1,..., X d ) T is a random vector with distribution function F and continuous marginals F 1,..., F d. Then the copula of X is defined as C X (x 1,..., x d ) = F(F 1 (x 1),..., F d (x d)), x = (x 1,..., x d ) T [0, 1] d, (1.1) Corresponding address: School of Management, Fudan University, Room 736, Siyuan Building 670 Guoshun Road, 200433 Shanghai, China. E-mail address: deyuanli@fudan.edu.cn (D. Li). 0167-7152/$ see front matter 2008 Elsevier B.V. All rights reserved. doi:10.1016/j.spl.2008.12.019
1098 D. Li, L. Peng / Statistics and Probability Letters 79 (2009) 1097 1104 where F j denotes the generalized inverse function of F j, and the tail copula of X is defined as λ X (x 1,..., x d ) = lim t 1 P(1 F 1 (X 1 ) tx 1,..., 1 F d (X d ) tx d ), x 1,..., x d 0. (1.2) t 0 For applications of copula and tail copula in risk management, we refer to the book of McNeil et al. (2005) and the references therein. In order to forecast rare events, modeling and estimating tail copulas are of importance. Recently, Klüppelber et al. (2007) and Klüppelberg et al. (2008) proposed to model tail copulas via elliptical copulas, which results in a parametric model for the tail copula. Some advantages of such modeling are easy to simulate and straightforward to extend to high dimension. The estimation procedure involves both tail and non-tail parameters. The details are as follows. Let Z = (Z 1,..., Z d ) T denote an elliptical random vector satisfying Z d = GAU, (1.3) where G > 0 is a random variable, A is a deterministic d d matrix with AA T := Σ = (σ ij ) and rank(σ) = d, U is a d- dimensional random vector uniformly distributed on the unit hyper-sphere S d = {z R d : z T z = 1}, and U is independent of G. Define the linear correlation between Z i and Z j as ρ ij = σ ij / σ ii σ jj and denote by D = (ρ ij ) the linear correlation matrix. Let A i denote the ith row of A and let F U denote the uniform distribution on S d. By assuming (A1) ρ ii > 0 for i = 1,..., d and ρ ij < 1 for i j; (A2) Σ = D; (A3) lim t P(G > tx)/p(g > t) = x α for x > 0 and some α > 0; (A4) X has the same copula as Z, Klüppelber et al. (2007) showed that the tail copula of X, λ X (x 1,..., x d ) can be written as λ X {u S (x 1,..., x d ) = d :A 1 u>0,...,a d u>0} d x i(a i u) α df U (u) {u S d :A (A. 1 u>0} 1 u) α (1.4) df U (u) Since α and A i can be estimated via paired data, estimating tail copula via (1.4) does not depend on the dimension d; see Klüppelberg et al. (2008) for details. In order to apply this methodology, an important question is how to verify (1.4). In this paper we propose goodness-of-fit tests for condition (1.4). We organize this paper as follows. In Section 2, we propose tests for both iid data and dependent data. Some real data analyses are given in Section 3. 2. Main results 2.1. IID case Suppose we have iid observations X i = (X i1,..., X id ) T from X which satisfy conditions (A1) (A4). For each pair {(X ip, X iq )} n with p q, we estimate Kendall s tau τ pq and ρ pq by 2 ˆτ pq = n(n 1) sign((x ip X jp )(X iq X jq )) 1 i<j n and ( π ) ˆρ pq = sin 2 ˆτ pq, respectively. Let ˆα pq be the unique solution of α to the equation λ p,q (1, 1) = λ p,q (1, 1), where g pq (t) = arctan((t ˆρ pq )/ 1 ˆρ pq 2 ), λ p,q (x, y) = 1 n I (1 F np (X ip ) kn k x, 1 F nq(x iq ) kn ) y, π/2 x p g pq ((x p /x q ) λ p,q (x p, x q ) = 1/α ) (cos π/2 θ)α dθ + x q g pq ((x p /x q ) 1/α ) (cos θ)α dθ π/2, π/2 (cos θ)α dθ
D. Li, L. Peng / Statistics and Probability Letters 79 (2009) 1097 1104 1099 F np (x) = 1 n n I(X ip x), k = k(n) and k/n 0 as n. Therefore the estimator for α in (A3) is defined as 2 ˆα = ˆα pq. d(d 1) 1 p<q d Put ˆD = ( ˆρij ). Note that ˆD is not necessarily positive semidefinite. If not, we could apply algorithm 3.3 in Higham (2002) to project the indefinite correlation matrix to the class of positive semidefinite correlation matrices. Therefore we could obtain  such that  T = ˆD. Let Âi denote the ith row of Â, ˆλ(x 1,..., x d ) denote the right-hand side of (1.4) with α and A i replaced by ˆα and  i, and λ(x 1,..., x d ) = 1 n ( I 1 F n1 (X i1 ) k k n x 1,..., 1 F nd (X id ) k ) n x d. Our test statistic is defined as ) 2 T n = ( λ(x 1,..., x d ) ˆλ(x 1,..., x d ) w(x1,..., x d ) dx 1 dx d, where w(x 1,..., x d ) is a weight function, which may be chosen as d ˆλ(x x 1 1 x d,..., x d ) or 1. In order to derive the asymptotic limit of the test statistic T n, we need the following second order condition: there exists A(t) 0 as t 0 such that t 1 P(1 F 1 (X 11 ) tx 1,..., 1 F d (X 1d ) tx d ) λ X (x 1,..., x d ) lim = b(x 1,..., x d ) (2.1) t 0 A(t) holds locally uniformly on Rd + = {(x 1,..., x d ) (,..., ) : x i [0, ], i = 1,..., d}. Theorem 1. Suppose (A1) (A4) and (2.1) hold. Further, assume k = k(n), ka(k/n) 0 as n, w(x sup Rd 1,..., x d ) < and w(x 1,..., x d ) + α λ(x 1,..., x d ; α) dx 1 dx d <, where λ(x 1,..., x d ; α) denotes the right-hand side of (1.4). Then ( d d kt n W(x 1,..., x d ) λ X (x 1,..., x d )W i (x i ) x i α λ(x 2 1,..., x d ; α) d(d 1) w(x 1,..., x d ) dx 1 dx d W p,q (1, 1) x p λ p,q (1, 1)W p (1) α λ p,q(1, 1) x q λ p,q (1, 1)W q (1) in B( Rd + ) (see Schmidt and Stadtmüller (2006) for details on the convergence in this space), where W(x 1,..., x d ) is a Gaussian process with mean zero and covariance structure E [W(x 1,..., x d )W(y 1,..., y d )] = λ X (x 1 y 1,..., x d y d ), W i,j (x i, x j ) is equal to W(x 1,..., x d ) with x l = for l i, j, and W i (x 1,..., x d ) is equal to W(x 1,..., x d ) with x l = for l i. Proof. Using the arguments in the proof of Theorem 5 of Schmidt and Stadtmüller (2006), we have k ( λ(x 1,..., x d ) λ X (x 1,..., x d ) ) d W(x 1,..., x d ) in B( Rd + ). Like the proof of Theorem 2.2 of Klüppelberg et al. (2008), we have k ( ˆα α ) d 2 d(d 1) 1 p<q d It follows from (2.3) and Taylor expansion that ) k (ˆλ(x 1,..., x d ) λ(x 1,..., x d ; α) d α λ(x 2 1,..., x d ; α) d(d 1) d W p,q (1, 1) x p λ p,q (1, 1)W p (1) x q λ p,q (1, 1)W q (1) 1 p<q d Hence the theorem follows from (2.2) and (2.4). α λ p,q(1, 1) x i λ X (x 1,..., x d )W i (x i ) (2.2) W p,q (1, 1) x p λ p,q (1, 1)W p (1) x q λ p,q (1, 1)W q (1) α λ p,q(1, 1) ) 2. (2.3). (2.4)
1100 D. Li, L. Peng / Statistics and Probability Letters 79 (2009) 1097 1104 Fig. 1. Data of LOSS and ALAE without censoring (sample size n = 1500). In order to test (1.4), we need to compute the P-value of the above test statistic. One way is to simulate the limiting distribution given in Theorem 1. Here we propose to employ the following bootstrap method. Draw B random samples with sample size n from {(X i1,..., X id ) T } n j, say {(X,..., i1 X j id )T } n j, j = 1,..., B. For each bootstrap sample {(X,..., i1 X j id )T } n, we compute the bootstrap test statistic, say T n (j). Therefore the P-value is computed as 1 B B j=1 I(T n T (j)). n 2.2. Multivariate GARCH models In this section we extend the above procedure to model residual tail copulas in multivariate GARCH models. Recently a flexible class of semiparametric copula-based multivariate GARCH models has been proposed to quantify multivariate risks, in which univariate GARCH models are used to capture the dynamics of individual financial series, and parametric copulas are used to model the contemporaneous dependence among GARCH residuals with nonparametric marginals; see Chen and Fan (2005, 2006) and Chan et al. (2009) for details. In this section we extend the procedure in Section 2.1 to model residual tail copulas of this simple multivariate GARCH models. Suppose the observations {Y t = (Y t1,..., Y td ) T } n t=1 follow the model: Y tj = h tj ɛ tj, p j qj h tj = c j + α ij Y 2 + (2.5) t i,j β ij h t i,j, j = 1,..., d, where {ɛ t = (ɛ t1,..., ɛ td ) T } n t=1 is a sequence of iid random vectors, and (ɛ 11,..., ɛ 1d ) T satisfies (A1) (A4) and (2.1). In order to apply the testing procedure in Section 2.1, we need to estimate the residuals. For each j = 1,..., d, let γ j = (c j, α j,1,..., α j,pj, β j,1,..., β j,qj ) T denote the true GARCH parameters associated with the model (2.5). Let ˆγ j denote the quasi-mle of γ j based on the sample Y 1j,..., Y nj. Then, ɛ t can be estimated, say ˆɛ t. Details on these estimated residuals can be found in Berkes and Horvath (2003). Therefore, the testing procedure in Section 2.1 can be applied to the estimated residuals. Since the approximation rate between the estimated residuals and true residuals is faster than k 1/2, Theorem 1 still holds when {(X i1,..., X id )} n is replaced by the estimated residuals {(ˆɛ i1,..., ˆɛ id )} n. Moreover the bootstrap approach in Section 2.1 can be applied to the estimated residuals as well since the quasi-mles for γ j, j = 1,..., d have no contribution to the limiting distribution of the test statistic (see Chan et al. (2009)). 3. Data analysis In order to assess the practical usefulness of the proposed method of modeling a tail copula by an elliptical copula, we applied the proposed test to two two-dimensional real data sets. The first data set is an insurance company data on losses and
D. Li, L. Peng / Statistics and Probability Letters 79 (2009) 1097 1104 1101 Fig. 2. The test statistic kt n and its P-value are plotted against k for the data set of LOSS and ALAE. Fig. 3. Daily log-returns of exchange rates between Euro and US dollar are plotted against daily log-returns of exchange rates between British pound and US dollar (sample size n = 1995). ALAEs; see Fig. 1. This particular data set has been analyzed by Dupuis and Jones (2006), Frees and Valdez (1998), Klugman and Parsa (1999) and Peng (in press). Indeed, Peng (in press) used this data set to show how the model of fitting a tail copula via an elliptical copula can be employed to predict rare events. The second data set is exchange rates between Euro and US
1102 D. Li, L. Peng / Statistics and Probability Letters 79 (2009) 1097 1104 Fig. 4. The test statistic kt n and its P-value are plotted against k by assuming those log-returns are iid. Fig. 5. Residuals of the multivariate Garch(1, 1) model for the log-returns of exchange rates. dollar, and those between British pound and US dollar from January 3, 2000 till December 19, 2007 (sample size n = 1995); see Fig. 3 for the log-returns. First we apply the proposed test to the insurance data by computing the test statistic kt n with weight w(x, y) = 2 xy ˆλ(x, y) against k = 10, 15,..., 410; see the upper panel in Fig. 2. For computing the P-values, we employed B = 1000 in the proposed bootstrap method. In the lower panel of Fig. 2, we plot P-values against k = 10, 15,..., 410.
D. Li, L. Peng / Statistics and Probability Letters 79 (2009) 1097 1104 1103 Fig. 6. rates. The test statistic kt n and its P-value are plotted against k for the residuals of the multivariate Garch(1, 1) model for the log-returns of exchange Second, we treat the log-returns of exchange rates as iid observations and compute the test statistic kt n and P-values as above. In Fig. 4, we plot the computed test statistics and P-values against k = 10, 15,..., 410. Third, we fit a Garch(1, 1) model to each series of the log-returns of exchange rates and then apply the proposed test to the residuals; see Fig. 5 for the residuals. The test statistic kt n and P-value are computed as above and plotted against k = 10, 15,..., 410 in Fig. 6. From the lower panels in Figs. 2, 4 and 6, we cannot reject the proposed model fitting. In other words, we show that such a model is practically useful in addition to its advantages of easy simulation and availability in high dimension. Acknowledgments The authors thank a reviewer for his/her constructive comments, which improved the presentation of this paper. Li s research was supported by NNSFC Grant 10801038. Peng s research was supported by NSF grant SES-0631608 and the Society of Actuaries through the Committee on Knowledge Extension Research. References Berkes, I., Horvath, L., 2003. Limit results for the empirical process of squared residuals in GARCH models. Stochastic Processes and their Applications 105, 271 298. Bolance, C., Giullen, M., Nielsen, J.P., 2003. Kernel density estimation of actuarial loss functions. Insurance: Mathematics and Economics 32, 19 36. Chan, N.H., Chen, J., Chen, X., Fan, Y., Peng, L., 2009. Statistical inference for multivariate residual copula of GARCH models. Statistica Sinica 19 (1), 53 70. Chen, X., Fan, Y., 2005. Pseudo-likelihood ratio tests for semiparametric multivariate copula model selection. The Canadian Journal of Statistics 33 (2), 389 414. Chen, X., Fan, Y., 2006. Estimation and model selection of semiparametric copula-based multivariate dynamic models under copula misspecification. Journal of Econometrics 135, 125 154. Dupuis, D., Jones, B.L., 2006. Multivariate extreme value theory and its usefulness in understanding risk. North American Actuarial Journal 10, 1 27. Frees, E.W., Valdez, E.A., 1998. Understanding relationships using copulas. North American Actuarial Journal 2, 1 25. Frees, E.W., Wang, P., 2006. Copula credibility for aggregate loss models. Insurance: Mathematics and Economics 38, 360 373. Hashorva, E., 2005. Extremes of asymptotically spherical and elliptical random vectors. Insurance: Mathematics and Economics 36, 285 302. Higham, N., 2002. Computing the nearest correlation matrix a problem from finance. IMA Journal of Numerical Analysis 22 (3), 329 343. Klugman, S.A., Parsa, R., 1999. Fitting bivariate loss distributions with copulas. Insurance: Mathematics and Economics 24, 139 148. Klüppelber, C., Kuhn, G., Peng, L., 2007. Estimating the tail dependence function of an elliptical distribution. Bernoulli 13 (1), 229 251. Klüppelberg, C., Kuhn, G., Peng, L., 2008. Semi-parametric models for the multivariate tail dependence function the asymptotically dependent case. Scandinavian Journal of Statistics 35, 701 718. Lane, M.N., 2000. Pricing risk transfer transactions. ASTIN Buletin 30 (2), 259 293.
1104 D. Li, L. Peng / Statistics and Probability Letters 79 (2009) 1097 1104 Matthys, G., Delafosse, E., Guillou, A., Beirlant, J., 2004. Estimating catastrophic quantile levels for heavy-tailed distributions. Insurance: Mathematics and Economics 34, 517 537. McNeil, A.J., Frey, R., Embrechts, P., 2005. Quantitative Risk Management: Concepts, Techniques and Tools. Princeton University Press. Peng, L., (2009). Estimating the probability of a rare event via elliptical copulas, North American Actuarial Journal (in press). Schmidt, R., Stadtmüller, U., 2006. Nonparametric estimation of tail dependence. Scandinavian Journal of Statistics 33, 307 335. Valdez, E.A., Chernih, A., 2003. Wang s capital allocation formula for elliptically contoured distributions. Insurance: Mathematics and Economics 33, 517 532. Vandewalle, B., Beirlant, J., 2006. On univariate extreme value statistics and the estimation of reinsurance premiums. Insurance: Mathematics and Economics 38, 441 459. Vernic, R., 2006. Multivariate skew-normal distributions with applications in insurance. Insurance: Mathematics and Economics 38.2, 413 426.