Penalized Splines - A statistical Idea with numerous Applications... Göran Kauermann Ludwig-Maximilians-University Munich Graz 7. September 2011 1
Penalized Splines - A statistical Idea with numerous Applications... which can be used for Copula Estimation Göran Kauermann Ludwig-Maximilians-University Munich Graz 7. September 2011 2
Regression Splines in a Nutshell The smooth regression model Y = µ(x) + ε is fitted by replacing µ(x) = β 0 + xβ 1 + x 2 } {{ β } 2 + K 2 u k (x τ k ) + k=1 } {{ } = X(x)β + Z(x)u = B(x)θ for knots τ 1, τ 2,... τ K. This yields the estimate µ = B(B T B) 1 B T y (K + q) (K + q), with K large, but not too large Graz 7. September 2011 3
Regressions Splines X(x) Z(x) Basis X Basis Z X 0 1 2 3 4 X 0 1 2 3 4 0.0 0.5 1.0 1.5 2.0 x 0.0 0.5 1.0 1.5 2.0 x Graz 7. September 2011 4
Regressions Splines We get the estimate via µ = B(B T B) 1 B T y K = 5 K = 10 y 0 2 4 6 8 10 y 0 2 4 6 8 10 x x Graz 7. September 2011 5
Need for Penalization K = 35 y 0 2 4 6 8 10 x Graz 7. September 2011 6
Need for Penalization K = 35 y 0 2 4 6 8 10 x Graz 7. September 2011 7
Need for Penalization K = 35 y 0 1 2 3 4 5 6 7 0.72 0.74 0.76 0.78 0.80 0.82 0.84 x Graz 7. September 2011 8
Need for Penalization K = 35 y 0 1 2 3 4 5 6 7 0.72 0.74 0.76 0.78 0.80 0.82 0.84 x Graz 7. September 2011 9
Penalized Least Square We penalize the coefficients in u in that we postulate K u 2 k = u T I K u k=1 small We minimize the Penalized Least Square n i=1 ( y i X(x i )β Z(x i )u) 2 + λu T I K u where λ is called the smoothing (penalization) parameter. Graz 7. September 2011 10
Penalized Regressions Splines We get the estimate via µ = B(B T B + P(λ)) 1 B T y K = 35, λ = 0 K = 35, λ = 1 y 0 2 4 6 8 10 y 0 2 4 6 8 10 x x Graz 7. September 2011 11
Penalized Spline Recipe (O Sullivan, 1986, Eilers & Marx, 1996, Ruppert, Wand & Carroll 2003) The Penalized Spline Recipe : 1. Take a rich, high dimensional Basis B(x), i.e. choose K generously large. 2. Minimize the penalized least squares criterion (Y Bθ) T (Y Bθ) + λθ T Dθ min with D as adequately chosen penalty matrix, in the simplest case D = diag(0 q,i K ) 3. Choose penalty parameter λ data-driven (e.g. by cross validation) Graz 7. September 2011 12
Reformulation For general splines can rewrite the penalized estimation to (Wand & Ormerod, 2008, Aust. & NZ J. Stat) Y = Xβ + Zu + ǫ with X low dimensional and Z high dimensional and penalized least square r(y Xβ Zu) T (Y Xβ Zu) + λu T Du min u, β with penalty matrix D usually chosen as identity matrix. Graz 7. September 2011 13
Reformulation For general splines can rewrite the penalized estimation to (Wand & Ormerod, 2008, Aust. & NZ J. Stat) Y = Xβ + Zu + ǫ with X low dimensional and Z high dimensional and penalized least square (Y Xβ Zu) T (Y Xβ Zu) + λ u T Du } {{ } quadratic form min u, β with penalty matrix D usually chosen as identity matrix. Graz 7. September 2011 14
Linking Penalized Splines with Linear Mixed Models We formulate the penalty as a priori normal distribution: u N(0, σ 2 ud 1 ) (1) Now, coefficient vector u is considered as random. Conditioning on u yields Y u N(Xβ + Zu, σ 2 ǫi) (2) With (1) and (2) we get a Linear Mixed Model Graz 7. September 2011 15
Posterior Estimates in a Linear Mixed Models We estimate u through the Posterior Bayes estimate (or equivalently Best Linear Unbiased Predictor (BLUP)) û = E(u Y ; β) = ( 1 Z T Z + σ2 ε D) σu 2 Z T (Y Xβ) where the penalty parameter equals λ = σ 2 ε/σ 2 u Graz 7. September 2011 16
Penalized Spline Smoothing and Linear Mixed Models We obtain the following results: This Posterior Bayes estimate is equivalent to the Penalized Spline estimate The penalty parameter λ = σ 2 ε/σ 2 u is a regular parameter in the linear mixed model. Smoothing (with penalized splines) can be carried out with software for fitting linear mixed models Graz 7. September 2011 17
Collection of Results Smoothing parameter λ = σ 2 ǫ/σ 2 u can be estimated by maximum likelihood. (Kauermann, 2005, JSPI; Ruppert, Wand & Carroll, 2003). This avoids grid searching! Smoothing parameter can be selected in the presence of correlated errors (Krivobokova & Kauermann, 2007, JASA) AIC based selection fails here! Graz 7. September 2011 18
Collection of Results (cont) Asymptotic results on the number of knots Penalized Splines are asymptotically justified! (Kauermann, Krivobokova & Fahrmeir, 2009, JRSS B...).Number of knots for fixed sample size Practical decision rule on how to choose K (Kauermann & Opsomer 2011, Biometrika) Graz 7. September 2011 19
Collection of Results (cont).local adaptive smoothing Simple and fast computation! (Krivobokova, Crainiceanu & Kauermann, 2008, JCGS).Small area estimation and smoothing Combination of smoothing and mixed models! (Opsomer, Claeskens, Ranalli, Kauermann & Breidt, 2008, JRSS B) and... Graz 7. September 2011 20
Outline of (Rest of) Talk Penalized Spline Fitting is well applicable beyond regression!! We show how to use Penalized Splines for Copula Estimation The idea of Sparse Grids The idea of Pair-Copulas Graz 7. September 2011 21
Penalized Smooth Estimation of Copulas using Sparse Grids Graz 7. September 2011 22
Copulas The idea of copulas traces back to: Hoeffding (1940), Sklar (1959) In 1997 first entry for copula in Encyclopedia of Statistical Sciences Wide area of applications and theory: mathematics, e.g. Joe (1997) or Nelsen (2006) financial econometrics, e.g. Embrechts (2009) biostatistics, e.g. Bogaerts & Lesaffre (2008) marketing, e.g. Danaher & Smith (2011) engineering, e.g. Kelly (2007) ecology, e.g. Briggs et al. (2011)... Graz 7. September 2011 23
Definition of Copulas Sklar s (1959) theorem: The distribution function of a p dimensional random vector x = (x 1,..., x p ) can be written as F(x 1,..., x p ) = C{F 1 (x 1 ),..., F p (x p )}, We assume that the C( ) is differentiable, so that the density results as f(x 1,..., x p ) = c(f 1 (x 1 ),..., F p (x p )) p j=1 f j (x j ). The copula carries the multivariate dependence structure Graz 7. September 2011 24
The idea of Copulas A multivariate distribution F(x 1,..., x p ) decomposes into: 1. The p univariate marginal distributions F 1 (x 1 ),... F p (x p ) 2. The dependence structure in form of the copula density c(u 1,..., u p ) on the unit cube, i.e. u j [0,1], j = 1,..., p. Our task: Estimation of c( ) with penalized splines. Graz 7. September 2011 25
Properties of the Copula Density The copula density c(u 1,..., u p ) has the bounded support [0,1] p. Univariate margins of c(u 1,..., u p ) are uniform, i.e. c j (u j ) 1 for j = 1,... p Often, high density areas are at the boundary of [0,1] p. Graz 7. September 2011 26
2 2 Demonstration - Normal Copula Normal Data Transformed Data True Copula Density 1 0.4 0.2 0.6 0.8 1.2 3 2 1 0 1 2 3 F^{ 1}(x_2) 1.4 1.4 1.6 1.6 1.8 1.8 2.2 2.2 0.8 0.6 1.2 0.2 0.4 1 2.4 2.4 4 2.8 2.8 3 3 4 2 1 0 1 2 3 x1 F^{ 1}(x_1) Graz 7. September 2011 27
Demonstration - Clayton Copula Clayton Copula (with Normal Margins) Transformed Data True Copula Density 2.5 2 1.5 1 0.5 1.5 x2 2 0 2 4 F^{ 1}(x_2) 2 2.5 3 3.5 4 5 1 6 0.5 3 2 1 0 1 2 x1 F^{ 1}(x_1) Graz 7. September 2011 28
Nonparametric Copula Estimation Nonparametric copula estimation is weakly developed (probably) since 1. Constraints: The copula density has uniform margins, this is hard to accommodate in classical kernel density estimation. 2. Boundary: The support is bounded, which requires special kernels to avoid boundary bias problems. 3. Dimension: The copula idea is suitable for high dimensions, nonparametric density estimation is not suited for that (curse of dimensionality). Graz 7. September 2011 29
Penalized Estimation of a Copula Penalized Estimation of Copulas tackles the three problems: 1. Constraints: We will use B-splines and Bernstein polynomials, which easily accommodate the constraints. 2. Boundary: The splines are bounded, so there is no boundary problem. 3. Dimension: We tackle the curse of dimensionality with sparse grids and pair-copulas. Graz 7. September 2011 30
B-spline fitting of Copulas Let u j = F 1 j (x j ) so that c(u 1,..., u p ) is a density on [0,1] p. Let k = (k 1,..., k p ) K be a p-dimensional multi index. We replace/approximate c( ) by c(u 1,..., u p ) k K φ(u 1,..., u p ) b k =: k K where φ l ( ) are univariate B-spline Density Bases p j=1 φ kj (u j )b k, Graz 7. September 2011 31
Marginal B-spline Density Basis 0 5 10 15 Graz 7. September 2011 32
Tensor Product Basis 250 200 150 100 50 0 1.0 0.8 0.6 0.4 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 Graz 7. September 2011 33
Log Likelihood We assume i.i.d. data u i = (u i1,..., u ip ), for i = 1,..., n The log likelihood equals with b = (b k, k K). l(b) = ( ) n log φ k (u i )b k i=1 k K We approximate l(b) by second order Taylor series, i.e. and we maximize Q(b). l(b) = s(b) + b T Hb +... } {{ } =: Q(b) Graz 7. September 2011 34
Penalizing the log Likelihood We need a penalization to obtain a smooth fit. Penalized likelihood: l p (b, λ) = l(b) 1 2 bt D(λ)b } {{ } Penalty Q(b) 1 2 bt D(λ)b =: Q p (b, λ) with penalty matrix D(λ) = p j=1 λ jd j Graz 7. September 2011 35
Constraints on the Parameters 1. The marginal density is uniform which results with the linear constraint K c j (u j ) = φ kj (u j )b (j)kj = 1 (3) k j with b (j)kj as marginal spline coefficient. 2. We fit a density with c(u)du = 1, which results through b k = 1 (4) k K 3. The density is positive, which results with φ k (u 1,..., u p )b k 0 (5) k K Graz 7. September 2011 36
Quadratic Programming We intend to maximize Q p (b, λ) max subject to the given linear constraints (previous slide), written as Ab = 1, Bb 0 This can be solved with (iterative) Quadratic Programming (R package quadprog). Graz 7. September 2011 37
The Curse of Dimensionality For p > 2 the dimension of the full tensor product basis becomes numerically infeasible. Dimension of Spline Basis marginal; spline dimension basis p = 3 p = 4 p = 5 K=9 (2 3 ) tensor product 729 6,561 59,049 K=17 (2 3 ) tensor product 4,913 83,521 1,419,857 K=33 (2 3 ) tensor product 35,937 1,185,921 39,135,393 Graz 7. September 2011 38
Hierarchical B-splines regular B spline hierarchy level 0 B spline B[, 1] u u Graz 7. September 2011 39
Hierarchical B-splines regular B spline hierarchy level 1 B spline B[, 1] u u Graz 7. September 2011 40
Hierarchical B-splines regular B spline hierarchy level 2 B spline B[, 1] u u Graz 7. September 2011 41
Hierarchical B-splines regular B spline hierarchy level 3 B spline B[, 1] u u Graz 7. September 2011 42
Tackling the Curse of Dimensionality The idea is now to built a sparse tensor product, by including tensors up to a given hierarchy only. Sparse grids have been proposed by Zenger (1992, Notes on Numerical Fluid Mechanics) Graz 7. September 2011 43
Φ (0) (u 1 ) Φ (0) (u 2 ) Φ (1) (u 1 ) Φ (0) (u 2 ) Φ (2) (u 1 ) Φ (0) (u 2 ) Φ (0) (u 1 ) Φ (1) (u 2 ) Φ (1) (u 1 ) Φ (1) (u 2 ) neglected Φ (0) (u 1 ) Φ (2) (u 2 ) neglected neglected Graz 7. September 2011 44
Tackling the Curse of Dimensionality Sparse grids allow to push the Curse of Dimensionality Dimension of Spline Basis margin; spline dimension basis p = 3 p = 4 p = 5 K=9 (2 3 ) tensor product 729 6,561 59,049 sparse grid basis 123 297 705 K=17 (2 3 ) tensor product 4,913 83,521 1,419,857 sparse grid basis 368 961 2,441 K=33 (2 3 ) tensor product 35,937 1,185,921 39,135,393 sparse grid basis 1,032 2,882 7,763 Graz 7. September 2011 45
Example Daily returns in 2006/2007 from Lufthansa and Deutsche Bank Daily Returns Transformed Daily Returns return Lufthansa 1.0 0.5 0.0 0.5 transformed return Lufthansa 4 2 0 2 return Deutsche Bank transformed return Deutsche Bank Graz 7. September 2011 46
Example 4 3 density 2 1 0 1.0 0.8 0.6 y2 0.4 0.2 0.2 0.4 0.6 y1 0.8 1.0 0.0 0.0 Lufthansa / Deutsche Bank return 2006/07 Graz 7. September 2011 47
Example Clayton: loglik = 37.46 Frank: loglik = 38.80 50 40 3 30 density density 2 20 10 1 0 1.0 1.0 0.8 0.6 y2 0.4 0.2 0.2 0.4 0.6 y1 0.8 1.0 0.8 0.6 y2 0.4 0.2 0.2 0.4 0.6 y1 0.8 1.0 0.0 0.0 0.0 0.0 Gumbel: loglik = 31.63 B-spline: loglik = 49.45 40 4 density 30 20 density 3 2 10 1 0 0 1.0 1.0 0.8 1.0 0.8 1.0 0.8 0.8 0.6 0.6 0.6 0.6 0.4 0.4 0.4 0.4 y2 y2 0.2 0.2 0.2 y1 0.2 y1 0.0 0.0 0.0 0.0 Graz 7. September 2011 48
Tackling the Curse of Dimensionality with Pair-Copulas Graz 7. September 2011 49
Copulas and Conditional Distributions With Sklar s theorem we get f(x 2 x 1 ) = c ( F 1 (x 1 ), F 2 (x 2 ) ) f 2 (x 2 ) We can express conditional densities with copulas. Factorization allows to write f(x 1, x 2,..., x p ) = f 1 (x 1 ) f(x 2 x 1 ) f(x 3 x 1, x 2 )... f(x p x 1,... x p 1 ) Graz 7. September 2011 50
The idea of Pair-Copulas With A = {x 1,... x p 2 } each conditional distribution is now written as ( ) f(x p x p 1,A) = c F p (x p A), F p 1 (x p 1 A) A f p (x p A) With Pair-Copulas we assume c ( F p (x p A), F p 1 (x p 1 A) A ) = c ( F p (x p A), F p 1 (x p 1 A) ) Conditioning set is omitted Graz 7. September 2011 51
Estimating Pair Copulas In Pair-Copulas we need to estimate: 1. Bivariate copulas c ( ) u p, u p 1 2. Univariate conditional marginal distributions: F j (x j A), j = 1,..., p. We focus on the first point in this talk. Graz 7. September 2011 52
Nonparametric Estimation of Bivariate Copulas As above, we use penalized estimation and assume c ( u p, u p 1 ) K K = φ k1 (u p )φ k2 (u p 1 )b k1 k 2 k 1 =0 k 2 =0 = {φ K (u p ) φ K (u p 1 )}b with φ() as basis of densities on [0,1]. Graz 7. September 2011 53
Bernstein Polynomials / Beta Distribution We use Bernstein polynomials (which are Beta densities) as bases, i.e. φ k (u) = (K + 1) 6 ( K k ) u k (1 u) K k 5 4 3 2 1 0 Graz 7. September 2011 54
Penalization We penalize the squared, second order derivative: ( 2 ) 2 c(u p, u p 1 ) ( u p ) 2 + ( 2 c(u p, u p 1 ) ( 2 u p 1 ) 2 ) 2 du p du p 1 = b T Pb } {{ } quadratic form Assuming for the quadratic penalty b N(0, λ 1 P ) we can again estimate λ as parameter. Graz 7. September 2011 55
Linear Constraints k 1 φ k1 (u 1 )b k1 = 1 Margins are uniform. k 1,k 2 b k1 k 2 = 1 The density c() integrates to 1. c ( ) u p, u p 1 0 The resulting fit is a density This is again accommodated using quadratic programming. Graz 7. September 2011 56
Estimating Univariate Conditional Margins For Copulas one can show that: F(x 1 x 2 ) = = ( ) C F 1 (x 1 ), F 2 (x 2 ) K k 1 =0 k 2 =0 F 2 (x 2 ) K Φ k1 (u 1 )φ k2 (u 2 )b k1 k 2 with Φ() as beta distribution function and φ() as beta density. The Bernstein approach easily allows to calculate univariate conditional distributions. Graz 7. September 2011 57
Example We look at the exchange rate of USD to Euro (EUR), British Pound (GBP) and Singapore Dollar (SIN). We model f(eur, GBP, SIN) = f(eur) f(gbp EUR) f(sin GBP, EUR) Graz 7. September 2011 58
Euro against GBP Example GBP against SIN 4 6 3 density 4 density 2 2 1 0 1.0 0.8 0.6 0.4 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 0 1.0 0.8 0.6 0.4 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 Euro against SIN given GPB 2.0 1.5 density 1.0 0.5 0.0 1.0 0.8 1.0 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0.0 0.0 Graz 7. September 2011 59
Discussion Penalized Spline are a flexible modelling tool. The link to Mixed Models allows for new, innovative statistical modelling. Penalized Estimation easily extends to Copula estimation. Quadratic programming is a useful alternative to classical Newton Raphson. Penalization guarantees smoothnes.... Graz 7. September 2011 60
Thank you for your attention Graz 7. September 2011 61