GLAM Array Methods in Statistics

GLAM Array Methods in Statistics Iain Currie Heriot Watt University A Generalized Linear Array Model is a low-storage, high-speed, GLAM method for multidimensional smoothing, when data forms an array, Simon Fraser University model has a row and column structure which allows it to be written as a Kronecker product. May 09 Swedish male mortality data (HMD) Raw mortality surface 10 2 Deaths : D Exposures : E D,E : 81 101 4 6 8 10 90 19 19 19

Structure Generalized linear models A single cubic B spline Data: vectors y of deaths and e of exposures Model: a model matrix B of B-splines a parameter vector θ a link function Error distribution: Poisson Algorithm Scoring algorithm µ = E(y), log µ = log e + Bθ B Wδ Bˆθ = B Wδ z where z = B θ + W δ 1 (y µ) is the working vector and Wδ is a diagonal matrix of weights. B spline 0.0 0.1 0.2 0.3 0.4 0.5 0.6 100 B-spline basis A B-spline regression basis uses local basis functions. Bspline 0.0 0.3 0.6 B-spline basis: {B 1 (x), B 2 (x),...,b c (x)} where B 1 (x), B 2 (x),...,b c (x) are B-splines. Model matrix B = [B 1 (x), B 2 (x),...,b c (x)], n c. 19 19 19

Log mortality for Swedish males age 70 Penalties 3.6 3.5 3.4 3.3 3.2 3.1 3.0 Observed mortality B spline regression B spline coefficients Eilers & Marx (1996) imposed penalties on differences between adjacent coefficients (θ 1 2θ 2 + θ 3 ) 2 +... + (θ c 2 2θ c 1 + θ c ) 2 = θ D 2D 2 θ where D 2 is a second order difference matrix. Estimation is via penalized likelihood PL(θ) = L(θ) 1 2 λθ D 2D 2 θ where λ is the smoothing parameter which balances fit and smoothness. Bspline 0.0 0.3 0.6 B-spline regression (λ = 0) Linear (classical Gompertz) regression (λ = ) 19 19 19 Algorithm Log mortality for Swedish males age 70 Penalized scoring algorithm (B Wδ B + P)ˆθ = B Wδ z, P = λd 2D 2 is a roughness penalty. This is Eilers and Marx s method of P -splines. 3.6 3.5 3.4 3.3 3.2 3.1 3.0 Observed mortality B spline regression P spline regression B spline coefficients P spline coefficients Bspline 0.0 0.3 0.6 19 19 19

2d B spline basis 2-dimensional smoothing Let B a, n a c a, be a 1-d B-spline model matrix defined along age. Let B y, n y c y, be a 1-d B-spline model matrix defined along year. The 2-d model matrix is given by the Kronecker product B = B y B a, n a n y c a c y. B spline 0.5 0.4 0.3 0.2 0.1 0.0 19 19 19 Amazing formula Generalized linear array models or GLAM Penalties in 2-d Structure [B y B a ]θ, n a n y 1 B a ΘB y, n a n y Each regression coefficient is associated with the summit of one of the hills. log E[D] = log E + B a ΘB y Smoothness is ensured by penalizing the coefficients in rows and columns. P = λ a I cy D ad a + λ y D yd y I ca Computational procedure with B = B y B a Bθ B a ΘB y B W δ B G(B a ) WG(B y ) Definition: Row tensor of X, n c, G(X) = [X 1 c] [1 c X], n c 2.

Computational details: the magic shuffle Linear functions Bθ, n a n y 1 B a ΘB y, n a n y Generalization to d-dimensions (X 2 X 1 )θ (X 2 (X 1 Θ) ) (X 3 X 2 X 1 )θ ρ(x 3, ρ(x 2, ρ(x 1,Θ))) Inner products Definition: X, n 1 c 1 matrix; A, c 1 c 2 c 3 array. Diagonal function B W δ B, c a c y c a c y G(B a ) WG(B y ), c 2 a c 2 y ρ(x, A) XA c1 c 2c 3 = A n1 c 2c 3 A n1 c 2 c 3 A c2 c 3 n 1 is called the rotated H-transform. diag ( BS m B ), n a n y 1 G(B a )SG(B y ), n a n y S m = (B W δ B) 1 SE s of fitted values Computation of Xθ in d-dimensions Computation of X W δ X in d-dimensions X i, n i c i, i = 1, 2, 3. X = X 3 X 2 X 1, n 1 n 2 n 3 c 1 c 2 c 3 θ, c 1 c 2 c 3 1 Θ is the corresponding array, c 1 c 2 c 3 X i, n i c i, i = 1, 2, 3. X = X 3 X 2 X 1, n 1 n 2 n 3 c 1 c 2 c 3 W δ is diagonal, n 1 n 2 n 3 n 1 n 2 n 3 W is the corresponding array, n 1 n 2 n 3 Xθ, n 1 n 2 n 3 1 ρ(x 3, ρ(x 2, ρ(x 1,Θ))), n 1 n 2 n 3 X W δ X, c 1 c 2 c 3 c 1 c 2 c 3 ρ(g(x 3 ), ρ(g(x 2 ), ρ(g(x 1 ), W))), c 2 1 c 2 2 c 2 3

Standard errors of Xˆθ We need diag X(X W δ X) 1 X = diag XS m X Inner product shuffles in R where S m, c 1 c 2 c 3 c 1 c 2 c 3. X W δ X, c 1 c 2 c 3 c 1 c 2 c 3 Let S, c 2 1 c 2 2 c 2 3, be the array form of S m. diag XS m X, n 1 n 2 n 3 1 ρ(g(x 3 ), ρ(g(x 2 ), ρ(g(x 1 ), S))), n 1 n 2 n 3. ρ(g(x 3 ), ρ(g(x 2 ), ρ(g(x 1 ), W))), c 2 1 c 2 2 c 2 3 In R, XWX = RH(t(RT3), RH(t(RT2), RH(t(RT1), W))) dim(xwx) = c(c1, c1, c2, c2, c3, c3) PermDims = aperm(xwx, c(1, 3, 5, 2, 4, 6)) XWX = matrix(permdims, nrow = c1 * c2 * c3) conceptually attractive low footprint very fast generalizes to d-dimensions GLAM Examples of GLAMs Mortality shocks: Swedish data and the Spanish flu Joint modelling of mortality surfaces: Insurance data by lives v amounts Density estimation: Old Faithful data

Raw mortality surface Modelling shocks 2 4 Additive model: smooth surface + smooth period shocks [ [B y B a ]θ + I ny B ] a θ, B = [B y B a : I ny B ] a, 8181 1346. 6 Additive GLAM: B a ΘB y + B a Θ 8 10 19 19 Penalty matrix: P 0 0 P 19 P penalizes roughness in rows and columns P is a ridge penalty Smooth + Shocks Smooth 2 2 4 4 6 6 8 8 19 19 19 19 19 19

Shocks Mortality shock 1918 Mortality shock 1919 1.0 0.5 Mortality shock 0.0 0.5 1.0 Alpha = 0 Alpha = 1 Alpha = 3.5 Mortality shock 0.1 0.0 0.1 0.2 0.3 0.4 0.0 Mortality shock 1923 Mortality shock 1944 19 19 19 Mortality shock 0.25 0.15 0.05 0.05 Mortality shock 0.0 0.2 0.4 0.6 Joint modelling of insurance data Insurance data by lives and amounts. Additive model: smooth 2d-surface + smooth age-dependent gaps Lives: [B y B a ]θ Amounts: [B y B a ]θ + [ ] 1 ny B a θ. Inner products in addditive GLAMs Let X = [ ] B y B a : 1 ny B a X W δ X G(B a) WG(B y ) G(B a ) WB y G(B a ) W1 ny Additive GLAM with dimensions Lives: B a ΘB y Amounts: B a ΘB y + B a Θ1 ny. c ac y c a c y c a c a c y c a c y c a c a c a c2 a c 2 y c 2 a c y c 2 a c y c 2 a 1

Log(mortality) -4.5-4.0-3.5 Amounts = 70 Lives Log(mortality) -4.0-3.5-3.0-2.5 Lives Amounts = 2-d Density Estimation Form a fine 2-d grid of counts Apply 2-d P -spline smoothing with Poisson errors & log link Model matrix B 2 (x 2 ) B 1 (x 1 ) third order penalties 272 data points Example: Old Faithful Geyser Data 1990 10 1990 10 217 grid 238 counts of 1, 17 of 2, and 12765 (98%!) counts of 0. Observed, smoothed and forecast log mortality by lives and amounts. Normalized Density Duration (minutes) 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 Duration (minutes): bin width = 1 sec 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 50 70 90 Waiting time (minutes) 50 70 90 100 Waiting time (minutes): bin width = 1 min

Normalized Density Histogram of waiting times 1.0 0.8 Density 0.6 0.4 0.2 0.0 50 Waiting time 70 90 150 100 300 250 0 Duration Density 0.00 0.01 0.02 0.03 0.04 0.05 2 d marginal density 1 d density 50 70 90 100 Waiting time (minutes): bin width 1 min Histogram of duration times Density 0.000 0.005 0.010 0.015 0.0 0.025 0.030 2 d marginal density 1 d density References P -splines: Eilers & Marx (1996) Statistical Science, 11, 758-783. GLAM: Currie, Durban & Eilers (06) Journal of the Royal Statistical Society, Series B, 68, 259-2. Eilers, Currie & Durban (06) Computational Statistics & Data Analysis, 50, 61-76. Mortality shocks: Kirkby & Currie (09) Statistical Modelling, to appear. Mortality data: Human Mortality Database www.mortality.org GLAM web page www.ma.hw.ac.uk/ iain/research/glam.html 100 150 0 250 300 Duration time (seconds): bin width 1 sec