Joint models for classification and comparison of mortality in different countries.

Joint models for classification and comparison of mortality in different countries. Viani D. Biatat 1 and Iain D. Currie 1 1 Department of Actuarial Mathematics and Statistics, and the Maxwell Institute for Mathematical Sciences, Heriot-Watt University, Edinburgh, EH14 4AS. Abstract: We propose a class of additive generalized linear array models (GLAMs) which facilitate the classification and comparison of mortality tables. Different mortality tables are modelled in terms of their distances (gaps) from a reference table. These gaps are smooth functions of age and/or time and provide a simple graphical summary of the differences between tables. In the paper we describe the models, discuss their computational demands and their resolution with GLAM. We present the results, largely graphical, of applying our methods to various mortality tables taken from the Human Mortality Database and from the Continuous Mortality Investigation Bureau. Keywords: Mortality classification, dispersion, joint modelling, P -splines, GLAM. 1 Introduction We suppose that we have mortality data for p populations, p 2, consisting of death counts and exposures, arranged in n a n t matrices D [r] and E [r], r = 1,..., p, such that the rows and columns of D [r] and E [r] are classified respectively by ages x a and years x t, each arranged in ascending order; their vector equivalents will be denoted by d [r] = vec(d [r] ) and e [r] = vec(e [r] ). For a single population, it is common and natural to suppose that there is a 2-dimensional smooth surface that drives the force of mortality. However, mortality data for two (or more) populations can have some connections between them. Two typical examples are (a) mortality for females and males where the latter is known to be heavier than that of the former, and (b) mortality by lives and by amounts (in life insurance) where the latter is known to be lighter than that of the former. In addition to that, male and female mortality (for example) generally have some similarities in their dynamism. In general, how much can the dynamism of p mortality tables be similar/different? Can we build a joint and economical model for mortality tables which are similar (in some way)? In this paper, we propose a class of additive models with different components for the economical modelling and comparison of such mortality tables: the first component describes a (common) two-dimensional smooth surface (viewed as the reference) and the remaining components describe the relative differences (gaps) between

2 Models for classification and comparison of mortality tables these tables. This class of models leads to the classification of populations into different categories. 2 Model specifications In population r, r = 1,..., p, we suppose that the number of deaths D [r] i,j at age i in year j can be described approximately by the over-dispersed Poisson assumption with mean E [r] i,j µ[r] i,j, where µ[r] i,j is the force of mortality; we assume that the Poisson variance in population r is inflated by some positive factor φ r : var(d [r] i,j ) = φ r E [r] i,j µ[r] i,j, where the φ r s are the dispersion parameters. In general, our models apply to any number of populations but, for simplicity, we present the work for two populations (1 and 2), with some discussion in the general situation of p populations. The key idea is the following: if the dynamism of the two populations is similar, then the relative variation of their forces of mortality can be captured by a moderate number of parameters, ie, if we set (conceptually) a 2-dimensional smooth surface for the force of mortality in population 1 (viewed as the reference), then the smooth force of mortality for population 2 can be captured by adding a simple gap to this reference. We describe two populations as very similar if the gap (relative variation) between them is constant in age and time; they would be similar in time/age if the gap is smooth (flexible) in age/time and constant in time/age; we would say that they are similar if the gap is additively smooth in both age and time; otherwise, they are different. Note that very similar populations are nested within similar in time/age populations, and similar in time/age populations are in turn nested within similar populations; hence for space reasons, only the model for similar populations will be detailed in this paper with some discussions and illustrations for the other two scenarios. The first component (reference) of our models uses 2-dimensional P -splines (Eilers and Marx, 1996, Currie et al., 2004). Let B a, n a c a, and B t, n t c t, be the marginal regression matrices (which are 1-dimensional regression matrices of B-splines evaluated along age (x a ) and year (x t ) respectively); the Kronecker product B t B a creates a 2-dimensional regression basis. If we denote by y [r] = d [r] /e [r], the vector of observed forces of mortality in population r, then taking population 1 as the reference, the linear predictor of its force of mortality can be expressed as ( [ log E y [1]]) = (B t B a ) θ [1]. (1) We use a rich basis of B-splines for age and year; a smooth surface is then obtained by marginal penalization; ie the coefficient vector θ [1] is subject

Biatat, V. et al. 3 to the penalty P [1] = λ a I ct a a + λ t t t I ca, (2) where a and t are second order difference matrices (of appropriate size), λ a and λ t are smoothing parameters in the age and year direction, and I n is the identity matrix of size n. With this setting, if we assume that population 2 is similar to population 1, then we express the linear predictor of population 2 as: log ( E [ y [2]]) = (B t B a ) θ [1] + (1 nt B a ) θ [2,1] + (B t 1 na ) θ [2,2], (3) where 1 n is the n length vector of ones, and θ [2,1] and θ [2,2] are coefficient vectors quantifying the gaps. In (3), we require the second term in the right hand side to capture both the constant component and the smooth age dependent component of the gap, while the third term models only the smooth year dependent component of the gap. Hence we smooth θ [2,1] and θ [2,2], and for identifiability reasons, we give preference to θ [2,1] by additionally shrinking θ [2,2] towards 0; this justifies the form of the block diagonal penalty matrix, P, in (4) below (with the smoothing gap parameters λ 2,1 and λ 2,2, and the shrinkage parameter λ 2,2 ). We now introduce the joint vectors of death counts and exposures: d = vec(d [1], d [2] ) and e = vec(e [1], e [2] ); the coefficient vector θ = vec(θ [1], θ [2,1], θ [2,2] ) is then estimated by the penalized GLM (or more correctly, the penalized quasi-log-likelihood) for d with regression matrix B, offset log(e), log link, quasi-poisson error and penalty matrix P, where [ ] Bt B B = a 0 0, B t B a 1 nt B a B t 1 na P = blockdiag (P [1], λ 2,1 a a, λ 2,2 t t + λ ) 2,2 I ct. The linear predictor (3) could be re-parameterized in the form ( [ log E y [2]]) = (B t B a ) θ [1] + (1 nt 1 na ) θ [2] (5) + (1 nt B a ) θ [2,a] + (B t 1 na ) θ [2,t], where (1 nt 1 na ) θ [2], (1 nt B a ) θ [2,a] and (B t 1 na ) θ [2,t] represent respectively the constant component, the smooth age dependent component and the smooth year dependent component of the gap. Here θ [1] is smoothed as before, there is no constraint on θ [2] ; θ [2,a] and θ [2,t] are smoothed and shrunk towards zero. These three components give an economical comparison between mortality tables in similar populations. With this representation, the model corresponding to each scenario of similarity (defined earlier in this section) is derived from (5) by keeping the appropriate components and taking away the other components. (4)

4 Models for classification and comparison of mortality tables 3 Computational aspects and applications The joint model for similar populations presented in section 2 is very computationally demanding if fitted with the standard GLM procedure, especially as the number of populations increases. In the general situation of p populations, we speed up the estimation as follow. (i) First observe that B is partitioned as B = [B 1 : B 2 ], with B 1 = 1 p B t B a, and B 2 = [0 : Λ], where Λ is a block diagonal matrix; a good use of this partition is efficient for solving the penalized iterative equations as well as for computing the diagonal elements of the hat matrix required for estimating the total effective dimension, the contribution of each population to the total effective dimension, and the dispersion parameters. (ii) Second, the Kronecker structure of each component in this partition together with the matrix structure of the data allows us to express the model as a Generalized Linear Array Model (GLAM), a high speed, low storage framework (Currie et al., 2006). Using (i) and (ii) simultaneously leads to very substantial gains in time. Finally, we choose the smoothing parameters by minimizing the scaled BIC, see Heuer (1997). We now apply our approach to some mortality data taken from two sources: (a) The Human Mortality Database (HMD) and (b) the Continuous Mortality Investigation (CMI). We start with the HMD data, and for illustration, we consider ages 30 to 90 and years 1960 to 2005. The residuals from our model applied to male and female mortality in Japan show that the model fits well (profile views for ages 70 and 75 are shown in Figure 1); hence we conclude that the dynamisms of mortality in these two populations are similar. By the same procedure, the plots and residuals indicate that the dynamisms of mortality for males in Japan and Netherlands are different (see profile views for ages 70 and 75 in Figure 2). We now consider the data from the CMI. These data are of two types: data by lives and data by amounts. The first type consists of the number of claims (view as deaths by lives) and the number of policies at risk (viewed as exposure to risk by lives); the second type consists of the total amounts claimed (viewed as death by amounts) and the total amounts at risk (viewed as exposure to risk by amounts). These two types of data lead to the concept of mortality by lives and mortality by amounts. The joint model applied to these data shows that the dynamisms in the mortality by lives and by amounts are similar in time (profile views for ages 70 and 75 are shown in Figure 3). Moreover, our joint model appropriately captures the well known fact that mortality by lives is worse than that by amounts; our model corresponding to the similar in time scenario has a particular importance for forecasting in life insurance, since it ensures that the extrapolated trends in time for different ages for mortality by lives and by amounts do not cross each other.

Biatat, V. et al. 5 FIGURE 1: These profile views illustrate that the dynamisms in the male and female mortality in Japan are similar. FIGURE 2: These profile views illustrate that the dynamisms in male mortality in Netherlands and in Japan are different. 4 Concluding remarks In this paper we have proposed a class of joint models for classifying mortality tables. When two (or more) populations turn out to be similar in some way, our joint models lead to simple comparisons of these mortality tables. An additional attractive feature of our models is that, once the com-

6 Models for classification and comparison of mortality tables FIGURE 3: These profile views illustrate that the dynamisms in the CMI mortality by lives and by amounts are similar in time. ponents are built, the fitting is reduced to the penalized scoring algorithm (with appropriate components). Furthermore, the order of the populations in our approach is not important; indeed taking population 2 (instead of population 1) as the reference leads to the same fit. We have approached the analysis of multiple mortality tables by fitting nested models. This has allowed us to compare such models by residual and graphical methods. Hypothesis testing is a more rigorous approach to such comparisons and our models give a platform for the development of these testing procedures. One problem that will need to be addressed is the very large power that our extensive datasets would give to any such test. This suggests that a Bayesian approach would be appropriate. References Currie, I.D., Durban, M., and Eilers, P.H.C. (2006). Generalized linear array models with applications to multidimensional smoothing. Journal of the Royal Statistical Society, Series B, 68, 259-80. Currie, I.D., Durban, M., and Eilers, P.H.C. (2004). Smoothing and forecasting mortality rates. Statistical Modelling, 4, 279-98. Eilers, P.H.C, and Marx, B.D. (1996). Flexible smoothing with B-splines and penalties. Statistical Science, 11, 89-121. Heuer, C. (1997). Modelling of time trends and interactions in vital rates using restricted regression splines. Biometrics, 53, 161-177.