Joint models for classification and comparison of mortality in different countries.



Similar documents
GLAM Array Methods in Statistics

Fitting Subject-specific Curves to Grouped Longitudinal Data

Least Squares Estimation

Introduction to General and Generalized Linear Models

Regression III: Advanced Methods

STATISTICA Formula Guide: Logistic Regression. Table of Contents

FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM

SAS Software to Fit the Generalized Linear Model

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

Penalized Logistic Regression and Classification of Microarray Data

Data Mining: Algorithms and Applications Matrix Math Review

Location matters. 3 techniques to incorporate geo-spatial effects in one's predictive model

Poisson Models for Count Data

GLMs: Gompertz s Law. GLMs in R. Gompertz s famous graduation formula is. or log µ x is linear in age, x,

Performance Metrics for Graph Mining Tasks

December 4, 2013 MATH 171 BASIC LINEAR ALGEBRA B. KITCHENS

Longevity Risk in the United Kingdom

Similarity and Diagonalization. Similar Matrices

Assessment. Presenter: Yupu Zhang, Guoliang Jin, Tuo Wang Computer Vision 2008 Fall

Linear Threshold Units

1 Determinants and the Solvability of Linear Systems

the points are called control points approximating curve

Mehtap Ergüven Abstract of Ph.D. Dissertation for the degree of PhD of Engineering in Informatics

Component Ordering in Independent Component Analysis Based on Data Power

5. Multiple regression

Machine Learning and Data Mining. Regression Problem. (adapted from) Prof. Alexander Ihler

Automated Biosurveillance Data from England and Wales,

MATH 304 Linear Algebra Lecture 18: Rank and nullity of a matrix.

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Factor Analysis. Chapter 420. Introduction

Section 14 Simple Linear Regression: Introduction to Least Squares Regression

Statistical Machine Learning

BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES

Penalized regression: Introduction

2. Simple Linear Regression

Chapter 4: Vector Autoregressive Models

GENERALIZED LINEAR MODELS IN VEHICLE INSURANCE

Introducing the Multilevel Model for Change

Offset Techniques for Predictive Modeling for Insurance

Machine Learning and Pattern Recognition Logistic Regression

LONGEVITY IMPACT ON THE LIFE ANNUITIES ON ROMANIA BY COMPARATIVE ANALYSIS WITH BULGARIA AND HUNGARY

Principle Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression

Dimensionality Reduction: Principal Components Analysis

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Principal Component Analysis

BayesX - Software for Bayesian Inference in Structured Additive Regression

BOOSTED REGRESSION TREES: A MODERN WAY TO ENHANCE ACTUARIAL MODELLING

Nonnested model comparison of GLM and GAM count regression models for life insurance data

Lecture 6: Poisson regression

Analysis of Bayesian Dynamic Linear Models

Regression Modeling Strategies

Marketing Mix Modelling and Big Data P. M Cain

GAM for large datasets and load forecasting

6. Cholesky factorization

Smoothing and Non-Parametric Regression

Logistic Regression (a type of Generalized Linear Model)

171:290 Model Selection Lecture II: The Akaike Information Criterion

Piecewise Cubic Splines

1 Introduction to Matrices

Multivariate Analysis of Ecological Data

Subspace Analysis and Optimization for AAM Based Face Alignment

Simple Predictive Analytics Curtis Seare

A FUZZY BASED APPROACH TO TEXT MINING AND DOCUMENT CLUSTERING

Advanced Statistical Analysis of Mortality. Rhodes, Thomas E. and Freitas, Stephen A. MIB, Inc. 160 University Avenue. Westwood, MA 02090

Solving Systems of Linear Equations

Analysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j

Logistic Regression (1/24/13)

HLM software has been one of the leading statistical packages for hierarchical

A Handbook of Statistical Analyses Using R. Brian S. Everitt and Torsten Hothorn

Data Mining - Evaluation of Classifiers

CITY UNIVERSITY LONDON. BEng Degree in Computer Systems Engineering Part II BSc Degree in Computer Systems Engineering Part III PART 2 EXAMINATION

IMPROVEMENT OF DIGITAL IMAGE RESOLUTION BY OVERSAMPLING

TOWARD BIG DATA ANALYSIS WORKSHOP

Risk pricing for Australian Motor Insurance

Practical Guide to the Simplex Method of Linear Programming

D-optimal plans in observational studies

Direct Methods for Solving Linear Systems. Matrix Factorization

Factorization Theorems

BIDM Project. Predicting the contract type for IT/ITES outsourcing contracts

Al-Jo anee Company: support department cost allocations with matrices to improve decision making

Statistics Graduate Courses

NOTES ON LINEAR TRANSFORMATIONS

Inequality, Mobility and Income Distribution Comparisons

ANALYSIS, THEORY AND DESIGN OF LOGISTIC REGRESSION CLASSIFIERS USED FOR VERY LARGE SCALE DATA MINING

AUTOMATION OF ENERGY DEMAND FORECASTING. Sanzad Siddique, B.S.

Programming Exercise 3: Multi-class Classification and Neural Networks

Package dsmodellingclient

DATA ANALYSIS II. Matrix Algorithms

Lecture 3: Linear methods for classification

Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus

It is important to bear in mind that one of the first three subscripts is redundant since k = i -j +3.

Getting Correct Results from PROC REG

Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning

Schools Value-added Information System Technical Manual

Chapter 19. General Matrices. An n m matrix is an array. a 11 a 12 a 1m a 21 a 22 a 2m A = a n1 a n2 a nm. The matrix A has n row vectors

STA 4273H: Statistical Machine Learning

PREDICTIVE MODELS IN LIFE INSURANCE

COMPARISONS OF CUSTOMER LOYALTY: PUBLIC & PRIVATE INSURANCE COMPANIES.

Handling missing data in large data sets. Agostino Di Ciaccio Dept. of Statistics University of Rome La Sapienza

Applied Statistics. J. Blanchet and J. Wadsworth. Institute of Mathematics, Analysis, and Applications EPF Lausanne

Transcription:

Joint models for classification and comparison of mortality in different countries. Viani D. Biatat 1 and Iain D. Currie 1 1 Department of Actuarial Mathematics and Statistics, and the Maxwell Institute for Mathematical Sciences, Heriot-Watt University, Edinburgh, EH14 4AS. Abstract: We propose a class of additive generalized linear array models (GLAMs) which facilitate the classification and comparison of mortality tables. Different mortality tables are modelled in terms of their distances (gaps) from a reference table. These gaps are smooth functions of age and/or time and provide a simple graphical summary of the differences between tables. In the paper we describe the models, discuss their computational demands and their resolution with GLAM. We present the results, largely graphical, of applying our methods to various mortality tables taken from the Human Mortality Database and from the Continuous Mortality Investigation Bureau. Keywords: Mortality classification, dispersion, joint modelling, P -splines, GLAM. 1 Introduction We suppose that we have mortality data for p populations, p 2, consisting of death counts and exposures, arranged in n a n t matrices D [r] and E [r], r = 1,..., p, such that the rows and columns of D [r] and E [r] are classified respectively by ages x a and years x t, each arranged in ascending order; their vector equivalents will be denoted by d [r] = vec(d [r] ) and e [r] = vec(e [r] ). For a single population, it is common and natural to suppose that there is a 2-dimensional smooth surface that drives the force of mortality. However, mortality data for two (or more) populations can have some connections between them. Two typical examples are (a) mortality for females and males where the latter is known to be heavier than that of the former, and (b) mortality by lives and by amounts (in life insurance) where the latter is known to be lighter than that of the former. In addition to that, male and female mortality (for example) generally have some similarities in their dynamism. In general, how much can the dynamism of p mortality tables be similar/different? Can we build a joint and economical model for mortality tables which are similar (in some way)? In this paper, we propose a class of additive models with different components for the economical modelling and comparison of such mortality tables: the first component describes a (common) two-dimensional smooth surface (viewed as the reference) and the remaining components describe the relative differences (gaps) between

2 Models for classification and comparison of mortality tables these tables. This class of models leads to the classification of populations into different categories. 2 Model specifications In population r, r = 1,..., p, we suppose that the number of deaths D [r] i,j at age i in year j can be described approximately by the over-dispersed Poisson assumption with mean E [r] i,j µ[r] i,j, where µ[r] i,j is the force of mortality; we assume that the Poisson variance in population r is inflated by some positive factor φ r : var(d [r] i,j ) = φ r E [r] i,j µ[r] i,j, where the φ r s are the dispersion parameters. In general, our models apply to any number of populations but, for simplicity, we present the work for two populations (1 and 2), with some discussion in the general situation of p populations. The key idea is the following: if the dynamism of the two populations is similar, then the relative variation of their forces of mortality can be captured by a moderate number of parameters, ie, if we set (conceptually) a 2-dimensional smooth surface for the force of mortality in population 1 (viewed as the reference), then the smooth force of mortality for population 2 can be captured by adding a simple gap to this reference. We describe two populations as very similar if the gap (relative variation) between them is constant in age and time; they would be similar in time/age if the gap is smooth (flexible) in age/time and constant in time/age; we would say that they are similar if the gap is additively smooth in both age and time; otherwise, they are different. Note that very similar populations are nested within similar in time/age populations, and similar in time/age populations are in turn nested within similar populations; hence for space reasons, only the model for similar populations will be detailed in this paper with some discussions and illustrations for the other two scenarios. The first component (reference) of our models uses 2-dimensional P -splines (Eilers and Marx, 1996, Currie et al., 2004). Let B a, n a c a, and B t, n t c t, be the marginal regression matrices (which are 1-dimensional regression matrices of B-splines evaluated along age (x a ) and year (x t ) respectively); the Kronecker product B t B a creates a 2-dimensional regression basis. If we denote by y [r] = d [r] /e [r], the vector of observed forces of mortality in population r, then taking population 1 as the reference, the linear predictor of its force of mortality can be expressed as ( [ log E y [1]]) = (B t B a ) θ [1]. (1) We use a rich basis of B-splines for age and year; a smooth surface is then obtained by marginal penalization; ie the coefficient vector θ [1] is subject

Biatat, V. et al. 3 to the penalty P [1] = λ a I ct a a + λ t t t I ca, (2) where a and t are second order difference matrices (of appropriate size), λ a and λ t are smoothing parameters in the age and year direction, and I n is the identity matrix of size n. With this setting, if we assume that population 2 is similar to population 1, then we express the linear predictor of population 2 as: log ( E [ y [2]]) = (B t B a ) θ [1] + (1 nt B a ) θ [2,1] + (B t 1 na ) θ [2,2], (3) where 1 n is the n length vector of ones, and θ [2,1] and θ [2,2] are coefficient vectors quantifying the gaps. In (3), we require the second term in the right hand side to capture both the constant component and the smooth age dependent component of the gap, while the third term models only the smooth year dependent component of the gap. Hence we smooth θ [2,1] and θ [2,2], and for identifiability reasons, we give preference to θ [2,1] by additionally shrinking θ [2,2] towards 0; this justifies the form of the block diagonal penalty matrix, P, in (4) below (with the smoothing gap parameters λ 2,1 and λ 2,2, and the shrinkage parameter λ 2,2 ). We now introduce the joint vectors of death counts and exposures: d = vec(d [1], d [2] ) and e = vec(e [1], e [2] ); the coefficient vector θ = vec(θ [1], θ [2,1], θ [2,2] ) is then estimated by the penalized GLM (or more correctly, the penalized quasi-log-likelihood) for d with regression matrix B, offset log(e), log link, quasi-poisson error and penalty matrix P, where [ ] Bt B B = a 0 0, B t B a 1 nt B a B t 1 na P = blockdiag (P [1], λ 2,1 a a, λ 2,2 t t + λ ) 2,2 I ct. The linear predictor (3) could be re-parameterized in the form ( [ log E y [2]]) = (B t B a ) θ [1] + (1 nt 1 na ) θ [2] (5) + (1 nt B a ) θ [2,a] + (B t 1 na ) θ [2,t], where (1 nt 1 na ) θ [2], (1 nt B a ) θ [2,a] and (B t 1 na ) θ [2,t] represent respectively the constant component, the smooth age dependent component and the smooth year dependent component of the gap. Here θ [1] is smoothed as before, there is no constraint on θ [2] ; θ [2,a] and θ [2,t] are smoothed and shrunk towards zero. These three components give an economical comparison between mortality tables in similar populations. With this representation, the model corresponding to each scenario of similarity (defined earlier in this section) is derived from (5) by keeping the appropriate components and taking away the other components. (4)

4 Models for classification and comparison of mortality tables 3 Computational aspects and applications The joint model for similar populations presented in section 2 is very computationally demanding if fitted with the standard GLM procedure, especially as the number of populations increases. In the general situation of p populations, we speed up the estimation as follow. (i) First observe that B is partitioned as B = [B 1 : B 2 ], with B 1 = 1 p B t B a, and B 2 = [0 : Λ], where Λ is a block diagonal matrix; a good use of this partition is efficient for solving the penalized iterative equations as well as for computing the diagonal elements of the hat matrix required for estimating the total effective dimension, the contribution of each population to the total effective dimension, and the dispersion parameters. (ii) Second, the Kronecker structure of each component in this partition together with the matrix structure of the data allows us to express the model as a Generalized Linear Array Model (GLAM), a high speed, low storage framework (Currie et al., 2006). Using (i) and (ii) simultaneously leads to very substantial gains in time. Finally, we choose the smoothing parameters by minimizing the scaled BIC, see Heuer (1997). We now apply our approach to some mortality data taken from two sources: (a) The Human Mortality Database (HMD) and (b) the Continuous Mortality Investigation (CMI). We start with the HMD data, and for illustration, we consider ages 30 to 90 and years 1960 to 2005. The residuals from our model applied to male and female mortality in Japan show that the model fits well (profile views for ages 70 and 75 are shown in Figure 1); hence we conclude that the dynamisms of mortality in these two populations are similar. By the same procedure, the plots and residuals indicate that the dynamisms of mortality for males in Japan and Netherlands are different (see profile views for ages 70 and 75 in Figure 2). We now consider the data from the CMI. These data are of two types: data by lives and data by amounts. The first type consists of the number of claims (view as deaths by lives) and the number of policies at risk (viewed as exposure to risk by lives); the second type consists of the total amounts claimed (viewed as death by amounts) and the total amounts at risk (viewed as exposure to risk by amounts). These two types of data lead to the concept of mortality by lives and mortality by amounts. The joint model applied to these data shows that the dynamisms in the mortality by lives and by amounts are similar in time (profile views for ages 70 and 75 are shown in Figure 3). Moreover, our joint model appropriately captures the well known fact that mortality by lives is worse than that by amounts; our model corresponding to the similar in time scenario has a particular importance for forecasting in life insurance, since it ensures that the extrapolated trends in time for different ages for mortality by lives and by amounts do not cross each other.

Biatat, V. et al. 5 FIGURE 1: These profile views illustrate that the dynamisms in the male and female mortality in Japan are similar. FIGURE 2: These profile views illustrate that the dynamisms in male mortality in Netherlands and in Japan are different. 4 Concluding remarks In this paper we have proposed a class of joint models for classifying mortality tables. When two (or more) populations turn out to be similar in some way, our joint models lead to simple comparisons of these mortality tables. An additional attractive feature of our models is that, once the com-

6 Models for classification and comparison of mortality tables FIGURE 3: These profile views illustrate that the dynamisms in the CMI mortality by lives and by amounts are similar in time. ponents are built, the fitting is reduced to the penalized scoring algorithm (with appropriate components). Furthermore, the order of the populations in our approach is not important; indeed taking population 2 (instead of population 1) as the reference leads to the same fit. We have approached the analysis of multiple mortality tables by fitting nested models. This has allowed us to compare such models by residual and graphical methods. Hypothesis testing is a more rigorous approach to such comparisons and our models give a platform for the development of these testing procedures. One problem that will need to be addressed is the very large power that our extensive datasets would give to any such test. This suggests that a Bayesian approach would be appropriate. References Currie, I.D., Durban, M., and Eilers, P.H.C. (2006). Generalized linear array models with applications to multidimensional smoothing. Journal of the Royal Statistical Society, Series B, 68, 259-80. Currie, I.D., Durban, M., and Eilers, P.H.C. (2004). Smoothing and forecasting mortality rates. Statistical Modelling, 4, 279-98. Eilers, P.H.C, and Marx, B.D. (1996). Flexible smoothing with B-splines and penalties. Statistical Science, 11, 89-121. Heuer, C. (1997). Modelling of time trends and interactions in vital rates using restricted regression splines. Biometrics, 53, 161-177.