Joint models for classification and comparison of mortality in different countries.


 Alfred Murphy
 2 years ago
 Views:
Transcription
1 Joint models for classification and comparison of mortality in different countries. Viani D. Biatat 1 and Iain D. Currie 1 1 Department of Actuarial Mathematics and Statistics, and the Maxwell Institute for Mathematical Sciences, HeriotWatt University, Edinburgh, EH14 4AS. Abstract: We propose a class of additive generalized linear array models (GLAMs) which facilitate the classification and comparison of mortality tables. Different mortality tables are modelled in terms of their distances (gaps) from a reference table. These gaps are smooth functions of age and/or time and provide a simple graphical summary of the differences between tables. In the paper we describe the models, discuss their computational demands and their resolution with GLAM. We present the results, largely graphical, of applying our methods to various mortality tables taken from the Human Mortality Database and from the Continuous Mortality Investigation Bureau. Keywords: Mortality classification, dispersion, joint modelling, P splines, GLAM. 1 Introduction We suppose that we have mortality data for p populations, p 2, consisting of death counts and exposures, arranged in n a n t matrices D [r] and E [r], r = 1,..., p, such that the rows and columns of D [r] and E [r] are classified respectively by ages x a and years x t, each arranged in ascending order; their vector equivalents will be denoted by d [r] = vec(d [r] ) and e [r] = vec(e [r] ). For a single population, it is common and natural to suppose that there is a 2dimensional smooth surface that drives the force of mortality. However, mortality data for two (or more) populations can have some connections between them. Two typical examples are (a) mortality for females and males where the latter is known to be heavier than that of the former, and (b) mortality by lives and by amounts (in life insurance) where the latter is known to be lighter than that of the former. In addition to that, male and female mortality (for example) generally have some similarities in their dynamism. In general, how much can the dynamism of p mortality tables be similar/different? Can we build a joint and economical model for mortality tables which are similar (in some way)? In this paper, we propose a class of additive models with different components for the economical modelling and comparison of such mortality tables: the first component describes a (common) twodimensional smooth surface (viewed as the reference) and the remaining components describe the relative differences (gaps) between
2 2 Models for classification and comparison of mortality tables these tables. This class of models leads to the classification of populations into different categories. 2 Model specifications In population r, r = 1,..., p, we suppose that the number of deaths D [r] i,j at age i in year j can be described approximately by the overdispersed Poisson assumption with mean E [r] i,j µ[r] i,j, where µ[r] i,j is the force of mortality; we assume that the Poisson variance in population r is inflated by some positive factor φ r : var(d [r] i,j ) = φ r E [r] i,j µ[r] i,j, where the φ r s are the dispersion parameters. In general, our models apply to any number of populations but, for simplicity, we present the work for two populations (1 and 2), with some discussion in the general situation of p populations. The key idea is the following: if the dynamism of the two populations is similar, then the relative variation of their forces of mortality can be captured by a moderate number of parameters, ie, if we set (conceptually) a 2dimensional smooth surface for the force of mortality in population 1 (viewed as the reference), then the smooth force of mortality for population 2 can be captured by adding a simple gap to this reference. We describe two populations as very similar if the gap (relative variation) between them is constant in age and time; they would be similar in time/age if the gap is smooth (flexible) in age/time and constant in time/age; we would say that they are similar if the gap is additively smooth in both age and time; otherwise, they are different. Note that very similar populations are nested within similar in time/age populations, and similar in time/age populations are in turn nested within similar populations; hence for space reasons, only the model for similar populations will be detailed in this paper with some discussions and illustrations for the other two scenarios. The first component (reference) of our models uses 2dimensional P splines (Eilers and Marx, 1996, Currie et al., 2004). Let B a, n a c a, and B t, n t c t, be the marginal regression matrices (which are 1dimensional regression matrices of Bsplines evaluated along age (x a ) and year (x t ) respectively); the Kronecker product B t B a creates a 2dimensional regression basis. If we denote by y [r] = d [r] /e [r], the vector of observed forces of mortality in population r, then taking population 1 as the reference, the linear predictor of its force of mortality can be expressed as ( [ log E y [1]]) = (B t B a ) θ [1]. (1) We use a rich basis of Bsplines for age and year; a smooth surface is then obtained by marginal penalization; ie the coefficient vector θ [1] is subject
3 Biatat, V. et al. 3 to the penalty P [1] = λ a I ct a a + λ t t t I ca, (2) where a and t are second order difference matrices (of appropriate size), λ a and λ t are smoothing parameters in the age and year direction, and I n is the identity matrix of size n. With this setting, if we assume that population 2 is similar to population 1, then we express the linear predictor of population 2 as: log ( E [ y [2]]) = (B t B a ) θ [1] + (1 nt B a ) θ [2,1] + (B t 1 na ) θ [2,2], (3) where 1 n is the n length vector of ones, and θ [2,1] and θ [2,2] are coefficient vectors quantifying the gaps. In (3), we require the second term in the right hand side to capture both the constant component and the smooth age dependent component of the gap, while the third term models only the smooth year dependent component of the gap. Hence we smooth θ [2,1] and θ [2,2], and for identifiability reasons, we give preference to θ [2,1] by additionally shrinking θ [2,2] towards 0; this justifies the form of the block diagonal penalty matrix, P, in (4) below (with the smoothing gap parameters λ 2,1 and λ 2,2, and the shrinkage parameter λ 2,2 ). We now introduce the joint vectors of death counts and exposures: d = vec(d [1], d [2] ) and e = vec(e [1], e [2] ); the coefficient vector θ = vec(θ [1], θ [2,1], θ [2,2] ) is then estimated by the penalized GLM (or more correctly, the penalized quasiloglikelihood) for d with regression matrix B, offset log(e), log link, quasipoisson error and penalty matrix P, where [ ] Bt B B = a 0 0, B t B a 1 nt B a B t 1 na P = blockdiag (P [1], λ 2,1 a a, λ 2,2 t t + λ ) 2,2 I ct. The linear predictor (3) could be reparameterized in the form ( [ log E y [2]]) = (B t B a ) θ [1] + (1 nt 1 na ) θ [2] (5) + (1 nt B a ) θ [2,a] + (B t 1 na ) θ [2,t], where (1 nt 1 na ) θ [2], (1 nt B a ) θ [2,a] and (B t 1 na ) θ [2,t] represent respectively the constant component, the smooth age dependent component and the smooth year dependent component of the gap. Here θ [1] is smoothed as before, there is no constraint on θ [2] ; θ [2,a] and θ [2,t] are smoothed and shrunk towards zero. These three components give an economical comparison between mortality tables in similar populations. With this representation, the model corresponding to each scenario of similarity (defined earlier in this section) is derived from (5) by keeping the appropriate components and taking away the other components. (4)
4 4 Models for classification and comparison of mortality tables 3 Computational aspects and applications The joint model for similar populations presented in section 2 is very computationally demanding if fitted with the standard GLM procedure, especially as the number of populations increases. In the general situation of p populations, we speed up the estimation as follow. (i) First observe that B is partitioned as B = [B 1 : B 2 ], with B 1 = 1 p B t B a, and B 2 = [0 : Λ], where Λ is a block diagonal matrix; a good use of this partition is efficient for solving the penalized iterative equations as well as for computing the diagonal elements of the hat matrix required for estimating the total effective dimension, the contribution of each population to the total effective dimension, and the dispersion parameters. (ii) Second, the Kronecker structure of each component in this partition together with the matrix structure of the data allows us to express the model as a Generalized Linear Array Model (GLAM), a high speed, low storage framework (Currie et al., 2006). Using (i) and (ii) simultaneously leads to very substantial gains in time. Finally, we choose the smoothing parameters by minimizing the scaled BIC, see Heuer (1997). We now apply our approach to some mortality data taken from two sources: (a) The Human Mortality Database (HMD) and (b) the Continuous Mortality Investigation (CMI). We start with the HMD data, and for illustration, we consider ages 30 to 90 and years 1960 to The residuals from our model applied to male and female mortality in Japan show that the model fits well (profile views for ages 70 and 75 are shown in Figure 1); hence we conclude that the dynamisms of mortality in these two populations are similar. By the same procedure, the plots and residuals indicate that the dynamisms of mortality for males in Japan and Netherlands are different (see profile views for ages 70 and 75 in Figure 2). We now consider the data from the CMI. These data are of two types: data by lives and data by amounts. The first type consists of the number of claims (view as deaths by lives) and the number of policies at risk (viewed as exposure to risk by lives); the second type consists of the total amounts claimed (viewed as death by amounts) and the total amounts at risk (viewed as exposure to risk by amounts). These two types of data lead to the concept of mortality by lives and mortality by amounts. The joint model applied to these data shows that the dynamisms in the mortality by lives and by amounts are similar in time (profile views for ages 70 and 75 are shown in Figure 3). Moreover, our joint model appropriately captures the well known fact that mortality by lives is worse than that by amounts; our model corresponding to the similar in time scenario has a particular importance for forecasting in life insurance, since it ensures that the extrapolated trends in time for different ages for mortality by lives and by amounts do not cross each other.
5 Biatat, V. et al. 5 FIGURE 1: These profile views illustrate that the dynamisms in the male and female mortality in Japan are similar. FIGURE 2: These profile views illustrate that the dynamisms in male mortality in Netherlands and in Japan are different. 4 Concluding remarks In this paper we have proposed a class of joint models for classifying mortality tables. When two (or more) populations turn out to be similar in some way, our joint models lead to simple comparisons of these mortality tables. An additional attractive feature of our models is that, once the com
6 6 Models for classification and comparison of mortality tables FIGURE 3: These profile views illustrate that the dynamisms in the CMI mortality by lives and by amounts are similar in time. ponents are built, the fitting is reduced to the penalized scoring algorithm (with appropriate components). Furthermore, the order of the populations in our approach is not important; indeed taking population 2 (instead of population 1) as the reference leads to the same fit. We have approached the analysis of multiple mortality tables by fitting nested models. This has allowed us to compare such models by residual and graphical methods. Hypothesis testing is a more rigorous approach to such comparisons and our models give a platform for the development of these testing procedures. One problem that will need to be addressed is the very large power that our extensive datasets would give to any such test. This suggests that a Bayesian approach would be appropriate. References Currie, I.D., Durban, M., and Eilers, P.H.C. (2006). Generalized linear array models with applications to multidimensional smoothing. Journal of the Royal Statistical Society, Series B, 68, Currie, I.D., Durban, M., and Eilers, P.H.C. (2004). Smoothing and forecasting mortality rates. Statistical Modelling, 4, Eilers, P.H.C, and Marx, B.D. (1996). Flexible smoothing with Bsplines and penalties. Statistical Science, 11, Heuer, C. (1997). Modelling of time trends and interactions in vital rates using restricted regression splines. Biometrics, 53,
GLAM Array Methods in Statistics
GLAM Array Methods in Statistics Iain Currie Heriot Watt University A Generalized Linear Array Model is a lowstorage, highspeed, GLAM method for multidimensional smoothing, when data forms an array,
More informationFitting Subjectspecific Curves to Grouped Longitudinal Data
Fitting Subjectspecific Curves to Grouped Longitudinal Data Djeundje, Viani HeriotWatt University, Department of Actuarial Mathematics & Statistics Edinburgh, EH14 4AS, UK Email: vad5@hw.ac.uk Currie,
More informationLeast Squares Estimation
Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN13: 9780470860809 ISBN10: 0470860804 Editors Brian S Everitt & David
More informationIntroduction to General and Generalized Linear Models
Introduction to General and Generalized Linear Models General Linear Models  part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK2800 Kgs. Lyngby
More informationRegression III: Advanced Methods
Lecture 16: Generalized Additive Models Regression III: Advanced Methods Bill Jacoby Michigan State University http://polisci.msu.edu/jacoby/icpsr/regress3 Goals of the Lecture Introduce Additive Models
More informationSTATISTICA Formula Guide: Logistic Regression. Table of Contents
: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 SigmaRestricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary
More information15.062 Data Mining: Algorithms and Applications Matrix Math Review
.6 Data Mining: Algorithms and Applications Matrix Math Review The purpose of this document is to give a brief review of selected linear algebra concepts that will be useful for the course and to develop
More informationFUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM
International Journal of Innovative Computing, Information and Control ICIC International c 0 ISSN 3448 Volume 8, Number 8, August 0 pp. 4 FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT
More informationPenalized Logistic Regression and Classification of Microarray Data
Penalized Logistic Regression and Classification of Microarray Data Milan, May 2003 Anestis Antoniadis Laboratoire IMAGLMC University Joseph Fourier Grenoble, France Penalized Logistic Regression andclassification
More informationRegression III: Advanced Methods
Lecture 5: Linear leastsquares Regression III: Advanced Methods William G. Jacoby Department of Political Science Michigan State University http://polisci.msu.edu/jacoby/icpsr/regress3 Simple Linear Regression
More informationSAS Software to Fit the Generalized Linear Model
SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling
More informationPerformance Metrics for Graph Mining Tasks
Performance Metrics for Graph Mining Tasks 1 Outline Introduction to Performance Metrics Supervised Learning Performance Metrics Unsupervised Learning Performance Metrics Optimizing Metrics Statistical
More informationPoisson Models for Count Data
Chapter 4 Poisson Models for Count Data In this chapter we study loglinear models for count data under the assumption of a Poisson error structure. These models have many applications, not only to the
More informationAssessment. Presenter: Yupu Zhang, Guoliang Jin, Tuo Wang Computer Vision 2008 Fall
Automatic Photo Quality Assessment Presenter: Yupu Zhang, Guoliang Jin, Tuo Wang Computer Vision 2008 Fall Estimating i the photorealism of images: Distinguishing i i paintings from photographs h Florin
More informationThe basic unit in matrix algebra is a matrix, generally expressed as: a 11 a 12. a 13 A = a 21 a 22 a 23
(copyright by Scott M Lynch, February 2003) Brief Matrix Algebra Review (Soc 504) Matrix algebra is a form of mathematics that allows compact notation for, and mathematical manipulation of, highdimensional
More informationGLMs: Gompertz s Law. GLMs in R. Gompertz s famous graduation formula is. or log µ x is linear in age, x,
Computing: an indispensable tool or an insurmountable hurdle? Iain Currie Heriot Watt University, Scotland ATRC, University College Dublin July 2006 Plan of talk General remarks The professional syllabus
More informationDecember 4, 2013 MATH 171 BASIC LINEAR ALGEBRA B. KITCHENS
December 4, 2013 MATH 171 BASIC LINEAR ALGEBRA B KITCHENS The equation 1 Lines in twodimensional space (1) 2x y = 3 describes a line in twodimensional space The coefficients of x and y in the equation
More informationSimilarity and Diagonalization. Similar Matrices
MATH022 Linear Algebra Brief lecture notes 48 Similarity and Diagonalization Similar Matrices Let A and B be n n matrices. We say that A is similar to B if there is an invertible n n matrix P such that
More informationExample: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.
Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation:  Feature vector X,  qualitative response Y, taking values in C
More informationLinear Dependence Tests
Linear Dependence Tests The book omits a few key tests for checking the linear dependence of vectors. These short notes discuss these tests, as well as the reasoning behind them. Our first test checks
More informationLocation matters. 3 techniques to incorporate geospatial effects in one's predictive model
Location matters. 3 techniques to incorporate geospatial effects in one's predictive model Xavier Conort xavier.conort@gearanalytics.com Motivation Location matters! Observed value at one location is
More informationLongevity Risk in the United Kingdom
Institut für Finanz und Aktuarwissenschaften, Universität Ulm Longevity Risk in the United Kingdom Stephen Richards 20 th July 2005 Copyright c Stephen Richards. All rights reserved. Electronic versions
More information2. Simple Linear Regression
Research methods  II 3 2. Simple Linear Regression Simple linear regression is a technique in parametric statistics that is commonly used for analyzing mean response of a variable Y which changes according
More informationMehtap Ergüven Abstract of Ph.D. Dissertation for the degree of PhD of Engineering in Informatics
INTERNATIONAL BLACK SEA UNIVERSITY COMPUTER TECHNOLOGIES AND ENGINEERING FACULTY ELABORATION OF AN ALGORITHM OF DETECTING TESTS DIMENSIONALITY Mehtap Ergüven Abstract of Ph.D. Dissertation for the degree
More informationthe points are called control points approximating curve
Chapter 4 Spline Curves A spline curve is a mathematical representation for which it is easy to build an interface that will allow a user to design and control the shape of complex curves and surfaces.
More informationChapter 4: Vector Autoregressive Models
Chapter 4: Vector Autoregressive Models 1 Contents: Lehrstuhl für Department Empirische of Wirtschaftsforschung Empirical Research and und Econometrics Ökonometrie IV.1 Vector Autoregressive Models (VAR)...
More informationMATH 304 Linear Algebra Lecture 18: Rank and nullity of a matrix.
MATH 304 Linear Algebra Lecture 18: Rank and nullity of a matrix. Nullspace Let A = (a ij ) be an m n matrix. Definition. The nullspace of the matrix A, denoted N(A), is the set of all ndimensional column
More informationFactor Analysis. Chapter 420. Introduction
Chapter 420 Introduction (FA) is an exploratory technique applied to a set of observed variables that seeks to find underlying factors (subsets of variables) from which the observed variables were generated.
More informationDATA ANALYTICS USING R
DATA ANALYTICS USING R Duration: 90 Hours Intended audience and scope: The course is targeted at fresh engineers, practicing engineers and scientists who are interested in learning and understanding data
More informationPATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical
More information5. Multiple regression
5. Multiple regression QBUS6840 Predictive Analytics https://www.otexts.org/fpp/5 QBUS6840 Predictive Analytics 5. Multiple regression 2/39 Outline Introduction to multiple linear regression Some useful
More information, then the form of the model is given by: which comprises a deterministic component involving the three regression coefficients (
Multiple regression Introduction Multiple regression is a logical extension of the principles of simple linear regression to situations in which there are several predictor variables. For instance if we
More informationLinear Threshold Units
Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear
More informationNCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )
Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates
More informationAutomated Biosurveillance Data from England and Wales, 1991 2011
Article DOI: http://dx.doi.org/10.3201/eid1901.120493 Automated Biosurveillance Data from England and Wales, 1991 2011 Technical Appendix This online appendix provides technical details of statistical
More informationStatistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes
More information1 Determinants and the Solvability of Linear Systems
1 Determinants and the Solvability of Linear Systems In the last section we learned how to use Gaussian elimination to solve linear systems of n equations in n unknowns The section completely sidestepped
More informationYiming Peng, Department of Statistics. February 12, 2013
Regression Analysis Using JMP Yiming Peng, Department of Statistics February 12, 2013 2 Presentation and Data http://www.lisa.stat.vt.edu Short Courses Regression Analysis Using JMP Download Data to Desktop
More informationSection 14 Simple Linear Regression: Introduction to Least Squares Regression
Slide 1 Section 14 Simple Linear Regression: Introduction to Least Squares Regression There are several different measures of statistical association used for understanding the quantitative relationship
More informationPenalized regression: Introduction
Penalized regression: Introduction Patrick Breheny August 30 Patrick Breheny BST 764: Applied Statistical Modeling 1/19 Maximum likelihood Much of 20thcentury statistics dealt with maximum likelihood
More informationSolving Systems of Linear Equations. Substitution
Solving Systems of Linear Equations There are two basic methods we will use to solve systems of linear equations: Substitution Elimination We will describe each for a system of two equations in two unknowns,
More information1 Orthogonal projections and the approximation
Math 1512 Fall 2010 Notes on least squares approximation Given n data points (x 1, y 1 ),..., (x n, y n ), we would like to find the line L, with an equation of the form y = mx + b, which is the best fit
More informationGENERALIZED LINEAR MODELS IN VEHICLE INSURANCE
ACTA UNIVERSITATIS AGRICULTURAE ET SILVICULTURAE MENDELIANAE BRUNENSIS Volume 62 41 Number 2, 2014 http://dx.doi.org/10.11118/actaun201462020383 GENERALIZED LINEAR MODELS IN VEHICLE INSURANCE Silvie Kafková
More informationSmoothing and NonParametric Regression
Smoothing and NonParametric Regression Germán Rodríguez grodri@princeton.edu Spring, 2001 Objective: to estimate the effects of covariates X on a response y nonparametrically, letting the data suggest
More informationLONGEVITY IMPACT ON THE LIFE ANNUITIES ON ROMANIA BY COMPARATIVE ANALYSIS WITH BULGARIA AND HUNGARY
LONGEVITY IMPACT ON THE LIFE ANNUITIES ON ROMANIA BY COMPARATIVE ANALYSIS WITH BULGARIA AND HUNGARY Lucian Claudiu ANGHEL, PhD * Cristian Ioan SOLOMON ** Abstract People are living longer worldwide than
More informationIntroducing the Multilevel Model for Change
Department of Psychology and Human Development Vanderbilt University GCM, 2010 1 Multilevel Modeling  A Brief Introduction 2 3 4 5 Introduction In this lecture, we introduce the multilevel model for change.
More informationComponent Ordering in Independent Component Analysis Based on Data Power
Component Ordering in Independent Component Analysis Based on Data Power Anne Hendrikse Raymond Veldhuis University of Twente University of Twente Fac. EEMCS, Signals and Systems Group Fac. EEMCS, Signals
More informationPrincipal Component Analysis
Principal Component Analysis ERS70D George Fernandez INTRODUCTION Analysis of multivariate data plays a key role in data analysis. Multivariate data consists of many different attributes or variables recorded
More informationBEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES
BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 123 CHAPTER 7 BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 7.1 Introduction Even though using SVM presents
More informationMachine Learning and Data Mining. Regression Problem. (adapted from) Prof. Alexander Ihler
Machine Learning and Data Mining Regression Problem (adapted from) Prof. Alexander Ihler Overview Regression Problem Definition and define parameters ϴ. Prediction using ϴ as parameters Measure the error
More informationLogistic Regression (1/24/13)
STA63/CBB540: Statistical methods in computational biology Logistic Regression (/24/3) Lecturer: Barbara Engelhardt Scribe: Dinesh Manandhar Introduction Logistic regression is model for regression used
More information9 Matrices, determinants, inverse matrix, Cramer s Rule
AAC  Business Mathematics I Lecture #9, December 15, 2007 Katarína Kálovcová 9 Matrices, determinants, inverse matrix, Cramer s Rule Basic properties of matrices: Example: Addition properties: Associative:
More informationA FUZZY BASED APPROACH TO TEXT MINING AND DOCUMENT CLUSTERING
A FUZZY BASED APPROACH TO TEXT MINING AND DOCUMENT CLUSTERING Sumit Goswami 1 and Mayank Singh Shishodia 2 1 Indian Institute of TechnologyKharagpur, Kharagpur, India sumit_13@yahoo.com 2 School of Computer
More informationMultivariate Analysis of Ecological Data
Multivariate Analysis of Ecological Data MICHAEL GREENACRE Professor of Statistics at the Pompeu Fabra University in Barcelona, Spain RAUL PRIMICERIO Associate Professor of Ecology, Evolutionary Biology
More information7  Linear Transformations
7  Linear Transformations Mathematics has as its objects of study sets with various structures. These sets include sets of numbers (such as the integers, rationals, reals, and complexes) whose structure
More informationSimple Predictive Analytics Curtis Seare
Using Excel to Solve Business Problems: Simple Predictive Analytics Curtis Seare Copyright: Vault Analytics July 2010 Contents Section I: Background Information Why use Predictive Analytics? How to use
More informationSchools Valueadded Information System Technical Manual
Schools Valueadded Information System Technical Manual Quality Assurance & Schoolbased Support Division Education Bureau 2015 Contents Unit 1 Overview... 1 Unit 2 The Concept of VA... 2 Unit 3 Control
More informationRegression Modeling Strategies
Frank E. Harrell, Jr. Regression Modeling Strategies With Applications to Linear Models, Logistic Regression, and Survival Analysis With 141 Figures Springer Contents Preface Typographical Conventions
More informationLecture 6: Poisson regression
Lecture 6: Poisson regression Claudia Czado TU München c (Claudia Czado, TU Munich) ZFS/IMS Göttingen 2004 0 Overview Introduction EDA for Poisson regression Estimation and testing in Poisson regression
More informationMATH 304 Linear Algebra Lecture 4: Matrix multiplication. Diagonal matrices. Inverse matrix.
MATH 304 Linear Algebra Lecture 4: Matrix multiplication. Diagonal matrices. Inverse matrix. Matrices Definition. An mbyn matrix is a rectangular array of numbers that has m rows and n columns: a 11
More informationAnalysis of Bayesian Dynamic Linear Models
Analysis of Bayesian Dynamic Linear Models Emily M. Casleton December 17, 2010 1 Introduction The main purpose of this project is to explore the Bayesian analysis of Dynamic Linear Models (DLMs). The main
More information6. Cholesky factorization
6. Cholesky factorization EE103 (Fall 201112) triangular matrices forward and backward substitution the Cholesky factorization solving Ax = b with A positive definite inverse of a positive definite matrix
More informationMarketing Mix Modelling and Big Data P. M Cain
1) Introduction Marketing Mix Modelling and Big Data P. M Cain Big data is generally defined in terms of the volume and variety of structured and unstructured information. Whereas structured data is stored
More informationDATA ANALYSIS II. Matrix Algorithms
DATA ANALYSIS II Matrix Algorithms Similarity Matrix Given a dataset D = {x i }, i=1,..,n consisting of n points in R d, let A denote the n n symmetric similarity matrix between the points, given as where
More informationMath 54. Selected Solutions for Week Is u in the plane in R 3 spanned by the columns
Math 5. Selected Solutions for Week 2 Section. (Page 2). Let u = and A = 5 2 6. Is u in the plane in R spanned by the columns of A? (See the figure omitted].) Why or why not? First of all, the plane in
More informationAn Introduction to Hierarchical Linear Modeling for Marketing Researchers
An Introduction to Hierarchical Linear Modeling for Marketing Researchers Barbara A. Wech and Anita L. Heck Organizations are hierarchical in nature. Specifically, individuals in the workplace are entrenched
More informationPiecewise Cubic Splines
280 CHAP. 5 CURVE FITTING Piecewise Cubic Splines The fitting of a polynomial curve to a set of data points has applications in CAD (computerassisted design), CAM (computerassisted manufacturing), and
More information( ) which must be a vector
MATH 37 Linear Transformations from Rn to Rm Dr. Neal, WKU Let T : R n R m be a function which maps vectors from R n to R m. Then T is called a linear transformation if the following two properties are
More informationFactorization Theorems
Chapter 7 Factorization Theorems This chapter highlights a few of the many factorization theorems for matrices While some factorization results are relatively direct, others are iterative While some factorization
More informationSolving Systems of Linear Equations
LECTURE 5 Solving Systems of Linear Equations Recall that we introduced the notion of matrices as a way of standardizing the expression of systems of linear equations In today s lecture I shall show how
More informationLogistic Regression (a type of Generalized Linear Model)
Logistic Regression (a type of Generalized Linear Model) 1/36 Today Review of GLMs Logistic Regression 2/36 How do we find patterns in data? We begin with a model of how the world works We use our knowledge
More informationMachine Learning and Pattern Recognition Logistic Regression
Machine Learning and Pattern Recognition Logistic Regression Course Lecturer:Amos J Storkey Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh Crichton Street,
More informationOffset Techniques for Predictive Modeling for Insurance
Offset Techniques for Predictive Modeling for Insurance Matthew Flynn, Ph.D, ISO Innovative Analytics, W. Hartford CT Jun Yan, Ph.D, Deloitte & Touche LLP, Hartford CT ABSTRACT This paper presents the
More informationLecture 3: Linear methods for classification
Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,
More informationPrinciple Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression
Principle Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression Saikat Maitra and Jun Yan Abstract: Dimension reduction is one of the major tasks for multivariate
More informationNON SINGULAR MATRICES. DEFINITION. (Non singular matrix) An n n A is called non singular or invertible if there exists an n n matrix B such that
NON SINGULAR MATRICES DEFINITION. (Non singular matrix) An n n A is called non singular or invertible if there exists an n n matrix B such that AB = I n = BA. Any matrix B with the above property is called
More information1 Introduction to Matrices
1 Introduction to Matrices In this section, important definitions and results from matrix algebra that are useful in regression analysis are introduced. While all statements below regarding the columns
More informationHandling missing data in large data sets. Agostino Di Ciaccio Dept. of Statistics University of Rome La Sapienza
Handling missing data in large data sets Agostino Di Ciaccio Dept. of Statistics University of Rome La Sapienza The problem Often in official statistics we have large data sets with many variables and
More informationSummary of week 8 (Lectures 22, 23 and 24)
WEEK 8 Summary of week 8 (Lectures 22, 23 and 24) This week we completed our discussion of Chapter 5 of [VST] Recall that if V and W are inner product spaces then a linear map T : V W is called an isometry
More informationNonnested model comparison of GLM and GAM count regression models for life insurance data
Nonnested model comparison of GLM and GAM count regression models for life insurance data Claudia Czado, Julia Pfettner, Susanne Gschlößl, Frank Schiller December 8, 2009 Abstract Pricing and product development
More informationSubspace Analysis and Optimization for AAM Based Face Alignment
Subspace Analysis and Optimization for AAM Based Face Alignment Ming Zhao Chun Chen College of Computer Science Zhejiang University Hangzhou, 310027, P.R.China zhaoming1999@zju.edu.cn Stan Z. Li Microsoft
More informationANALYSIS, THEORY AND DESIGN OF LOGISTIC REGRESSION CLASSIFIERS USED FOR VERY LARGE SCALE DATA MINING
ANALYSIS, THEORY AND DESIGN OF LOGISTIC REGRESSION CLASSIFIERS USED FOR VERY LARGE SCALE DATA MINING BY OMID ROUHANIKALLEH THESIS Submitted as partial fulfillment of the requirements for the degree of
More informationBayesX  Software for Bayesian Inference in Structured Additive Regression
BayesX  Software for Bayesian Inference in Structured Additive Regression Thomas Kneib Faculty of Mathematics and Economics, University of Ulm Department of Statistics, LudwigMaximiliansUniversity Munich
More informationBOOSTED REGRESSION TREES: A MODERN WAY TO ENHANCE ACTUARIAL MODELLING
BOOSTED REGRESSION TREES: A MODERN WAY TO ENHANCE ACTUARIAL MODELLING Xavier Conort xavier.conort@gearanalytics.com Session Number: TBR14 Insurance has always been a data business The industry has successfully
More informationCanonical Correlation
Chapter 400 Introduction Canonical correlation analysis is the study of the linear relations between two sets of variables. It is the multivariate extension of correlation analysis. Although we will present
More informationApplications to Data Smoothing and Image Processing I
Applications to Data Smoothing and Image Processing I MA 348 Kurt Bryan Signals and Images Let t denote time and consider a signal a(t) on some time interval, say t. We ll assume that the signal a(t) is
More information1. LINEAR EQUATIONS. A linear equation in n unknowns x 1, x 2,, x n is an equation of the form
1. LINEAR EQUATIONS A linear equation in n unknowns x 1, x 2,, x n is an equation of the form a 1 x 1 + a 2 x 2 + + a n x n = b, where a 1, a 2,..., a n, b are given real numbers. For example, with x and
More informationGeneralized Linear Models. Today: definition of GLM, maximum likelihood estimation. Involves choice of a link function (systematic component)
Generalized Linear Models Last time: definition of exponential family, derivation of mean and variance (memorize) Today: definition of GLM, maximum likelihood estimation Include predictors x i through
More informationPractical Guide to the Simplex Method of Linear Programming
Practical Guide to the Simplex Method of Linear Programming Marcel Oliver Revised: April, 0 The basic steps of the simplex algorithm Step : Write the linear programming problem in standard form Linear
More informationApplied Statistics. J. Blanchet and J. Wadsworth. Institute of Mathematics, Analysis, and Applications EPF Lausanne
Applied Statistics J. Blanchet and J. Wadsworth Institute of Mathematics, Analysis, and Applications EPF Lausanne An MSc Course for Applied Mathematicians, Fall 2012 Outline 1 Model Comparison 2 Model
More informationBasic Statistics and Data Analysis for Health Researchers from Foreign Countries
Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Volkert Siersma siersma@sund.ku.dk The Research Unit for General Practice in Copenhagen Dias 1 Content Quantifying association
More informationSolution. Area(OABC) = Area(OAB) + Area(OBC) = 1 2 det( [ 5 2 1 2. Question 2. Let A = (a) Calculate the nullspace of the matrix A.
Solutions to Math 30 Takehome prelim Question. Find the area of the quadrilateral OABC on the figure below, coordinates given in brackets. [See pp. 60 63 of the book.] y C(, 4) B(, ) A(5, ) O x Area(OABC)
More information(a) The transpose of a lower triangular matrix is upper triangular, and the transpose of an upper triangular matrix is lower triangular.
Theorem.7.: (Properties of Triangular Matrices) (a) The transpose of a lower triangular matrix is upper triangular, and the transpose of an upper triangular matrix is lower triangular. (b) The product
More informationSimple Linear Regression One Binary Categorical Independent Variable
Simple Linear Regression Does sex influence mean GCSE score? In order to answer the question posed above, we want to run a linear regression of sgcseptsnew against sgender, which is a binary categorical
More informationIMPROVEMENT OF DIGITAL IMAGE RESOLUTION BY OVERSAMPLING
ABSTRACT: IMPROVEMENT OF DIGITAL IMAGE RESOLUTION BY OVERSAMPLING Hakan Wiman Department of Photogrammetry, Royal Institute of Technology S  100 44 Stockholm, Sweden (email hakanw@fmi.kth.se) ISPRS Commission
More information171:290 Model Selection Lecture II: The Akaike Information Criterion
171:290 Model Selection Lecture II: The Akaike Information Criterion Department of Biostatistics Department of Statistics and Actuarial Science August 28, 2012 Introduction AIC, the Akaike Information
More informationA Handbook of Statistical Analyses Using R. Brian S. Everitt and Torsten Hothorn
A Handbook of Statistical Analyses Using R Brian S. Everitt and Torsten Hothorn CHAPTER 6 Logistic Regression and Generalised Linear Models: Blood Screening, Women s Role in Society, and Colonic Polyps
More informationNOTES ON LINEAR TRANSFORMATIONS
NOTES ON LINEAR TRANSFORMATIONS Definition 1. Let V and W be vector spaces. A function T : V W is a linear transformation from V to W if the following two properties hold. i T v + v = T v + T v for all
More informationData Matching Optimal and Greedy
Chapter 13 Data Matching Optimal and Greedy Introduction This procedure is used to create treatmentcontrol matches based on propensity scores and/or observed covariate variables. Both optimal and greedy
More informationTOWARD BIG DATA ANALYSIS WORKSHOP
TOWARD BIG DATA ANALYSIS WORKSHOP 邁 向 巨 量 資 料 分 析 研 討 會 摘 要 集 2015.06.0506 巨 量 資 料 之 矩 陣 視 覺 化 陳 君 厚 中 央 研 究 院 統 計 科 學 研 究 所 摘 要 視 覺 化 (Visualization) 與 探 索 式 資 料 分 析 (Exploratory Data Analysis, EDA)
More information