The zero-adjusted Inverse Gaussian distribution as a model for insurance claims
|
|
- Cecilia Hamilton
- 8 years ago
- Views:
Transcription
1 The zero-adjusted Inverse Gaussian distribution as a model for insurance claims Gillian Heller 1, Mikis Stasinopoulos 2 and Bob Rigby 2 1 Dept of Statistics, Macquarie University, Sydney, Australia. gheller@efs.mq.edu.au 2 STORM, London Metropolitan University. s: d.stasinopoulos@londonmet.ac.uk and r.rigby@londonmet.ac.uk Abstract: We introduce a method for modelling insurance claim sizes, including zero claims. A mixed discrete-continuous model, with a probability mass at zero and an Inverse Gaussian continuous component, is used. The Inverse Gaussian distribution accommodates the extreme right skewness of the claim distribution. The model explicitly specifies a logit-linear model for the occurrence of a claim; and log-linear models for the mean claim size (given a claim has occurred); and the dispersion of claim sizes (given a claim has occurred). The method is illustrated on aa Australian motor vehicle insurance data set. Keywords: Inverse Gaussian model; zero-adjusted; insurance claims; gamlss. 1 Introduction The purpose of modelling claim sizes on insurance policies is to price premiums accurately, and to estimate the risk of extreme claim events. In a fixed period, a policy will either experience a claim, which is a nonnegative amount typically having an extremely right-skewed distribution, or no claim, in which the claim amount is identically zero. The distribution of the claim size is then mixed discrete-continuous: a continuous, rightskewed distribution mixed with a single probability mass at zero. In this respect the phenomenon is similar to rainfall, which is either identically zero on a dry day, or a continuous non-negative size on a wet day. 1.1 Models for insurance claims Much attention has been paid in the actuarial literature to alternative distributions for claim sizes (e.g. Hogg and Klugman (1984)) and some authors have developed regression models (usually generalized linear models) for explaining claim sizes as a function of risk factors (e.g. Haberman and Renshaw (1996)). All of these are models for claim sizes in the subclass of policies which had a claim in the period of observation. Jørgensen and de Souza (1994) and Smyth and Jørgensen (2002) considered models for claim sizes, including the zero claims. These are based on
2 2 Zero-adjusted Inverse Gaussian the Tweedie distribution, which may be characterised as a Poisson sum of Gamma random variates. A problem with the Tweedie distribution model is that the probabilities at zero can not modeled explicitly as a function of explanatory variables; and as we shall see in the example, the Gamma distribution is inadequate for modelling the extreme right-skewness which is present in our data. 2 The zero-adjusted Inverse Gaussian model Let y i = size of claim on i th policy, i = 1,..., n. We can write the distribution of y as a mixed discrete-continuous probability function: f(y) = 1 π y = 0 = π g(y) y > 0 (1) where g(y) is the density of a continuous, right-skewed distribution and π is the probability of a claim. 2.1 Continuous part of the model The extreme right skewness of claims distributions has been well documented. Candidate distributions within the exponential family are the Gamma and Inverse Gaussian distributions. Motor vehicle insurance example We illustrate the method on a class of motor vehicle insurance policies from an Australian insurance company in There were 67,856 policies, of which 4,624 (6.8%) had at least one claim in the period of observation. Of these, 4,333 policies (6.4%) had one claim, and the remaining 291 policies (0.4%) between 2 and 4 claims. The maximum claim size was $56,000. A histogram of the non-zero claims, and the pdfs of the fitted Gamma and inverse Gaussian distributions are shown in Figure 1. (For clarity of display the horizontal axis has been truncated, at $15,000. Sixty-five observations were omitted.) The Gamma clearly does not reproduce the shape of the observed claim size distribution; the Inverse Gaussian looks to be a far better fit, accommodating both the mode near zero and the extremely long tail of the distribution. The density of the inverse Gaussian is: [ 1 g(y) = 2πy3 σ exp 1 ( ) ] 2 y µ 2y µσ y > 0 which has E(y) = µ and V ar(y) = σ 2 µ 3. The use of the Inverse Gaussian distribution for modelling claim sizes has been recommended by, for example, Berg (1994).
3 Heller et al. 3 Inverse Gaussian Gamma f(y) 0 e+00 2 e 04 4 e 04 6 e 04 8 e 04 FIGURE 1. distribution: motor vehicle insurance 2.2 Discrete part of the model The obvious model for the probability of a claim is the Bernoulli. Let w i be a binary variable indicating the occurrence of at least one claim, and π i be the probability of at least one claim, on policy i. Note that the occurrence of more than one claim in the period of observation is rare. Then f(w i ) = π w i i (1 π i ) 1 w i w i = 0, 1 However, we have to correct for the typical feature of policy-level data, that not all policies have been in force for the entire period of observation. Let t i = exposure of policy i, 0 < t i 1. (Exposure is the proportion of the period of observation for which the policy has been in force.) We will be assuming that the t i are known. If c i is the number of claims in the period, and we assume a Poisson process with mean number of claims (per unit exposure time) π i then c i t i P o(t i π i ), P (c i = 0 t i = 1) = e πi 1 π i and P (c i = 0 t i ) = e t iπ i 1 t i π i, provided t i π i is small. This gives f(w i ) = (π i ) w i (1 π i ) 1 w i w i = 0, 1 i.e. Bernoulli with πi link function on π i : = t iπ i. We incorporate covariates through the logit log π i 1 π i = η i
4 4 Zero-adjusted Inverse Gaussian i.e. πi log /t i 1 πi /t = η i (2) i and the correction for differing periods of exposure enters the model through the modified link function (2). The predictor η i is defined in the next section. 2.3 The mixture model The zero-adjusted Inverse Gaussian (ZAIG) model is then f(y i ) = 1 π i y i = 0 = π i [ 1 exp 1 2πy 3 i σ i 2y i ( ) ] 2 yi µ i µ i σ i y i > 0 which has E(y i ) = π i µ i and V ar(y i ) = π i µ i 2 ( 1 π i + µ iσ 2 i ). Following Rigby and Stasinopoulos (2005), who specify generalized additive models for the location, scale and shape parameters of a variety of distributions, we specify the following models on the parameters µ i, σ i and π i : log(µ i ) = x 1µiβ µ + f µ (x 2µi ) log(σ i ) = x 1σiβ σ + f σ (x 2σi ) πi log /t i 1 πi /t i = x 1πiβ π + f π (x 2πi ) where x 1µi, x 2µi, x 1σi, x 2σi, x 1πi and x 2πi are covariate vectors for µ i, σ i and πi, which may be different, the same, or may have some but not all elements in common; β µ, β σ and β π are the corresponding parameter vectors; and f µ, f σ and f π are nonparametric functions, typically smoothing splines. In order to correct for multiple claims in the period, we use the fact that, if y j IG(µ, σ), j = 1,..., c independently, then the total t = j y j has the distribution t IG(µ, σ ) where µ = cµ and σ = σ/c. As log(µ ) = log(µ) + log(c) and log(σ ) = log(σ) log(c) we use log(c i ) and log(c i ) as offsets in the models for µ i and σ i respectively, where c i is the number of claims on policy i. (A doubtful assumption here is that multiple claim amounts on the same policy are independent.)
5 Heller et al. 5 3 Estimation The ZAIG has been incorporated into the gamlss package in R (Stasinopoulos et al. (2006)). Maximum (penalised) likelihood estimation is used. The penalized log likelihood function of the model is maximized iteratively using either the RS or CG algorithm of Rigby and Stasinopoulos (2005), which in turn uses a back-fitting algorithm to perform each step of the Fisher scoring procedure. Both RS and CG algorithms use the log likelihood of the data, and its first derivatives (and optionally expected second derivatives) with respect to distributional parameters, which in this case are µ, σ and ν = π. The CG algorithm, a generalization of the algorithm used by Cole and Green (1992), additionally uses the expected cross derivatives. 3.1 Motor vehicle insurance The following covariates were available: Variable Range Characteristics of policy holder: Age band 1,2,3,4,5,6 (1 is youngest) Gender male, female Area of residence A, B, C, D, E, F Characteristics of vehicle: Value $0-$350,000 Make A, B, C, D Age 1, 2, 3, 4 (1 is recent) Body type bus, convertible, coupe, hatchback, hardtop, motorised caravan/combi, minibus, panel van, roadster, sedan, station wagon, truck, utility Using the GAIC as model selection criterion, the following final model was selected: log(µ) = age band + gender + area + offset{log(claims)} log(σ) = area + offset{-log(claims)} ) = age band + area + vehicle body + spline(vehicle value) log( π 1 π Comments on the model Model for π: The model for the occurrence of a claim has terms for both policyholder and vehicle characteristics. Policyholder age, area and vehicle body are all categorical, so their form is not an issue; vehicle value is the only continuous covariate that we have, and it enters in the model in a smoothing spline form. This is understood when we examine the scatterplot of claim/no claim, with a smoothing spline, in Figure 2. The relationship is nonlinear; the probability of a claim is at a maximum for vehicle value around $40,000.
6 Zero-adjusted Inverse Gaussian Claim Smoothed data Vehicle value in $10,000 units FIGURE 2. Occurrence of a claim (0/1) plotted against vehicle value, with smoothing spline Model for µ: This contains only policyholder characteristics, which is surprising. A more complicated model involving vehicle value, make and some interaction terms, was a close second in the model selection. However, it was felt that this was too complex and difficult to interpret, so the simpler version was chosen. Model for σ: Area is the only covariate for σ. The variation of the claim size distribution with area is shown in Figure 3: it can be seen that areas D, E and F have shapes which are different from A, B and C, reflected in lower values for σ. In fact areas D, E and F are rural whereas A, B and C are urban. The explanatory variables age band and area appear in the model equations for both π and µ. It is of interest whether they affect the occurrence of a claim, and claim size, in the same way. Figure 4.a shows the effect of age band (eβ ), on both π/(1 π) and µ; figure 4.b shows the effect of area on both π/(1 π) and µ. Note that age band=3 and area=a are the reference categories. Age band 1 (the youngest drivers) increases both the odds of a claim and the mean claim size, to a similar extent; age bands 2 and 4 have a similar effect to age band 3; and age bands 5 and 6 (older drivers) decrease both the odds of a claim, and the mean claim size, their effect being greater on the odds of a claim. The effect of area on the odds of a claim, and mean claim size, is less clear: the only clear indication is that the mean claim size is increased in area F.
7 Heller et al. 7 A. µ^ = 1909, σ^ = B. µ^ = 1860, σ^ = C. µ^ = 2030, σ^ = D. µ^ = 1837, σ^ = E. µ^ = 2251, σ^ = F. µ^ = 2864, σ^ = FIGURE 3. distribution by area 4 Conclusion We introduce a method for modelling insurance claim sizes using a zero adjusted Inverse Gaussian (ZAIG) model, which explicitly specifies a logitlinear model for the occurrence of a claim; and log-linear models for the mean claim size (given a claim has occurred); and the dispersion of claim sizes (given a claim has occurred). These three models may incorporate different covariates, or some of the same covariates, and may depend on common covariates in different ways. The Inverse Gaussian distribution accommodates the extreme right skewness of the claim distributions. Given the risk factors for a potential new policyholder, the expected claim size may easily be computed as the expected value of the ZAIG distribution, conditional on the covariate values; and quartiles of the claim size distribution may be calculated for each combination of covariate values. The ZAIG distribution introduced here is a useful distribution for modelling data where the total amount per unit of time is observed but where zero amounts are possible. Rainfall data and smoking/drinking habits data are possible candidates for modelling using the ZAIG distribution. References Berg, P.T. (1994). Deductibles and the inverse Gaussian distribution. ASTIN Bulletin, 24,
8 8 Zero-adjusted Inverse Gaussian a. Age band b. Area exp(β^) Occurrence of claim exp(β^) Age band A B C D E F Area FIGURE 4. Effect of age category and area (exp( ˆβ)) on occurrence of claim and claim size Cole, T. and Green, P. (1992) Smoothing reference centile curves: The LMS method and penalized likelihood. Statist. in Med, 11, Hogg, R.V. and Klugman, S.A. (1984). Loss Distributions. New York: Wiley. Haberman, S. and Renshaw, A.E. (1996). Generalized Linear Models and Actuarial Science. The Statistician, 45 (4), Jørgensen, B. and de Souza, M.C.P. (1994). Fitting Tweedie s compound Poisson model to insurance claims data. Scandinavian Actuarial Journal, Rigby, R.A. and Stasinopoulos, D.M. (2005). Generalized Additive Models for Location, Scale and Shape (with discussion). Appl. Statist., 54, 1-38 Smyth, G.K. and Jørgensen, B. (2002). Fitting Tweedie s compound Poisson model to insurance claims data: dispersion modelling. ASTIN Bulletin, 32(1), Stasinopoulos D. M., Rigby R.A. and Akantziliotou C. (2006) gamlss: A collection of functions to fit Generalized Additive Models for Location Scale and Shape, R package version 1.1-0, url = ac.uk/gamlss/.
Travelers Analytics: U of M Stats 8053 Insurance Modeling Problem
Travelers Analytics: U of M Stats 8053 Insurance Modeling Problem October 30 th, 2013 Nathan Hubbell, FCAS Shengde Liang, Ph.D. Agenda Travelers: Who Are We & How Do We Use Data? Insurance 101 Basic business
More informationGENERALIZED LINEAR MODELS IN VEHICLE INSURANCE
ACTA UNIVERSITATIS AGRICULTURAE ET SILVICULTURAE MENDELIANAE BRUNENSIS Volume 62 41 Number 2, 2014 http://dx.doi.org/10.11118/actaun201462020383 GENERALIZED LINEAR MODELS IN VEHICLE INSURANCE Silvie Kafková
More informationPoisson Models for Count Data
Chapter 4 Poisson Models for Count Data In this chapter we study log-linear models for count data under the assumption of a Poisson error structure. These models have many applications, not only to the
More informationLogistic Regression (1/24/13)
STA63/CBB540: Statistical methods in computational biology Logistic Regression (/24/3) Lecturer: Barbara Engelhardt Scribe: Dinesh Manandhar Introduction Logistic regression is model for regression used
More informationOffset Techniques for Predictive Modeling for Insurance
Offset Techniques for Predictive Modeling for Insurance Matthew Flynn, Ph.D, ISO Innovative Analytics, W. Hartford CT Jun Yan, Ph.D, Deloitte & Touche LLP, Hartford CT ABSTRACT This paper presents the
More informationA LOGNORMAL MODEL FOR INSURANCE CLAIMS DATA
REVSTAT Statistical Journal Volume 4, Number 2, June 2006, 131 142 A LOGNORMAL MODEL FOR INSURANCE CLAIMS DATA Authors: Daiane Aparecida Zuanetti Departamento de Estatística, Universidade Federal de São
More informationClassification Problems
Classification Read Chapter 4 in the text by Bishop, except omit Sections 4.1.6, 4.1.7, 4.2.4, 4.3.3, 4.3.5, 4.3.6, 4.4, and 4.5. Also, review sections 1.5.1, 1.5.2, 1.5.3, and 1.5.4. Classification Problems
More informationExample: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.
Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C
More informationEMPIRICAL RISK MINIMIZATION FOR CAR INSURANCE DATA
EMPIRICAL RISK MINIMIZATION FOR CAR INSURANCE DATA Andreas Christmann Department of Mathematics homepages.vub.ac.be/ achristm Talk: ULB, Sciences Actuarielles, 17/NOV/2006 Contents 1. Project: Motor vehicle
More informationGamma Distribution Fitting
Chapter 552 Gamma Distribution Fitting Introduction This module fits the gamma probability distributions to a complete or censored set of individual or grouped data values. It outputs various statistics
More informationLogistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression
Logistic Regression Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max
More informationStatistical Analysis of Life Insurance Policy Termination and Survivorship
Statistical Analysis of Life Insurance Policy Termination and Survivorship Emiliano A. Valdez, PhD, FSA Michigan State University joint work with J. Vadiveloo and U. Dias Session ES82 (Statistics in Actuarial
More informationModel Selection and Claim Frequency for Workers Compensation Insurance
Model Selection and Claim Frequency for Workers Compensation Insurance Jisheng Cui, David Pitt and Guoqi Qian Abstract We consider a set of workers compensation insurance claim data where the aggregate
More informationSpatial Statistics Chapter 3 Basics of areal data and areal data modeling
Spatial Statistics Chapter 3 Basics of areal data and areal data modeling Recall areal data also known as lattice data are data Y (s), s D where D is a discrete index set. This usually corresponds to data
More informationI L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN
Beckman HLM Reading Group: Questions, Answers and Examples Carolyn J. Anderson Department of Educational Psychology I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN Linear Algebra Slide 1 of
More informationLinear Classification. Volker Tresp Summer 2015
Linear Classification Volker Tresp Summer 2015 1 Classification Classification is the central task of pattern recognition Sensors supply information about an object: to which class do the object belong
More informationOverview Classes. 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7)
Overview Classes 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7) 2-4 Loglinear models (8) 5-4 15-17 hrs; 5B02 Building and
More informationStatistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes
More informationGLM I An Introduction to Generalized Linear Models
GLM I An Introduction to Generalized Linear Models CAS Ratemaking and Product Management Seminar March 2009 Presented by: Tanya D. Havlicek, Actuarial Assistant 0 ANTITRUST Notice The Casualty Actuarial
More informationHierarchical Insurance Claims Modeling
Hierarchical Insurance Claims Modeling Edward W. (Jed) Frees, University of Wisconsin - Madison Emiliano A. Valdez, University of Connecticut 2009 Joint Statistical Meetings Session 587 - Thu 8/6/09-10:30
More informationSurvival Analysis of Left Truncated Income Protection Insurance Data. [March 29, 2012]
Survival Analysis of Left Truncated Income Protection Insurance Data [March 29, 2012] 1 Qing Liu 2 David Pitt 3 Yan Wang 4 Xueyuan Wu Abstract One of the main characteristics of Income Protection Insurance
More informationExploratory Data Analysis
Exploratory Data Analysis Johannes Schauer johannes.schauer@tugraz.at Institute of Statistics Graz University of Technology Steyrergasse 17/IV, 8010 Graz www.statistics.tugraz.at February 12, 2008 Introduction
More informationLecture 3: Linear methods for classification
Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,
More informationBayesX - Software for Bayesian Inference in Structured Additive Regression
BayesX - Software for Bayesian Inference in Structured Additive Regression Thomas Kneib Faculty of Mathematics and Economics, University of Ulm Department of Statistics, Ludwig-Maximilians-University Munich
More informationA revisit of the hierarchical insurance claims modeling
A revisit of the hierarchical insurance claims modeling Emiliano A. Valdez Michigan State University joint work with E.W. Frees* * University of Wisconsin Madison Statistical Society of Canada (SSC) 2014
More informationChristfried Webers. Canberra February June 2015
c Statistical Group and College of Engineering and Computer Science Canberra February June (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 829 c Part VIII Linear Classification 2 Logistic
More informationMore Flexible GLMs Zero-Inflated Models and Hybrid Models
More Flexible GLMs Zero-Inflated Models and Hybrid Models Mathew Flynn, Ph.D. Louise A. Francis FCAS, MAAA Motivation: GLMs are widely used in insurance modeling applications. Claim or frequency models
More informationGeneralized Linear Models
Generalized Linear Models We have previously worked with regression models where the response variable is quantitative and normally distributed. Now we turn our attention to two types of models where the
More informationWeb-based Supplementary Materials for. Modeling of Hormone Secretion-Generating. Mechanisms With Splines: A Pseudo-Likelihood.
Web-based Supplementary Materials for Modeling of Hormone Secretion-Generating Mechanisms With Splines: A Pseudo-Likelihood Approach by Anna Liu and Yuedong Wang Web Appendix A This appendix computes mean
More informationIntroduction to Predictive Modeling Using GLMs
Introduction to Predictive Modeling Using GLMs Dan Tevet, FCAS, MAAA, Liberty Mutual Insurance Group Anand Khare, FCAS, MAAA, CPCU, Milliman 1 Antitrust Notice The Casualty Actuarial Society is committed
More informationLogistic Regression (a type of Generalized Linear Model)
Logistic Regression (a type of Generalized Linear Model) 1/36 Today Review of GLMs Logistic Regression 2/36 How do we find patterns in data? We begin with a model of how the world works We use our knowledge
More informationThese slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop
Music and Machine Learning (IFT6080 Winter 08) Prof. Douglas Eck, Université de Montréal These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher
More informationSAS Software to Fit the Generalized Linear Model
SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling
More informationNominal and ordinal logistic regression
Nominal and ordinal logistic regression April 26 Nominal and ordinal logistic regression Our goal for today is to briefly go over ways to extend the logistic regression model to the case where the outcome
More informationSTATISTICA Formula Guide: Logistic Regression. Table of Contents
: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary
More informationAssumptions. Assumptions of linear models. Boxplot. Data exploration. Apply to response variable. Apply to error terms from linear model
Assumptions Assumptions of linear models Apply to response variable within each group if predictor categorical Apply to error terms from linear model check by analysing residuals Normality Homogeneity
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct
More information13. Poisson Regression Analysis
136 Poisson Regression Analysis 13. Poisson Regression Analysis We have so far considered situations where the outcome variable is numeric and Normally distributed, or binary. In clinical work one often
More informationMaximum Likelihood Estimation
Math 541: Statistical Theory II Lecturer: Songfeng Zheng Maximum Likelihood Estimation 1 Maximum Likelihood Estimation Maximum likelihood is a relatively simple method of constructing an estimator for
More informationChapter 3 RANDOM VARIATE GENERATION
Chapter 3 RANDOM VARIATE GENERATION In order to do a Monte Carlo simulation either by hand or by computer, techniques must be developed for generating values of random variables having known distributions.
More informationStatistics in Retail Finance. Chapter 6: Behavioural models
Statistics in Retail Finance 1 Overview > So far we have focussed mainly on application scorecards. In this chapter we shall look at behavioural models. We shall cover the following topics:- Behavioural
More informationExercise 1.12 (Pg. 22-23)
Individuals: The objects that are described by a set of data. They may be people, animals, things, etc. (Also referred to as Cases or Records) Variables: The characteristics recorded about each individual.
More informationPattern Analysis. Logistic Regression. 12. Mai 2009. Joachim Hornegger. Chair of Pattern Recognition Erlangen University
Pattern Analysis Logistic Regression 12. Mai 2009 Joachim Hornegger Chair of Pattern Recognition Erlangen University Pattern Analysis 2 / 43 1 Logistic Regression Posteriors and the Logistic Function Decision
More information1 Sufficient statistics
1 Sufficient statistics A statistic is a function T = rx 1, X 2,, X n of the random sample X 1, X 2,, X n. Examples are X n = 1 n s 2 = = X i, 1 n 1 the sample mean X i X n 2, the sample variance T 1 =
More informationPredictive Modeling in Long-Term Care Insurance
Predictive Modeling in Long-Term Care Insurance Nathan R. Lally and Brian M. Hartman May 3, 2015 Abstract The accurate prediction of long-term care insurance (LTCI) mortality, lapse, and claim rates is
More informationInstitute of Actuaries of India Subject CT3 Probability and Mathematical Statistics
Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics For 2015 Examinations Aim The aim of the Probability and Mathematical Statistics subject is to provide a grounding in
More informationA Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution
A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 4: September
More informationPATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical
More informationCombining Linear and Non-Linear Modeling Techniques: EMB America. Getting the Best of Two Worlds
Combining Linear and Non-Linear Modeling Techniques: Getting the Best of Two Worlds Outline Who is EMB? Insurance industry predictive modeling applications EMBLEM our GLM tool How we have used CART with
More informationDongfeng Li. Autumn 2010
Autumn 2010 Chapter Contents Some statistics background; ; Comparing means and proportions; variance. Students should master the basic concepts, descriptive statistics measures and graphs, basic hypothesis
More informationSTT315 Chapter 4 Random Variables & Probability Distributions KM. Chapter 4.5, 6, 8 Probability Distributions for Continuous Random Variables
Chapter 4.5, 6, 8 Probability Distributions for Continuous Random Variables Discrete vs. continuous random variables Examples of continuous distributions o Uniform o Exponential o Normal Recall: A random
More informationStatistics. Measurement. Scales of Measurement 7/18/2012
Statistics Measurement Measurement is defined as a set of rules for assigning numbers to represent objects, traits, attributes, or behaviors A variableis something that varies (eye color), a constant does
More informationApproximation of Aggregate Losses Using Simulation
Journal of Mathematics and Statistics 6 (3): 233-239, 2010 ISSN 1549-3644 2010 Science Publications Approimation of Aggregate Losses Using Simulation Mohamed Amraja Mohamed, Ahmad Mahir Razali and Noriszura
More information5. Multiple regression
5. Multiple regression QBUS6840 Predictive Analytics https://www.otexts.org/fpp/5 QBUS6840 Predictive Analytics 5. Multiple regression 2/39 Outline Introduction to multiple linear regression Some useful
More informationReject Inference in Credit Scoring. Jie-Men Mok
Reject Inference in Credit Scoring Jie-Men Mok BMI paper January 2009 ii Preface In the Master programme of Business Mathematics and Informatics (BMI), it is required to perform research on a business
More informationAutomated Biosurveillance Data from England and Wales, 1991 2011
Article DOI: http://dx.doi.org/10.3201/eid1901.120493 Automated Biosurveillance Data from England and Wales, 1991 2011 Technical Appendix This online appendix provides technical details of statistical
More informationLeast Squares Estimation
Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David
More informationLogistic regression modeling the probability of success
Logistic regression modeling the probability of success Regression models are usually thought of as only being appropriate for target variables that are continuous Is there any situation where we might
More informationFrom the help desk: hurdle models
The Stata Journal (2003) 3, Number 2, pp. 178 184 From the help desk: hurdle models Allen McDowell Stata Corporation Abstract. This article demonstrates that, although there is no command in Stata for
More informationComparing return to work outcomes between vocational rehabilitation providers after adjusting for case mix using statistical models
Comparing return to work outcomes between vocational rehabilitation providers after adjusting for case mix using statistical models Prepared by Jim Gaetjens Presented to the Institute of Actuaries of Australia
More informationJava Modules for Time Series Analysis
Java Modules for Time Series Analysis Agenda Clustering Non-normal distributions Multifactor modeling Implied ratings Time series prediction 1. Clustering + Cluster 1 Synthetic Clustering + Time series
More informationBOOSTED REGRESSION TREES: A MODERN WAY TO ENHANCE ACTUARIAL MODELLING
BOOSTED REGRESSION TREES: A MODERN WAY TO ENHANCE ACTUARIAL MODELLING Xavier Conort xavier.conort@gear-analytics.com Session Number: TBR14 Insurance has always been a data business The industry has successfully
More information7.1 The Hazard and Survival Functions
Chapter 7 Survival Models Our final chapter concerns models for the analysis of data which have three main characteristics: (1) the dependent variable or response is the waiting time until the occurrence
More informationProbability Calculator
Chapter 95 Introduction Most statisticians have a set of probability tables that they refer to in doing their statistical wor. This procedure provides you with a set of electronic statistical tables that
More informationRisk pricing for Australian Motor Insurance
Risk pricing for Australian Motor Insurance Dr Richard Brookes November 2012 Contents 1. Background Scope How many models? 2. Approach Data Variable filtering GLM Interactions Credibility overlay 3. Model
More informationGeostatistics Exploratory Analysis
Instituto Superior de Estatística e Gestão de Informação Universidade Nova de Lisboa Master of Science in Geospatial Technologies Geostatistics Exploratory Analysis Carlos Alberto Felgueiras cfelgueiras@isegi.unl.pt
More informationPoisson Regression or Regression of Counts (& Rates)
Poisson Regression or Regression of (& Rates) Carolyn J. Anderson Department of Educational Psychology University of Illinois at Urbana-Champaign Generalized Linear Models Slide 1 of 51 Outline Outline
More informationOwn Damage, Third Party Property Damage Claims and Malaysian Motor Insurance: An Empirical Examination
Australian Journal of Basic and Applied Sciences, 5(7): 1190-1198, 2011 ISSN 1991-8178 Own Damage, Third Party Property Damage Claims and Malaysian Motor Insurance: An Empirical Examination 1 Mohamed Amraja
More informationIntroduction to General and Generalized Linear Models
Introduction to General and Generalized Linear Models General Linear Models - part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby
More informationLOGISTIC REGRESSION. Nitin R Patel. where the dependent variable, y, is binary (for convenience we often code these values as
LOGISTIC REGRESSION Nitin R Patel Logistic regression extends the ideas of multiple linear regression to the situation where the dependent variable, y, is binary (for convenience we often code these values
More informationINSURANCE RISK THEORY (Problems)
INSURANCE RISK THEORY (Problems) 1 Counting random variables 1. (Lack of memory property) Let X be a geometric distributed random variable with parameter p (, 1), (X Ge (p)). Show that for all n, m =,
More informationA Log-Robust Optimization Approach to Portfolio Management
A Log-Robust Optimization Approach to Portfolio Management Dr. Aurélie Thiele Lehigh University Joint work with Ban Kawas Research partially supported by the National Science Foundation Grant CMMI-0757983
More informationName: Date: Use the following to answer questions 2-3:
Name: Date: 1. A study is conducted on students taking a statistics class. Several variables are recorded in the survey. Identify each variable as categorical or quantitative. A) Type of car the student
More informationFactorial experimental designs and generalized linear models
Statistics & Operations Research Transactions SORT 29 (2) July-December 2005, 249-268 ISSN: 1696-2281 www.idescat.net/sort Statistics & Operations Research c Institut d Estadística de Transactions Catalunya
More informationMotor and Household Insurance: Pricing to Maximise Profit in a Competitive Market
Motor and Household Insurance: Pricing to Maximise Profit in a Competitive Market by Tom Wright, Partner, English Wright & Brockman 1. Introduction This paper describes one way in which statistical modelling
More informationParametric Survival Models
Parametric Survival Models Germán Rodríguez grodri@princeton.edu Spring, 2001; revised Spring 2005, Summer 2010 We consider briefly the analysis of survival data when one is willing to assume a parametric
More informationJoint models for classification and comparison of mortality in different countries.
Joint models for classification and comparison of mortality in different countries. Viani D. Biatat 1 and Iain D. Currie 1 1 Department of Actuarial Mathematics and Statistics, and the Maxwell Institute
More informationExam C, Fall 2006 PRELIMINARY ANSWER KEY
Exam C, Fall 2006 PRELIMINARY ANSWER KEY Question # Answer Question # Answer 1 E 19 B 2 D 20 D 3 B 21 A 4 C 22 A 5 A 23 E 6 D 24 E 7 B 25 D 8 C 26 A 9 E 27 C 10 D 28 C 11 E 29 C 12 B 30 B 13 C 31 C 14
More informationThe Probit Link Function in Generalized Linear Models for Data Mining Applications
Journal of Modern Applied Statistical Methods Copyright 2013 JMASM, Inc. May 2013, Vol. 12, No. 1, 164-169 1538 9472/13/$95.00 The Probit Link Function in Generalized Linear Models for Data Mining Applications
More informationMultiple Choice Models II
Multiple Choice Models II Laura Magazzini University of Verona laura.magazzini@univr.it http://dse.univr.it/magazzini Laura Magazzini (@univr.it) Multiple Choice Models II 1 / 28 Categorical data Categorical
More informationUnderwriting risk control in non-life insurance via generalized linear models and stochastic programming
Underwriting risk control in non-life insurance via generalized linear models and stochastic programming 1 Introduction Martin Branda 1 Abstract. We focus on rating of non-life insurance contracts. We
More informationA SURVEY ON CONTINUOUS ELLIPTICAL VECTOR DISTRIBUTIONS
A SURVEY ON CONTINUOUS ELLIPTICAL VECTOR DISTRIBUTIONS Eusebio GÓMEZ, Miguel A. GÓMEZ-VILLEGAS and J. Miguel MARÍN Abstract In this paper it is taken up a revision and characterization of the class of
More informationMachine Learning and Pattern Recognition Logistic Regression
Machine Learning and Pattern Recognition Logistic Regression Course Lecturer:Amos J Storkey Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh Crichton Street,
More informationLecture 8: Gamma regression
Lecture 8: Gamma regression Claudia Czado TU München c (Claudia Czado, TU Munich) ZFS/IMS Göttingen 2004 0 Overview Models with constant coefficient of variation Gamma regression: estimation and testing
More informationProbability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur
Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur Module No. #01 Lecture No. #15 Special Distributions-VI Today, I am going to introduce
More information2WB05 Simulation Lecture 8: Generating random variables
2WB05 Simulation Lecture 8: Generating random variables Marko Boon http://www.win.tue.nl/courses/2wb05 January 7, 2013 Outline 2/36 1. How do we generate random variables? 2. Fitting distributions Generating
More informationSimple linear regression
Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between
More informationProbabilistic concepts of risk classification in insurance
Probabilistic concepts of risk classification in insurance Emiliano A. Valdez Michigan State University East Lansing, Michigan, USA joint work with Katrien Antonio* * K.U. Leuven 7th International Workshop
More informationRegression III: Advanced Methods
Lecture 4: Transformations Regression III: Advanced Methods William G. Jacoby Michigan State University Goals of the lecture The Ladder of Roots and Powers Changing the shape of distributions Transforming
More informationNormality Testing in Excel
Normality Testing in Excel By Mark Harmon Copyright 2011 Mark Harmon No part of this publication may be reproduced or distributed without the express permission of the author. mark@excelmasterseries.com
More informationLongitudinal Modeling of Singapore Motor Insurance
Longitudinal Modeling of Singapore Motor Insurance Emiliano A. Valdez University of New South Wales Edward W. (Jed) Frees University of Wisconsin 28-December-2005 Abstract This work describes longitudinal
More informationLinda K. Muthén Bengt Muthén. Copyright 2008 Muthén & Muthén www.statmodel.com. Table Of Contents
Mplus Short Courses Topic 2 Regression Analysis, Eploratory Factor Analysis, Confirmatory Factor Analysis, And Structural Equation Modeling For Categorical, Censored, And Count Outcomes Linda K. Muthén
More informationNON-LIFE INSURANCE PRICING USING THE GENERALIZED ADDITIVE MODEL, SMOOTHING SPLINES AND L-CURVES
NON-LIFE INSURANCE PRICING USING THE GENERALIZED ADDITIVE MODEL, SMOOTHING SPLINES AND L-CURVES Kivan Kaivanipour A thesis submitted for the degree of Master of Science in Engineering Physics Department
More informationSUMAN DUVVURU STAT 567 PROJECT REPORT
SUMAN DUVVURU STAT 567 PROJECT REPORT SURVIVAL ANALYSIS OF HEROIN ADDICTS Background and introduction: Current illicit drug use among teens is continuing to increase in many countries around the world.
More informationMATH4427 Notebook 2 Spring 2016. 2 MATH4427 Notebook 2 3. 2.1 Definitions and Examples... 3. 2.2 Performance Measures for Estimators...
MATH4427 Notebook 2 Spring 2016 prepared by Professor Jenny Baglivo c Copyright 2009-2016 by Jenny A. Baglivo. All Rights Reserved. Contents 2 MATH4427 Notebook 2 3 2.1 Definitions and Examples...................................
More informationMultivariate Logistic Regression
1 Multivariate Logistic Regression As in univariate logistic regression, let π(x) represent the probability of an event that depends on p covariates or independent variables. Then, using an inv.logit formulation
More informationActuarial Applications of a Hierarchical Insurance Claims Model
Actuarial Applications of a Hierarchical Insurance Claims Model Edward W. Frees Peng Shi University of Wisconsin University of Wisconsin Emiliano A. Valdez University of Connecticut February 17, 2008 Abstract
More informationPredictive Modeling for Life Insurers
Predictive Modeling for Life Insurers Application of Predictive Modeling Techniques in Measuring Policyholder Behavior in Variable Annuity Contracts April 30, 2010 Guillaume Briere-Giroux, FSA, MAAA, CFA
More informationUNIT I: RANDOM VARIABLES PART- A -TWO MARKS
UNIT I: RANDOM VARIABLES PART- A -TWO MARKS 1. Given the probability density function of a continuous random variable X as follows f(x) = 6x (1-x) 0
More informationVI. Introduction to Logistic Regression
VI. Introduction to Logistic Regression We turn our attention now to the topic of modeling a categorical outcome as a function of (possibly) several factors. The framework of generalized linear models
More information