Introduction to Predictive Modeling Using GLMs



Similar documents
GLM I An Introduction to Generalized Linear Models

STATISTICA Formula Guide: Logistic Regression. Table of Contents

Anti-Trust Notice. Agenda. Three-Level Pricing Architect. Personal Lines Pricing. Commercial Lines Pricing. Conclusions Q&A

SAS Software to Fit the Generalized Linear Model

Predictive Modeling for Life Insurers

GENERALIZED LINEAR MODELS IN VEHICLE INSURANCE

How To Build A Predictive Model In Insurance

Revising the ISO Commercial Auto Classification Plan. Anand Khare, FCAS, MAAA, CPCU

A Deeper Look Inside Generalized Linear Models

Combining Linear and Non-Linear Modeling Techniques: EMB America. Getting the Best of Two Worlds

Underwriting risk control in non-life insurance via generalized linear models and stochastic programming

Logistic Regression (a type of Generalized Linear Model)

Model Validation Techniques

UBI-5: Usage-Based Insurance from a Regulatory Perspective

BOOSTED REGRESSION TREES: A MODERN WAY TO ENHANCE ACTUARIAL MODELLING

Nonnested model comparison of GLM and GAM count regression models for life insurance data

Stochastic programming approaches to pricing in non-life insurance

Predictive Modeling in Workers Compensation 2008 CAS Ratemaking Seminar

Munich Re INSURANCE AND REINSURANCE. May 6 & 7, 2010 Dave Clark Munich Reinsurance America, Inc 4/26/2010 1

New Work Item for ISO Predictive Analytics (Initial Notes and Thoughts) Introduction

Insurance Analytics - analýza dat a prediktivní modelování v pojišťovnictví. Pavel Kříž. Seminář z aktuárských věd MFF 4.

Predictive Modeling Techniques in Insurance

Comparing return to work outcomes between vocational rehabilitation providers after adjusting for case mix using statistical models

Location matters. 3 techniques to incorporate geo-spatial effects in one's predictive model

Reserving for loyalty rewards programs Part I

Predictive modelling around the world

Penalized regression: Introduction

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

Offset Techniques for Predictive Modeling for Insurance

More Flexible GLMs Zero-Inflated Models and Hybrid Models

Non-life insurance mathematics. Nils F. Haavardsson, University of Oslo and DNB Skadeforsikring

Regulatory Process for Reviewing Hurricane Models and use in Florida Rate Filings

R-3: What Makes a Good Rate Filing?

11. Analysis of Case-control Studies Logistic Regression

GENERALIZED LINEAR MODELS FOR INSURANCE RATING Mark Goldburd, FCAS, MAAA Anand Khare, FCAS, MAAA, CPCU Dan Tevet, FCAS, MAAA

PREDICTIVE LOSS RATIO

Fairfield Public Schools

Extended Warranty, Availability, Power Guarantee, Design defect: How to Develop and Price the Alternative Energy Opportunities

Chapter 7: Simple linear regression Learning Objectives

Travelers Analytics: U of M Stats 8053 Insurance Modeling Problem

Reserving for loyalty rewards programs Part III

Automated Biosurveillance Data from England and Wales,

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

The zero-adjusted Inverse Gaussian distribution as a model for insurance claims

Generalized Linear Models

Innovations and Value Creation in Predictive Modeling. David Cummings Vice President - Research

The Probit Link Function in Generalized Linear Models for Data Mining Applications

240ST014 - Data Analysis of Transport and Logistics

5. Multiple regression

Measuring per-mile risk for pay-as-youdrive automobile insurance. Eric Minikel CAS Ratemaking & Product Management Seminar March 20, 2012

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

Personal Auto Predictive Modeling Update: What s Next? Roosevelt Mosley, FCAS, MAAA CAS Predictive Modeling Seminar October 6 7, 2008 San Diego, CA

Logistic Regression.

Probabilistic concepts of risk classification in insurance

Computer exercise 4 Poisson Regression

Model Selection and Claim Frequency for Workers Compensation Insurance

Climate change and its impact on building water damage

Overview Classes Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7)

Gerry Hobbs, Department of Statistics, West Virginia University

Studying Auto Insurance Data

business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar

Improving Demand Forecasting

Chapter 29 The GENMOD Procedure. Chapter Table of Contents

Stochastic Analysis of Long-Term Multiple-Decrement Contracts

Simple Linear Regression Inference

Antitrust Notice 9/9/2013. Captive Insurance: Another View. Session Overview. Scott Anderson, Senior Vice President and Actuary, Gross Consulting

Antitrust Notice. Overview. Massachusetts Personal Auto Insurance. Why make change? Challenges of change. Unique features of the Plan.

Data Visualization Techniques and Practices Introduction to GIS Technology

Course Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics

II. DISTRIBUTIONS distribution normal distribution. standard scores

Cross Validation techniques in R: A brief overview of some methods, packages, and functions for assessing prediction models.

Lecture 8: Gamma regression

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Predictive Modeling in Long-Term Care Insurance

CHAPTER 3 EXAMPLES: REGRESSION AND PATH ANALYSIS

Advanced Statistical Analysis of Mortality. Rhodes, Thomas E. and Freitas, Stephen A. MIB, Inc. 160 University Avenue. Westwood, MA 02090

Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools

Credibility and Pooling Applications to Group Life and Group Disability Insurance

Reserving for Loyalty Rewards Programs

A Property & Casualty Insurance Predictive Modeling Process in SAS

Motor and Household Insurance: Pricing to Maximise Profit in a Competitive Market

Large Account Pricing

How to Test Seasonality of Earthquakes

Risk pricing for Australian Motor Insurance

Time Series Analysis

Data Mining Techniques Chapter 6: Decision Trees

SOA 2013 Life & Annuity Symposium May 6-7, Session 30 PD, Predictive Modeling Applications for Life and Annuity Pricing and Underwriting

15.1 The Structure of Generalized Linear Models

A Comparison of Variable Selection Techniques for Credit Scoring

Poisson Models for Count Data

Why do statisticians "hate" us?

Principle Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression

Accurately and Efficiently Measuring Individual Account Credit Risk On Existing Portfolios

Simple Linear Regression

Aspects of Risk Adjustment in Healthcare

Section 6: Model Selection, Logistic Regression and more...

Quantifying operational risk in life insurance companies

The IRS Is Knocking Are You Ready?

13. Poisson Regression Analysis

BNG 202 Biomechanics Lab. Descriptive statistics and probability distributions I

Transcription:

Introduction to Predictive Modeling Using GLMs Dan Tevet, FCAS, MAAA, Liberty Mutual Insurance Group Anand Khare, FCAS, MAAA, CPCU, Milliman 1

Antitrust Notice The Casualty Actuarial Society is committed to adhering strictly to the letter and spirit of the antitrust laws. Seminars conducted under the auspices of the CAS are designed solely to provide a forum for the expression of various points of view on topics described in the programs or agendas for such meetings. Under no circumstances shall CAS seminars be used as a means for competing companies or firms to reach any understanding expressed or implied that restricts competition or in any way impairs the ability of members to exercise independent business judgment regarding matters affecting competition. It is the responsibility of all seminar participants to be aware of antitrust regulations, to prevent any written or verbal discussions that appear to violate these laws, and to adhere in every respect to the CAS antitrust compliance policy. 2

A New England vacation The weather The car insurance

A few words about overfitting - The more data we have The more models we have The more sophisticated models become The potential for overfitting data never goes away! That s one big reason to validate models.

Prediction error In a nutshell 0.25 0.23 0.21 Training data Test data 0.19 0.17 0.15 0.13 0.11 0.09 0.07 0.05 Model Complexity

Overfitting U.S. Elections (courtesy of Randall Munroe, writer of xkcd and What If)

1876-1944

1948-Present

Outline Overview of predictive modeling Predictive modeling in the actuarial world Simple linear models vs generalized linear models (GLMs) Specification of GLMs Interpretation of GLM output Frequency/severity vs pure premium modeling Model validation Modeling process and important considerations The next 100 years 9

What is Predictive Modeling Model an abstraction of reality, generally with a random or probabilistic component Simplification of a real world phenomenon Model types include: Linear models predict target variable using linear combination of predictor variables Trees split dataset, one variable at a time, into subgroups that behave similarly Neural networks self-learning algorithms that adapt to best predict a quantity of interest 10

How Do Actuaries Use Modeling? Rating plans model insurance loss data to build plans that charge actuarially fair rates Underwriting plans knowing relative riskiness of policyholder can inform underwriting decisions Enterprise risk management model correlations between lines of business or probability of ruin Product analytics customer retention, conversion, lifetime value 11

Simple Linear Model Y = β 0 + β 1 *X 1 + β 2 *X 2 + + ε Y is the target or response variable it is what we are trying to predict (e.g. pure premium) X 1, X 2, etc are the explanatory (e.g. age of driver, type of vehicle) variables we use them to predict Y ε is the error or noise term it is the portion of Y that is unexplained by X μ = E(Y) = β 0 + β 1 *X 1 + β 2 *X 2 + + β n *X n In general, we are modeling the mean of Y 12

Simple Linear Model Assumptions Assumptions of simple linear models Target variable Y does not depend on the value of Y for any other record, only the predictors Y is normally distributed Mean of Y depends on the predictors, but all records have same variance Y is related to predictors through simple linear function Unfortunately, these assumptions are often unrealistic Target variables of interest, such as pure premium, frequency, and severity, are not normally distributed and have non-constant variance 13

Generalized Linear Models Generalized linear model: g( ) = β 0 + β 1 *X 1 + β 2 *X 2 + + β n *X n Assumptions of generalized linear models Target variable Y does not depend on the value of Y for any other record, only the predictors Distribution of Y is a member of the exponential family of distributions Variance of Y is a function of the mean of Y g( ) is linearly related to the predictors. The function g is called the link function The exponential family of distributions include the following: Normal, Poisson, Gamma, Binomial, Negative Binomial, Inverse Gaussian, Tweedie 14

GLM Variance Function Var(Y) = φ*v(μ)/w φ is the dispersion coefficient, which is estimated by the GLM w is the weight assigned to each record GLMs calculate the coefficients that maximize likelihood, and w is the weight that each record gets in that calculation V(μ) is the GLM Variance Function, and is determined by the distribution Normal: V(μ) = 1 Poisson: V(μ) = μ Gamma: V(μ) = μ 2 15

GLM Link Function g(μ) = β 0 + β 1 *X 1 + β 2 *X 2 + Common choices for link function Identity: g(μ) = μ Log: g(μ) = ln(μ) Logit: g(μ) = ln(μ/(1- μ)) Log link commonly used to model rating plans because it produces multiplicative relativities ln(μ) = β 0 + β 1 *X 1 + β 2 *X 2 μ = exp(β 0 + β 1 *X 1 + β 2 *X 2 ) μ = exp(β 0 )*exp(β 1 *X 1 ) *exp(β 2 *X 2 ) Logit link used to model probability of an event occurring 16

Offsets An effect in a model that is fixed by the modeler Variable offsets fix the effect of variables that are not being modeled Example: Constructing a rating plan and not modeling base territory rates Solution: Offset for current territory rates Volume offsets reflect fact that different records have different volumes of data and thus have different expected values Example: Modeling claim counts. Some records have a single exposure, other have many exposures Solution: Offset for exposure volume of each observation 17

Interpreting GLM Coefficients w/ Log Link Discrete variables: exponentiate GLM coefficient Example: coefficient for youth drivers is 0.52 Rating factor = exp(0.52) = 1.68 Youth drivers have 68% surcharge relative to base level of adult drivers (who, by definition, have rating factor of 1.00) Continuous variables with no transformation Example: modeling pure premium, and annual miles driven is a continuous variable As miles driven increases by 1 unit, expected pure premium is scaled by a factor of exp(β), regardless of whether mileage goes from 1,000 to 2,000 or 20,000 to 21,000 Continuous variables with log transformation Pure premium ~ (annual mileage)^β If β < 1, then as mileage increases, pure premium increases at decreasing rate 18

Uncertainty in Parameter Estimates GLMs allow us to quantify uncertainty in parameter estimates Wald 95% confidence interval for mean of parameter estimate = Mean +/- 1.96*(Standard Error) Test for the significance of an individual parameter Wald Chi Square = (Parameter Estimate/Standard Error)^2 Approximately follows a Chi Squared distribution with 1 degree of freedom P-value is probability of obtaining a Chi Square statistic of given magnitude by pure chance Lower p-value more significant 19

Two Modeling Approaches Pure Premium Approach: Build a single model for pure premium Generally straightforward to implement Frequency-Severity Approach: Build one model for claim frequency and another for claims severity Additional work for additional insight 20

Pure Premium Approach Advantages: Only a single model needs to be built No need to split variable offsets Results often very similar to frequency-severity approach Disadvantages: Yields less insight than frequency-severity Tweedie distribution is only good choice Relatively new and mathematically complex Includes implicit assumptions that may not hold 21

Frequency-Severity Approach Advantages: May yield meaningful insights about data Can choose from several well-known and wellunderstood distributions Disadvantages: Two models to build, run, and validate Requires splitting variable offsets Often produces limited additional benefit 22

Tweedie Distribution Mixed Poisson-Gamma process number of claims follow a Poisson distribution, and the size of each claim follows a Gamma distribution The Tweedie is a 3-parameter distribution: Mean (μ), equal to the product of the means of the underlying Poisson and Gamma distributions Power (p), which depends on the coefficient of variation of the underlying Gamma distribution Dispersion (φ), a measure of variance 23

Frequency Distribution Options Poisson The Coca Cola of claim count distributions Overdispersed frequency distributions Overdispersed Poisson Zero-Inflated Poisson Negative Binomial Zero-Inflated Negative Binomial 24

Severity Distribution Options Several reasonable distributions Criteria Member of exponential family p 2, where V(µ)=µ p In order of increasing variance: Gamma (p=2) Tweedie (2<p<3) Inverse Gaussian (p=3) Tweedie (p>3) 25

Three Pillars of Model Validation Tests of Fit Tests of Lift Tests of Stability 26

Fit Statistics Traditional: Absolute/Squared Error Alternatives: Likelihood, Deviance, Pearson s Chi-Squared Penalized: AIC, BIC Per Observation: Residuals, Leverage 27

Absolute/Squared Error Only appropriate if data is normally distributed Inappropriate to use on disaggregate claim frequency, severity, or pure premium data Useful to assess model fit within buckets Bucket data into percentiles, or similar quantiles, and calculate squared difference between actual and predicted for each bucket 28

Better Alternatives to Squared Error Likelihood: chance of observation, given model Always increases as parameters are added to model Deviance: twice the difference in loglikelihoods between the saturated and fitted models GLMs are fit so as to minimize deviance Accounts for the shape of the distribution Pearson s chi-squared: squared error divided by the variance function of the distribution Accounts for the skew of the distribution 29

Penalized Measures Akaike Information Criterion (AIC): Penalizes loglikelihood for additional model parameters Bayesian Information Criterion (BIC): Penalizes loglikelihood for additional model parameters, and this penalty increases as the number of records in the dataset increases Can be too restrictive Used primarily for variable selection 30

Per Observation Traditional residual: actual minus predicted Deviance residual: square root of weighted deviance times sign of actual minus predicted Reflects the shape of the distribution Plotting deviance residual against weight or any predictor should yield an uninformative cloud Should be approximately normally distributed Leverage: used to identify extreme outliers Does not necessarily measure impact 31

Model Lift Lift is meant to approximate economic value Fit has no relationship with economic value Economic value is produced by comparative advantage in avoidance of adverse selection Lift is a comparative measure, i.e. the lift of one model over another, or the lift of a model over status quo Lift should always be measured on holdout data 32

Lift Measures Simple Quantile Plot Double Lift Chart Loss Ratio Chart Gini Index 33

Model Lift Simple Quantile Plots 34

Double Lift Chart 35

Loss Ratio Chart 36

Economic Gini Index 37

Gini Index of Rating Plan Model should differentiate lowest and highest loss cost policyholders Creation of Gini index: Order policyholders by model prediction, from best to worst X-axis is cumulative percent of exposures Y-axis is cumulative percent of losses Had model produced Gini index in prior slide, would have identified 60% of exposures that contribute only 20% of losses 38

Methods for Testing Model Stability Cross-validation Split data into subsets (e.g. by time period) Refit model on each subset Compare model parameter estimates Bootstrapping Refit model on many bootstrapped samples Calculate variability of parameter estimates Deletion of influential records 39

Measures of Influence Cook s Distance: Statistical measure of the impact each record has on the overall model Excellent tool for identifying errors or anomalies Deletion of records with high Cook s Distance may significantly change model results, and so this procedure can be used to test stability DFBETA: Influence on a certain parameter Influence is not to be confused with leverage 40

Predictive Modeling Process Collect Data Exploratory Data Analysis Examine univariate distributions Examine relationship of each variable to target Specify Model Evaluate Output Validate Model Productize Model Maintain Model Rebuild Model 41

Some important considerations What are you trying to predict? Which explanatory variables to use? Legal concerns Reputation risk Explainability Data integrity Cost What components of the product are not included in the model? Does the model need regulatory approval? What do you expect to find? 42

Some important considerations Who will work on the model, and how will you set clearly defined roles for each member of the team? Does the model significantly outperform the existing one? How will you sell the results to each of the key stakeholders? 43

Predictive modeling the next 100 years! As GLMs become ingrained in actuarial work, what will be the next development in actuarial predictive modeling? What challenges will we face in adopting new modeling frameworks? 1. Gaining acceptance from key stakeholders 2. Regulatory approval 3. Conforming to actuarial standards 4. Finding/developing the necessary talent 44

The future of actuarial predictive modeling Extended linear models Generalized Linear Mixed Models (GLMMs): include both fixed and random effects Generalized Additive Models (GAMs): linear predictor depends on smooth function of predictor variables Bayesian statistics Parameters of the distribution are random and have an a priori distribution Data is sampled to create posterior distribution Produces better estimates of uncertainty than traditional statistics Potential to be heavily used in reserving 45

The black box Black box methods (e.g. machine learning) are gaining popularity among many statisticians Little to no knowledge of the underlying data is required Use of such methods in actuarial work raises several questions: 1. Can we accept an accurate but unexplainable answer? 2. Do actuaries need to be involved in the construction of such models? 46

For Further Reference Anderson, Duncan, et. al., A Practitioner s Guide to Generalized Linear Models, CAS Discussion Paper Program, 2004 47

For Further Reference McCullagh, Peter and Nelder, John A., Generalized Linear Models, 2nd Ed., Chapman & Hall, 1989 48

For Further Reference De Jong, Piet and Heller, Gillian, Generalized Linear Models for Insurance Data, Cambridge University Press, 2008 49

For Further Reference Ohlsson, Esbjörn and Johansson, Björn, Non-Life Pricing with Generalized Linear Models, Springer, 2010 50

Questions?