# GLM I An Introduction to Generalized Linear Models

Save this PDF as:
Size: px
Start display at page:

## Transcription

1 GLM I An Introduction to Generalized Linear Models CAS Ratemaking and Product Management Seminar March 2009 Presented by: Tanya D. Havlicek, Actuarial Assistant 0

2 ANTITRUST Notice The Casualty Actuarial Society is committed to adhering strictly to the letter and spirit of the antitrust laws. Seminars conducted under the auspices of the CAS are designed solely to provide a forum for the expression of various points of view on topics described in the programs or agendas for such meetings. Under no circumstances shall CAS seminars be used as a means for competing companies or firms to reach any understanding expressed or implied that restricts competition or in any way impairs the ability of members to exercise independent business judgment regarding matters affecting competition. It is the responsibility of all seminar participants to be aware of antitrust regulations, to prevent any written or verbal discussions that appear to violate these laws, and to adhere in every respect to the CAS antitrust compliance policy. 1

3 Outline Overview of Statistical Modeling Linear Models ANOVA Simple Linear Regression Multiple Linear Regression Categorical Variables Transformations Generalized Linear Models Why GLM? From Linear to GLM Basic Components of GLM s Common GLM structures References 2

4 Modeling Schematic Independent vars/ Predictors Loan Age Region Loan-to-Value (LTV) Credit Score Weights Claims Exposures Premium Dependent var/ Response Losses Claims Persistency Statistical Model Model Results Parameters Validation Statistics 3

5 General Steps in Modeling Goal: Explain how a variable of interest depends on some other variable(s). Once the relationship (i.e., a model) between the dependent and independent variables is established, one can make predictions about the dependent variable from the independent variables. 1. Collect/build potential models and data with which to test models 2. Parameterize models from observed data 3. Evaluate if observed data follow or violate model assumptions 4. Evaluate model fit using appropriate statistical tests Explanatory or predictive power Significance of parameters associated with independent variables 5. Modify model 6. Repeat 4

6 Basic Linear Model Structures ANOVA : Y ij = µ + ψ i + e ij Assumptions: errors are independent and follow N(0,σe 2 ) Normal distribution with mean of zero and constant variance σe 2 ψi = 0 i = 1,..,k (fixed effects model) ψi ~ N(0,σψ 2 ) (random effects model) Simple Linear Regression : y i = b o + b 1 x i + e i Assumptions: linear relationship errors are independent and follow N(0,σe 2 ) Multiple Regression : y i = b o + b 1 x 1i +.+ b n x ni + e i Assumptions: same as simple regression, but with n independent random variables (RV s) Transformed Regression : transform x, y, or both; maintain assumption that errors are N(0,σ e 2 ) yi = exp(xi) log(yi) = xi 5

7 One-way ANOVA Y ij = µ + ψ i + e ij Y ij is the j th observation on the i th treatment j= 1,.n i i= 1,.k treatments or levels µis the common effect for the whole experiment ψ i is the i th treatment effect e ij is random error associated with observation Y ij, e ij~ N(0,σ e 2 ) ANOVAs can be used to test whether observations come from different populations or from the same population Is there a statistically significant difference between two groups of claims? Is the frequency of default on subprime loans different than that for prime loans? Personal Auto: Is claim severity different for Urban vs Rural locations? 6

8 One-way ANOVA Y ij = µ + ψ i + e ij Assumptions of ANOVA model independent observations equal population variances Normally distributed errors with mean of 0 Balanced sample sizes (equal # of observations in each group) Prediction: (observation) Y = common mean + treatment effect. Null hypothesis is no treatment effect. Can use contrasts or categorical regression to investigate treatment effects. 7

9 One-way ANOVA Potential Assumption violations: Implicit factors: lack of independence within sample (e.g., serial correlation) Lack of independence between samples (e.g., samples over time onsame subject) Outliers: apparent non-normality by a few data points Unequal population variances Unbalanced sample sizes How to assess: Evaluate experimental design -- how was data generated? (independence) Graphical plots (outliers, normality) Equality of variances test (Levene s test) 8

10 Simple Regression Model: Y i = b o + b 1 X i + e i Y is the dependent variable explained by X, the independent variable Y: mortgage claim frequency depends on X: Seriousness of delinquency Y: claim severity depends on X: Accident year Want to estimate how Y depends on X using observed data Prediction: Y= b o + b 1 x* for some new x* (usually with some confidence interval) 9

11 Simple Regression Model: Y i = b o + b 1 X i + e i Assumptions: 1) model is correct (there exists a linear relationship) 2) errors are independent 3) variance of e i constant 4) e i ~ N(0,σ 2 e ) In terms of robustness, 1) is most important, 4) is least important Parameterize: Fit b o and b 1 using Least Squares: minimize: [y i (b o + b 1 x i )] 2 10

12 Simple Regression The method of least squares is a formalization of best fitting a line through data with a ruler and a pencil Based on a correlative relationship between the independent and dependent variables Mortgage Insurance Average Claim Paid Trend 70,000 60,000 50,000 Severity 40,000 30,000 20,000 10, Accident Year Severity Predicted Y Note: All data in this presentation are for illustrative purposes only 11 β N ( Yi Y)( Xi X) i= 1 =, a = Y βx N 2 ( X X) i= 1 i Slope Intercept

13 Simple Regression ANOVA df SS Regression 1 2,139,093,999 Residual ,480,781 Total 19 2,330,574,780 How much of the sum of squares is explained by the regression? SS = Sum Squared Errors SSTotal = SSRegression + SSResidual (Residual also called Error) SSTotal = (y i y ) 2 SSRegression = b 1 est *[ x i y i -1/n( x i )( y i )] SSResidual = (y i y iest ) 2 = SSTotal SSRegression 12

14 Simple Regression ANOVA df SS MS F Significance F Regression 1 2,139,093,999 2,139,093, Residual ,480,781 10,637,821 Total 19 2,330,574,780 MS = SS divided by degrees of freedom R 2 : (SS Regression/SS Total) percentage of variance explained by linear relationship F statistic: (MS Regression/MS Residual) significance of regression: tests Ho: b1=0 v. HA: b1 0 13

15 Simple Regression Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Intercept -3,552, , ,083,531-3,021,441 Accident Year 1, ,528 2,059 T statistics: (b i est H o (b i )) / s.e.(b iest ) significance of coefficients T 2 = F for b 1 in simple regression 14

16 Simple Regression p-values test the null hypotheses that the parameters b 1 =0 or b o = 0. If b 1 is 0, then there is no linear relationship between the independent variable Y (severity) and the dependent variable X (accident year). If b o is 0, then the intercept is 0. SUMMARY OUTPUT Regression Statistics Multiple R R Square Adjusted R Square Standard Error Observations 20 92% of the variance is explained by the regression The probability of observing this data given that b1 =0 is < , the significance of F Both parameters are significant F = T 2 for X Variable = ( ) 2 ANOVA df SS MS F Significance F Regression 1 2,139,093,999 2,139,093, Residual ,480,781 10,637,821 Total 19 2,330,574,780 Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Intercept -3,552, , ,083,531-3,021,441 Accident Year 1, ,528 2,059 15

17 Residuals Plot Looks at (y obs y pred ) vs. y pred Can assess linearity assumption, constant variance of errors, and look for outliers Residuals should be random scatter around 0, standard residuals should lie between -2 and 2 With small data sets, it can be difficult to asess 4 Plot of Standardized Residuals Standard Residuals Standardized Residual , ,000 20,000 25,000 30,000 35,000 40,000 45,000 50, Predicted Severity 16

18 Normal Probability Plot Can evaluate assumption e i ~ N(0,σ e 2 ) Plot should be a straight line with intercept µ and slope σ e 2 Can be difficult to assess with small sample sizes Normal Probability Plot of Residuals 4 Standard Residuals Standardized Residual Theoretical z Percentile 17

19 Residuals If absolute size of residuals increases as predicted value increases, may indicate nonconstant variance May indicate need to transform dependent variable May need to use weighted regression May indicate a nonlinear relationship Plot of Standardized Residuals 3 Standard Residuals 2 Standardized Residual ,000 15,000 20,000 25,000 30,000 35,000 40,000 45,000 50, Predicted Severity 18

20 Distribution of Observations Average claim amounts for Rural drivers is normally distributed as are average claim amounts for Urban drivers Mean for Urban drivers is twice that of Rural drivers The variance of the observations is equal for Rural and Urban The total distribution of average claim amounts is not Normally distributed here it is bimodal Distribution of Individual Observations Rural Urban 19

21 Distribution of Observations The basic form of the regression model is Y = b o + b 1 X + e µ i = E[Y i ] = E[b o + b 1 X i + e i ] = b o + b 1 X i + E[e i ] = b o + b 1 X i The mean value of Y, rather than Y itself, is a linear function of X The observations Y i are normally distributed about their mean µ i Y i ~ N(µ i, σ e 2 ) Each Y i can have a different mean µ i but the variance σ e 2 is the same for each observation Y Line Y = b o + b 1 X b o + b 1 X 2 b o + b 1 X 1 X 20 X 1 X 2

22 Predicting with regression MSResiduals (also called s e2 )is an estimate of σ e2, the variability of the errors Estimated Y has a lower standard error than Predicted Y, but both have the same point estimate - µ i Y pred = b o + b 1 x* for some new x* Y est = b o + b 1 x* for some new x* standard error for both use s e informula Y pred standard error accounts for random variability around the line in addition to the uncertainty of the line Typically give a confidence interval around the point estimate (e.g. 95%) (Y pred ±se(y pred )*T 0.025, DF ) Predictions should only be made within the range or slightly outside of observed data. Extrapolation can lead to erroneous predictions 21

23 Multiple Regression Y = β 0 + β 1 X 1 + β 2 X β n X n + ε E[Y] = β X Same assumptions as simple regression 1) model is correct (there exists a linear relationship) 2) errors are independent 3) variance of e i constant 4) e i ~ N(0,σ 2 e ) Added assumption the n variables are independent 22

24 Multiple Regression Uses more than one variable in regression model R-sq always goes up as add variables Adjusted R-Square puts models on more equal footing Many variables may be insignificant Approaches to model building Forward Selection -Add in variables, keep if significant Backward Elimination -Start with all variables, remove if not significant Fully Stepwise Procedures Combination of Forward and Backward 23

25 Multiple Regression Goal : Find a simple model that explains things well with assumptions met Model assumes all predictor variables independent of one another as add more, they may not be (multicollinearity strong linear relationships among the X s) As you increase the number of parameters (one for each variable in regression) you lose degrees of freedom want to keep df as high as possible for general predictive power problem of over-fitting 24

26 Multiple Regression Multicollinearity arises when there are strong linear relationships among the x s May see: High pairwise correlations amongst the x s Large changes in coefficients when another variable added or deleted Large change in coefficients when data point added or deleted Large standard deviations of the coefficients Some solutions to combat overfitting and multicollinearity Stepwise Regression (Forwards, Backwards, Exhaustive) -- Order matters Drop one or more highly correlated variables Use Factor Analysis or Principle Components Analysis to combine correlated variables into a smaller number of new uncorrelated variables 25

27 Multiple Regression F significant and Adj R-sq high Degrees of freedom ~ # observations - # parameters Any parameter with a t-stat with absolute value less than 2 is not significant SUMMARY OUTPUT Regression Statistics Multiple R 0.97 R Square 0.94 Adjusted R Square 0.94 Standard Error 0.05 Observations 586 ANOVA df SS MS F Significance F Regression < Residual Total Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Intercept ltv ltv ltv ltv ss ss ss ss ssfcl HPA

28 Multiple Regression Residuals Plot Standard Residual vs Predicted Claim Rate Standard Residuals Standard Residual Predicted Claim Rate 27

29 Multiple Regression Normal Probability Plot 3 Normal Probability Plot Standard Residual Theoretical z Percentile 28

30 Categorical Variables Explanatory variables can be discrete or continuous Discrete variables generally referred to as factors Values each factor takes on referred to as levels Discrete variables also called Categorical variables In the multiple regression example given, all variables were discrete except HPA (embedded home price appreciation) 29

31 Categorical Variables Assign each level a Dummy variable A binary valued variable X=1 means member of category and 0 otherwise Always a reference category defined by being 0 for all other levels If only one factor in model, then reference level will be intercept of regression If a category is not omitted, there will be linear dependency Intrinsic Aliasing 30

32 Categorical Variables 31 Example: Loan To Value (LTV) Grouped for premium 5 Levels <=85%, LTV % - 90%, LTV % - 95%, LTV % - 97%, LTV97 >97% Reference Generally postively correlated with claim frequency Allowing each level it s own dummy variable allows for the possiblity of non-monotonic relationship Each modeled coefficient will be relative to reference level X1 X2 X3 X4 Loan # LTV LTV85 LTV90 LTV95 LTV

33 Transformations A possible solution to nonlinear relationship or unequal variance of errors Transform predictor variables, response variable, or both Examples: Y = log(y) X = log(x) X = 1/X Y = Y Substitute transformed variable into regression equation Maintain assumption that errors are N(0,σ 2 e ) 32

34 Why GLM? What if the variance of the errors increases with predicted values? More variability associated with larger claim sizes What if the values for the response variable are strictly positive? assumption of normality violates this restriction If the response variable is strictly non-negative, intuitively the variance of Y tends to zero as the mean of X tends to zero Variance is a function of the mean What if predictor variables do not enter additively? Many insurance risks tend to vary multiplicatively with rating factors 33

35 Classic Linear Model to Generalized Linear Model LM: X is a matrix of the independent variables Each column is a variable Each row is an observation β is a vector of parameter coefficients ε is a vector of residuals LM Y = β X+ ε E[Y] = β X E[Y] = µ = η ε ~ N(0,σ 2 e ) GLM: X, β mean same as in LM ε is still vector of residuals g is called the link function GLM g (µ) = η = β X E[Y] = µ = g -1 (η) Y = g -1 (η) + ε ε ~ exponential family 34

36 Classic Linear Model to Generalized Linear Model LM: 35 1) Random Component : Each component of Y is independent and normally distributed. The mean µ i allowed to differ, but all Y i have common variance σ e 2 2) Systematic Component : The n covariates combine to give the linear predictor η = β X 3) Link Function : The relationship between the random and systematic components is specified via a link function. In linear model, link function is identity fnc. GLM: E[Y] = µ = η 1) Random Component : Each component of Y is independent and from one of the exponential family of distributions 2) Systematic Component : The n covariates are combined to give the linear predictor η = β X 3) Link Function : The relationship between the random and systematic components is specified via a link function g, that is differentiable and monotonic E[Y] = µ = g -1 (η)

37 Linear Transformation versus a GLM Linear transformation uses transformed variables GLM transforms the mean GLM not trying to transform Y in a way that approximates uniform variability Y Linear The error structure X 1 X 2 X Linear transformation retains assumption Y i ~ N(µ i, σ e2 ) GLM relaxes normality GLM allows for non-uniform variance Variance of each observation Y i is a function of the mean E[Y i ] = µ i 36

38 The Link Function Example: the log link function g(x) = ln (x) ; g -1 (x) = e x Suppose Premium (Y) is an multiplicative function of Policyholder Age (X 1 ) and Rating Area (X 2 ) with estimated parameters β 1, β 2 η i = β 1 X 1 + β 2 X 2 g(µ i ) = η i E[Y i ] = µ i = g -1 (η i ) E[Y i ] = exp (β 1 X 1 + β 2 X 2 ) E[Y] = g -1 (β X) E[Y i ] = exp (β 1 X 1 ) exp(β 2 X 2 ) = µ i g(µ i ) = ln [exp (β 1 X 1 ) exp(β 2 X 2 ) ] = η i = β 1 X 1 + β 2 X 2 The GLM here estimates logs of multiplicative effects 37

39 Examples of Link Functions Identity g(x) = x g -1 (x) = x Reciprocal g(x) = 1/x g -1 (x) = 1/x Log g(x) = ln(x) g -1 (x) = e x Logistic g(x) = ln(x/(1-x)) g -1 (x) = e x /(1+ e x ) 38

40 Error Structure Exponential Family Distribution completely specified in terms of its mean and variance The variance of Y i is a function of its mean Members of the Exponential Family Normal (Gaussian) -- used in classic regression Poisson (common for frequency) Binomial Negative Binomial Gamma (common for severity) Inverse Gaussian Tweedie (common for pure premium) 39

41 General Examples of Error/Link Combinations Traditional Linear Model response variable: a continuous variable error distribution: normal link function: identity Logistic Regression response variable: a proportion error distribution: binomial link function: logit Poisson Regression in Log Linear Model response variable: a count error distribution: Poisson link function: log Gamma Model with Log Link response variable: a positive, continuous variable error distribution: gamma link function: log 40

42 Specific Examples of Error/Link Combinations Observed Response Link Fnc Error Structure Variance Fnc Claim Frequency Log Poisson µ Claim Severity Log Gamma µ 2 Pure Premium Log Tweedie µ p (1<p<2) Retention Rate Logit Binomial µ(1-µ) 41

43 References Anderson, D.; Feldblum, S; Modlin, C; Schirmacher, D.; Schirmacher, E.; and Thandi, N., A Practitioner s Guide to Generalized Linear Models (Second Edition), CAS Study Note, May Devore, Jay L. Probability and Statistics for Engineering and the Sciences 3 rd ed., Duxbury Press. McCullagh, P. and J.A. Nelder. Generalized Linear Models, 2 nd Ed., Chapman & Hall/CRC SAS Institute, Inc. SAS Help and Documentation v

### SAS Software to Fit the Generalized Linear Model

SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling

: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

### Regression Analysis: A Complete Example

Regression Analysis: A Complete Example This section works out an example that includes all the topics we have discussed so far in this chapter. A complete example of regression analysis. PhotoDisc, Inc./Getty

### Introduction to Predictive Modeling Using GLMs

Introduction to Predictive Modeling Using GLMs Dan Tevet, FCAS, MAAA, Liberty Mutual Insurance Group Anand Khare, FCAS, MAAA, CPCU, Milliman 1 Antitrust Notice The Casualty Actuarial Society is committed

### 1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years

### Chapter 7: Simple linear regression Learning Objectives

Chapter 7: Simple linear regression Learning Objectives Reading: Section 7.1 of OpenIntro Statistics Video: Correlation vs. causation, YouTube (2:19) Video: Intro to Linear Regression, YouTube (5:18) -

### " Y. Notation and Equations for Regression Lecture 11/4. Notation:

Notation: Notation and Equations for Regression Lecture 11/4 m: The number of predictor variables in a regression Xi: One of multiple predictor variables. The subscript i represents any number from 1 through

### Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

Business Course Text Bowerman, Bruce L., Richard T. O'Connell, J. B. Orris, and Dawn C. Porter. Essentials of Business, 2nd edition, McGraw-Hill/Irwin, 2008, ISBN: 978-0-07-331988-9. Required Computing

### DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9

DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9 Analysis of covariance and multiple regression So far in this course,

### Course Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics

Course Text Business Statistics Lind, Douglas A., Marchal, William A. and Samuel A. Wathen. Basic Statistics for Business and Economics, 7th edition, McGraw-Hill/Irwin, 2010, ISBN: 9780077384470 [This

### 1. The parameters to be estimated in the simple linear regression model Y=α+βx+ε ε~n(0,σ) are: a) α, β, σ b) α, β, ε c) a, b, s d) ε, 0, σ

STA 3024 Practice Problems Exam 2 NOTE: These are just Practice Problems. This is NOT meant to look just like the test, and it is NOT the only thing that you should study. Make sure you know all the material

### Elementary Statistics Sample Exam #3

Elementary Statistics Sample Exam #3 Instructions. No books or telephones. Only the supplied calculators are allowed. The exam is worth 100 points. 1. A chi square goodness of fit test is considered to

### Generalized Linear Models

Generalized Linear Models We have previously worked with regression models where the response variable is quantitative and normally distributed. Now we turn our attention to two types of models where the

### Regression step-by-step using Microsoft Excel

Step 1: Regression step-by-step using Microsoft Excel Notes prepared by Pamela Peterson Drake, James Madison University Type the data into the spreadsheet The example used throughout this How to is a regression

### Assumptions. Assumptions of linear models. Boxplot. Data exploration. Apply to response variable. Apply to error terms from linear model

Assumptions Assumptions of linear models Apply to response variable within each group if predictor categorical Apply to error terms from linear model check by analysing residuals Normality Homogeneity

### NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

### Multiple Linear Regression

Multiple Linear Regression A regression with two or more explanatory variables is called a multiple regression. Rather than modeling the mean response as a straight line, as in simple regression, it is

### Principle Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression

Principle Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression Saikat Maitra and Jun Yan Abstract: Dimension reduction is one of the major tasks for multivariate

### A Deeper Look Inside Generalized Linear Models

A Deeper Look Inside Generalized Linear Models University of Minnesota February 3 rd, 2012 Nathan Hubbell, FCAS Agenda Property & Casualty (P&C Insurance) in one slide The Actuarial Profession Travelers

### Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression

Data Mining and Data Warehousing Henryk Maciejewski Data Mining Predictive modelling: regression Algorithms for Predictive Modelling Contents Regression Classification Auxiliary topics: Estimation of prediction

### Final Exam Practice Problem Answers

Final Exam Practice Problem Answers The following data set consists of data gathered from 77 popular breakfast cereals. The variables in the data set are as follows: Brand: The brand name of the cereal

### Cross Validation techniques in R: A brief overview of some methods, packages, and functions for assessing prediction models.

Cross Validation techniques in R: A brief overview of some methods, packages, and functions for assessing prediction models. Dr. Jon Starkweather, Research and Statistical Support consultant This month

### business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar

business statistics using Excel Glyn Davis & Branko Pecar OXFORD UNIVERSITY PRESS Detailed contents Introduction to Microsoft Excel 2003 Overview Learning Objectives 1.1 Introduction to Microsoft Excel

### Chapter 13 Introduction to Linear Regression and Correlation Analysis

Chapter 3 Student Lecture Notes 3- Chapter 3 Introduction to Linear Regression and Correlation Analsis Fall 2006 Fundamentals of Business Statistics Chapter Goals To understand the methods for displaing

### August 2012 EXAMINATIONS Solution Part I

August 01 EXAMINATIONS Solution Part I (1) In a random sample of 600 eligible voters, the probability that less than 38% will be in favour of this policy is closest to (B) () In a large random sample,

### Poisson Models for Count Data

Chapter 4 Poisson Models for Count Data In this chapter we study log-linear models for count data under the assumption of a Poisson error structure. These models have many applications, not only to the

### Univariate Regression

Univariate Regression Correlation and Regression The regression line summarizes the linear relationship between 2 variables Correlation coefficient, r, measures strength of relationship: the closer r is

### Predictor Coef StDev T P Constant 970667056 616256122 1.58 0.154 X 0.00293 0.06163 0.05 0.963. S = 0.5597 R-Sq = 0.0% R-Sq(adj) = 0.

Statistical analysis using Microsoft Excel Microsoft Excel spreadsheets have become somewhat of a standard for data storage, at least for smaller data sets. This, along with the program often being packaged

### HLM software has been one of the leading statistical packages for hierarchical

Introductory Guide to HLM With HLM 7 Software 3 G. David Garson HLM software has been one of the leading statistical packages for hierarchical linear modeling due to the pioneering work of Stephen Raudenbush

### Simple Linear Regression Inference

Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation

### Chicago Booth BUSINESS STATISTICS 41000 Final Exam Fall 2011

Chicago Booth BUSINESS STATISTICS 41000 Final Exam Fall 2011 Name: Section: I pledge my honor that I have not violated the Honor Code Signature: This exam has 34 pages. You have 3 hours to complete this

### Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics For 2015 Examinations Aim The aim of the Probability and Mathematical Statistics subject is to provide a grounding in

### 11. Analysis of Case-control Studies Logistic Regression

Research methods II 113 11. Analysis of Case-control Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:

### Simple linear regression

Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between

### Elements of statistics (MATH0487-1)

Elements of statistics (MATH0487-1) Prof. Dr. Dr. K. Van Steen University of Liège, Belgium December 10, 2012 Introduction to Statistics Basic Probability Revisited Sampling Exploratory Data Analysis -

### HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION HOD 2990 10 November 2010 Lecture Background This is a lightning speed summary of introductory statistical methods for senior undergraduate

### Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares

Topic 4 - Analysis of Variance Approach to Regression Outline Partitioning sums of squares Degrees of freedom Expected mean squares General linear test - Fall 2013 R 2 and the coefficient of correlation

### CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression

Opening Example CHAPTER 13 SIMPLE LINEAR REGREION SIMPLE LINEAR REGREION! Simple Regression! Linear Regression Simple Regression Definition A regression model is a mathematical equation that descries the

### Introduction to Regression and Data Analysis

Statlab Workshop Introduction to Regression and Data Analysis with Dan Campbell and Sherlock Campbell October 28, 2008 I. The basics A. Types of variables Your variables may take several forms, and it

### UBI-5: Usage-Based Insurance from a Regulatory Perspective

UBI-5: Usage-Based Insurance from a Regulatory Perspective Carl Sornson, FCAS Managing Actuary Property/Casualty NJ Dept of Banking & Insurance 2015 CAS RPM Seminar March 10, 2015 Antitrust Notice The

### Analysis of Variance. MINITAB User s Guide 2 3-1

3 Analysis of Variance Analysis of Variance Overview, 3-2 One-Way Analysis of Variance, 3-5 Two-Way Analysis of Variance, 3-11 Analysis of Means, 3-13 Overview of Balanced ANOVA and GLM, 3-18 Balanced

### Week 5: Multiple Linear Regression

BUS41100 Applied Regression Analysis Week 5: Multiple Linear Regression Parameter estimation and inference, forecasting, diagnostics, dummy variables Robert B. Gramacy The University of Chicago Booth School

### One-Way Analysis of Variance (ANOVA) Example Problem

One-Way Analysis of Variance (ANOVA) Example Problem Introduction Analysis of Variance (ANOVA) is a hypothesis-testing technique used to test the equality of two or more population (or treatment) means

### Module 5: Multiple Regression Analysis

Using Statistical Data Using to Make Statistical Decisions: Data Multiple to Make Regression Decisions Analysis Page 1 Module 5: Multiple Regression Analysis Tom Ilvento, University of Delaware, College

### Munich Re INSURANCE AND REINSURANCE. May 6 & 7, 2010 Dave Clark Munich Reinsurance America, Inc 4/26/2010 1

Antitrust Notice The Casualty Actuarial Society is committed to adhering strictly to the letter and spirit of the antitrust laws. Seminars conducted under the auspices of the CAS are designed solely to

### 2013 MBA Jump Start Program. Statistics Module Part 3

2013 MBA Jump Start Program Module 1: Statistics Thomas Gilbert Part 3 Statistics Module Part 3 Hypothesis Testing (Inference) Regressions 2 1 Making an Investment Decision A researcher in your firm just

### MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS MSR = Mean Regression Sum of Squares MSE = Mean Squared Error RSS = Regression Sum of Squares SSE = Sum of Squared Errors/Residuals α = Level of Significance

### Simple Methods and Procedures Used in Forecasting

Simple Methods and Procedures Used in Forecasting The project prepared by : Sven Gingelmaier Michael Richter Under direction of the Maria Jadamus-Hacura What Is Forecasting? Prediction of future events

### MULTIPLE LINEAR REGRESSION ANALYSIS USING MICROSOFT EXCEL. by Michael L. Orlov Chemistry Department, Oregon State University (1996)

MULTIPLE LINEAR REGRESSION ANALYSIS USING MICROSOFT EXCEL by Michael L. Orlov Chemistry Department, Oregon State University (1996) INTRODUCTION In modern science, regression analysis is a necessary part

### Using R for Linear Regression

Using R for Linear Regression In the following handout words and symbols in bold are R functions and words and symbols in italics are entries supplied by the user; underlined words and symbols are optional

### An analysis appropriate for a quantitative outcome and a single quantitative explanatory. 9.1 The model behind linear regression

Chapter 9 Simple Linear Regression An analysis appropriate for a quantitative outcome and a single quantitative explanatory variable. 9.1 The model behind linear regression When we are examining the relationship

### 1 Simple Linear Regression I Least Squares Estimation

Simple Linear Regression I Least Squares Estimation Textbook Sections: 8. 8.3 Previously, we have worked with a random variable x that comes from a population that is normally distributed with mean µ and

### The Do s & Don ts of Building A Predictive Model in Insurance. University of Minnesota November 9 th, 2012 Nathan Hubbell, FCAS Katy Micek, Ph.D.

The Do s & Don ts of Building A Predictive Model in Insurance University of Minnesota November 9 th, 2012 Nathan Hubbell, FCAS Katy Micek, Ph.D. Agenda Travelers Broad Overview Actuarial & Analytics Career

### Statistics 104 Final Project A Culture of Debt: A Study of Credit Card Spending in America TF: Kevin Rader Anonymous Students: LD, MH, IW, MY

Statistics 104 Final Project A Culture of Debt: A Study of Credit Card Spending in America TF: Kevin Rader Anonymous Students: LD, MH, IW, MY ABSTRACT: This project attempted to determine the relationship

### X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)

CORRELATION AND REGRESSION / 47 CHAPTER EIGHT CORRELATION AND REGRESSION Correlation and regression are statistical methods that are commonly used in the medical literature to compare two or more variables.

### ANNUITY LAPSE RATE MODELING: TOBIT OR NOT TOBIT? 1. INTRODUCTION

ANNUITY LAPSE RATE MODELING: TOBIT OR NOT TOBIT? SAMUEL H. COX AND YIJIA LIN ABSTRACT. We devise an approach, using tobit models for modeling annuity lapse rates. The approach is based on data provided

### 2. Simple Linear Regression

Research methods - II 3 2. Simple Linear Regression Simple linear regression is a technique in parametric statistics that is commonly used for analyzing mean response of a variable Y which changes according

### INTRODUCTORY STATISTICS

INTRODUCTORY STATISTICS FIFTH EDITION Thomas H. Wonnacott University of Western Ontario Ronald J. Wonnacott University of Western Ontario WILEY JOHN WILEY & SONS New York Chichester Brisbane Toronto Singapore

### Statistical Models in R

Statistical Models in R Some Examples Steven Buechler Department of Mathematics 276B Hurley Hall; 1-6233 Fall, 2007 Outline Statistical Models Linear Models in R Regression Regression analysis is the appropriate

### Offset Techniques for Predictive Modeling for Insurance

Offset Techniques for Predictive Modeling for Insurance Matthew Flynn, Ph.D, ISO Innovative Analytics, W. Hartford CT Jun Yan, Ph.D, Deloitte & Touche LLP, Hartford CT ABSTRACT This paper presents the

### Notes on Applied Linear Regression

Notes on Applied Linear Regression Jamie DeCoster Department of Social Psychology Free University Amsterdam Van der Boechorststraat 1 1081 BT Amsterdam The Netherlands phone: +31 (0)20 444-8935 email:

### Multiple Linear Regression in Data Mining

Multiple Linear Regression in Data Mining Contents 2.1. A Review of Multiple Linear Regression 2.2. Illustration of the Regression Process 2.3. Subset Selection in Linear Regression 1 2 Chap. 2 Multiple

### Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus Tihomir Asparouhov and Bengt Muthén Mplus Web Notes: No. 15 Version 8, August 5, 2014 1 Abstract This paper discusses alternatives

### The importance of graphing the data: Anscombe s regression examples

The importance of graphing the data: Anscombe s regression examples Bruce Weaver Northern Health Research Conference Nipissing University, North Bay May 30-31, 2008 B. Weaver, NHRC 2008 1 The Objective

### 5. Linear Regression

5. Linear Regression Outline.................................................................... 2 Simple linear regression 3 Linear model............................................................. 4

### A Primer on Forecasting Business Performance

A Primer on Forecasting Business Performance There are two common approaches to forecasting: qualitative and quantitative. Qualitative forecasting methods are important when historical data is not available.

### Week TSX Index 1 8480 2 8470 3 8475 4 8510 5 8500 6 8480

1) The S & P/TSX Composite Index is based on common stock prices of a group of Canadian stocks. The weekly close level of the TSX for 6 weeks are shown: Week TSX Index 1 8480 2 8470 3 8475 4 8510 5 8500

### APPLICATION OF LINEAR REGRESSION MODEL FOR POISSON DISTRIBUTION IN FORECASTING

APPLICATION OF LINEAR REGRESSION MODEL FOR POISSON DISTRIBUTION IN FORECASTING Sulaimon Mutiu O. Department of Statistics & Mathematics Moshood Abiola Polytechnic, Abeokuta, Ogun State, Nigeria. Abstract

### MULTIPLE REGRESSION EXAMPLE

MULTIPLE REGRESSION EXAMPLE For a sample of n = 166 college students, the following variables were measured: Y = height X 1 = mother s height ( momheight ) X 2 = father s height ( dadheight ) X 3 = 1 if

### One-Way Analysis of Variance

One-Way Analysis of Variance Note: Much of the math here is tedious but straightforward. We ll skim over it in class but you should be sure to ask questions if you don t understand it. I. Overview A. We

### GENERALIZED LINEAR MODELS FOR INSURANCE RATING Mark Goldburd, FCAS, MAAA Anand Khare, FCAS, MAAA, CPCU Dan Tevet, FCAS, MAAA

CAS MONOGRAPH SERIES NUMBER 5 GENERALIZED LINEAR MODELS FOR INSURANCE RATING Mark Goldburd, FCAS, MAAA Anand Khare, FCAS, MAAA, CPCU Dan Tevet, FCAS, MAAA CASUALTY ACTUARIAL SOCIETY Casualty Actuarial

### LOGISTIC REGRESSION. Nitin R Patel. where the dependent variable, y, is binary (for convenience we often code these values as

LOGISTIC REGRESSION Nitin R Patel Logistic regression extends the ideas of multiple linear regression to the situation where the dependent variable, y, is binary (for convenience we often code these values

### Jinadasa Gamage, Professor of Mathematics, Illinois State University, Normal, IL, e- mail: jina@ilstu.edu

Submission for ARCH, October 31, 2006 Jinadasa Gamage, Professor of Mathematics, Illinois State University, Normal, IL, e- mail: jina@ilstu.edu Jed L. Linfield, FSA, MAAA, Health Actuary, Kaiser Permanente,

### Linear Models in STATA and ANOVA

Session 4 Linear Models in STATA and ANOVA Page Strengths of Linear Relationships 4-2 A Note on Non-Linear Relationships 4-4 Multiple Linear Regression 4-5 Removal of Variables 4-8 Independent Samples

### GENERALIZED LINEAR MODELS IN VEHICLE INSURANCE

ACTA UNIVERSITATIS AGRICULTURAE ET SILVICULTURAE MENDELIANAE BRUNENSIS Volume 62 41 Number 2, 2014 http://dx.doi.org/10.11118/actaun201462020383 GENERALIZED LINEAR MODELS IN VEHICLE INSURANCE Silvie Kafková

### Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written

### **BEGINNING OF EXAMINATION** The annual number of claims for an insured has probability function: , 0 < q < 1.

**BEGINNING OF EXAMINATION** 1. You are given: (i) The annual number of claims for an insured has probability function: 3 p x q q x x ( ) = ( 1 ) 3 x, x = 0,1,, 3 (ii) The prior density is π ( q) = q,

### Please follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software

STATA Tutorial Professor Erdinç Please follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software 1.Wald Test Wald Test is used

### Online Appendix to Are Risk Preferences Stable Across Contexts? Evidence from Insurance Data

Online Appendix to Are Risk Preferences Stable Across Contexts? Evidence from Insurance Data By LEVON BARSEGHYAN, JEFFREY PRINCE, AND JOSHUA C. TEITELBAUM I. Empty Test Intervals Here we discuss the conditions

### Extended Warranty, Availability, Power Guarantee, Design defect: How to Develop and Price the Alternative Energy Opportunities

Extended Warranty, Availability, Power Guarantee, Design defect: How to Develop and Price the Alternative Energy Opportunities Antitrust Notice n n n The Casualty Actuarial Society is committed to adhering

### Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C

### Section 1: Simple Linear Regression

Section 1: Simple Linear Regression Carlos M. Carvalho The University of Texas McCombs School of Business http://faculty.mccombs.utexas.edu/carlos.carvalho/teaching/ 1 Regression: General Introduction

### Statistics in Retail Finance. Chapter 2: Statistical models of default

Statistics in Retail Finance 1 Overview > We consider how to build statistical models of default, or delinquency, and how such models are traditionally used for credit application scoring and decision

### Section 6: Model Selection, Logistic Regression and more...

Section 6: Model Selection, Logistic Regression and more... Carlos M. Carvalho The University of Texas McCombs School of Business http://faculty.mccombs.utexas.edu/carlos.carvalho/teaching/ 1 Model Building

### Chapter 7. One-way ANOVA

Chapter 7 One-way ANOVA One-way ANOVA examines equality of population means for a quantitative outcome and a single categorical explanatory variable with any number of levels. The t-test of Chapter 6 looks

### R-3: What Makes a Good Rate Filing?

R-3: What Makes a Good Rate Filing? Carl Sornson, FCAS Managing Actuary Property/Casualty NJ Dept of Banking & Insurance 2012 CAS Ratemaking and Product Management Seminar March 19-21, 2012 Antitrust Notice

### Predictive Modeling for Life Insurers

Predictive Modeling for Life Insurers Application of Predictive Modeling Techniques in Measuring Policyholder Behavior in Variable Annuity Contracts April 30, 2010 Guillaume Briere-Giroux, FSA, MAAA, CFA

### Penalized regression: Introduction

Penalized regression: Introduction Patrick Breheny August 30 Patrick Breheny BST 764: Applied Statistical Modeling 1/19 Maximum likelihood Much of 20th-century statistics dealt with maximum likelihood

### Lecture 8: Gamma regression

Lecture 8: Gamma regression Claudia Czado TU München c (Claudia Czado, TU Munich) ZFS/IMS Göttingen 2004 0 Overview Models with constant coefficient of variation Gamma regression: estimation and testing

### International Statistical Institute, 56th Session, 2007: Phil Everson

Teaching Regression using American Football Scores Everson, Phil Swarthmore College Department of Mathematics and Statistics 5 College Avenue Swarthmore, PA198, USA E-mail: peverso1@swarthmore.edu 1. Introduction

### A Handbook of Statistical Analyses Using R. Brian S. Everitt and Torsten Hothorn

A Handbook of Statistical Analyses Using R Brian S. Everitt and Torsten Hothorn CHAPTER 6 Logistic Regression and Generalised Linear Models: Blood Screening, Women s Role in Society, and Colonic Polyps

### Combining Linear and Non-Linear Modeling Techniques: EMB America. Getting the Best of Two Worlds

Combining Linear and Non-Linear Modeling Techniques: Getting the Best of Two Worlds Outline Who is EMB? Insurance industry predictive modeling applications EMBLEM our GLM tool How we have used CART with

### Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Objectives: To perform a hypothesis test concerning the slope of a least squares line To recognize that testing for a

### ECON 142 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE #2

University of California, Berkeley Prof. Ken Chay Department of Economics Fall Semester, 005 ECON 14 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE # Question 1: a. Below are the scatter plots of hourly wages

### Statistics in Retail Finance. Chapter 6: Behavioural models

Statistics in Retail Finance 1 Overview > So far we have focussed mainly on application scorecards. In this chapter we shall look at behavioural models. We shall cover the following topics:- Behavioural

### Causal Forecasting Models

CTL.SC1x -Supply Chain & Logistics Fundamentals Causal Forecasting Models MIT Center for Transportation & Logistics Causal Models Used when demand is correlated with some known and measurable environmental

### Getting Correct Results from PROC REG

Getting Correct Results from PROC REG Nathaniel Derby, Statis Pro Data Analytics, Seattle, WA ABSTRACT PROC REG, SAS s implementation of linear regression, is often used to fit a line without checking

### An Introduction to Statistical Tests for the SAS Programmer Sara Beck, Fred Hutchinson Cancer Research Center, Seattle, WA

ABSTRACT An Introduction to Statistical Tests for the SAS Programmer Sara Beck, Fred Hutchinson Cancer Research Center, Seattle, WA Often SAS Programmers find themselves in situations where performing