GLM III: Advanced Modeling Strategy 2005 CAS Seminar on Predictive Modeling Duncan Anderson MA FIA Watson Wyatt Worldwide


 Lindsey Edith Jennings
 1 years ago
 Views:
Transcription
1 GLM III: Advanced Modeling Strategy 25 CAS Seminar on Predictive Modeling Duncan Anderson MA FIA Watson Wyatt Worldwide W W W. W A T S O N W Y A T T. C O M
2 Agenda Introduction Testing the link function The Tweedie distribution Splines Reference models Aliasing / near aliasing Combining models across claim types Restricted models
3 "A Practitioner's Guide to GLMs" 24 CAS Discussion Paper Program Discusses testing the link function the Tweedie distribution aliasing / near aliasing combining models across claim types restricted models Copies available here
4 Generalized linear models E[Y i ] = µ i = g 1 (ΣX ij β j + ξ i ) Var[Y i ] = φ.v(µ i )/ω i Consider all factors simultaneously Provide statistical diagnostics Allow for nature of random process Robust and transparent Increasingly a global industry standard
5 Generalized linear models E[Y] = µ = g 1 (X.β + ξ) Var[Y] = φ.v(µ)/ω
6 Generalized linear models E[Y] = µ = g 1 (X.β + ξ) Link function Yvariate Design matrix Parameter estimates Offset term
7 Generalized linear models E[Y] = µ = g 1 (X.β + ξ) Observed thing (data) Some function (user defined) Some matrix based on data (user defined) Parameters to be estimated (the answer!) Known effects
8 Generalized linear models Var[Y] = φ.v(µ)/ω Scale parameter Variance function Prior weights Usually assume exponential family, eg φ = σ 2 (estimated), V(x) = 1 Var[Y i ] = σ 2 Normal φ = 1 (specified), V(x) = x Var[Y i ] = µ i Poisson φ = k (estimated), V(x) = x 2 Var[Y i ] = kµ i 2 Gamma
9 Agenda Introduction Testing the link function The Tweedie distribution Splines Reference models Aliasing / near aliasing Combining models across claim types Restricted models
10 Link function g(x) Offset ξ Error Structure V(µ) Linear Predictor Form X.β = α i + β j + γ k + δ l Data Y Scale Parameter φ Prior Weights ω Numerical MLE Parameter Estimates α, β, γ, δ Diagnostics
11 Model testing Use only those factors which are predictive standard errors of parameter estimates F tests / χ 2 tests on deviances stepwise approach (helpful if used with care) consistency over time human intuition Make sure the model is reasonable variance function: residual plots (histograms / QQ / residual vs fitted value etc) outliers: leverage / Cook's distance link function: BoxCox
12 BoxCox link function investigation GLM structure is E[Y] = µ = g 1 (X.β + ξ) Var[Y] = φ.v(µ) / ω Box Cox transforms defines g(x) = ( x λ 1 ) / λ for λ, ln(x) for λ= λ = 1 g(x) = x 1 additive (with base level shift) λ g(x) ln(x) multiplicative (via maths) λ = 1 g(x) = 1 1/x inverse (with base level shift) Try different values of λ and measure goodness of fit to see which fits experience best
13 BoxCox link function investigation Auto third party property damage frequencies Likelihood Inverse Multiplicative Additive λ
14 BoxCox link function investigation Auto third party property damage average amounts Likelihood Inverse Multiplicative Additive λ
15 BoxCox link function investigation Comparing fitted values of different link functions 35 Not much 3 25 Count of records Ratio of fitted values for Lambda = (mult) to Lambda =.3
16 Agenda Introduction Testing the link function The Tweedie distribution Splines Reference models Aliasing / near aliasing Combining models across claim types Restricted models
17 Standard approach BI Freq x Amt = Cost 1 PD Freq x Amt = Cost 2 MED Freq x Amt = Cost 3 COL Freq x Amt = Cost 4 OTC Freq x Amt = Cost 5
18 .2 Tweedie distributions Incurred losses have a point mass at zero and then a continuous distribution Poisson and gamma not suited to this Tweedie distribution can have point mass at zero and parameters which can alter the shape to be like Poisson and gamma above zero f Y ( y ; θ, λ, α ) = n = 1 1 α n {( λω ) κ α ( 1 / y )}. exp { λω [ θ y κ ( θ )]} Γ ( n α ) n! y α for y > { ( θ )} p( Y = ) = exp λωκ α
19 Tweedie distributions Tweedie: φ = k, V(x) = x p Var[Y] = kµ p p=1 corresponds to Poisson, p=2 to gamma Defines a valid distribution for p<, 1<p<2, p>2 Can be considered as Poisson/gamma process for 1<p<2 Need to estimate both k and p when fitting models often estimate a where p = (2a)/(1a) Typical values of p for insurance incurred claims around, or just under, 1.5
20 Generalised linear models Var[Y] = φ.v(µ) / ω Normal: φ = σ 2, V(x) = 1 Var[Y] = σ 2.1 Poisson: φ = 1, V(x) = x Var[Y] = µ Gamma: φ = k, V(x) = x 2 Var[Y] = kµ 2 Tweedie: φ = k, V(x) = x p Var[Y] = kµ p
21 Example 1: frequency Comparison of Tweedie model with traditional frequency/amounts approach Run 7 Model 2 Frequency % 446% 1 Log of multiplier % 128% 94% 96% 44% % 23% 15% 21% 17% Exposure (years) % A B C D E F G H I J K L M Bonus Malus Onew ay relativities Approx 95% confidence interval Unsmoothed estimate Smoothed estimate P value =.% Rank 12/12
22 Example 1: amounts Comparison of Tweedie model with traditional frequency/amounts approach Run 7 Model 6 Amounts Log of multiplier % 26% 6% 1% 5% 3% 7% % 2% 3% 2% 4% 1% Number of claims A B C D E F G H I J K L M Bonus Malus EXCLUDED FACTOR Onew ay relativities Approx 95% confidence interval Unsmoothed estimate Smoothed estimate P value = 5.9% Rank 4/12
23 Example 1: traditional RP vs Tweedie Comparison of Tweedie model with traditional frequency/amounts approach Run 11 Model 2 Tweedie Models % 568% 443% 446% 1 Log of multiplier % 128% 94% 96% 44% % 23% 15% 21% 17% Exposure (years) % A B C D E F G H I J K L M Bonus Malus Tw eedie SE Tw eedie RPSE RP
24 Example 2: frequency Comparison of Tweedie model with traditional frequency/amounts approach Run 7 Model 1 Frequency Log of multiplier % 84% 69% 75% 68% 67% 61% 58% 58% 45% 48% 54% 47% 38% Exposure (years).1 % >13 Age of vehicle Onew ay relativities Approx 95% confidence interval Unsmoothed estimate Smoothed estimate P value =.% Rank 12/12
25 Example 2: amounts Comparison of Tweedie model with traditional frequency/amounts approach Run 7 Model 5 Amounts Log of multiplier % 1% 11% 17% 14% 11% 24% 8% 11% 14% 13% 1% 4% Number of claims % 1 5% >13 Age of vehicle Onew ay relativities Approx 95% confidence interval Unsmoothed estimate Smoothed estimate P value =.% Rank 5/7
26 Example 2: traditional RP vs Tweedie Comparison of Tweedie model with traditional frequency/amounts approach Run 11 Model 1 Tweedie Models 1 1 Log of multiplier % 83% 17% 11% 9% 87% 16% 15% 94% 92% 9% 86% 14% 99% 73% 75% 71% 67% 7% 73% 65% 67% 7% 55% 53% 34% 31% Exposure (years) % >13 Age of vehicle Tw eedie SE Tw eedie RPSE RP Numbers Amounts
27 Example 3: traditional RP vs Tweedie Comparison of Tweedie model with traditional frequency/amounts approach Run 11 Model 1 Tweedie Models.1 45 Log of multiplier % 9% % Exposure (years) Category 1 Category 2 Occupation Tw eedie SE Tw eedie RPSE RP Numbers Amounts
28 Example 4: frequency Comparison of Tweedie model with traditional frequency/amounts approach Run 7 Model 1 Frequency.4 8 Log of multiplier % 6% 2% % 2% 2% 5% 8% 3% 8% 8% 3% 17% 2% Exposure (years) < Unknown 39% 1 Age of driver Onew ay relativities Approx 95% confidence interval Unsmoothed estimate Smoothed estimate P value =.% Rank 5/12
29 Example 4: amounts Comparison of Tweedie model with traditional frequency/amounts approach Run 7 Model 5 Amounts.18 16% 6 Log of multiplier % 3% 2% % 1% 1% 6% 9% 5% 3% 5% % 2% 4% Number of claims < Unknown Age of driver EXCLUDED FACTOR Onew ay relativities Approx 95% confidence interval Unsmoothed estimate Smoothed estimate P value = 5.6% Rank 4/9
30 Example 4: traditional RP vs Tweedie Comparison of Tweedie model with traditional frequency/amounts approach Run 11 Model 1 Tweedie Models Log of multiplier % 21% 11% 6% 2% 1% % 4% 2% 2% % 13% 5% 2% 8% 3% 1% 16% 8% 15% 8% 2% 3% 14% 17% 16% 2% 28% Exposure (years).4 39% 1.6 < Unknown Age of driver Tw eedie SE Tw eedie RPSE RP Numbers
31 Agenda Introduction Testing the link function The Tweedie distribution Splines Reference models Aliasing / near aliasing Combining models across claim types Restricted models
32 Spline definition 1 A series of polynomial functions, with each function defined over a short interval Intervals are defined by k+2 knots two exterior knots at extremes of data variable number (k) of interior knots At each interior knot the two functions must join "smoothly"
33 Cubic splines Each polynomial is a cubic a + bx + cx 2 + dx 3 "Smoothness" at interior knots is defined as: continuous continuous first derivative continuous second derivative
34 Regression splines The position of the knots is specified by the user Standard GLMs can be used by careful definition of variates Pros fits easily into existing structures no complex resampling needed Cons position of knots can effect final answer
35 Smoothing splines One knot at each unique data value Additional curvature penalty prevents over fitting Curvature penalty selected by repeatedly sampling subsets and optimising generalised goodness of fit measure such as AIC Pros allows data to guide final result Cons 1s of knots required optimisation process is timeconsuming difficult to produce new fitted values
36 "Easy" regression splines Fit a cubic over the whole range simply define x, x 2 and x 3 as variates and include in the model Fit additional cubic "correction" variates for each interval, defined as if x<k r ((x k r+1 )/(k r k r+1 )) 3 otherwise
37 "Easy" regression splines
38 "Easy" regression splines
39 "Easy" regression splines
40 "Easy" regression splines "Correction" variates get large quickly In practice GLM process can struggle with these large numbers Alternate basis is clearly desirable
41 BSplines Set of basis functions usually covering four segments (defined by five knots) Each function is itself a cubic spline Each basis function has the same shape, except for the three basis functions at each extreme which occupy fewer than four segments
42 BSplines
43 BSplines
44 BSplines
45 BSplines
46 Example Factor
47 Example Factor Polynomial
48 Example Spline Factor Polynomial
49 Example Spline Factor Spline SE Spline SE
50 Extrapolation Can combine the three boundary basis functions to achieve linear extrapolation Sum of three functions tends to 1 at the exterior knot, and continues as 1 when extrapolated Can combine the last two functions to give function that tends to a straight line at the exterior knot, and can be extrapolated as such
51 BSplines
52 BSplines
53 BSplines
54 Example Spline Factor Spline SE Spline SE Polynomial
55 Further example 1.2 Comparison of factor with spline Urban density 28% % 5 81%.6 44% 53% 4 Log of multiplier.3 9% 15% 2% % 1% 18% 3 Exposure (years).3 33% 37% 34% 35% 35% 34% 28% 25% 28% Population density Factor SE Factor P value =.% Rank 7/11
56 1.2 Further example Comparison of factor with spline Urban density Log of multiplier Exposure (years) Population density Factor SE Factor Spline P value =.% Rank 7/11
57 1.2 Further example Comparison of factor with spline Urban density Log of multiplier Exposure (years) Population density Spline P value =.% Rank 7/11
58 Further example 1.2 Comparison of factor with spline Urban density 182% % 139% % Log of multiplier.3 15% 8% 3% % 5% 17% 4% 4 3 Exposure (years).3 38% 37% 37% 36% 35% 33% 31% 29% 24% Population density Factor Spline SE Spline P value =.% Rank 7/11
59 Splines Practical way of modeling continuous variables Often better than polynomials Increases complexity, therefore best used when it is important that rates vary continuously with a variable when modeling elasticity to be used in price optimization analyses
60 Agenda Introduction Testing the link function The Tweedie distribution Splines Reference models Aliasing / near aliasing Combining models across claim types Restricted models
61 Standard approach BI Freq x Amt = Cost 1 PD Freq x Amt = Cost 2 MED Freq x Amt = Cost 3 COL Freq x Amt = Cost 4 OTC Freq x Amt = Cost 5
62 Binomial reference models Amt = Cost 1 TP Freq BI / PD? Amt = Cost 2 MED Freq x Amt = Cost 3 COL Freq x Amt = Cost 4 OTC Freq x Amt = Cost 5
63 Offset reference model
64 Offset reference model
65 Offset reference model
66 Offset reference model
67 Offset reference model
68 Offset reference model
69 Offset reference model
70 Offset reference model
71 Testing the reference model approach (1) Fit to BI claims on all data the "correct answer" 1% of large company 1% Random sample to emulate small company (2) Model BI claims with standard approach (3) Model BI claims referencing PD experience on this small sample
72 Example of reference model method working Example PI models Green: BI 1% ("correct answer") Blue: Standard BI GLM on 1% Purple: Referencing PD on 1% 27% Log of multiplier % 14% 19% 1 8 Exposure (years) 6.6 % Standard BI GLM factor rejected Level 1 Level 2 Level 3 Factor X Approx 2 s.e. from estimate Full model Unsmoothed estimate Full model PD model
73 Example of reference model method working Example Green: BI PI 1% models ("correct answer") % 182% Blue: Standard BI GLM on 1% Purple: Referencing PD on 1% Log of multiplier.6.3 1% 54% 37% 9% 66% 54% 38% 42% 29% 24% 34% 2% 12% 12% 11% 6% 34% 14% 1% 1% 3% 38% 6% 7% % Exposure (years) 2.3 Level A Level B Level C Level D Level E Level F Level G Level H Level I Level J Factor Y Approx 2 s.e. from estimate Full model Unsmoothed estimate Full model Unsmoothed estimate 1% model Approx 2 s.e. from estimate 1% model PD model
74 Agenda Introduction Testing the link function The Tweedie distribution Splines Reference models Aliasing / near aliasing Combining models across claim types Restricted models
75 Aliasing and "near aliasing" Aliasing the removal of unwanted redundant parameters Intrinsic aliasing occurs by the design of the model Extrinsic aliasing occurs "accidentally" as a result of the data
76 Example Suppose we wanted a model of the form: µ = α + β if age < β if age β if age > γ if sex male 1 + γ if sex female 2
77 Form of X.β in this case Age Sex <3 34 >4 M F α β β β γ γ
78 Example Suppose we wanted a model of the form: µ = α + β if age < β if age "Base levels" + β if age > γ if sex male 1 + γ if sex female 2
79 X.β having adjusted for base levels Age Sex <3 34 >4 M F α β β β γ γ
80 X.β having adjusted for base levels Age Sex <3 >4 F α β 1 β 3 γ 2
81 Intrinsic aliasing Example job Run 16 Model 3 Small interaction Third party material damage, Numbers % 138%.8 2 Log of multiplier % 28% 24% 15 1 Exposure (years).2 % 19% 6% 11% Age of driver Onew ay relativities Approx 95% confidence interval Unsmoothed estimate Smoothed estimate
82 Extrinsic aliasing If a perfect correlation exists, one factor can alias levels of another Eg if doors declared first: Selected base Exposure: # Doors Unknown Color Selected base Red 13,234 12,343 13,432 13,432 Green 4,543 4,543 13,243 2,345 Blue 6,544 5,443 15,654 4,565 Black 4,643 1,235 14,565 4,545 Further aliasing Unknown 3,242 This is the only reason the order of declaration can matter (fitted values are unaffected)
83 Extrinsic aliasing Example job Run 16 Model 3 Small interaction Third party material damage, Numbers 1 135% 6 Log of multiplier % 39% 45% 57% 82% 72% 7% 91% 13% Exposure (years) % % Malus No claim discount Onew ay relativities Approx 95% confidence interval Unsmoothed estimate Smoothed estimate
84 "Near aliasing" If two factors are almost perfectly, but not quite aliased, convergence problems can result and/or results can become hard to interpret Exposure: # Doors Unknown Color Selected base Red 13,234 12,343 13,432 13,432 Eg if the 2 black, unknown doors policies had no claims, GLM would try to estimate a very large negative number for unknown doors, and a very large positive number for unknown color Selected base Green 4,543 4,543 13,243 2,345 Blue 6,544 5,443 15,654 4,565 Black 4,643 1,235 14,565 4,545 2 Unknown 3,242
85 "Near aliasing" solution 1. Spot it 2. Fix the data! Exposure: # Doors Unknown Colour Red 13,234 12,343 13,432 13,432 Green 4,543 4,543 13,243 2,345 Blue 6,544 5,443 15,654 4,565 Black 4,643 1,235 14,565 4,545 2 Unknown 3,242
86 Agenda Introduction Testing the link function The Tweedie distribution Splines Reference models Aliasing / near aliasing Combining models across claim types Restricted models
87 Combining claim elements I BI Freq x Amt = Cost 1 PD Freq x Amt = Cost 2 MED Freq x Amt = Cost 3 Multiply factors for frequencies and amounts Calculate risk premium as sum of claim elements COL Freq x Amt = Cost 4 OTC Freq x Amt = Cost 5
88 Combining claim elements II Consider current exposure BI Freq x Amt = Cost 1 PD Freq x Amt = Cost 2 MED Freq x Amt = Cost 3 COL Freq x Amt = Cost 4 OTC Freq x Amt = Cost 5 Calculate expected frequency and amount for each claim type for each record Combine to give expected total cost of claims for each record Fit model to this expected value
89 Calculation of risk premium TPPD TPPD TPBI TPBI Numbers Amounts Numbers Amounts Intercept 32% 1 12% 486 Sex Male Female Area Town Country Policy Sex Area WWNUM1 WWAMT1 WWNUM2 WWAMT2 WWCC1 WWCC2 WWRSKPRM M T 32% 1 12% F T 24% 12 8% M C 4% 7 9% F C 3% 84 6%
90 Risk premium standard errors Risk premium model standard errors are small owing to the smoothness of the expected value It is possible to approximate standard error of risk premium parameter estimates based on standard errors of parameter estimates in underlying models Care needed in interpreting such approximations since they do not reflect model error, eg deciding to exclude a marginal factor
91 Risk premium standard errors failings Numbers Amounts Risk premium
92 Risk premium standard errors failings Numbers Amounts Risk premium
93 Agenda Introduction Testing the link function The Tweedie distribution Splines Reference models Aliasing / near aliasing Combining models across claim types Restricted models
94 Restricted models E[Y] = µ = g 1 ( X.β + ξ ) Offset Offset term used for known effects, eg exposure in a numbers model Can also be used to constrain model (eg claim free years / payment frequency / amount of cover) Other factors adjusted to compensate
95 Restricted models Age Sex <3 >4 F α β 1 β 3 γ 2
96 Restricted models E[Y] = µ = g 1 ( X.β ) Age Sex <3 >4 F α β 1 β 3 γ 2
97 Restricted models E[Y] = µ = g 1 ( X.β + ξ ) Age <3 > α β 1 β
98 Restricted models Age Sex <3 >4 F α β 1 β 3.1
99 Restricted models Log of multiplier Restriction Log of multiplier BonusMalus Vehicle group Log of multiplier.5 1 Log of multiplier A B C D E F G H I J K L M N O Geographical zone Age of driver
100 Restricted models Log of multiplier Restriction Log of multiplier BonusMalus Vehicle group Log of multiplier Log of multiplier 1.5 Model compensates (as best it can) to allow for restriction A B C D E F G H I J K L M N O Geographical zone Age of driver
101 Testing the effectiveness of restrictions 12 1 Effect of restricting Bonus Malus Other factors not very correlated with Bonus Malus 8 Count of records Ratio A B C D E F G H I J K L M N O P Q R S
102 Testing the effectiveness of restrictions 12 1 Effect of restricting Bonus Malus Adding in a measure of claim free years 8 Count of records Ratio A B C D E F G H I J K L M N O P Q R S
103 Restrictions Only use to "get around" restrictions A commercial smoothing is a commercial smoothing Apply at risk premium stage
104 GLM III: Advanced Modeling Strategy 25 CAS Seminar on Predictive Modeling Duncan Anderson MA FIA Watson Wyatt Worldwide W W W. W A T S O N W Y A T T. C O M
GLM I An Introduction to Generalized Linear Models
GLM I An Introduction to Generalized Linear Models CAS Ratemaking and Product Management Seminar March 2009 Presented by: Tanya D. Havlicek, Actuarial Assistant 0 ANTITRUST Notice The Casualty Actuarial
More informationIntroduction to General and Generalized Linear Models
Introduction to General and Generalized Linear Models General Linear Models  part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK2800 Kgs. Lyngby
More informationBasic Statistics and Data Analysis for Health Researchers from Foreign Countries
Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Volkert Siersma siersma@sund.ku.dk The Research Unit for General Practice in Copenhagen Dias 1 Content Quantifying association
More informationIntroduction to Hypothesis Testing. Point estimation and confidence intervals are useful statistical inference procedures.
Introduction to Hypothesis Testing Point estimation and confidence intervals are useful statistical inference procedures. Another type of inference is used frequently used concerns tests of hypotheses.
More informationMaximum Likelihood Estimation
Math 541: Statistical Theory II Lecturer: Songfeng Zheng Maximum Likelihood Estimation 1 Maximum Likelihood Estimation Maximum likelihood is a relatively simple method of constructing an estimator for
More informationSimple Linear Regression Inference
Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation
More informationLocation matters. 3 techniques to incorporate geospatial effects in one's predictive model
Location matters. 3 techniques to incorporate geospatial effects in one's predictive model Xavier Conort xavier.conort@gearanalytics.com Motivation Location matters! Observed value at one location is
More informationA LOGNORMAL MODEL FOR INSURANCE CLAIMS DATA
REVSTAT Statistical Journal Volume 4, Number 2, June 2006, 131 142 A LOGNORMAL MODEL FOR INSURANCE CLAIMS DATA Authors: Daiane Aparecida Zuanetti Departamento de Estatística, Universidade Federal de São
More informationMultiple Regression  Selecting the Best Equation An Example Techniques for Selecting the "Best" Regression Equation
Multiple Regression  Selecting the Best Equation When fitting a multiple linear regression model, a researcher will likely include independent variables that are not important in predicting the dependent
More information1 Sufficient statistics
1 Sufficient statistics A statistic is a function T = rx 1, X 2,, X n of the random sample X 1, X 2,, X n. Examples are X n = 1 n s 2 = = X i, 1 n 1 the sample mean X i X n 2, the sample variance T 1 =
More informationDepartment of Mathematics, Indian Institute of Technology, Kharagpur Assignment 23, Probability and Statistics, March 2015. Due:March 25, 2015.
Department of Mathematics, Indian Institute of Technology, Kharagpur Assignment 3, Probability and Statistics, March 05. Due:March 5, 05.. Show that the function 0 for x < x+ F (x) = 4 for x < for x
More informationGeneralized Linear Models. Today: definition of GLM, maximum likelihood estimation. Involves choice of a link function (systematic component)
Generalized Linear Models Last time: definition of exponential family, derivation of mean and variance (memorize) Today: definition of GLM, maximum likelihood estimation Include predictors x i through
More informationCombining Linear and NonLinear Modeling Techniques: EMB America. Getting the Best of Two Worlds
Combining Linear and NonLinear Modeling Techniques: Getting the Best of Two Worlds Outline Who is EMB? Insurance industry predictive modeling applications EMBLEM our GLM tool How we have used CART with
More informationThe zeroadjusted Inverse Gaussian distribution as a model for insurance claims
The zeroadjusted Inverse Gaussian distribution as a model for insurance claims Gillian Heller 1, Mikis Stasinopoulos 2 and Bob Rigby 2 1 Dept of Statistics, Macquarie University, Sydney, Australia. email:
More informationDongfeng Li. Autumn 2010
Autumn 2010 Chapter Contents Some statistics background; ; Comparing means and proportions; variance. Students should master the basic concepts, descriptive statistics measures and graphs, basic hypothesis
More informationWebbased Supplementary Materials for Bayesian Effect Estimation. Accounting for Adjustment Uncertainty by Chi Wang, Giovanni
1 Webbased Supplementary Materials for Bayesian Effect Estimation Accounting for Adjustment Uncertainty by Chi Wang, Giovanni Parmigiani, and Francesca Dominici In Web Appendix A, we provide detailed
More informationMultinomial and Ordinal Logistic Regression
Multinomial and Ordinal Logistic Regression ME104: Linear Regression Analysis Kenneth Benoit August 22, 2012 Regression with categorical dependent variables When the dependent variable is categorical,
More informationNotes on the Negative Binomial Distribution
Notes on the Negative Binomial Distribution John D. Cook October 28, 2009 Abstract These notes give several properties of the negative binomial distribution. 1. Parameterizations 2. The connection between
More informationIntroduction to Predictive Modeling Using GLMs
Introduction to Predictive Modeling Using GLMs Dan Tevet, FCAS, MAAA, Liberty Mutual Insurance Group Anand Khare, FCAS, MAAA, CPCU, Milliman 1 Antitrust Notice The Casualty Actuarial Society is committed
More informationFitting Subjectspecific Curves to Grouped Longitudinal Data
Fitting Subjectspecific Curves to Grouped Longitudinal Data Djeundje, Viani HeriotWatt University, Department of Actuarial Mathematics & Statistics Edinburgh, EH14 4AS, UK Email: vad5@hw.ac.uk Currie,
More informationBonusmalus systems and Markov chains
Bonusmalus systems and Markov chains Dutch car insurance bonusmalus system class % increase new class after # claims 0 1 2 >3 14 30 14 9 5 1 13 32.5 14 8 4 1 12 35 13 8 4 1 11 37.5 12 7 3 1 10 40 11
More informationThe University of Kansas
All Greek Summary Rank Chapter Name Total Membership Chapter GPA 1 Beta Theta Pi 3.57 2 Chi Omega 3.42 3 Kappa Alpha Theta 3.36 4 Kappa Kappa Gamma 3.28 *5 Pi Beta Phi 3.27 *5 Gamma Phi Beta 3.27 *7 Alpha
More informationModels for Count Data With Overdispersion
Models for Count Data With Overdispersion Germán Rodríguez November 6, 2013 Abstract This addendum to the WWS 509 notes covers extrapoisson variation and the negative binomial model, with brief appearances
More information12.5: CHISQUARE GOODNESS OF FIT TESTS
125: ChiSquare Goodness of Fit Tests CD121 125: CHISQUARE GOODNESS OF FIT TESTS In this section, the χ 2 distribution is used for testing the goodness of fit of a set of data to a specific probability
More informationα α λ α = = λ λ α ψ = = α α α λ λ ψ α = + β = > θ θ β > β β θ θ θ β θ β γ θ β = γ θ > β > γ θ β γ = θ β = θ β = θ β = β θ = β β θ = = = β β θ = + α α α α α = = λ λ λ λ λ λ λ = λ λ α α α α λ ψ + α =
More informationLogistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression
Logistic Regression Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max
More informationFraternity & Sorority Academic Report Spring 2016
Fraternity & Sorority Academic Report Organization Overall GPA Triangle 1717 1 Delta Chi 88 12 100 2 Alpha Epsilon Pi 77 3 80 3 Alpha Delta Chi 28 4 32 4 Alpha Delta Pi 190190 4 Phi Gamma Delta 85 3
More information3. Regression & Exponential Smoothing
3. Regression & Exponential Smoothing 3.1 Forecasting a Single Time Series Two main approaches are traditionally used to model a single time series z 1, z 2,..., z n 1. Models the observation z t as a
More informationLecture  32 Regression Modelling Using SPSS
Applied Multivariate Statistical Modelling Prof. J. Maiti Department of Industrial Engineering and Management Indian Institute of Technology, Kharagpur Lecture  32 Regression Modelling Using SPSS (Refer
More informationHypothesis Testing COMP 245 STATISTICS. Dr N A Heard. 1 Hypothesis Testing 2 1.1 Introduction... 2 1.2 Error Rates and Power of a Test...
Hypothesis Testing COMP 45 STATISTICS Dr N A Heard Contents 1 Hypothesis Testing 1.1 Introduction........................................ 1. Error Rates and Power of a Test.............................
More informationLecture 3: Linear methods for classification
Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,
More information, then the form of the model is given by: which comprises a deterministic component involving the three regression coefficients (
Multiple regression Introduction Multiple regression is a logical extension of the principles of simple linear regression to situations in which there are several predictor variables. For instance if we
More informationFraternity & Sorority Academic Report Fall 2015
Fraternity & Sorority Academic Report Organization Lambda Upsilon Lambda 11 1 Delta Chi 77 19 96 2 Alpha Delta Chi 30 1 31 3 Alpha Delta Pi 134 62 196 4 Alpha Sigma Phi 37 13 50 5 Sigma Alpha Epsilon
More information1. χ 2 minimization 2. Fits in case of of systematic errors
Data fitting Volker Blobel University of Hamburg March 2005 1. χ 2 minimization 2. Fits in case of of systematic errors Keys during display: enter = next page; = next page; = previous page; home = first
More informationLeast Squares Estimation
Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN13: 9780470860809 ISBN10: 0470860804 Editors Brian S Everitt & David
More informationThe University of Kansas
Fall 2011 Scholarship Report All Greek Summary Rank Chapter Name Chapter GPA 1 Beta Theta Pi 3.57 2 Chi Omega 3.42 3 Kappa Alpha Theta 3.36 *4 Gamma Phi Beta 3.28 4 Kappa Kappa Gamma 3.28 6 Pi Beta Phi
More informationExam C, Fall 2006 PRELIMINARY ANSWER KEY
Exam C, Fall 2006 PRELIMINARY ANSWER KEY Question # Answer Question # Answer 1 E 19 B 2 D 20 D 3 B 21 A 4 C 22 A 5 A 23 E 6 D 24 E 7 B 25 D 8 C 26 A 9 E 27 C 10 D 28 C 11 E 29 C 12 B 30 B 13 C 31 C 14
More informationMATH4427 Notebook 2 Spring 2016. 2 MATH4427 Notebook 2 3. 2.1 Definitions and Examples... 3. 2.2 Performance Measures for Estimators...
MATH4427 Notebook 2 Spring 2016 prepared by Professor Jenny Baglivo c Copyright 20092016 by Jenny A. Baglivo. All Rights Reserved. Contents 2 MATH4427 Notebook 2 3 2.1 Definitions and Examples...................................
More informationNormality Testing in Excel
Normality Testing in Excel By Mark Harmon Copyright 2011 Mark Harmon No part of this publication may be reproduced or distributed without the express permission of the author. mark@excelmasterseries.com
More informationCHAPTER 6: Continuous Uniform Distribution: 6.1. Definition: The density function of the continuous random variable X on the interval [A, B] is.
Some Continuous Probability Distributions CHAPTER 6: Continuous Uniform Distribution: 6. Definition: The density function of the continuous random variable X on the interval [A, B] is B A A x B f(x; A,
More informationAn Introduction to Generalized Linear Mixed Models Using SAS PROC GLIMMIX
An Introduction to Generalized Linear Mixed Models Using SAS PROC GLIMMIX Phil Gibbs Advanced Analytics Manager SAS Technical Support November 22, 2008 UC Riverside What We Will Cover Today What is PROC
More informationGLMs: Gompertz s Law. GLMs in R. Gompertz s famous graduation formula is. or log µ x is linear in age, x,
Computing: an indispensable tool or an insurmountable hurdle? Iain Currie Heriot Watt University, Scotland ATRC, University College Dublin July 2006 Plan of talk General remarks The professional syllabus
More informationI L L I N O I S UNIVERSITY OF ILLINOIS AT URBANACHAMPAIGN
Beckman HLM Reading Group: Questions, Answers and Examples Carolyn J. Anderson Department of Educational Psychology I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANACHAMPAIGN Linear Algebra Slide 1 of
More information5. Multiple regression
5. Multiple regression QBUS6840 Predictive Analytics https://www.otexts.org/fpp/5 QBUS6840 Predictive Analytics 5. Multiple regression 2/39 Outline Introduction to multiple linear regression Some useful
More informationProbability Calculator
Chapter 95 Introduction Most statisticians have a set of probability tables that they refer to in doing their statistical wor. This procedure provides you with a set of electronic statistical tables that
More informationErrata and updates for ASM Exam C/Exam 4 Manual (Sixteenth Edition) sorted by page
Errata for ASM Exam C/4 Study Manual (Sixteenth Edition) Sorted by Page 1 Errata and updates for ASM Exam C/Exam 4 Manual (Sixteenth Edition) sorted by page Practice exam 1:9, 1:22, 1:29, 9:5, and 10:8
More informationParametric Models Part I: Maximum Likelihood and Bayesian Density Estimation
Parametric Models Part I: Maximum Likelihood and Bayesian Density Estimation Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2015 CS 551, Fall 2015
More informationSOCIETY OF ACTUARIES/CASUALTY ACTUARIAL SOCIETY EXAM C CONSTRUCTION AND EVALUATION OF ACTUARIAL MODELS EXAM C SAMPLE QUESTIONS
SOCIETY OF ACTUARIES/CASUALTY ACTUARIAL SOCIETY EXAM C CONSTRUCTION AND EVALUATION OF ACTUARIAL MODELS EXAM C SAMPLE QUESTIONS Copyright 005 by the Society of Actuaries and the Casualty Actuarial Society
More informationLogistic Regression (a type of Generalized Linear Model)
Logistic Regression (a type of Generalized Linear Model) 1/36 Today Review of GLMs Logistic Regression 2/36 How do we find patterns in data? We begin with a model of how the world works We use our knowledge
More informationChapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 )
Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 ) and Neural Networks( 類 神 經 網 路 ) 許 湘 伶 Applied Linear Regression Models (Kutner, Nachtsheim, Neter, Li) hsuhl (NUK) LR Chap 10 1 / 35 13 Examples
More informationSYSM 6304: Risk and Decision Analysis Lecture 3 Monte Carlo Simulation
SYSM 6304: Risk and Decision Analysis Lecture 3 Monte Carlo Simulation M. Vidyasagar Cecil & Ida Green Chair The University of Texas at Dallas Email: M.Vidyasagar@utdallas.edu September 19, 2015 Outline
More informationPoisson Models for Count Data
Chapter 4 Poisson Models for Count Data In this chapter we study loglinear models for count data under the assumption of a Poisson error structure. These models have many applications, not only to the
More informationScientific Computing: An Introductory Survey
Scientific Computing: An Introductory Survey Chapter 10 Boundary Value Problems for Ordinary Differential Equations Prof. Michael T. Heath Department of Computer Science University of Illinois at UrbanaChampaign
More informationJoint models for classification and comparison of mortality in different countries.
Joint models for classification and comparison of mortality in different countries. Viani D. Biatat 1 and Iain D. Currie 1 1 Department of Actuarial Mathematics and Statistics, and the Maxwell Institute
More informationVI. Introduction to Logistic Regression
VI. Introduction to Logistic Regression We turn our attention now to the topic of modeling a categorical outcome as a function of (possibly) several factors. The framework of generalized linear models
More informationDevelopment Period 1 2 3 4 5 6 7 8 9 Observed Payments
Pricing and reserving in the general insurance industry Solutions developed in The SAS System John Hansen & Christian Larsen, Larsen & Partners Ltd 1. Introduction The two business solutions presented
More informationGLM, insurance pricing & big data: paying attention to convergence issues.
GLM, insurance pricing & big data: paying attention to convergence issues. Michaël NOACK  michael.noack@addactis.com Senior consultant & Manager of ADDACTIS Pricing Copyright 2014 ADDACTIS Worldwide.
More informationSTATISTICA Formula Guide: Logistic Regression. Table of Contents
: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 SigmaRestricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary
More information7. Tests of association and Linear Regression
7. Tests of association and Linear Regression In this chapter we consider 1. Tests of Association for 2 qualitative variables. 2. Measures of the strength of linear association between 2 quantitative variables.
More informationPredictive Modeling. Age. Sex. Y.o.B. Expected mortality. Model. Married Amount. etc. SOA/CAS Spring Meeting
watsonwyatt.com SOA/CAS Spring Meeting Application of Predictive Modeling in Life Insurance JeanFelix Huet, ASA June 18, 28 Predictive Modeling Statistical model that relates an event (death) with a number
More information4. Joint Distributions of Two Random Variables
4. Joint Distributions of Two Random Variables 4.1 Joint Distributions of Two Discrete Random Variables Suppose the discrete random variables X and Y have supports S X and S Y, respectively. The joint
More informationOrdinal Regression. Chapter
Ordinal Regression Chapter 4 Many variables of interest are ordinal. That is, you can rank the values, but the real distance between categories is unknown. Diseases are graded on scales from least severe
More informationLinear Classification. Volker Tresp Summer 2015
Linear Classification Volker Tresp Summer 2015 1 Classification Classification is the central task of pattern recognition Sensors supply information about an object: to which class do the object belong
More information7 Hypothesis testing  one sample tests
7 Hypothesis testing  one sample tests 7.1 Introduction Definition 7.1 A hypothesis is a statement about a population parameter. Example A hypothesis might be that the mean age of students taking MAS113X
More informationSAS Software to Fit the Generalized Linear Model
SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling
More informationRegression III: Advanced Methods
Lecture 5: Linear leastsquares Regression III: Advanced Methods William G. Jacoby Department of Political Science Michigan State University http://polisci.msu.edu/jacoby/icpsr/regress3 Simple Linear Regression
More informationUsing pivots to construct confidence intervals. In Example 41 we used the fact that
Using pivots to construct confidence intervals In Example 41 we used the fact that Q( X, µ) = X µ σ/ n N(0, 1) for all µ. We then said Q( X, µ) z α/2 with probability 1 α, and converted this into a statement
More informationSeasonality and issues of the season
General Insurance Pricing Seminar James Tanser and Margarida Júdice Seasonality and issues of the season 2010 The Actuarial Profession www.actuaries.org.uk Agenda Issues of the season New claim trends
More informationPredictive Modeling for Life Insurers
Predictive Modeling for Life Insurers Application of Predictive Modeling Techniques in Measuring Policyholder Behavior in Variable Annuity Contracts April 30, 2010 Guillaume BriereGiroux, FSA, MAAA, CFA
More information1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96
1 Final Review 2 Review 2.1 CI 1propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years
More informationPoisson Regression or Regression of Counts (& Rates)
Poisson Regression or Regression of (& Rates) Carolyn J. Anderson Department of Educational Psychology University of Illinois at UrbanaChampaign Generalized Linear Models Slide 1 of 51 Outline Outline
More informatione = random error, assumed to be normally distributed with mean 0 and standard deviation σ
1 Linear Regression 1.1 Simple Linear Regression Model The linear regression model is applied if we want to model a numeric response variable and its dependency on at least one numeric factor variable.
More informationMath 425 (Fall 08) Solutions Midterm 2 November 6, 2008
Math 425 (Fall 8) Solutions Midterm 2 November 6, 28 (5 pts) Compute E[X] and Var[X] for i) X a random variable that takes the values, 2, 3 with probabilities.2,.5,.3; ii) X a random variable with the
More informationBOOSTED REGRESSION TREES: A MODERN WAY TO ENHANCE ACTUARIAL MODELLING
BOOSTED REGRESSION TREES: A MODERN WAY TO ENHANCE ACTUARIAL MODELLING Xavier Conort xavier.conort@gearanalytics.com Session Number: TBR14 Insurance has always been a data business The industry has successfully
More informationHeteroskedasticity and Weighted Least Squares
Econ 507. Econometric Analysis. Spring 2009 April 14, 2009 The Classical Linear Model: 1 Linearity: Y = Xβ + u. 2 Strict exogeneity: E(u) = 0 3 No Multicollinearity: ρ(x) = K. 4 No heteroskedasticity/
More informationBasics of Statistical Machine Learning
CS761 Spring 2013 Advanced Machine Learning Basics of Statistical Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu Modern machine learning is rooted in statistics. You will find many familiar
More informationTwo Topics in Parametric Integration Applied to Stochastic Simulation in Industrial Engineering
Two Topics in Parametric Integration Applied to Stochastic Simulation in Industrial Engineering Department of Industrial Engineering and Management Sciences Northwestern University September 15th, 2014
More informationVISUALIZATION OF DENSITY FUNCTIONS WITH GEOGEBRA
VISUALIZATION OF DENSITY FUNCTIONS WITH GEOGEBRA Csilla Csendes University of Miskolc, Hungary Department of Applied Mathematics ICAM 2010 Probability density functions A random variable X has density
More informationOffset Techniques for Predictive Modeling for Insurance
Offset Techniques for Predictive Modeling for Insurance Matthew Flynn, Ph.D, ISO Innovative Analytics, W. Hartford CT Jun Yan, Ph.D, Deloitte & Touche LLP, Hartford CT ABSTRACT This paper presents the
More informationHierarchical Insurance Claims Modeling
Hierarchical Insurance Claims Modeling Edward W. (Jed) Frees, University of Wisconsin  Madison Emiliano A. Valdez, University of Connecticut 2009 Joint Statistical Meetings Session 587  Thu 8/6/0910:30
More information1.1 Introduction, and Review of Probability Theory... 3. 1.1.1 Random Variable, Range, Types of Random Variables... 3. 1.1.2 CDF, PDF, Quantiles...
MATH4427 Notebook 1 Spring 2016 prepared by Professor Jenny Baglivo c Copyright 20092016 by Jenny A. Baglivo. All Rights Reserved. Contents 1 MATH4427 Notebook 1 3 1.1 Introduction, and Review of Probability
More informationCredit Risk Models: An Overview
Credit Risk Models: An Overview Paul Embrechts, Rüdiger Frey, Alexander McNeil ETH Zürich c 2003 (Embrechts, Frey, McNeil) A. Multivariate Models for Portfolio Credit Risk 1. Modelling Dependent Defaults:
More informationSimple linear regression
Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between
More informationGENERALIZED LINEAR MODELS IN VEHICLE INSURANCE
ACTA UNIVERSITATIS AGRICULTURAE ET SILVICULTURAE MENDELIANAE BRUNENSIS Volume 62 41 Number 2, 2014 http://dx.doi.org/10.11118/actaun201462020383 GENERALIZED LINEAR MODELS IN VEHICLE INSURANCE Silvie Kafková
More information11. Analysis of Casecontrol Studies Logistic Regression
Research methods II 113 11. Analysis of Casecontrol Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:
More informationSpatial Statistics Chapter 3 Basics of areal data and areal data modeling
Spatial Statistics Chapter 3 Basics of areal data and areal data modeling Recall areal data also known as lattice data are data Y (s), s D where D is a discrete index set. This usually corresponds to data
More informationInteraction Detection in GLM a Case Study
Interaction Detection in GLM a Case Study Chun Li, PhD ISO Innovative Analytics T H E S C I E N C E O F R I S K SM March 2012 1 Agenda Case study Approaches Proc Genmod, GAM in R, Proc Arbor Details Summary
More informationGLAM Array Methods in Statistics
GLAM Array Methods in Statistics Iain Currie Heriot Watt University A Generalized Linear Array Model is a lowstorage, highspeed, GLAM method for multidimensional smoothing, when data forms an array,
More informationA Deeper Look Inside Generalized Linear Models
A Deeper Look Inside Generalized Linear Models University of Minnesota February 3 rd, 2012 Nathan Hubbell, FCAS Agenda Property & Casualty (P&C Insurance) in one slide The Actuarial Profession Travelers
More informationExploratory Data Analysis
Exploratory Data Analysis Johannes Schauer johannes.schauer@tugraz.at Institute of Statistics Graz University of Technology Steyrergasse 17/IV, 8010 Graz www.statistics.tugraz.at February 12, 2008 Introduction
More informationOutline. Topic 4  Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares
Topic 4  Analysis of Variance Approach to Regression Outline Partitioning sums of squares Degrees of freedom Expected mean squares General linear test  Fall 2013 R 2 and the coefficient of correlation
More informationNonlinear Regression Functions. SW Ch 8 1/54/
Nonlinear Regression Functions SW Ch 8 1/54/ The TestScore STR relation looks linear (maybe) SW Ch 8 2/54/ But the TestScore Income relation looks nonlinear... SW Ch 8 3/54/ Nonlinear Regression General
More information3. The Multivariate Normal Distribution
3. The Multivariate Normal Distribution 3.1 Introduction A generalization of the familiar bell shaped normal density to several dimensions plays a fundamental role in multivariate analysis While real data
More informationLecture 14: GLM Estimation and Logistic Regression
Lecture 14: GLM Estimation and Logistic Regression Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South
More informationDATA INTERPRETATION AND STATISTICS
PholC60 September 001 DATA INTERPRETATION AND STATISTICS Books A easy and systematic introductory text is Essentials of Medical Statistics by Betty Kirkwood, published by Blackwell at about 14. DESCRIPTIVE
More informationMULTIPLE REGRESSION EXAMPLE
MULTIPLE REGRESSION EXAMPLE For a sample of n = 166 college students, the following variables were measured: Y = height X 1 = mother s height ( momheight ) X 2 = father s height ( dadheight ) X 3 = 1 if
More informationAdditional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jintselink/tselink.htm
Mgt 540 Research Methods Data Analysis 1 Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jintselink/tselink.htm http://web.utk.edu/~dap/random/order/start.htm
More informationChiSquare Test. Contingency Tables. Contingency Tables. ChiSquare Test for Independence. ChiSquare Tests for GoodnessofFit
ChiSquare Tests 15 Chapter ChiSquare Test for Independence ChiSquare Tests for Goodness Uniform Goodness Poisson Goodness Goodness Test ECDF Tests (Optional) McGrawHill/Irwin Copyright 2009 by The
More informationInstitute of Actuaries of India Subject CT3 Probability and Mathematical Statistics
Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics For 2015 Examinations Aim The aim of the Probability and Mathematical Statistics subject is to provide a grounding in
More informationChapter 2, part 2. Petter Mostad
Chapter 2, part 2 Petter Mostad mostad@chalmers.se Parametrical families of probability distributions How can we solve the problem of learning about the population distribution from the sample? Usual procedure:
More information