GLM III: Advanced Modeling Strategy 2005 CAS Seminar on Predictive Modeling Duncan Anderson MA FIA Watson Wyatt Worldwide

Save this PDF as:
 WORD  PNG  TXT  JPG

Size: px
Start display at page:

Download "GLM III: Advanced Modeling Strategy 2005 CAS Seminar on Predictive Modeling Duncan Anderson MA FIA Watson Wyatt Worldwide"

Transcription

1 GLM III: Advanced Modeling Strategy 25 CAS Seminar on Predictive Modeling Duncan Anderson MA FIA Watson Wyatt Worldwide W W W. W A T S O N W Y A T T. C O M

2 Agenda Introduction Testing the link function The Tweedie distribution Splines Reference models Aliasing / near aliasing Combining models across claim types Restricted models

3 "A Practitioner's Guide to GLMs" 24 CAS Discussion Paper Program Discusses testing the link function the Tweedie distribution aliasing / near aliasing combining models across claim types restricted models Copies available here

4 Generalized linear models E[Y i ] = µ i = g 1 (ΣX ij β j + ξ i ) Var[Y i ] = φ.v(µ i )/ω i Consider all factors simultaneously Provide statistical diagnostics Allow for nature of random process Robust and transparent Increasingly a global industry standard

5 Generalized linear models E[Y] = µ = g 1 (X.β + ξ) Var[Y] = φ.v(µ)/ω

6 Generalized linear models E[Y] = µ = g 1 (X.β + ξ) Link function Yvariate Design matrix Parameter estimates Offset term

7 Generalized linear models E[Y] = µ = g 1 (X.β + ξ) Observed thing (data) Some function (user defined) Some matrix based on data (user defined) Parameters to be estimated (the answer!) Known effects

8 Generalized linear models Var[Y] = φ.v(µ)/ω Scale parameter Variance function Prior weights Usually assume exponential family, eg φ = σ 2 (estimated), V(x) = 1 Var[Y i ] = σ 2 Normal φ = 1 (specified), V(x) = x Var[Y i ] = µ i Poisson φ = k (estimated), V(x) = x 2 Var[Y i ] = kµ i 2 Gamma

9 Agenda Introduction Testing the link function The Tweedie distribution Splines Reference models Aliasing / near aliasing Combining models across claim types Restricted models

10 Link function g(x) Offset ξ Error Structure V(µ) Linear Predictor Form X.β = α i + β j + γ k + δ l Data Y Scale Parameter φ Prior Weights ω Numerical MLE Parameter Estimates α, β, γ, δ Diagnostics

11 Model testing Use only those factors which are predictive standard errors of parameter estimates F tests / χ 2 tests on deviances stepwise approach (helpful if used with care) consistency over time human intuition Make sure the model is reasonable variance function: residual plots (histograms / QQ / residual vs fitted value etc) outliers: leverage / Cook's distance link function: BoxCox

12 BoxCox link function investigation GLM structure is E[Y] = µ = g 1 (X.β + ξ) Var[Y] = φ.v(µ) / ω Box Cox transforms defines g(x) = ( x λ 1 ) / λ for λ, ln(x) for λ= λ = 1 g(x) = x 1 additive (with base level shift) λ g(x) ln(x) multiplicative (via maths) λ = 1 g(x) = 1 1/x inverse (with base level shift) Try different values of λ and measure goodness of fit to see which fits experience best

13 BoxCox link function investigation Auto third party property damage frequencies Likelihood Inverse Multiplicative Additive λ

14 BoxCox link function investigation Auto third party property damage average amounts Likelihood Inverse Multiplicative Additive λ

15 BoxCox link function investigation Comparing fitted values of different link functions 35 Not much 3 25 Count of records Ratio of fitted values for Lambda = (mult) to Lambda =.3

16 Agenda Introduction Testing the link function The Tweedie distribution Splines Reference models Aliasing / near aliasing Combining models across claim types Restricted models

17 Standard approach BI Freq x Amt = Cost 1 PD Freq x Amt = Cost 2 MED Freq x Amt = Cost 3 COL Freq x Amt = Cost 4 OTC Freq x Amt = Cost 5

18 .2 Tweedie distributions Incurred losses have a point mass at zero and then a continuous distribution Poisson and gamma not suited to this Tweedie distribution can have point mass at zero and parameters which can alter the shape to be like Poisson and gamma above zero f Y ( y ; θ, λ, α ) = n = 1 1 α n {( λω ) κ α ( 1 / y )}. exp { λω [ θ y κ ( θ )]} Γ ( n α ) n! y α for y > { ( θ )} p( Y = ) = exp λωκ α

19 Tweedie distributions Tweedie: φ = k, V(x) = x p Var[Y] = kµ p p=1 corresponds to Poisson, p=2 to gamma Defines a valid distribution for p<, 1<p<2, p>2 Can be considered as Poisson/gamma process for 1<p<2 Need to estimate both k and p when fitting models often estimate a where p = (2a)/(1a) Typical values of p for insurance incurred claims around, or just under, 1.5

20 Generalised linear models Var[Y] = φ.v(µ) / ω Normal: φ = σ 2, V(x) = 1 Var[Y] = σ 2.1 Poisson: φ = 1, V(x) = x Var[Y] = µ Gamma: φ = k, V(x) = x 2 Var[Y] = kµ 2 Tweedie: φ = k, V(x) = x p Var[Y] = kµ p

21 Example 1: frequency Comparison of Tweedie model with traditional frequency/amounts approach Run 7 Model 2 Frequency % 446% 1 Log of multiplier % 128% 94% 96% 44% % 23% 15% 21% 17% Exposure (years) % A B C D E F G H I J K L M Bonus Malus Onew ay relativities Approx 95% confidence interval Unsmoothed estimate Smoothed estimate P value =.% Rank 12/12

22 Example 1: amounts Comparison of Tweedie model with traditional frequency/amounts approach Run 7 Model 6 Amounts Log of multiplier % 26% 6% 1% 5% 3% 7% % 2% 3% 2% 4% 1% Number of claims A B C D E F G H I J K L M Bonus Malus EXCLUDED FACTOR Onew ay relativities Approx 95% confidence interval Unsmoothed estimate Smoothed estimate P value = 5.9% Rank 4/12

23 Example 1: traditional RP vs Tweedie Comparison of Tweedie model with traditional frequency/amounts approach Run 11 Model 2 Tweedie Models % 568% 443% 446% 1 Log of multiplier % 128% 94% 96% 44% % 23% 15% 21% 17% Exposure (years) % A B C D E F G H I J K L M Bonus Malus Tw eedie SE Tw eedie RPSE RP

24 Example 2: frequency Comparison of Tweedie model with traditional frequency/amounts approach Run 7 Model 1 Frequency Log of multiplier % 84% 69% 75% 68% 67% 61% 58% 58% 45% 48% 54% 47% 38% Exposure (years).1 % >13 Age of vehicle Onew ay relativities Approx 95% confidence interval Unsmoothed estimate Smoothed estimate P value =.% Rank 12/12

25 Example 2: amounts Comparison of Tweedie model with traditional frequency/amounts approach Run 7 Model 5 Amounts Log of multiplier % 1% 11% 17% 14% 11% 24% 8% 11% 14% 13% 1% 4% Number of claims % 1 5% >13 Age of vehicle Onew ay relativities Approx 95% confidence interval Unsmoothed estimate Smoothed estimate P value =.% Rank 5/7

26 Example 2: traditional RP vs Tweedie Comparison of Tweedie model with traditional frequency/amounts approach Run 11 Model 1 Tweedie Models 1 1 Log of multiplier % 83% 17% 11% 9% 87% 16% 15% 94% 92% 9% 86% 14% 99% 73% 75% 71% 67% 7% 73% 65% 67% 7% 55% 53% 34% 31% Exposure (years) % >13 Age of vehicle Tw eedie SE Tw eedie RPSE RP Numbers Amounts

27 Example 3: traditional RP vs Tweedie Comparison of Tweedie model with traditional frequency/amounts approach Run 11 Model 1 Tweedie Models.1 45 Log of multiplier % 9% % Exposure (years) Category 1 Category 2 Occupation Tw eedie SE Tw eedie RPSE RP Numbers Amounts

28 Example 4: frequency Comparison of Tweedie model with traditional frequency/amounts approach Run 7 Model 1 Frequency.4 8 Log of multiplier % 6% 2% % 2% 2% 5% 8% 3% 8% 8% 3% 17% 2% Exposure (years) < Unknown 39% 1 Age of driver Onew ay relativities Approx 95% confidence interval Unsmoothed estimate Smoothed estimate P value =.% Rank 5/12

29 Example 4: amounts Comparison of Tweedie model with traditional frequency/amounts approach Run 7 Model 5 Amounts.18 16% 6 Log of multiplier % 3% 2% % 1% 1% 6% 9% 5% 3% 5% % 2% 4% Number of claims < Unknown Age of driver EXCLUDED FACTOR Onew ay relativities Approx 95% confidence interval Unsmoothed estimate Smoothed estimate P value = 5.6% Rank 4/9

30 Example 4: traditional RP vs Tweedie Comparison of Tweedie model with traditional frequency/amounts approach Run 11 Model 1 Tweedie Models Log of multiplier % 21% 11% 6% 2% 1% % 4% 2% 2% % 13% 5% 2% 8% 3% 1% 16% 8% 15% 8% 2% 3% 14% 17% 16% 2% 28% Exposure (years).4 39% 1.6 < Unknown Age of driver Tw eedie SE Tw eedie RPSE RP Numbers

31 Agenda Introduction Testing the link function The Tweedie distribution Splines Reference models Aliasing / near aliasing Combining models across claim types Restricted models

32 Spline definition 1 A series of polynomial functions, with each function defined over a short interval Intervals are defined by k+2 knots two exterior knots at extremes of data variable number (k) of interior knots At each interior knot the two functions must join "smoothly"

33 Cubic splines Each polynomial is a cubic a + bx + cx 2 + dx 3 "Smoothness" at interior knots is defined as: continuous continuous first derivative continuous second derivative

34 Regression splines The position of the knots is specified by the user Standard GLMs can be used by careful definition of variates Pros fits easily into existing structures no complex resampling needed Cons position of knots can effect final answer

35 Smoothing splines One knot at each unique data value Additional curvature penalty prevents over fitting Curvature penalty selected by repeatedly sampling subsets and optimising generalised goodness of fit measure such as AIC Pros allows data to guide final result Cons 1s of knots required optimisation process is timeconsuming difficult to produce new fitted values

36 "Easy" regression splines Fit a cubic over the whole range simply define x, x 2 and x 3 as variates and include in the model Fit additional cubic "correction" variates for each interval, defined as if x<k r ((x k r+1 )/(k r k r+1 )) 3 otherwise

37 "Easy" regression splines

38 "Easy" regression splines

39 "Easy" regression splines

40 "Easy" regression splines "Correction" variates get large quickly In practice GLM process can struggle with these large numbers Alternate basis is clearly desirable

41 BSplines Set of basis functions usually covering four segments (defined by five knots) Each function is itself a cubic spline Each basis function has the same shape, except for the three basis functions at each extreme which occupy fewer than four segments

42 BSplines

43 BSplines

44 BSplines

45 BSplines

46 Example Factor

47 Example Factor Polynomial

48 Example Spline Factor Polynomial

49 Example Spline Factor Spline SE Spline SE

50 Extrapolation Can combine the three boundary basis functions to achieve linear extrapolation Sum of three functions tends to 1 at the exterior knot, and continues as 1 when extrapolated Can combine the last two functions to give function that tends to a straight line at the exterior knot, and can be extrapolated as such

51 BSplines

52 BSplines

53 BSplines

54 Example Spline Factor Spline SE Spline SE Polynomial

55 Further example 1.2 Comparison of factor with spline Urban density 28% % 5 81%.6 44% 53% 4 Log of multiplier.3 9% 15% 2% % 1% 18% 3 Exposure (years).3 33% 37% 34% 35% 35% 34% 28% 25% 28% Population density Factor SE Factor P value =.% Rank 7/11

56 1.2 Further example Comparison of factor with spline Urban density Log of multiplier Exposure (years) Population density Factor SE Factor Spline P value =.% Rank 7/11

57 1.2 Further example Comparison of factor with spline Urban density Log of multiplier Exposure (years) Population density Spline P value =.% Rank 7/11

58 Further example 1.2 Comparison of factor with spline Urban density 182% % 139% % Log of multiplier.3 15% 8% 3% % 5% 17% 4% 4 3 Exposure (years).3 38% 37% 37% 36% 35% 33% 31% 29% 24% Population density Factor Spline SE Spline P value =.% Rank 7/11

59 Splines Practical way of modeling continuous variables Often better than polynomials Increases complexity, therefore best used when it is important that rates vary continuously with a variable when modeling elasticity to be used in price optimization analyses

60 Agenda Introduction Testing the link function The Tweedie distribution Splines Reference models Aliasing / near aliasing Combining models across claim types Restricted models

61 Standard approach BI Freq x Amt = Cost 1 PD Freq x Amt = Cost 2 MED Freq x Amt = Cost 3 COL Freq x Amt = Cost 4 OTC Freq x Amt = Cost 5

62 Binomial reference models Amt = Cost 1 TP Freq BI / PD? Amt = Cost 2 MED Freq x Amt = Cost 3 COL Freq x Amt = Cost 4 OTC Freq x Amt = Cost 5

63 Offset reference model

64 Offset reference model

65 Offset reference model

66 Offset reference model

67 Offset reference model

68 Offset reference model

69 Offset reference model

70 Offset reference model

71 Testing the reference model approach (1) Fit to BI claims on all data the "correct answer" 1% of large company 1% Random sample to emulate small company (2) Model BI claims with standard approach (3) Model BI claims referencing PD experience on this small sample

72 Example of reference model method working Example PI models Green: BI 1% ("correct answer") Blue: Standard BI GLM on 1% Purple: Referencing PD on 1% 27% Log of multiplier % 14% 19% 1 8 Exposure (years) 6.6 % Standard BI GLM factor rejected Level 1 Level 2 Level 3 Factor X Approx 2 s.e. from estimate Full model Unsmoothed estimate Full model PD model

73 Example of reference model method working Example Green: BI PI 1% models ("correct answer") % 182% Blue: Standard BI GLM on 1% Purple: Referencing PD on 1% Log of multiplier.6.3 1% 54% 37% 9% 66% 54% 38% 42% 29% 24% 34% 2% 12% 12% 11% 6% 34% 14% 1% 1% 3% 38% 6% 7% % Exposure (years) 2.3 Level A Level B Level C Level D Level E Level F Level G Level H Level I Level J Factor Y Approx 2 s.e. from estimate Full model Unsmoothed estimate Full model Unsmoothed estimate 1% model Approx 2 s.e. from estimate 1% model PD model

74 Agenda Introduction Testing the link function The Tweedie distribution Splines Reference models Aliasing / near aliasing Combining models across claim types Restricted models

75 Aliasing and "near aliasing" Aliasing the removal of unwanted redundant parameters Intrinsic aliasing occurs by the design of the model Extrinsic aliasing occurs "accidentally" as a result of the data

76 Example Suppose we wanted a model of the form: µ = α + β if age < β if age β if age > γ if sex male 1 + γ if sex female 2

77 Form of X.β in this case Age Sex <3 34 >4 M F α β β β γ γ

78 Example Suppose we wanted a model of the form: µ = α + β if age < β if age "Base levels" + β if age > γ if sex male 1 + γ if sex female 2

79 X.β having adjusted for base levels Age Sex <3 34 >4 M F α β β β γ γ

80 X.β having adjusted for base levels Age Sex <3 >4 F α β 1 β 3 γ 2

81 Intrinsic aliasing Example job Run 16 Model 3 Small interaction Third party material damage, Numbers % 138%.8 2 Log of multiplier % 28% 24% 15 1 Exposure (years).2 % 19% 6% 11% Age of driver Onew ay relativities Approx 95% confidence interval Unsmoothed estimate Smoothed estimate

82 Extrinsic aliasing If a perfect correlation exists, one factor can alias levels of another Eg if doors declared first: Selected base Exposure: # Doors Unknown Color Selected base Red 13,234 12,343 13,432 13,432 Green 4,543 4,543 13,243 2,345 Blue 6,544 5,443 15,654 4,565 Black 4,643 1,235 14,565 4,545 Further aliasing Unknown 3,242 This is the only reason the order of declaration can matter (fitted values are unaffected)

83 Extrinsic aliasing Example job Run 16 Model 3 Small interaction Third party material damage, Numbers 1 135% 6 Log of multiplier % 39% 45% 57% 82% 72% 7% 91% 13% Exposure (years) % % Malus No claim discount Onew ay relativities Approx 95% confidence interval Unsmoothed estimate Smoothed estimate

84 "Near aliasing" If two factors are almost perfectly, but not quite aliased, convergence problems can result and/or results can become hard to interpret Exposure: # Doors Unknown Color Selected base Red 13,234 12,343 13,432 13,432 Eg if the 2 black, unknown doors policies had no claims, GLM would try to estimate a very large negative number for unknown doors, and a very large positive number for unknown color Selected base Green 4,543 4,543 13,243 2,345 Blue 6,544 5,443 15,654 4,565 Black 4,643 1,235 14,565 4,545 2 Unknown 3,242

85 "Near aliasing" solution 1. Spot it 2. Fix the data! Exposure: # Doors Unknown Colour Red 13,234 12,343 13,432 13,432 Green 4,543 4,543 13,243 2,345 Blue 6,544 5,443 15,654 4,565 Black 4,643 1,235 14,565 4,545 2 Unknown 3,242

86 Agenda Introduction Testing the link function The Tweedie distribution Splines Reference models Aliasing / near aliasing Combining models across claim types Restricted models

87 Combining claim elements I BI Freq x Amt = Cost 1 PD Freq x Amt = Cost 2 MED Freq x Amt = Cost 3 Multiply factors for frequencies and amounts Calculate risk premium as sum of claim elements COL Freq x Amt = Cost 4 OTC Freq x Amt = Cost 5

88 Combining claim elements II Consider current exposure BI Freq x Amt = Cost 1 PD Freq x Amt = Cost 2 MED Freq x Amt = Cost 3 COL Freq x Amt = Cost 4 OTC Freq x Amt = Cost 5 Calculate expected frequency and amount for each claim type for each record Combine to give expected total cost of claims for each record Fit model to this expected value

89 Calculation of risk premium TPPD TPPD TPBI TPBI Numbers Amounts Numbers Amounts Intercept 32% 1 12% 486 Sex Male Female Area Town Country Policy Sex Area WWNUM1 WWAMT1 WWNUM2 WWAMT2 WWCC1 WWCC2 WWRSKPRM M T 32% 1 12% F T 24% 12 8% M C 4% 7 9% F C 3% 84 6%

90 Risk premium standard errors Risk premium model standard errors are small owing to the smoothness of the expected value It is possible to approximate standard error of risk premium parameter estimates based on standard errors of parameter estimates in underlying models Care needed in interpreting such approximations since they do not reflect model error, eg deciding to exclude a marginal factor

91 Risk premium standard errors failings Numbers Amounts Risk premium

92 Risk premium standard errors failings Numbers Amounts Risk premium

93 Agenda Introduction Testing the link function The Tweedie distribution Splines Reference models Aliasing / near aliasing Combining models across claim types Restricted models

94 Restricted models E[Y] = µ = g 1 ( X.β + ξ ) Offset Offset term used for known effects, eg exposure in a numbers model Can also be used to constrain model (eg claim free years / payment frequency / amount of cover) Other factors adjusted to compensate

95 Restricted models Age Sex <3 >4 F α β 1 β 3 γ 2

96 Restricted models E[Y] = µ = g 1 ( X.β ) Age Sex <3 >4 F α β 1 β 3 γ 2

97 Restricted models E[Y] = µ = g 1 ( X.β + ξ ) Age <3 > α β 1 β

98 Restricted models Age Sex <3 >4 F α β 1 β 3.1

99 Restricted models Log of multiplier Restriction Log of multiplier BonusMalus Vehicle group Log of multiplier.5 1 Log of multiplier A B C D E F G H I J K L M N O Geographical zone Age of driver

100 Restricted models Log of multiplier Restriction Log of multiplier BonusMalus Vehicle group Log of multiplier Log of multiplier 1.5 Model compensates (as best it can) to allow for restriction A B C D E F G H I J K L M N O Geographical zone Age of driver

101 Testing the effectiveness of restrictions 12 1 Effect of restricting Bonus Malus Other factors not very correlated with Bonus Malus 8 Count of records Ratio A B C D E F G H I J K L M N O P Q R S

102 Testing the effectiveness of restrictions 12 1 Effect of restricting Bonus Malus Adding in a measure of claim free years 8 Count of records Ratio A B C D E F G H I J K L M N O P Q R S

103 Restrictions Only use to "get around" restrictions A commercial smoothing is a commercial smoothing Apply at risk premium stage

104 GLM III: Advanced Modeling Strategy 25 CAS Seminar on Predictive Modeling Duncan Anderson MA FIA Watson Wyatt Worldwide W W W. W A T S O N W Y A T T. C O M

GLM I An Introduction to Generalized Linear Models

GLM I An Introduction to Generalized Linear Models GLM I An Introduction to Generalized Linear Models CAS Ratemaking and Product Management Seminar March 2009 Presented by: Tanya D. Havlicek, Actuarial Assistant 0 ANTITRUST Notice The Casualty Actuarial

More information

Introduction to General and Generalized Linear Models

Introduction to General and Generalized Linear Models Introduction to General and Generalized Linear Models General Linear Models - part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby

More information

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Volkert Siersma siersma@sund.ku.dk The Research Unit for General Practice in Copenhagen Dias 1 Content Quantifying association

More information

Introduction to Hypothesis Testing. Point estimation and confidence intervals are useful statistical inference procedures.

Introduction to Hypothesis Testing. Point estimation and confidence intervals are useful statistical inference procedures. Introduction to Hypothesis Testing Point estimation and confidence intervals are useful statistical inference procedures. Another type of inference is used frequently used concerns tests of hypotheses.

More information

Maximum Likelihood Estimation

Maximum Likelihood Estimation Math 541: Statistical Theory II Lecturer: Songfeng Zheng Maximum Likelihood Estimation 1 Maximum Likelihood Estimation Maximum likelihood is a relatively simple method of constructing an estimator for

More information

Simple Linear Regression Inference

Simple Linear Regression Inference Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation

More information

Location matters. 3 techniques to incorporate geo-spatial effects in one's predictive model

Location matters. 3 techniques to incorporate geo-spatial effects in one's predictive model Location matters. 3 techniques to incorporate geo-spatial effects in one's predictive model Xavier Conort xavier.conort@gear-analytics.com Motivation Location matters! Observed value at one location is

More information

A LOGNORMAL MODEL FOR INSURANCE CLAIMS DATA

A LOGNORMAL MODEL FOR INSURANCE CLAIMS DATA REVSTAT Statistical Journal Volume 4, Number 2, June 2006, 131 142 A LOGNORMAL MODEL FOR INSURANCE CLAIMS DATA Authors: Daiane Aparecida Zuanetti Departamento de Estatística, Universidade Federal de São

More information

Multiple Regression - Selecting the Best Equation An Example Techniques for Selecting the "Best" Regression Equation

Multiple Regression - Selecting the Best Equation An Example Techniques for Selecting the Best Regression Equation Multiple Regression - Selecting the Best Equation When fitting a multiple linear regression model, a researcher will likely include independent variables that are not important in predicting the dependent

More information

1 Sufficient statistics

1 Sufficient statistics 1 Sufficient statistics A statistic is a function T = rx 1, X 2,, X n of the random sample X 1, X 2,, X n. Examples are X n = 1 n s 2 = = X i, 1 n 1 the sample mean X i X n 2, the sample variance T 1 =

More information

Department of Mathematics, Indian Institute of Technology, Kharagpur Assignment 2-3, Probability and Statistics, March 2015. Due:-March 25, 2015.

Department of Mathematics, Indian Institute of Technology, Kharagpur Assignment 2-3, Probability and Statistics, March 2015. Due:-March 25, 2015. Department of Mathematics, Indian Institute of Technology, Kharagpur Assignment -3, Probability and Statistics, March 05. Due:-March 5, 05.. Show that the function 0 for x < x+ F (x) = 4 for x < for x

More information

Generalized Linear Models. Today: definition of GLM, maximum likelihood estimation. Involves choice of a link function (systematic component)

Generalized Linear Models. Today: definition of GLM, maximum likelihood estimation. Involves choice of a link function (systematic component) Generalized Linear Models Last time: definition of exponential family, derivation of mean and variance (memorize) Today: definition of GLM, maximum likelihood estimation Include predictors x i through

More information

Combining Linear and Non-Linear Modeling Techniques: EMB America. Getting the Best of Two Worlds

Combining Linear and Non-Linear Modeling Techniques: EMB America. Getting the Best of Two Worlds Combining Linear and Non-Linear Modeling Techniques: Getting the Best of Two Worlds Outline Who is EMB? Insurance industry predictive modeling applications EMBLEM our GLM tool How we have used CART with

More information

The zero-adjusted Inverse Gaussian distribution as a model for insurance claims

The zero-adjusted Inverse Gaussian distribution as a model for insurance claims The zero-adjusted Inverse Gaussian distribution as a model for insurance claims Gillian Heller 1, Mikis Stasinopoulos 2 and Bob Rigby 2 1 Dept of Statistics, Macquarie University, Sydney, Australia. email:

More information

Dongfeng Li. Autumn 2010

Dongfeng Li. Autumn 2010 Autumn 2010 Chapter Contents Some statistics background; ; Comparing means and proportions; variance. Students should master the basic concepts, descriptive statistics measures and graphs, basic hypothesis

More information

Web-based Supplementary Materials for Bayesian Effect Estimation. Accounting for Adjustment Uncertainty by Chi Wang, Giovanni

Web-based Supplementary Materials for Bayesian Effect Estimation. Accounting for Adjustment Uncertainty by Chi Wang, Giovanni 1 Web-based Supplementary Materials for Bayesian Effect Estimation Accounting for Adjustment Uncertainty by Chi Wang, Giovanni Parmigiani, and Francesca Dominici In Web Appendix A, we provide detailed

More information

Multinomial and Ordinal Logistic Regression

Multinomial and Ordinal Logistic Regression Multinomial and Ordinal Logistic Regression ME104: Linear Regression Analysis Kenneth Benoit August 22, 2012 Regression with categorical dependent variables When the dependent variable is categorical,

More information

Notes on the Negative Binomial Distribution

Notes on the Negative Binomial Distribution Notes on the Negative Binomial Distribution John D. Cook October 28, 2009 Abstract These notes give several properties of the negative binomial distribution. 1. Parameterizations 2. The connection between

More information

Introduction to Predictive Modeling Using GLMs

Introduction to Predictive Modeling Using GLMs Introduction to Predictive Modeling Using GLMs Dan Tevet, FCAS, MAAA, Liberty Mutual Insurance Group Anand Khare, FCAS, MAAA, CPCU, Milliman 1 Antitrust Notice The Casualty Actuarial Society is committed

More information

Fitting Subject-specific Curves to Grouped Longitudinal Data

Fitting Subject-specific Curves to Grouped Longitudinal Data Fitting Subject-specific Curves to Grouped Longitudinal Data Djeundje, Viani Heriot-Watt University, Department of Actuarial Mathematics & Statistics Edinburgh, EH14 4AS, UK E-mail: vad5@hw.ac.uk Currie,

More information

Bonus-malus systems and Markov chains

Bonus-malus systems and Markov chains Bonus-malus systems and Markov chains Dutch car insurance bonus-malus system class % increase new class after # claims 0 1 2 >3 14 30 14 9 5 1 13 32.5 14 8 4 1 12 35 13 8 4 1 11 37.5 12 7 3 1 10 40 11

More information

The University of Kansas

The University of Kansas All Greek Summary Rank Chapter Name Total Membership Chapter GPA 1 Beta Theta Pi 3.57 2 Chi Omega 3.42 3 Kappa Alpha Theta 3.36 4 Kappa Kappa Gamma 3.28 *5 Pi Beta Phi 3.27 *5 Gamma Phi Beta 3.27 *7 Alpha

More information

Models for Count Data With Overdispersion

Models for Count Data With Overdispersion Models for Count Data With Overdispersion Germán Rodríguez November 6, 2013 Abstract This addendum to the WWS 509 notes covers extra-poisson variation and the negative binomial model, with brief appearances

More information

12.5: CHI-SQUARE GOODNESS OF FIT TESTS

12.5: CHI-SQUARE GOODNESS OF FIT TESTS 125: Chi-Square Goodness of Fit Tests CD12-1 125: CHI-SQUARE GOODNESS OF FIT TESTS In this section, the χ 2 distribution is used for testing the goodness of fit of a set of data to a specific probability

More information

α α λ α = = λ λ α ψ = = α α α λ λ ψ α = + β = > θ θ β > β β θ θ θ β θ β γ θ β = γ θ > β > γ θ β γ = θ β = θ β = θ β = β θ = β β θ = = = β β θ = + α α α α α = = λ λ λ λ λ λ λ = λ λ α α α α λ ψ + α =

More information

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression Logistic Regression Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max

More information

Fraternity & Sorority Academic Report Spring 2016

Fraternity & Sorority Academic Report Spring 2016 Fraternity & Sorority Academic Report Organization Overall GPA Triangle 17-17 1 Delta Chi 88 12 100 2 Alpha Epsilon Pi 77 3 80 3 Alpha Delta Chi 28 4 32 4 Alpha Delta Pi 190-190 4 Phi Gamma Delta 85 3

More information

3. Regression & Exponential Smoothing

3. Regression & Exponential Smoothing 3. Regression & Exponential Smoothing 3.1 Forecasting a Single Time Series Two main approaches are traditionally used to model a single time series z 1, z 2,..., z n 1. Models the observation z t as a

More information

Lecture - 32 Regression Modelling Using SPSS

Lecture - 32 Regression Modelling Using SPSS Applied Multivariate Statistical Modelling Prof. J. Maiti Department of Industrial Engineering and Management Indian Institute of Technology, Kharagpur Lecture - 32 Regression Modelling Using SPSS (Refer

More information

Hypothesis Testing COMP 245 STATISTICS. Dr N A Heard. 1 Hypothesis Testing 2 1.1 Introduction... 2 1.2 Error Rates and Power of a Test...

Hypothesis Testing COMP 245 STATISTICS. Dr N A Heard. 1 Hypothesis Testing 2 1.1 Introduction... 2 1.2 Error Rates and Power of a Test... Hypothesis Testing COMP 45 STATISTICS Dr N A Heard Contents 1 Hypothesis Testing 1.1 Introduction........................................ 1. Error Rates and Power of a Test.............................

More information

Lecture 3: Linear methods for classification

Lecture 3: Linear methods for classification Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,

More information

, then the form of the model is given by: which comprises a deterministic component involving the three regression coefficients (

, then the form of the model is given by: which comprises a deterministic component involving the three regression coefficients ( Multiple regression Introduction Multiple regression is a logical extension of the principles of simple linear regression to situations in which there are several predictor variables. For instance if we

More information

Fraternity & Sorority Academic Report Fall 2015

Fraternity & Sorority Academic Report Fall 2015 Fraternity & Sorority Academic Report Organization Lambda Upsilon Lambda 1-1 1 Delta Chi 77 19 96 2 Alpha Delta Chi 30 1 31 3 Alpha Delta Pi 134 62 196 4 Alpha Sigma Phi 37 13 50 5 Sigma Alpha Epsilon

More information

1. χ 2 minimization 2. Fits in case of of systematic errors

1. χ 2 minimization 2. Fits in case of of systematic errors Data fitting Volker Blobel University of Hamburg March 2005 1. χ 2 minimization 2. Fits in case of of systematic errors Keys during display: enter = next page; = next page; = previous page; home = first

More information

Least Squares Estimation

Least Squares Estimation Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David

More information

The University of Kansas

The University of Kansas Fall 2011 Scholarship Report All Greek Summary Rank Chapter Name Chapter GPA 1 Beta Theta Pi 3.57 2 Chi Omega 3.42 3 Kappa Alpha Theta 3.36 *4 Gamma Phi Beta 3.28 4 Kappa Kappa Gamma 3.28 6 Pi Beta Phi

More information

Exam C, Fall 2006 PRELIMINARY ANSWER KEY

Exam C, Fall 2006 PRELIMINARY ANSWER KEY Exam C, Fall 2006 PRELIMINARY ANSWER KEY Question # Answer Question # Answer 1 E 19 B 2 D 20 D 3 B 21 A 4 C 22 A 5 A 23 E 6 D 24 E 7 B 25 D 8 C 26 A 9 E 27 C 10 D 28 C 11 E 29 C 12 B 30 B 13 C 31 C 14

More information

MATH4427 Notebook 2 Spring 2016. 2 MATH4427 Notebook 2 3. 2.1 Definitions and Examples... 3. 2.2 Performance Measures for Estimators...

MATH4427 Notebook 2 Spring 2016. 2 MATH4427 Notebook 2 3. 2.1 Definitions and Examples... 3. 2.2 Performance Measures for Estimators... MATH4427 Notebook 2 Spring 2016 prepared by Professor Jenny Baglivo c Copyright 2009-2016 by Jenny A. Baglivo. All Rights Reserved. Contents 2 MATH4427 Notebook 2 3 2.1 Definitions and Examples...................................

More information

Normality Testing in Excel

Normality Testing in Excel Normality Testing in Excel By Mark Harmon Copyright 2011 Mark Harmon No part of this publication may be reproduced or distributed without the express permission of the author. mark@excelmasterseries.com

More information

CHAPTER 6: Continuous Uniform Distribution: 6.1. Definition: The density function of the continuous random variable X on the interval [A, B] is.

CHAPTER 6: Continuous Uniform Distribution: 6.1. Definition: The density function of the continuous random variable X on the interval [A, B] is. Some Continuous Probability Distributions CHAPTER 6: Continuous Uniform Distribution: 6. Definition: The density function of the continuous random variable X on the interval [A, B] is B A A x B f(x; A,

More information

An Introduction to Generalized Linear Mixed Models Using SAS PROC GLIMMIX

An Introduction to Generalized Linear Mixed Models Using SAS PROC GLIMMIX An Introduction to Generalized Linear Mixed Models Using SAS PROC GLIMMIX Phil Gibbs Advanced Analytics Manager SAS Technical Support November 22, 2008 UC Riverside What We Will Cover Today What is PROC

More information

GLMs: Gompertz s Law. GLMs in R. Gompertz s famous graduation formula is. or log µ x is linear in age, x,

GLMs: Gompertz s Law. GLMs in R. Gompertz s famous graduation formula is. or log µ x is linear in age, x, Computing: an indispensable tool or an insurmountable hurdle? Iain Currie Heriot Watt University, Scotland ATRC, University College Dublin July 2006 Plan of talk General remarks The professional syllabus

More information

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN Beckman HLM Reading Group: Questions, Answers and Examples Carolyn J. Anderson Department of Educational Psychology I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN Linear Algebra Slide 1 of

More information

5. Multiple regression

5. Multiple regression 5. Multiple regression QBUS6840 Predictive Analytics https://www.otexts.org/fpp/5 QBUS6840 Predictive Analytics 5. Multiple regression 2/39 Outline Introduction to multiple linear regression Some useful

More information

Probability Calculator

Probability Calculator Chapter 95 Introduction Most statisticians have a set of probability tables that they refer to in doing their statistical wor. This procedure provides you with a set of electronic statistical tables that

More information

Errata and updates for ASM Exam C/Exam 4 Manual (Sixteenth Edition) sorted by page

Errata and updates for ASM Exam C/Exam 4 Manual (Sixteenth Edition) sorted by page Errata for ASM Exam C/4 Study Manual (Sixteenth Edition) Sorted by Page 1 Errata and updates for ASM Exam C/Exam 4 Manual (Sixteenth Edition) sorted by page Practice exam 1:9, 1:22, 1:29, 9:5, and 10:8

More information

Parametric Models Part I: Maximum Likelihood and Bayesian Density Estimation

Parametric Models Part I: Maximum Likelihood and Bayesian Density Estimation Parametric Models Part I: Maximum Likelihood and Bayesian Density Estimation Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2015 CS 551, Fall 2015

More information

SOCIETY OF ACTUARIES/CASUALTY ACTUARIAL SOCIETY EXAM C CONSTRUCTION AND EVALUATION OF ACTUARIAL MODELS EXAM C SAMPLE QUESTIONS

SOCIETY OF ACTUARIES/CASUALTY ACTUARIAL SOCIETY EXAM C CONSTRUCTION AND EVALUATION OF ACTUARIAL MODELS EXAM C SAMPLE QUESTIONS SOCIETY OF ACTUARIES/CASUALTY ACTUARIAL SOCIETY EXAM C CONSTRUCTION AND EVALUATION OF ACTUARIAL MODELS EXAM C SAMPLE QUESTIONS Copyright 005 by the Society of Actuaries and the Casualty Actuarial Society

More information

Logistic Regression (a type of Generalized Linear Model)

Logistic Regression (a type of Generalized Linear Model) Logistic Regression (a type of Generalized Linear Model) 1/36 Today Review of GLMs Logistic Regression 2/36 How do we find patterns in data? We begin with a model of how the world works We use our knowledge

More information

Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 )

Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 ) Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 ) and Neural Networks( 類 神 經 網 路 ) 許 湘 伶 Applied Linear Regression Models (Kutner, Nachtsheim, Neter, Li) hsuhl (NUK) LR Chap 10 1 / 35 13 Examples

More information

SYSM 6304: Risk and Decision Analysis Lecture 3 Monte Carlo Simulation

SYSM 6304: Risk and Decision Analysis Lecture 3 Monte Carlo Simulation SYSM 6304: Risk and Decision Analysis Lecture 3 Monte Carlo Simulation M. Vidyasagar Cecil & Ida Green Chair The University of Texas at Dallas Email: M.Vidyasagar@utdallas.edu September 19, 2015 Outline

More information

Poisson Models for Count Data

Poisson Models for Count Data Chapter 4 Poisson Models for Count Data In this chapter we study log-linear models for count data under the assumption of a Poisson error structure. These models have many applications, not only to the

More information

Scientific Computing: An Introductory Survey

Scientific Computing: An Introductory Survey Scientific Computing: An Introductory Survey Chapter 10 Boundary Value Problems for Ordinary Differential Equations Prof. Michael T. Heath Department of Computer Science University of Illinois at Urbana-Champaign

More information

Joint models for classification and comparison of mortality in different countries.

Joint models for classification and comparison of mortality in different countries. Joint models for classification and comparison of mortality in different countries. Viani D. Biatat 1 and Iain D. Currie 1 1 Department of Actuarial Mathematics and Statistics, and the Maxwell Institute

More information

VI. Introduction to Logistic Regression

VI. Introduction to Logistic Regression VI. Introduction to Logistic Regression We turn our attention now to the topic of modeling a categorical outcome as a function of (possibly) several factors. The framework of generalized linear models

More information

Development Period 1 2 3 4 5 6 7 8 9 Observed Payments

Development Period 1 2 3 4 5 6 7 8 9 Observed Payments Pricing and reserving in the general insurance industry Solutions developed in The SAS System John Hansen & Christian Larsen, Larsen & Partners Ltd 1. Introduction The two business solutions presented

More information

GLM, insurance pricing & big data: paying attention to convergence issues.

GLM, insurance pricing & big data: paying attention to convergence issues. GLM, insurance pricing & big data: paying attention to convergence issues. Michaël NOACK - michael.noack@addactis.com Senior consultant & Manager of ADDACTIS Pricing Copyright 2014 ADDACTIS Worldwide.

More information

STATISTICA Formula Guide: Logistic Regression. Table of Contents

STATISTICA Formula Guide: Logistic Regression. Table of Contents : Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

More information

7. Tests of association and Linear Regression

7. Tests of association and Linear Regression 7. Tests of association and Linear Regression In this chapter we consider 1. Tests of Association for 2 qualitative variables. 2. Measures of the strength of linear association between 2 quantitative variables.

More information

Predictive Modeling. Age. Sex. Y.o.B. Expected mortality. Model. Married Amount. etc. SOA/CAS Spring Meeting

Predictive Modeling. Age. Sex. Y.o.B. Expected mortality. Model. Married Amount. etc. SOA/CAS Spring Meeting watsonwyatt.com SOA/CAS Spring Meeting Application of Predictive Modeling in Life Insurance Jean-Felix Huet, ASA June 18, 28 Predictive Modeling Statistical model that relates an event (death) with a number

More information

4. Joint Distributions of Two Random Variables

4. Joint Distributions of Two Random Variables 4. Joint Distributions of Two Random Variables 4.1 Joint Distributions of Two Discrete Random Variables Suppose the discrete random variables X and Y have supports S X and S Y, respectively. The joint

More information

Ordinal Regression. Chapter

Ordinal Regression. Chapter Ordinal Regression Chapter 4 Many variables of interest are ordinal. That is, you can rank the values, but the real distance between categories is unknown. Diseases are graded on scales from least severe

More information

Linear Classification. Volker Tresp Summer 2015

Linear Classification. Volker Tresp Summer 2015 Linear Classification Volker Tresp Summer 2015 1 Classification Classification is the central task of pattern recognition Sensors supply information about an object: to which class do the object belong

More information

7 Hypothesis testing - one sample tests

7 Hypothesis testing - one sample tests 7 Hypothesis testing - one sample tests 7.1 Introduction Definition 7.1 A hypothesis is a statement about a population parameter. Example A hypothesis might be that the mean age of students taking MAS113X

More information

SAS Software to Fit the Generalized Linear Model

SAS Software to Fit the Generalized Linear Model SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling

More information

Regression III: Advanced Methods

Regression III: Advanced Methods Lecture 5: Linear least-squares Regression III: Advanced Methods William G. Jacoby Department of Political Science Michigan State University http://polisci.msu.edu/jacoby/icpsr/regress3 Simple Linear Regression

More information

Using pivots to construct confidence intervals. In Example 41 we used the fact that

Using pivots to construct confidence intervals. In Example 41 we used the fact that Using pivots to construct confidence intervals In Example 41 we used the fact that Q( X, µ) = X µ σ/ n N(0, 1) for all µ. We then said Q( X, µ) z α/2 with probability 1 α, and converted this into a statement

More information

Seasonality and issues of the season

Seasonality and issues of the season General Insurance Pricing Seminar James Tanser and Margarida Júdice Seasonality and issues of the season 2010 The Actuarial Profession www.actuaries.org.uk Agenda Issues of the season New claim trends

More information

Predictive Modeling for Life Insurers

Predictive Modeling for Life Insurers Predictive Modeling for Life Insurers Application of Predictive Modeling Techniques in Measuring Policyholder Behavior in Variable Annuity Contracts April 30, 2010 Guillaume Briere-Giroux, FSA, MAAA, CFA

More information

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96 1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years

More information

Poisson Regression or Regression of Counts (& Rates)

Poisson Regression or Regression of Counts (& Rates) Poisson Regression or Regression of (& Rates) Carolyn J. Anderson Department of Educational Psychology University of Illinois at Urbana-Champaign Generalized Linear Models Slide 1 of 51 Outline Outline

More information

e = random error, assumed to be normally distributed with mean 0 and standard deviation σ

e = random error, assumed to be normally distributed with mean 0 and standard deviation σ 1 Linear Regression 1.1 Simple Linear Regression Model The linear regression model is applied if we want to model a numeric response variable and its dependency on at least one numeric factor variable.

More information

Math 425 (Fall 08) Solutions Midterm 2 November 6, 2008

Math 425 (Fall 08) Solutions Midterm 2 November 6, 2008 Math 425 (Fall 8) Solutions Midterm 2 November 6, 28 (5 pts) Compute E[X] and Var[X] for i) X a random variable that takes the values, 2, 3 with probabilities.2,.5,.3; ii) X a random variable with the

More information

BOOSTED REGRESSION TREES: A MODERN WAY TO ENHANCE ACTUARIAL MODELLING

BOOSTED REGRESSION TREES: A MODERN WAY TO ENHANCE ACTUARIAL MODELLING BOOSTED REGRESSION TREES: A MODERN WAY TO ENHANCE ACTUARIAL MODELLING Xavier Conort xavier.conort@gear-analytics.com Session Number: TBR14 Insurance has always been a data business The industry has successfully

More information

Heteroskedasticity and Weighted Least Squares

Heteroskedasticity and Weighted Least Squares Econ 507. Econometric Analysis. Spring 2009 April 14, 2009 The Classical Linear Model: 1 Linearity: Y = Xβ + u. 2 Strict exogeneity: E(u) = 0 3 No Multicollinearity: ρ(x) = K. 4 No heteroskedasticity/

More information

Basics of Statistical Machine Learning

Basics of Statistical Machine Learning CS761 Spring 2013 Advanced Machine Learning Basics of Statistical Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu Modern machine learning is rooted in statistics. You will find many familiar

More information

Two Topics in Parametric Integration Applied to Stochastic Simulation in Industrial Engineering

Two Topics in Parametric Integration Applied to Stochastic Simulation in Industrial Engineering Two Topics in Parametric Integration Applied to Stochastic Simulation in Industrial Engineering Department of Industrial Engineering and Management Sciences Northwestern University September 15th, 2014

More information

VISUALIZATION OF DENSITY FUNCTIONS WITH GEOGEBRA

VISUALIZATION OF DENSITY FUNCTIONS WITH GEOGEBRA VISUALIZATION OF DENSITY FUNCTIONS WITH GEOGEBRA Csilla Csendes University of Miskolc, Hungary Department of Applied Mathematics ICAM 2010 Probability density functions A random variable X has density

More information

Offset Techniques for Predictive Modeling for Insurance

Offset Techniques for Predictive Modeling for Insurance Offset Techniques for Predictive Modeling for Insurance Matthew Flynn, Ph.D, ISO Innovative Analytics, W. Hartford CT Jun Yan, Ph.D, Deloitte & Touche LLP, Hartford CT ABSTRACT This paper presents the

More information

Hierarchical Insurance Claims Modeling

Hierarchical Insurance Claims Modeling Hierarchical Insurance Claims Modeling Edward W. (Jed) Frees, University of Wisconsin - Madison Emiliano A. Valdez, University of Connecticut 2009 Joint Statistical Meetings Session 587 - Thu 8/6/09-10:30

More information

1.1 Introduction, and Review of Probability Theory... 3. 1.1.1 Random Variable, Range, Types of Random Variables... 3. 1.1.2 CDF, PDF, Quantiles...

1.1 Introduction, and Review of Probability Theory... 3. 1.1.1 Random Variable, Range, Types of Random Variables... 3. 1.1.2 CDF, PDF, Quantiles... MATH4427 Notebook 1 Spring 2016 prepared by Professor Jenny Baglivo c Copyright 2009-2016 by Jenny A. Baglivo. All Rights Reserved. Contents 1 MATH4427 Notebook 1 3 1.1 Introduction, and Review of Probability

More information

Credit Risk Models: An Overview

Credit Risk Models: An Overview Credit Risk Models: An Overview Paul Embrechts, Rüdiger Frey, Alexander McNeil ETH Zürich c 2003 (Embrechts, Frey, McNeil) A. Multivariate Models for Portfolio Credit Risk 1. Modelling Dependent Defaults:

More information

Simple linear regression

Simple linear regression Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between

More information

GENERALIZED LINEAR MODELS IN VEHICLE INSURANCE

GENERALIZED LINEAR MODELS IN VEHICLE INSURANCE ACTA UNIVERSITATIS AGRICULTURAE ET SILVICULTURAE MENDELIANAE BRUNENSIS Volume 62 41 Number 2, 2014 http://dx.doi.org/10.11118/actaun201462020383 GENERALIZED LINEAR MODELS IN VEHICLE INSURANCE Silvie Kafková

More information

11. Analysis of Case-control Studies Logistic Regression

11. Analysis of Case-control Studies Logistic Regression Research methods II 113 11. Analysis of Case-control Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:

More information

Spatial Statistics Chapter 3 Basics of areal data and areal data modeling

Spatial Statistics Chapter 3 Basics of areal data and areal data modeling Spatial Statistics Chapter 3 Basics of areal data and areal data modeling Recall areal data also known as lattice data are data Y (s), s D where D is a discrete index set. This usually corresponds to data

More information

Interaction Detection in GLM a Case Study

Interaction Detection in GLM a Case Study Interaction Detection in GLM a Case Study Chun Li, PhD ISO Innovative Analytics T H E S C I E N C E O F R I S K SM March 2012 1 Agenda Case study Approaches Proc Genmod, GAM in R, Proc Arbor Details Summary

More information

GLAM Array Methods in Statistics

GLAM Array Methods in Statistics GLAM Array Methods in Statistics Iain Currie Heriot Watt University A Generalized Linear Array Model is a low-storage, high-speed, GLAM method for multidimensional smoothing, when data forms an array,

More information

A Deeper Look Inside Generalized Linear Models

A Deeper Look Inside Generalized Linear Models A Deeper Look Inside Generalized Linear Models University of Minnesota February 3 rd, 2012 Nathan Hubbell, FCAS Agenda Property & Casualty (P&C Insurance) in one slide The Actuarial Profession Travelers

More information

Exploratory Data Analysis

Exploratory Data Analysis Exploratory Data Analysis Johannes Schauer johannes.schauer@tugraz.at Institute of Statistics Graz University of Technology Steyrergasse 17/IV, 8010 Graz www.statistics.tugraz.at February 12, 2008 Introduction

More information

Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares

Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares Topic 4 - Analysis of Variance Approach to Regression Outline Partitioning sums of squares Degrees of freedom Expected mean squares General linear test - Fall 2013 R 2 and the coefficient of correlation

More information

Nonlinear Regression Functions. SW Ch 8 1/54/

Nonlinear Regression Functions. SW Ch 8 1/54/ Nonlinear Regression Functions SW Ch 8 1/54/ The TestScore STR relation looks linear (maybe) SW Ch 8 2/54/ But the TestScore Income relation looks nonlinear... SW Ch 8 3/54/ Nonlinear Regression General

More information

3. The Multivariate Normal Distribution

3. The Multivariate Normal Distribution 3. The Multivariate Normal Distribution 3.1 Introduction A generalization of the familiar bell shaped normal density to several dimensions plays a fundamental role in multivariate analysis While real data

More information

Lecture 14: GLM Estimation and Logistic Regression

Lecture 14: GLM Estimation and Logistic Regression Lecture 14: GLM Estimation and Logistic Regression Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South

More information

DATA INTERPRETATION AND STATISTICS

DATA INTERPRETATION AND STATISTICS PholC60 September 001 DATA INTERPRETATION AND STATISTICS Books A easy and systematic introductory text is Essentials of Medical Statistics by Betty Kirkwood, published by Blackwell at about 14. DESCRIPTIVE

More information

MULTIPLE REGRESSION EXAMPLE

MULTIPLE REGRESSION EXAMPLE MULTIPLE REGRESSION EXAMPLE For a sample of n = 166 college students, the following variables were measured: Y = height X 1 = mother s height ( momheight ) X 2 = father s height ( dadheight ) X 3 = 1 if

More information

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm Mgt 540 Research Methods Data Analysis 1 Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm http://web.utk.edu/~dap/random/order/start.htm

More information

Chi-Square Test. Contingency Tables. Contingency Tables. Chi-Square Test for Independence. Chi-Square Tests for Goodnessof-Fit

Chi-Square Test. Contingency Tables. Contingency Tables. Chi-Square Test for Independence. Chi-Square Tests for Goodnessof-Fit Chi-Square Tests 15 Chapter Chi-Square Test for Independence Chi-Square Tests for Goodness Uniform Goodness- Poisson Goodness- Goodness Test ECDF Tests (Optional) McGraw-Hill/Irwin Copyright 2009 by The

More information

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics For 2015 Examinations Aim The aim of the Probability and Mathematical Statistics subject is to provide a grounding in

More information

Chapter 2, part 2. Petter Mostad

Chapter 2, part 2. Petter Mostad Chapter 2, part 2 Petter Mostad mostad@chalmers.se Parametrical families of probability distributions How can we solve the problem of learning about the population distribution from the sample? Usual procedure:

More information