GLM, insurance pricing & big data: paying attention to convergence issues.



Similar documents
STATISTICA Formula Guide: Logistic Regression. Table of Contents

Lecture 3: Linear methods for classification

Logistic Regression (1/24/13)

(Quasi-)Newton methods

i=1 In practice, the natural logarithm of the likelihood function, called the log-likelihood function and denoted by

Linear Classification. Volker Tresp Summer 2015

Roots of Equations (Chapters 5 and 6)

Linear Threshold Units

STA 4273H: Statistical Machine Learning

Statistical Machine Learning

Poisson Models for Count Data

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Pa8ern Recogni6on. and Machine Learning. Chapter 4: Linear Models for Classifica6on

Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 )

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression

SAS Software to Fit the Generalized Linear Model

Time Domain and Frequency Domain Techniques For Multi Shaker Time Waveform Replication

Local classification and local likelihoods

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus

Statistics in Retail Finance. Chapter 2: Statistical models of default

Lecture 14: GLM Estimation and Logistic Regression

LOGISTIC REGRESSION. Nitin R Patel. where the dependent variable, y, is binary (for convenience we often code these values as

Generalized Linear Models

Introduction to Logistic Regression

How To Make A Credit Risk Model For A Bank Account

G.A. Pavliotis. Department of Mathematics. Imperial College London

NON-LIFE INSURANCE PRICING USING THE GENERALIZED ADDITIVE MODEL, SMOOTHING SPLINES AND L-CURVES

Pattern Analysis. Logistic Regression. 12. Mai Joachim Hornegger. Chair of Pattern Recognition Erlangen University

Probabilistic Linear Classification: Logistic Regression. Piyush Rai IIT Kanpur

Principle Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression

The Gravity Model: Derivation and Calibration

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.

Mexican Volatility Index

ECLT5810 E-Commerce Data Mining Technique SAS Enterprise Miner -- Regression Model I. Regression Node

Statistics in Retail Finance. Chapter 6: Behavioural models

Lecture 8 February 4

Monte Carlo testing with Big Data

An extension of the factoring likelihood approach for non-monotone missing data

Multivariate Logistic Regression

Lecture Notes to Accompany. Scientific Computing An Introductory Survey. by Michael T. Heath. Chapter 10

Nonlinear Iterative Partial Least Squares Method

Development Period Observed Payments

A Deeper Look Inside Generalized Linear Models

Please follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software

Java Modules for Time Series Analysis

MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group

U.C. Berkeley CS276: Cryptography Handout 0.1 Luca Trevisan January, Notes on Algebra

GENERALIZED LINEAR MODELS IN VEHICLE INSURANCE

An Iterative Image Registration Technique with an Application to Stereo Vision

USERV Auto Insurance Rule Model in Corticon

Nonlinear Algebraic Equations. Lectures INF2320 p. 1/88

Factor analysis. Angela Montanari

11. Analysis of Case-control Studies Logistic Regression

7 Gaussian Elimination and LU Factorization

Missing Data: Part 1 What to Do? Carol B. Thompson Johns Hopkins Biostatistics Center SON Brown Bag 3/20/13

Factorial experimental designs and generalized linear models

OBJECTIVE ASSESSMENT OF FORECASTING ASSIGNMENTS USING SOME FUNCTION OF PREDICTION ERRORS

ANNUITY LAPSE RATE MODELING: TOBIT OR NOT TOBIT? 1. INTRODUCTION

AN INTRODUCTION TO NUMERICAL METHODS AND ANALYSIS

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

CSCI567 Machine Learning (Fall 2014)

Introduction to General and Generalized Linear Models

Assessing Model Fit and Finding a Fit Model

Chapter 9 Experience rating

Reject Inference in Credit Scoring. Jie-Men Mok

GLMs: Gompertz s Law. GLMs in R. Gompertz s famous graduation formula is. or log µ x is linear in age, x,

1 Short Introduction to Time Series

Empirical Model-Building and Response Surfaces

7 Time series analysis

Roots of equation fx are the values of x which satisfy the above expression. Also referred to as the zeros of an equation

Gamma Distribution Fitting

MVA ENS Cachan. Lecture 2: Logistic regression & intro to MIL Iasonas Kokkinos Iasonas.kokkinos@ecp.fr

ANALYSIS, THEORY AND DESIGN OF LOGISTIC REGRESSION CLASSIFIERS USED FOR VERY LARGE SCALE DATA MINING

MATH4427 Notebook 2 Spring MATH4427 Notebook Definitions and Examples Performance Measures for Estimators...

Multivariate Normal Distribution

Component Ordering in Independent Component Analysis Based on Data Power

A Property & Casualty Insurance Predictive Modeling Process in SAS

Data Mining: An Overview of Methods and Technologies for Increasing Profits in Direct Marketing. C. Olivia Rud, VP, Fleet Bank

MATHEMATICAL METHODS OF STATISTICS

GLM I An Introduction to Generalized Linear Models

A simplified implementation of the least squares solution for pairwise comparisons matrices

Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches

Modeling Lifetime Value in the Insurance Industry

Offset Techniques for Predictive Modeling for Insurance

A Coefficient of Variation for Skewed and Heavy-Tailed Insurance Losses. Michael R. Powers[ 1 ] Temple University and Tsinghua University

The equivalence of logistic regression and maximum entropy models

Overview Classes Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7)

Introduction to Machine Learning and Data Mining. Prof. Dr. Igor Trajkovski

INDEX 1. INTRODUCTION 3. RATE / PREMIUM 4. FREQUENCY / SEVERITY & NRP 5. NET PREMIUM METHOD 7. GLM

Bias in the Estimation of Mean Reversion in Continuous-Time Lévy Processes

Specifications for this HLM2 run

Package EstCRM. July 13, 2015

Goodness of fit assessment of item response theory models

Partial Least Squares (PLS) Regression.

BIG DATA Driven Innovations in the Life Insurance Industry

7 Generalized Estimating Equations

Transcription:

GLM, insurance pricing & big data: paying attention to convergence issues. Michaël NOACK - michael.noack@addactis.com Senior consultant & Manager of ADDACTIS Pricing Copyright 2014 ADDACTIS Worldwide. All Rights Reserved

INTRODUCTION The GLM model tries to find and express the relationship between a random variable Y (response variable) and a set of predictor variables: X 1,,Xp. In the pricing process GLM is market standard and it is used to explain response variables Y like the number of claims (frequency), the average cost of a claim, the cost of the risk, the propensity of large claims. The distribution of the response variable Y must belong to the exponential family. Let μ be the mean of Y, the model can be written as: * Equivalence satisfied under certain conditions for the function g. To find a solution means to find a maximum of the log-likelihood (MLE): Generally the solution to this equation must be calculated by iterative methods. Iterative solutions to non-linear equations follow this algorithm: 1. A start value is selected (initial guess for β) 2. Using a polynomial approximation of the likelihood, a second guess is obtained 3. The difference, C, between guess i and i + 1 is calculated C= β (i+1) β (i) 4. Once the difference C < k, where k = convergence criterion (say 0.0001) then the estimate β (i+1) =β Note: when β is a vector, the difference β (i+1) β (i) yields a vector of c i s where c i is the convergence criterion for the i th element of β. Convergence could be reached when all ci < k or when the ci < k. But iteration doesn t converge in all cases. What to do when there is no convergence? To have a robust algorithm, an algorithm that converges in almost every situation - seems to be something very positive, even if it requires more explanations. To evaluate what are the really important properties of the GLM-algorithm, we will show some more details about GLM (chapter 1 and 2) and explain convergence issues and how our solution ADDACTIS Pricing handles these situations (chapter 3). www.addactis.com

1MAIN METHODS In general, there are two popular iterative methods for estimating the parameters of a non-linear equations. g Newton-Raphson Method g Fisher s Scoring Method (or Iteratively Reweighted Least Squares Algorithm) Both take on the same general form and differ only in the variance structure. The Newton-Raphson method uses the Wald test (non-null standard error) and the Fisher s scoring method uses the Score test (null standard error). These algorithms find (iteratively) a point x where f(x)=0 for any f functions that possess the appropriate conditions. The iterative solution can be written as: One of the most common methods is the Newton Raphson method and this is based on successive approximations to the solution, using Taylor s theorem to approximate the equation. Copyright 2014 ADDACTIS Worldwide. All Rights Reserved

Using matrices, the Newton-Raphson algorithm can then be written as follows: Where: * β is the vector (of size p) containing the parameters to estimate, * S is the gradient of the (log-)likelihood computed at the point β (t), * H the Hessian matrix of the (log-)likelihood computed at the point β (t). H has the form H=-(X T VX) where V is a diagonal matrix (n, n). Fisher s Scoring Method is identically to Newton-Raphson, except that in place of the Hessian matrix (observed information matrix) the expectation of the Hessian matrix is used (expected information matrix). Generally Newton-Raphson converges faster, but is more sensitive to the starting point. In same situations Hessian matrix cannot be negative defined when we are not close enough to the final estimator. The Fisher scoring method is less dependent on individual Y i values and provides more stable convergence. But the calculation of the expected information matrix is not always computationally feasible. www.addactis.com

2ADDITIONAL PARAMETERS It has been proved that in case of canonical link function there is no difference between both methods. So in the most decisive case in the insurance sector - the frequency model with a Poisson Error distribution and a log link function - there is no difference between both methods. Once selected the method to apply to solve the GLM calculation, there are more parameters that have to be defined for the iteration process. The solution and the execution time also depend on: g Selection of the starting point An adequate selection of starting points can avoid convergence problems and can reduce the number of iteration steps. g Convergence criterion As we have seen before a convergence criterion can be based on the individual difference for the estimates between two iteration steps or on the difference in the error sum. In both cases the user can get quite different results. If the problem converges by the individual criterion it will also converge by the criterion on the aggregated difference. But it is possible to obtain convergence by the aggregated criterion but non convergence by the individual one. In which cases can this occur? If the overall error is stable and the individual is not, there must be at least two estimates with (compensated) fluctuations. From the ADDACTIS Pricing point of view this is an unstable solution the user should not accept without further analysis. Normally the affected estimates can be identified by a large confidence interval and a low exposure. Copyright 2014 ADDACTIS Worldwide. All Rights Reserved

g Handling of special situations In most cases we get a convergence and the result doesn t depend on the selected method and the parameters. But in some special cases we have some more conflictive situations like: o Matrix no invertible o Unobserved Modalities main factor o Unobserved Modalities marginal interaction o Non convergence o Max is really maximum In these cases it is especially important that the user gets all information about the iteration process to make the correct decision. www.addactis.com

3ADDACTIS Pricing In this section we will show all details about the method integrated in ADDACTIS Pricing, explain and demonstrate with an example, why it matters to give the full information on calculations to the user. 3.1 The method Our software works with the Newton-Raphson method. It is a fast converging method to solve the problem. Possible non-convergence in most cases will be avoided by the selection of the starting points. Documented examples about non-convergence are from biostatistics studies and due to the big impact of single observations for small sample size. However for insurance data this shouldn t be a problem. 3.2 Starting points ADDACTIS Pricing takes as a starting point the ordinary least squares estimator: In the improbable case of non-convergence the user has the option to start with the Weighted Least Squares (which is theoretically closer to a maximum). If both methods don t converge, that means nothing else that neither the ordinary least squares estimator, nor the weighted least squares estimator are close to the maximum. In such a case we would not recommend to consider a different method that could converge (are we really interested in such a solution?), but to change the design matrix (include/exclude parameters or change mapping of parameters). Copyright 2014 ADDACTIS Worldwide. All Rights Reserved

3.3 Convergence criterion The convergence criterion is based on the difference between β (t+1) and β (t) : As soon as the maximum of these values is lower than the target accuracy defined by the user (10-4 by default), ADDACTIS Pricing stops, and the algorithm has converged: if it converges, it is fast; generally in 5 steps a solution has been found. Other software bases its convergence criterion on the sum of errors (Deviance). This criterion normally converge faster, but in some situations of nearly co-linearity between to parameters a and b, we can find situations where the pairs (a,b) and (a/2, b*2) etc. give very similar results and the sum of errors converge, but at individual level there is an unstable situation with non-convergence. In that case we also suggest that it is better to change the design matrix than to accept convergence based on the sum of errors. www.addactis.com

3.4 Handling of special situations In special situation ADDACTIS Pricing always gives as much information as possible to help the user to find a solution. g Matrix no invertible (Co-linearities detected) The matrix is not of full range. Each matrix has a unique (Row-)Reduced Echelon form. This matrix form can be obtained by the Gaussian-Elimination method (due to the uniqueness of the reduced echelon form, every method produced the same results). ADDACTIS Pricing executes the matrix reduction by this method and calculates the dependencies between the parameters. With this information the user can see exactly which factors are the sources of the problem. g Unobserved Modalities main factor If there are unobserved modalities ADDACTIS Pricing doesn t start the iteration process and shows the user which modalities have no observation. It is not compulsory to have observations for each modality; we could execute the GLM and just ignore these modalities (like other applications do). In these situations the unobserved modalities have no rate (or rate equal 1, like the reference value). That means the rate depends on the selection of the reference value and the user will know about it. This is why ADDACTIS Pricing is making the user aware of the situation and helps him make a decision joining these modalities with the adequate ones. Copyright 2014 ADDACTIS Worldwide. All Rights Reserved

g Unobserved Modalities marginal interaction For a main factor it s very uncommon to have unobserved modalities, but in case of marginal interactions it is quite frequent. We will illustrate with an example the difference. Let s take some fictive policy and claim data. The variables of interest are age and vehicle. The one-way analysis shows us an expected behavior: www.addactis.com

In our case there are no observations for the combination age x vehicle for the group for [20:22[ x M. (Maybe due to subscription rules). This can be observed in the two-way analysis: When we now include both variables with a marginal interaction ADDACTIS Pricing doesn t converge. Depending on the selected reference value for vehicle ADDACTIS Pricing delivers the message: unobserved modalities or collinearities detected. The user has consequently to make a decision on how to re-group the age variable. Once done the new model will converge without problem. But what impact could have this distinct handling in practice? Copyright 2014 ADDACTIS Worldwide. All Rights Reserved

A standard software package like SAS gives no advice about unobserved modalities. You get the parameter list for the frequency calculation like: Notice that the line age*vehicle B.[20,22[ M is omitted and the reference value is changed to L. The user can calculate the best frequency-estimation for all elements in this critical segment: www.addactis.com

In ADDACTIS Pricing the user was forced to regroup the age variable, for instance like A.[18,21[ and B.[21,24[. The best frequency-estimation in this situation gives: For the age = 20 and vehicle = M the SAS premium would be 1/3 of ADDACTIS Pricing ; for the age = 21 aprox. ½. This example shows the impact of forced convergence. We prefer not to force convergence in ADDACTIS Pricing and let the user in the driver s seat to solve the convergence issue if any. Copyright 2014 ADDACTIS Worldwide. All Rights Reserved

SUMMARY Convergence of the model is a decisive issue when using GLM models that are based on iterative methods. In most cases, with the selection of adequate starting points and enough data, GLM models do converge. But it is not always the case. ADDACTIS Pricing assumes that in the case where Newton-Raphson doesn t converge, the GLM model might be incorrectly specified. In which case, some modifications in the design matrix (parameter selection and/or parameter grouping) will help achieve convergence. But in order to make a valid and documented decision, the user has as many and complete informations as possible on calculations. Consequently convergence should not be artificially forced by the GLM methods, as exist in some software and non-convergence is an information of utmost importance for the user. It is the primary condition to evaluate the correctness of the solution. Moreover this transparency of calculations and information is in tune with the requirements of control over calculations at the heart of Solvency 2. 13-15 Boulevard de la Madeleine F-75001 PARIS +33 (0)4 81 92 13 00 contact@addactis.com Worldwide actuarial software. European expertise. Local solutions. www.addactis.com