Insurance Fraud Detection: MARS versus Neural Networks? Louise A Francis FCAS, MAAA Louise_francis@msn.com 1
Objectives Introduce a relatively new data mining method which can be used as an alternative to neural networks Compare the method to neural networks Apply the methods to fraud data 2
MARS Acronym for Multivariate Adaptive Regression Splines In many ways it is similar to regression, but: It can deal with data complexities that ordinary linear regression had difficulties 3
Data Complexities Nonlinear functions Interactions Missing Data 4
The Fraud Study Data 1993 Automobile Insurers Bureau closed Personal Injury Protection claims Dependent Variables Suspicion Score Expert assessment of liklihood of fraud or abuse Predictor Variables Red flag indicators Claim file variables 5
Example: Nonlinear Function Neural Network Fit of SUSPICION vs Provider Bill 4.00 3.00 netfraud1 2.00 1.00 0.00 1000 3000 5000 7000 Provider Bill 6
MARS Fit to Nonlinear Function MARS Fit of SUSPICION vs Provider Bill 4 Fitted Suspicion Score 3 2 1 0-1 1000 3000 5000 7000 Provider Bill 7
How MARS Fits Nonlinear Function MARS fits a piecewise regression BF1 = max(0, 2185 X ) Y = 4.29-0.002 * BF1 BF1 is basis function MARS uses statistical optimization to find best basis function Basis function similar to dummy variable in regression 8
Interactions Effect of a predictor variable on dependent variable depends on the values of another variable(s) Neural Network Predicted for Provider Bill and Injury Type inj.type: 05 6.00 4.00 2.00 Neural Net Predicted 0.00 6.00 4.00 2.00 0.00 inj.type: 03 in j.type: 04 inj.type: 01 in j.type: 02 3000 8000 13000 18000 Provider Bill 6.00 4.00 2.00 0.00 9
Interactions: MARS Fit MARS Predicted for Provider Bill and Injury Type 1000 3000 5000 7000 6 inj.type: 05 inj.type: 06 3 Fitted Suspicion Score 0 6 inj.type: 03 inj.type: 04 inj.type: 01 inj.type: 02 6 3 0 3 0 1000 3000 5000 7000 Provider 1 Bill 10
Interactions: The Basis Functions Injury type 4 (neck sprain), and type 5 (back sprain) increase faster and have higher scores than the other injury types BF1 = max(0, 2185 - X ) BF2 = ( INJTYPE = 4 OR INJTYPE = 5) BF3 = max(0, X - 159) * BF2 Y = 2.815-0.001 * BF1 + 0.685 * BF2 +.360E-03 * BF3 where X is the provider bill INJTYPE is the injury type 11
Missing Data Occurs frequently in insurance data There are some sophisticated methods for addressing this (i.e., EM algorithm) MARS uses basis functions to find surrogates for variables with missing values 12
Missing Data Example: Health Insurance (Claimant has Health Insurance) Value Frequency Percent Cumulative Percent No 457 32.6 32.6 Missing 208 14.9 47.5 13
Missing Data Example BF1 = max(0, MP_BILL - 2885) BF2 = max(0, 2885 - MP_BILL ) BF3 = (HEALTHIN MISSING) BF4 = (HEALTHIN = MISSING) BF5 = (HEALTHIN = N) BF7 = max(0, MP_BILL - 2262) * BF5 BF8 = max(0, 2262 - MP_BILL ) * BF5 BF9 = max(0, MP_BILL - 98) * BF4 BF10 = max(0, 98 - MP_BILL ) * BF4 BF11 = max(0, MP_BILL - 710) * BF3 BF13 = max(0, MP_BILL - 35483) BF15 = BF3 * BF2 Y = -0.754-0.002 * BF1 + 0.967 * BF3 + 1.389 * BF5 -.808E-04 * BF7 -.624E-03 * BF8 + 0.001 * BF9 + 0.016 * BF10 + 0.001 * BF11 +.114E-03 * BF13 +.376E-03 * BF15 14
More Complex Example Dependent variable: Expert s assessment of liklihood claim is legitimate A classification application Predictor variables: Combination of claim file variables (age of claimant, legal representation) red flag variables (injury is strain/sprain only, claimant has history of previous claim) 15
More Complex Example BF1 = (LEGALREP = 1) BF2 = (LEGALREP = 2) BF3 = ( TRTLAG = missing) BF4 = ( TRTLAG NE missing) BF5 = ( INJ01 = 1) * BF2 BF7 = ( ACC04 = 1) * BF4 BF9 = ( ACC14 = 1) BF11 = ( PARTDIS = 1) * BF4 BF15 = max(0, AGE - 36) * BF4 BF16 = max(0, 36 - AGE ) * BF4 BF18 = max(0, 55 - AMBUL ) * BF15 BF20 = max(0, 10 - RPTLAG ) * BF4 BF21 = ( CLT02 = 1) BF23 = POLLAG * BF21 BF24 = ( ACC15 = 1) * BF16 Y = 0.580-0.174 * BF1-0.414 * BF3 + 0.196 * BF5-0.234 * BF7 + 0.455 * BF9 + 0.131 * BF11-0.011 * BF15-0.006 * BF16 +.135E-03 * BF18-0.013 * BF20 +.286E-03 * BF23 + 0.010 * BF24 16
Evaluating Predictor Variables: Generalized Cross-validation GCV = 1 N y ˆ ( i f x [ N 1 k / N i= 1 i ) ] 2 where N is the number of observations y is the dependent variable x is the independent variable(s) k is the effective number of parameters or degrees of freedom in the model. 17
Variable Importance Ranking Rank Variable MARS Ranking of Variables Description 1 LEGALREP Legal Representation 2 TRTMIS Treatment lag missing 3 ACC04 Single vehicle accident 4 INJ01 Injury consisted of strain or sprain only 5 AGE Claimant age 6 PARTDIS Claimant partially disabled Property damage was inconsistent with 7 ACC14 accident 8 CLT02 Had a history of previous claims 9 POLLAG Policy lag 10 RPTLAG Report lag 11 AMBUL Ambulance charges 12 ACC15 Francis Very Analytics minor impact and Actuarial collision 18
Methods of Assessing Fit Cross Validation Confusion Matrix Sensitivity Specificity ROC Curve Area Under the ROC Curve 19
Cross-validation Four Fold Cross-validation Percent Technique R^2 Correct MARS 0.35 0.77 Neural Network 0.39 0.79 20
Confusion Matrix MARS Predicted * Actual Predicted Actual No Yes Total No 738 160 898 Yes 157 344 501 Total 895 505 21
Sensitivity/Specificity Sensitivity: Percent of targets correctly predicted Specificity: Percent of non-targets correctly predicted Model Sensitivity Specificity MARS 68.3 82.5 Neural Network 74.8 83.4 22
ROC Curve ROC Curve 1.0 Sensitivity 0.8 0.6 0.4 0.2 Neural Net MARS BASE 0.0 0.1 0.3 0.5 0.7 0.9 1.1 1 - Specificity 23
Area Under the ROC Curve Statistics for Area Under the ROC Curve Test Result Variables Area Std Error Asymptotic Sig Lower 95% Bound Upper 95% Bound MARS Probability Neural Probability 0.85 0.01 0.000 0.834 0.873 0.88 0.01 0.000 0.857 0.893 24
Which One is Better? Depends on application MARS handles missing values better MARS clusters categories on nominal variables with many categories MARS can be explained more easily On applications where analyst believes neural networks will outperform MARS, use them Also use hybrid models to improve performance 25
Using the Model Results Both claim file variables and red flag variables appear to be significant in predicting fraud Other research supports value of using statistical and data mining models to predict fraud Derrig (Journal of Risk and Insurance, 2002) advocates using analytic models to sort claims Pay claims with low score Devote resources to claims with high scores 26