SOA 2013 Life & Annuity Symposium May 6-7, 2013. Session 30 PD, Predictive Modeling Applications for Life and Annuity Pricing and Underwriting



Similar documents
Predictive modelling around the world

BIG DATA and Opportunities in the Life Insurance Industry

Predictive Modeling Techniques in Insurance

BIG DATA Driven Innovations in the Life Insurance Industry

Session 62 TS, Predictive Modeling for Actuaries: Predictive Modeling Techniques in Insurance Moderator: Yonasan Schwartz, FSA, MAAA

Session 26 Predictive Modeling How Can it Help? Jonathan Polon, FSA

Advanced Statistical Analysis of Mortality. Rhodes, Thomas E. and Freitas, Stephen A. MIB, Inc. 160 University Avenue. Westwood, MA 02090

Predictive Modeling and Big Data

A Deeper Look Inside Generalized Linear Models

Data Mining Methods: Applications for Institutional Research

Report on the Lapse and Mortality Experience of Post-Level Premium Period Term Plans (2014)

Applications of Credit in Life Insurance

Risk pricing for Australian Motor Insurance

Lessons From Down Under The D2C Market in Australia and what the UK can learn.

Predictive Modeling in Workers Compensation 2008 CAS Ratemaking Seminar

Model Validation Techniques

The Data Mining Process

Report on the Lapse and Mortality Experience of Post-Level Premium Period Term Plans

Social Media Mining. Data Mining Essentials

WebFOCUS RStat. RStat. Predict the Future and Make Effective Decisions Today. WebFOCUS RStat

Session 35 PD, Predictive Modeling for Actuaries: Integrating Predictive Analytics in Assumption Setting Moderator: David Wang, FSA, FIA, MAAA

Combining Linear and Non-Linear Modeling Techniques: EMB America. Getting the Best of Two Worlds

Health Spring Meeting May 2008 Session # 42: Dental Insurance What's New, What's Important

Data Mining: An Overview of Methods and Technologies for Increasing Profits in Direct Marketing. C. Olivia Rud, VP, Fleet Bank

Moderator: Gregory A. Brandner, FSA, MAAA. Presenters: Gregory A. Brandner, FSA, MAAA Sean J. Conrad, FSA, MAAA Lisa Hollenbeck Renetzky, FSA, MAAA

Practical applications of Predictive Modelling Overview of the process, the techniques and the applications

Modeling Lifetime Value in the Insurance Industry

Session 54 PD, Credibility and Pooling for Group Life and Disability Insurance Moderator: Paul Luis Correia, FSA, CERA, MAAA

A Property & Casualty Insurance Predictive Modeling Process in SAS

Innovations and Value Creation in Predictive Modeling. David Cummings Vice President - Research

EXPLORING & MODELING USING INTERACTIVE DECISION TREES IN SAS ENTERPRISE MINER. Copyr i g ht 2013, SAS Ins titut e Inc. All rights res er ve d.

Session 60 PD, Predictive Modeling Real Applications in Life Insurance and Annuities. Moderator: Ricardo Trachtman, FSA, MAAA

Practical Applications of Stochastic Modeling for Disability Insurance

Data Analytical Framework for Customer Centric Solutions

STATISTICA. Financial Institutions. Case Study: Credit Scoring. and

Data Mining. Nonlinear Classification

CONTENTS PREFACE 1 INTRODUCTION 1 2 DATA VISUALIZATION 19

Location matters. 3 techniques to incorporate geo-spatial effects in one's predictive model

An Overview of Data Mining: Predictive Modeling for IR in the 21 st Century

Credibility and Pooling Applications to Group Life and Group Disability Insurance

Data Mining - Evaluation of Classifiers

STATISTICA Formula Guide: Logistic Regression. Table of Contents

BIDM Project. Predicting the contract type for IT/ITES outsourcing contracts

BOOSTED REGRESSION TREES: A MODERN WAY TO ENHANCE ACTUARIAL MODELLING

Session 190 PD, Model Risk Management and Controls Moderator: Chad R. Runchey, FSA, MAAA

Introduction to Predictive Modeling Using GLMs

Stochastic Analysis of Long-Term Multiple-Decrement Contracts

STATISTICA. Clustering Techniques. Case Study: Defining Clusters of Shopping Center Patrons. and

Cross Validation. Dr. Thomas Jensen Expedia.com

Session 8: The Latest on Practical Uses of Big Data and Predictive Analytics. Moderator: Phil Murphy

Session 42 PD, Predictive Analytics for Actuaries: Building an Effective Predictive Analytics Team. Moderator: Courtney Nashan

Predictive Modeling for Life Insurers

Data Mining Applications in Higher Education

Data Mining Algorithms Part 1. Dejan Sarka

Beating the NCAA Football Point Spread

Anti-Trust Notice. Agenda. Three-Level Pricing Architect. Personal Lines Pricing. Commercial Lines Pricing. Conclusions Q&A

Learning Example. Machine learning and our focus. Another Example. An example: data (loan application) The data and the goal

Deriving Value from ORSA. Board Perspective

Supervised Learning (Big Data Analytics)

Machine Learning using MapReduce

A Study to Predict No Show Probability for a Scheduled Appointment at Free Health Clinic

Maximize Revenues on your Customer Loyalty Program using Predictive Analytics

Simple Predictive Analytics Curtis Seare

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning

Azure Machine Learning, SQL Data Mining and R

Gerry Hobbs, Department of Statistics, West Virginia University

Banking Analytics Training Program

Lecture 10: Regression Trees

Data Mining Part 5. Prediction

Insurance Analytics - analýza dat a prediktivní modelování v pojišťovnictví. Pavel Kříž. Seminář z aktuárských věd MFF 4.

A Property and Casualty Insurance Predictive Modeling Process in SAS

1 Choosing the right data mining techniques for the job (8 minutes,

Session 114 PD, RGA Session Series Part 2: Reinventing Insurance. Moderator: Michael H. Choate, FSA, MAAA. Presenters: Kevin J Pledge FSA,FIA

Decision Trees from large Databases: SLIQ

Data Mining Part 5. Prediction

Chapter 12 Discovering New Knowledge Data Mining

Jean-Yves Rioux. Big Data and Analytics dramatic impacts in the Life Insurance Industry

Session 106 PD, Profitability Trends for Disability Insurance. Moderator: John R. Murphy, FSA, MAAA

Introduction to Data Mining

ReFocus March 1-4, Las Vegas, NV

Efficiency in Software Development Projects

Role of Social Networking in Marketing using Data Mining

Data Mining + Business Intelligence. Integration, Design and Implementation

Data Mining Classification: Decision Trees

Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets

Introduction to Longitudinal Data Analysis

Pentaho Data Mining Last Modified on January 22, 2007

Knowledge Discovery and Data Mining

An Overview of Knowledge Discovery Database and Data mining Techniques

Employer Health Insurance Premium Prediction Elliott Lui

Robust Outlier Detection Technique in Data Mining: A Univariate Approach

Regression Modeling Strategies

Driving Insurance World through Science Murli D. Buluswar Chief Science Officer

!"!!"#$$%&'()*+$(,%!"#$%$&'()*""%(+,'-*&./#-$&'(-&(0*".$#-$1"(2&."3$'45"

How To Make A Credit Risk Model For A Bank Account

Transcription:

SOA 2013 Life & Annuity Symposium May 6-7, 2013 Session 30 PD, Predictive Modeling Applications for Life and Annuity Pricing and Underwriting Moderator: Barry D. Senensky, FSA, FCIA, MAAA Presenters: Jonathan P. Polon, FSA Qichun (Richard) Xu, FSA, Ph.D. Primary Competency Technical Skills and Analytical Problem Solving

Predictive Modeling for Life and Annuity Pricing and Underwriting Life and Annuity Symposium Session 30 May 6, 2013 Jonathan Polon FSA Overview One Key Takeaway Traditional Actuarial Techniques vs. Predictive Modeling Credibility vs. Validation Benefits of Predictive Modeling Data Considerations Performing the Analysis 2 1

One Key Takeaway 3 One Key Takeaway Mortality experience takes several years to develop Your ability to analyze internal mortality experience in the future will be affected by the way you capture data today Invest the time and resources today to develop and implement a data collection strategy What data to collect Where to collect the data from How to structure the data storage to facilitate analysis 4 2

Data Quality vs. Data Quantity The Law of Diminishing Marginal Returns applies Select the data elements that you believe are most important Focus on accuracy, completeness and structure for the key data elements rather than simply maximizing the number of data elements; this will: Minimize the cost of data storage and capture Greatly decrease time to create models Increase the interpretability of the models Reduce the risk of overfit Probably improve model accuracy 5 Traditional Actuarial Techniques vs. Predictive Modeling 6 3

Traditional Actuarial Techniques Its purpose is not to classify individual risks Rather, it is used to determine average cost for each class of risk Objective is to be accurate at the class level in aggregate, not at the level of the individual case Typically applied in low number of dimensions (e.g., age, gender, smoking, underwriting class, duration) Techniques include tables and classical statistics 7 Predictive Modeling Its purpose is to make predictions at the individual case level Output for each case could be a risk class, number of debits or qx vector Each case is unique and has its own combination of characteristics Typically applied in high number of dimensions Techniques include machine learning and numerical analysis Iteratively improving fit of model to the historic data 8 4

Credibility vs. Validation 9 Credibility Actuaries apply credibility theory to ensure their mortality analysis is based upon a sufficient number of observations From CIA Educational Note on Expected Mortality, July 2002 Goal of credibility theory is to provide a framework for combining data from different sources Typically company data, which may not be fully credible, and industry data, which is assumed to be fully credible The Normalized Method is the preferred credibility method and 3,007 is the suggested number of deaths needed for full credibility Credibility is as much an art is it is a science Barry Senensky FSA FCIA MAAA, April 2013 10 5

Validation Credibility is not independent of dimensionality If # observations is small, can still model primary predictors As # observations increases, models can be expanded to include predictors of secondary and tertiary importance Danger of using predictive modeling is overfit modeling noise rather than signal Validation i is applied to protect against overfit Validate using out-of-sample data to ensure models are robust Withhold 10-20% of data until models are deemed complete 11 Benefits of Predictive Modeling 12 6

Benefits of Predictive Modeling The potential benefits of predictive modeling vs. traditional techniques: 1. Improved accuracy 2. Reduced time to decision 3. Lower expense 13 Improved Accuracy Improved accuracy is most important for larger risks E.g., large amounts or impaired lives These cases are typically fully underwritten and a lot of information is generated E.g., application, lab results, APS, financial underwriting Predictive models can provide more accurate estimates of the risk of each applicant This can be used as a decision support tool for the underwriting decision 14 7

Reduced Time to Decision and Lower Expense Quicker turnaround and lower cost of underwriting decisions can be especially important in the middle markets Can increase close rates and reduce issue expenses Why can t an insurance policy be sold in real-time? Whether at the agent/broker office or direct online No medical exam or fluid samples required Base underwriting decision on other sources of information such as prescription drug history 15 Data Considerations 16 8

Big Data Big data is the catch phrase of 2013 Not only are we creating data in new ways, such as: social media, cell phone GPS, webclicks The information created by our interactions with the world is being stored electronically to a greater and greater extent Sources of data for life pricing and underwriting: An insurer s internal data Data for sale from external data aggregators Webscraping 17 Internal Data Sources Application for insurance Lab results Attending physician statements Underwriter s notes Data generated from other product lines 18 9

External Data Sources MIB MVR Prescription drug history Credit score Public records Consumer data There may be regulatory, legal and reputational risk involved with the use of some external data sources. Be sure to research before using. 19 Webscraping Crawling the internet to uncover information about an entity More difficult to perform for an individual than a business Names are unlikely to be unique Personal Facebook, Twitter and other social media accounts are often set to private May be greater reputational risk to an insurer that is webscraping for information about an individual as opposed to searching for information about a business 20 10

Vital Status Analysis should probably include all applicants for insurance not just written cases May need to determine vital status of non-written cases In the US, the SS DMF is a good start, but not complete Companies that aggregate public records may be of help Can validate these data sources against insured lives and develop assumptions to account for the missing deaths 21 Performing the Analysis 22 11

Steps Define objective Identify data sources Acquire and clean data Analyze data and train models Validate models 23 Define objective Sounds trivial but is of critical importance Will drive all other steps of the modeling process What should the model output be? For example: Replicate underwriting decisions (doesn t require vital status) Risk classification (e.g., preferred, standard, substandard) Number of debits to apply to the base table Applicant-specific mortality rates (qx) for the first several years 24 12

Identify Data Sources Must have the target (independent) variable available in the historic data What predictor data is available? Internal sources External sources Webscraping 25 Acquire and Clean Data May be 80% of the total effort required for the project Data collected from different sources must be linked Raw data is seldom in a form appropriate for modeling Text mine documents, such as APS or underwriter notes Perform some basic calculations like age or BMI Some data elements will need to be transformed to optimize modeling, depending on modeling techniques to be applied 26 13

Analyze Data and Train Models Can begin with analysis of current basis Identify types of applications where actual outcomes are similar to or different from expected outcomes Train the new models Iterative process: will require testing of various modeling techniques and data transformations Evaluate new models Typically on a hold-out sample of testing data Consider: goodness-of-fit metrics, univariate analysis, model complexity vs. interpretability, consistency with expectations 27 Model Validation Final test on out-of-sample ( validation ) data Can really only be performed once then data is no longer unseen Goodness-of-fit should, at a minimum, be improved relative to current basis Requires a goodness-of-fit metric such as mean squared error 28 14

Predictive Modeling Applications Case Study Richard Xu Global R&D RGA LAS May 2013 UW Model GLM model Contents Experience Study GLM model Pricing Model CART model Client Segmentation Clustering 2 1

Underwriting Identify best risks Be fast & consistent Prioritize cases Reduce not-taken rates Claims Predict claim frequency Identify claim severity Prioritize resources Identify claims most likely fraudulent/rescinded PM Applications Pricing/ Reserves Improve pricing accuracy Identify deviation of pricing variables Reserve more accurate Compute reserve variance Experience Analysis Identify drivers in experience Handle low credibility data Create own mortality/lapse tables Sales & Marketing Make effective campaigns Recommend products Select new agents Monitor existing agents In Force Business Client segmentation Predict lapses Design retention strategies Offer other products 3 3 Generalized Linear Model OLS(LM) GLM Random Systematic Link OLS Normal only GLM Various distributions Generalized Linear Model (GLM) Inclusion of most distributions related to insurance data Normal, binomial, Poisson, Gamma, inverse-gaussian, etc. Ordinary Least Square (OLS) is a special case of GLM Great flexibility in variance structure Weights & offset to be more flexible Multiplicative model intuitive & consistent with insurance practice Easy to understand & communicate 4 2

Case Study 1: UW Model Goal: to predict UW decisions on its existing customers Bancassurance in Asia with large customer pool, but low penetration in life product Identify certain pre-qualified existing customers, & offer guaranteed issue (GI) or simplified issue (SI) without medical UW Acquisition costs will be significantly reduced Market penetration will be deeper, and sales will increase Bancassurance is unique for PM Financial/demographic information about customers Major challenges - very limited data A total of about 8k-9k full UW cases Target variable UW decision, with very low declined/rated cases, ~3.0% Many missing values due to old time, especially for sub-std Not all information collected at the time of UW 5 Key Variables GLM with binomial and logistic link function About a dozen of predictor variables that are statistically significant for prediction & readily available in client database Key predictor variables Positive means the probability to be STD increases if the value goes up; otherwise, it is Negative Name Type Note Age_At_Entry Numeric Negative; less likely to qualify for STD as age goes up Branch Categorical Proxy of geographic locations AUM Numeric Positive; more likely to qualify for STD with large AUM Customer_Segment Categorical Positive for Premier, negative for non-premier Nationality Categorical Positive for domestic; negative for certain others 6 3

STD Rate non 18.0% 16.0% 14.0% 12.0% 10.0% 8.0% 6.0% 4.0% 2.0% 0.0% Lift Plot for In Sample Results Declined Rated Average nonstd Rate 3.0% 0.6% 0.5% 0.2% 1 2 3 4 5 6 7 8 9 10 Sorted Model Output In-sample results show model performance under optimal condition May over-fit data 0.5% of sub-std in top 30% of model output non STD Rate Model Results Validation results are a better test of model performance in real business 0.6% sub-std in the top 30% of model outputs, about 80% reduction Declined vs. Rated 16.0% 14.0% 12.0% 10.0% 8.0% 6.0% 4.0% 2.0% 0.0% Lift Plot for Validation Results Declined Rated Average nonstd Rate 3.0% 0.5% 0.4% 0.8% 0.5% 1 2 3 4 5 6 7 8 9 10 Sorted Model Output 7 Model results Gain curve, another way to understand model capability to differentiate STD from sub-std Best 30% of model outputs t contains about 5% of total non-std Lowest 30% captures about 75% of bad risks Model implementation Results delivered to the client Final implementation stage Final control on offers by insurer non STD % 1 0.9 0.8 0.7 0.6 0.5 04 0.4 0.3 0.2 0.1 Model Gain Curve In sample results Validation results Random 0 0 1 2 3 4 5 6 7 8 9 10 Sorted Model Output 8 4

Case Study 2: Experience Study PM vs. traditional actuarial approach True multivariate approach vs. univariate, under/over-estimation Impact of interaction term on target More efficient use of data, and handle low credibility data Establish own assumption based on experience data Type of studies Mortality, lapse, claim severity, incidence rate, continue table Major challenge - Data! Data! Data! Understanding business; clean data; mapping of data; data legacy; missing values; timing, etc. Not enough credible data Too much data, in big data territory 9 Post level Lapse Rates vs. Duration Term Tail Lapse Rates Tail lapse rates for 10-year term product Duration, premium jump, face amount, UW class, issue age, gender, etc. Post level Lapse Rate vs. Premium Jump Formula-based results with uncertainty estimated Business insights 10 5

CART Model Classification And Regression Tree (CART) Both classification and regression Non-parametric approach (no insight in data structure) CART tree is generated by repeated partitioning i of data set Data is split into two partitions (binary partition) Partitions can also be split into sub-partitions (recursive) Until data in end node(leaf) is homogeneous (more or less) Results are very intuitive Identify specific groups that deviate in target variable Yet, algorithm is very sophisticated 11 Case Study 3: LTD Pricing Business: US group Long-Term Disability(LTD) About 13k policies, with lives per policies from 10 to 30k Current pricing variables: about 30-40 Experience data of past 5 years with >80 variables Major pricing variables: age, gender, industry, location, benefit structure Objective To determine additional pricing variables and possible interaction terms (for pricing) To identify groups with experience deviating from pricing assumptions (for UW) Client has experience with PM Minimum efforts on business & data understanding Profit margin as target variable 12 6

CART Model results Results Easy to develop, interpret and understand; business insights Not efficient for linear function; sensitive to noise; over-fitting 13 CART Model results Results improve profit margin and pricing accuracy Useful tool for both pricing and UW of group LTD business Model implementation Client is very interested in model results; approved by management team Implemented in Q1 13 Quartile # of cases Actual EPM Model Predicted EPM 1 3230 (0.28) (0.32) 2 3230 (0.088) 088) (0.060) 060) 3 3230 0.063 0.020 4 3230 0.017 0.14 14 7

Clustering algorithm Data Clustering Find similarities in data according to features found in data and group similar objects into clusters Unsurprised (no pre-defined), classification, non-parametric How to measure similarities/dissimilarities, e.g. distance Numeric, categorical, and ordinal variables Partitioning (k-means), Hierarchical, Density-based, etc. 15 Case Study 4: Client Segmentation Existing client segmentation is based on geographic location, a more self-serving approach for own benefit rather than market and needs Objective To better understand client base, identifying knowledge gaps To capture tacit knowledge; create structured data on our clients & a tool for client analysis and strategic decision-making purposes on an ongoing basis To identify opportunities to better serve our clients needs and grow business To help better optimizing resourcing requirements 16 8

Client Segmentation Business team survey data Three main data categories Description of clients Behavior when facing of risks Needs to deal with risks Clustering algorithm & principal component analysis Algorithms find clusters that clients in same cluster are more similar to each other than to those in other clusters Un-supervised algorithm without target variable Data is dominated by categorical variables 17 Results on 5 clusters Number of clusters is a free parameters Example: opportunity Clustering Model Results 18 9

Clustering Model Example: Two High Level Clusters Direct distribution and Living Benefits related products Data quality is very important Prefer objective variables than subjective variables 5 (12% by NB volume) Want Direct 32 (25% by NB volume) Want Direct 8 (7% by NB volume) Want Direct e-sales 16 (5% by NB volume) Want Direct Traditional 17 (16% by NB volume) Want living benefits - related 13 (11% by NB volume) Want Living Benefits 4 (5% by NB volume) Want Combination 3 (1% by NB volume) Want Direct to in-force 19 PM is skills for actuaries in future Conclusion PM is to find knowledge in data so that we can understand and gain advantage Everything eyt gshould be made as simple as possible, but not simpler. Albert Einstein 20 10

Predictive Modeling Applications Case Study Richard Xu Global R&D RGA LAS May 2013 11