Employer Health Insurance Premium Prediction Elliott Lui



Similar documents
Making Sense of the Mayhem: Machine Learning and March Madness

Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms

How can we discover stocks that will

Predicting Flight Delays

The Data Mining Process

Predict Influencers in the Social Network

Predictive Data modeling for health care: Comparative performance study of different prediction models

Content-Based Recommendation

Social Media Mining. Data Mining Essentials

Introduction to Support Vector Machines. Colin Campbell, Bristol University

Microsoft Azure Machine learning Algorithms

Cross-Validation. Synonyms Rotation estimation

CHARACTERISTICS IN FLIGHT DATA ESTIMATION WITH LOGISTIC REGRESSION AND SUPPORT VECTOR MACHINES

Supervised Feature Selection & Unsupervised Dimensionality Reduction

Scalable Developments for Big Data Analytics in Remote Sensing

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data

Chapter 6. The stacking ensemble approach

Car Insurance. Havránek, Pokorný, Tomášek

Model Combination. 24 Novembre 2009

LCs for Binary Classification

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Question 2 Naïve Bayes (16 points)

Towards better accuracy for Spam predictions

T : Classification as Spam or Ham using Naive Bayes Classifier. Santosh Tirunagari :

Equity forecast: Predicting long term stock price movement using machine learning

Linear Threshold Units

International Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 3, May-June 2015

MAXIMIZING RETURN ON DIRECT MARKETING CAMPAIGNS

Chapter 20: Data Analysis

Classification algorithm in Data mining: An Overview

BIDM Project. Predicting the contract type for IT/ITES outsourcing contracts

MHI3000 Big Data Analytics for Health Care Final Project Report

UNDERSTANDING THE EFFECTIVENESS OF BANK DIRECT MARKETING Tarun Gupta, Tong Xia and Diana Lee

Introduction to Machine Learning Lecture 1. Mehryar Mohri Courant Institute and Google Research

Author Gender Identification of English Novels

Search Taxonomy. Web Search. Search Engine Optimization. Information Retrieval

Data Mining Techniques for Prognosis in Pancreatic Cancer

Estimating Car Insurance Premia: a Case Study in High-Dimensional Data Inference

BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES

Challenges of Cloud Scale Natural Language Processing

Pentaho Data Mining Last Modified on January 22, 2007

Statistical Models in Data Mining

Strategic Online Advertising: Modeling Internet User Behavior with

Classification of Bad Accounts in Credit Card Industry

A SURVEY OF TEXT CLASSIFICATION ALGORITHMS

Analysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j

Data Mining - Evaluation of Classifiers

Application of Event Based Decision Tree and Ensemble of Data Driven Methods for Maintenance Action Recommendation

Choosing the Right Health Insurance Plan What is the different between PPO, HMO, POS and HSA plans?

A Property & Casualty Insurance Predictive Modeling Process in SAS

AUTO CLAIM FRAUD DETECTION USING MULTI CLASSIFIER SYSTEM

Data Mining Methods: Applications for Institutional Research

Data Mining. Nonlinear Classification

Creating a NL Texas Hold em Bot

Car Insurance. Prvák, Tomi, Havri

Machine Learning. Mausam (based on slides by Tom Mitchell, Oren Etzioni and Pedro Domingos)

Assessment. Presenter: Yupu Zhang, Guoliang Jin, Tuo Wang Computer Vision 2008 Fall

Hong Kong Stock Index Forecasting

Bayesian Machine Learning (ML): Modeling And Inference in Big Data. Zhuhua Cai Google, Rice University

A Logistic Regression Approach to Ad Click Prediction

Crowdfunding Support Tools: Predicting Success & Failure

Machine Learning Big Data using Map Reduce

A Study Of Bagging And Boosting Approaches To Develop Meta-Classifier

Data Mining Algorithms Part 1. Dejan Sarka

Dynamic Predictive Modeling in Claims Management - Is it a Game Changer?

Recognizing Informed Option Trading

VCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter

Bootstrapping Big Data

How to Win at the Track

Predicting Soccer Match Results in the English Premier League

Active Learning SVM for Blogs recommendation

Health Insurance After Age 65

Learning is a very general term denoting the way in which agents:

STA 4273H: Statistical Machine Learning

BIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics

Data Quality Mining: Employing Classifiers for Assuring consistent Datasets

CENTERS FOR MEDICARE & MEDICAID SERVICES. Cost

An Introduction to Data Mining

Large Margin DAGs for Multiclass Classification

ElegantJ BI. White Paper. The Competitive Advantage of Business Intelligence (BI) Forecasting and Predictive Analysis

Why do statisticians "hate" us?

Customer Classification And Prediction Based On Data Mining Technique

Classifying Large Data Sets Using SVMs with Hierarchical Clusters. Presented by :Limou Wang

Generalizing Random Forests Principles to other Methods: Random MultiNomial Logit, Random Naive Bayes, Anita Prinzie & Dirk Van den Poel

Knowledge Discovery and Data Mining

Support Vector Machines with Clustering for Training with Very Large Datasets

Health Insurance Marketplace in Illinois Plan Comparison Charts

Azure Machine Learning, SQL Data Mining and R

Applying Machine Learning to Stock Market Trading Bryce Taylor

Supervised Learning (Big Data Analytics)

Decision Trees What Are They?

Emotion Detection from Speech

Multiclass Classification Class 06, 25 Feb 2008 Ryan Rifkin

E-commerce Transaction Anomaly Classification

Feature Selection using Integer and Binary coded Genetic Algorithm to improve the performance of SVM Classifier

Transcription:

Employer Health Insurance Premium Prediction Elliott Lui 1 Introduction The US spends 15.2% of its GDP on health care, more than any other country, and the cost of health insurance is rising faster than wages or inflation. Per year, employers spend $500 billion on health premiums for their employees. One important question these employers must always consider is if the coverage they are getting is worth what they are paying. The health insurance providers have their proprietary actuarial methods and complex models to determine these premiums, but they are hidden from the public. The purpose of this project is to explore the use of machine learning algorithms to predict the prices of annual health insurance premiums given the specifications of the contract and the company s demographics. That is, given a health insurance contract and information about a company s employees, can we accurately predict how much it will cost per year? Using SVM, multinomial Naive Bayes, and a decision tree classifier, we can try to predict the premium costs of a contract, as well as determine the optimal set of features using feature selection. 2 Data 2.1 Dataset The data set used for this project encompasses over 5,500 anonymous companies representing over 3 million lives. It comes from the Kaiser Family Foundation, which conducts an annual survey to collect this data to create the annual Health Benefits Report. A vector of 1199 variables represents each company, or data point. To put into relevant context, some of these features include industry, size of firm, percentage of workforce age 26 or lower, copayment amount for prescription drugs, coinsurance rate for hospital visit, and percent of workforce earning $21,000 or less, which conceptually seem quite relevant. On the other hand, there are features such as Does HDP use Copay, Coinsurance, or Both for Generic Drugs and Firm offers this PPO Plan Last Year that seem irrelevant in insurance premium pricing. 2.2 Training Data Because there are three different types of health insurance plans, and each type of plan has slightly different features, we come up with three sets of training data to predict three sets of prices. We are trying to predict the annual premium cost for a company for the categories of PPO, HMO, and HDP. Some companies provide more than one type of plan, but because there are slightly different features, we treated the company as separate data points in each plan. For example, if company A offered both PPO and HMO plans, we put the relevant feature vectors of A in both the PPO training data and HMO training data. 2.3 Feature Selection

To quantitatively determine which features are most relevant, we ran forward search first using the SVM classifier with a Gaussian to maximize 10-fold cross validation, and then with the other classification algorithms. Interestingly but perhaps not too surprising, the top features made sense conceptually, consisting of variables representing coinsurance amounts for surgeries, hospital visits, and drugs. Figure 1. Variables added to feature set through forward search Improvements in accuracy stopped after just 8 iterations for the SVM as seen in Figure 1. The forward search algorithm resulted in a set of just 8 features, down from the 1199 in the raw format as seen in Figure 2. Furthermore, after initially running with just SVM, running with multinomial Naïve Bayes, and decision tree classification yielded the same top 8 features with variances of at most three features. Raw Data Feature Set Figure 2. Raw data to relevant features 3 Predicting Insurance Premiums 3.1 Bucketing Target Values Since the premium prices are continuous, and we wanted to leverage the power of SVM and other classification algorithms, it was necessary to bucket these values. In order to make

meaningful predictions, it would make sense to limit the maximum error to what is acceptable in practical situations. Year to year changes in insurance premiums max out at about 30%, so we bucketed our data such that each of N buckets corresponds to a maximum price differential of 30%. For example, a bucket with a minimum value of $10,000 would have a maximum value of $13,000. 3.2 SVM We trained a multi-class SVM to classify which of N buckets a given data point belonged in, with one SVM trained for each type of plan. When training the SVMs, we experimented using Gaussian and linear kernels, the results shown in figure 3. In the Gaussian kernel, K(x,z) = e (x z)2 /(2σ 2 ) whereas in the linear kernel K(x,z) = x T z. The parameter σ for the Gaussian kernel was tuned by using a grid-based search to maximize 10-fold cross validation. We made sure to not overestimate this parameter to avoid it performing like a linear kernel. Underestimating it on the other hand would make it too sensitive to noise in the training data. We created the multi-class SVM by training a SVM for the upper and lower limit for each bucket. Each SVM predicts whether the price of a contract should be greater than or less than a bucket value; the final classification is then made by a majority vote by the SVMs. In addition to using different kernels, we also experimented with different bucket sizes of 15%, 30%, and 45% to see its effect on accuracy. 3.3 Multinomial Naïve Bayes Multinomial Naïve Bayes models the distribution of feature values as a multinomial. The Naïve Bayes assumption is that each feature is generated independently of every other. Using the same methodology described above, we trained multiple binary multinomial Naïve Bayes classifiers. We wanted to keep consistent with our previous method of multi-class classification, and it performed much better than multiclass Naïve Bayes on its own. 3.4 Decision Tree Though it wasn t explicitly taught in class, decision tree learning seemed like a good model for this data. One hypothesis for the low accuracy of the models could be the combination of continuous and categorical variables. Of the 8 features in our feature set, 5 are categorical even copay amounts were discretized into ranges. Decision tree classifiers are able to handle categorical variables especially well, so it made sense to train this classifier. 4 Results The results of our three models are summarized in figure 3. Each model was run on three datasets corresponding to PPO, HMO, and HDP plans. Within each

category we see the accuracy of predicting two values the premium cost for family and the premium cost for an individual. The PPO training data contained 696 rows, the HMO 247, and HDP 213. Algorithm Overall Accuracy SVM.6317 Multinomial Naiive Bayes.245 Decision Tree.62 Figure 3. Overall Performance of Models 4.1 SVM Results With the accepted 30% bucket size standard for this project, the SVM predicted premium prices with the highest accuracy among the three models at a rate of.632 (+/-.02). We see increasing bucket sizes improves classification accuracy; intuitively, it makes sense that it is easier to classify data into larger ranges. Bucket Size Accuracy Linear Kernel 15%.449.508 30%.593.632 45%.765.738 Figure 4. SVM Accuracy Gaussian Kernel The two different kernels on the other hand didn t display any significant differences in performance. Though we made sure to tune our Gaussian parameter σ to not overestimate and behave linearly, it still didn t produce a substantial difference. 4.2 Multinomial Naïve Bayes Results At an average accuracy of.245, multinomial Naïve Bayes exhibited the worst performance among our models. For all three categories (PPO, HMO, HDP), Naïve Bayes was the only algorithm to pick the company s industry as a feature in forward search. Conceptually it would make sense that a company in a labor intensive industry like construction would command higher premium prices than a company with young, highly skilled workers like a tech company. At first glance at the raw data, the poor performance could be attributed to a violation of the Naïve Bayes assumption; however, the features it selected were all independent from a conceptual standpoint. 4.3 Decision Tree Results The decision tree classifier performed almost as well as the SVM overall with an accuracy rate of.62. Figure 5 shows a sub-tree of the decision tree that was ultimately constructed from our training data (the whole tree would be too large to display).

Figure 5. Sub-tree of the Decision Tree Classifier The tree contained 130 nodes representing possible decisions, the number of samples with that decision, and its error. Because these decisions are binary <= operations, the model seemed to work on the categorical nature of our features. 5 Discussion The results from our models didn t predict insurance premium costs very well. While the accuracy rates weren t abysmally low, there was much left to be desired. The low accuracy across all models suggests that the feature vectors don t encompass all pertinent information. In fact, the data doesn t account for the strength of the health plan network. For example, two identical data points can have differing physician networks one could include a top-notch institution like Stanford hospital while the other includes only small clinics with fewer doctors. The former would obviously command a higher premium price, but such information was not included in the data set, presumably because it is extremely difficult to quantify. The forward search component of the analysis was quite successful in choosing the most important features. That is, without supervision, it was able to select the features such as copays for drugs and hospital visits that make sense in determining the price of health insurance premiums. Perhaps with key additional information like network strength, we could have predicted premium prices at a much higher accuracy. References [1] N Chapados, Y Bengio, P Vincent, J Gohsn, C Dugas, I Takeuchi, L Meng. Estimating Car Insurance Premia: a Case Study in High Dimensional Data Inference. Advances in Neural Information Processing Systems (2002) [2] D Biggs, B Ville, E Suen. A Method of Choosing Multi-way Partitions for Classification and Decision Trees. Journal of Applied Statistics, 18(1):49-62 [3] G Cass. An Exploratory Technique for Investigating large quantities of categorical Data. Applied Statistics, 29(2):119-127