Application in Predictive Analytics. FirstName LastName. Northwestern University



Similar documents
Descriptive Methods Ch. 6 and 7

NON-PROBABILITY SAMPLING TECHNIQUES

Missing Data. A Typology Of Missing Data. Missing At Random Or Not Missing At Random

Running head: HOW TO WRITE A RESEARCH PROPOSAL 1. How to Write a Research Proposal: A Formal Template for Preparing a Proposal for Research Methods

DESCRIPTIVE RESEARCH DESIGNS

SAMPLING & INFERENTIAL STATISTICS. Sampling is necessary to make inferences about a population.

Guided Reading 9 th Edition. informed consent, protection from harm, deception, confidentiality, and anonymity.

Reflections on Probability vs Nonprobability Sampling

Introduction Qualitative Data Collection Methods... 7 In depth interviews... 7 Observation methods... 8 Document review... 8 Focus groups...

Fairfield Public Schools

Multiple Imputation for Missing Data: A Cautionary Tale

DEMQOL and DEMQOL-Proxy - Interviewer Manual Instructions for administration:

Introduction to Sampling. Dr. Safaa R. Amer. Overview. for Non-Statisticians. Part II. Part I. Sample Size. Introduction.

CHAPTER 3 RESEARCH METHODOLOGY

Non-random/non-probability sampling designs in quantitative research

CALCULATIONS & STATISTICS

Customer Satisfaction with Oftel s Complaint Handling. Wave 4, October 2003

THE JOINT HARMONISED EU PROGRAMME OF BUSINESS AND CONSUMER SURVEYS

Challenges in Longitudinal Data Analysis: Baseline Adjustment, Missing Data, and Drop-out

Barriers & Incentives to Obtaining a Bachelor of Science Degree in Nursing

The Billion Dollar Lost Laptop Problem Benchmark study of U.S. organizations

Why Sample? Why not study everyone? Debate about Census vs. sampling

ESOMAR 28: SurveyMonkey Audience

Farm Business Survey - Statistical information

research/scientific includes the following: statistical hypotheses: you have a null and alternative you accept one and reject the other

Self-Check and Review Chapter 1 Sections

Research Methods & Experimental Design

2015 Michigan Department of Health and Human Services Adult Medicaid Health Plan CAHPS Report

IMPACT OF TRUST, PRIVACY AND SECURITY IN FACEBOOK INFORMATION SHARING

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

An Introduction to. Metrics. used during. Software Development

5 KEYS [ ] to Successfully Tracking Customer Experience. Are you delivering a top-notch customer experience that <keeps them coming back?

Descriptive Statistics

MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group

A Basic Introduction to Missing Data

Elementary Statistics

Implementation of DoD Instruction , Surveys of DoD Personnel

Analysing Questionnaires using Minitab (for SPSS queries contact -)

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

A Property and Casualty Insurance Predictive Modeling Process in SAS

Marketing Research Core Body Knowledge (MRCBOK ) Learning Objectives

p ˆ (sample mean and sample

Solar Energy MEDC or LEDC

TeleLife. Consumer Quick Reference Guide. Birmingham, AL PLC.2899 (06.13)

Chapter Eight: Quantitative Methods

Labour Force Survey s Electronic Collection

Factors Influencing Laptop Buying Behavior a Study on Students Pursuing Ug/Pg in Computer Science Department of Assam University

Handling missing data in large data sets. Agostino Di Ciaccio Dept. of Statistics University of Rome La Sapienza

430 Statistics and Financial Mathematics for Business

Statistical tests for SPSS

National Survey of Franchisees 2015

GUIDELINES FOR REVIEWING QUANTITATIVE DESCRIPTIVE STUDIES

How to do a Survey (A 9-Step Process) Mack C. Shelley, II Fall 2001 LC Assessment Workshop

HIRING PROCESS Human Resource Manual Reference Checks Section 511 REFERENCE CHECKS

A Study to Predict No Show Probability for a Scheduled Appointment at Free Health Clinic

Appendix B Checklist for the Empirical Cycle

Statistics Graduate Courses

SURVEY DESIGN: GETTING THE RESULTS YOU NEED

OFFICE OF MANAGEMENT AND BUDGET STANDARDS AND GUIDELINES FOR STATISTICAL SURVEYS September Table of Contents

Development Period Observed Payments

Provider Satisfaction Survey: Research and Best Practices

USING LOGISTIC REGRESSION TO PREDICT CUSTOMER RETENTION. Andrew H. Karp Sierra Information Services, Inc. San Francisco, California USA

STATISTICA. Financial Institutions. Case Study: Credit Scoring. and

Sample NSAC Research Ethics Board Submission

Prospects, Problems of Marketing Research and Data Mining in Turkey

II. DISTRIBUTIONS distribution normal distribution. standard scores

PHILANTHROPIC ADVICE SURVEY

SURVEY RESEARCH AND RESPONSE BIAS

Analyzing Structural Equation Models With Missing Data

Rates for Vehicle Loans: Race and Loan Source

Michigan Department of Community Health

(Following Paper ID and Roll No. to be filled in your Answer Book) Roll No. METHODOLOGY

Course Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION

Premaster Statistics Tutorial 4 Full solutions

Handling missing data in Stata a whirlwind tour

Summary A Contemporary Study of Factors Influencing Urban and Rural Consumers for Buying Different Life Insurance Policies in Haryana.

Statistical & Technical Team

DATA ANALYSIS AND INTERPRETATION OF EMPLOYEES PERSPECTIVES ON HIGH ATTRITION

Point and Interval Estimates

Sample Size and Power in Clinical Trials

Missing Data in Longitudinal Studies: To Impute or not to Impute? Robert Platt, PhD McGill University

COMPARISONS OF CUSTOMER LOYALTY: PUBLIC & PRIVATE INSURANCE COMPANIES.

Attachment A: Customer Follow Up Phone Survey Of Non-Respondents to the Business Case Customer Survey

Data Cleaning and Missing Data Analysis

Pilot Testing and Sampling. An important component in the data collection process is that of the pilot study, which

Experimental Design and Hypothesis Testing. Rick Balkin, Ph.D.

Transcription:

Application in Predictive Analytics FirstName LastName Northwestern University Prepared for: Dr. Nethra Sambamoorthi, Ph.D. Author Note: Final Assignment PRED 402 Sec 55 Page 1 of 18

Contents Introduction... 3 Company Background... 3 Survey Population:... 3 Hypothesis... 4 Sample Frame... 4 Sampling Size... 5 Sampling Method... 6 Strategy to reduce bias... 6 Strategy for missing or poor quality data:... 7 Analysis methods... 8 Modeling Plans:... 10 Questionnaire... 11 Check List... 16 References... 18 Page 2 of 18

Introduction Insurance companies are facing a tough economy, rising cost of operations and increasingly demanding customers who want personal and seamless customer experience. Insurance is a unique industry where services are delivered only when a claim is filled. When a customer files a claim, he/she has loses something personal. One bad claims experience is enough for him/her to switch and possibly never return to the same insurance company. Therefore claims customer service and experience becomes an important part of the services offered by an insurance company. In this case study, we will design customer satisfaction survey for X Insurance Company. Company Background (Due to client confidentiality the name X insurance company is used) X insurance company is North America s leading personal auto insurance company (by premiums). It also is the leading home insurer and offers non-medical health and life insurance through its subsidiary companies. Its products are marketed via more than 18,000 agents in the US and Canada. X insurance company diversified businesses also includes a federal savings bank that offers consumer and business loans, savings and checking accounts, and investment products. X insurance company services on average 3,000 claims per day and reducing claims operation cost and improving customer service is core part of the strategy. Survey Population: For our auto claims satisfaction survey, the total population equates to all the claimants who have settled a claim with X insurance company in the last 3 months. The insurance company has settled over 320,000 claims in the last 3 months. Page 3 of 18

Hypothesis Claimant experience impacts loyalty and every insurance company s profitability. Therefore it is very important for every insurance company to provide exceptional and personalized customer experience at the time of a claim. The hypothesis is believed to be that among claimants who are delighted with their claims experience will definitely renew with X insurance company and "definitely will" recommend the company to others. Among claimants who are "pleased" by their customer experience, they may" renew their policy and among claimants who have a bad claim experience, they is a high probability of them switching the insurance company during their next renewal or possibly sooner. Sample Frame In our case we are using simple random sampling whereas sample frames are used widely to improve the design for stratified sampling technique. Unlike traditional survey techniques where sample frames can include electoral register, telephone directory, public database type sources but X insurance company will have all the information readily available that can be sourced from the internal claims database. We will have the sample frames from individuals who have filled for a claim amount greater than $2000 with the X insurance company and will excludes claimants whose vehicle incurred only glass/windshield damage or was stolen, or who only filed roadside assistance claims. Page 4 of 18

Sample Frame below: Claim Policy Customer Age Customer Number Number Name Claim Amount 1001 1123432 J Smith 23 2000 1023 1123433 T Moody 43 2120 1076 1123434 J Lopez 18 3001 1022 1123435 A Lee 65 3200 1009 1123436 G Patel 33 3600 1043 1123437 Y Upad 54 3800 1007 1123438 T yao 22 4100 1056 1123439 F Lonzie 53 4400 1023 1123440 J Jimmy 19 4700 1010 1123441 h Jiines 24 5000 1011 1123442 G Jines 33 5300 1012 1123443 K Deapn 32 5600 1013 1123444 F rosie 60 5900 1014 1123445 L Kumar 51 6200 1015 1123446 F sharma 23 6500 1016 1123447 P Shrmae 44 6800 1057 1123448 I Raofd 47 7100 1032 1123449 R Martini 55 7400 1001 1123432 J Smith 23 2000 Sampling Size As we know that the sample size increases as the margin of error and confidence interval decreases. We have chosen a margin of error of 2 % with 95% confidence level. Based on the formula (T 2 ) [p (1-p)]/M 2, X insurance company will require minimum of 2401 responses. We also know that larger sample size lead to increased precision when estimating a population therefore for our survey, we will decided to target more than 5000 auto insurance customers who have filled a claim within the past 3 months. Page 5 of 18

Sampling Method We have chosen simple random sample (SRS) as a method of choice so that all subsets of the frame are given an equal probability. This will also minimize bias and simplifies analysis of results. The SRS method that we chose is not vulnerable to sampling error because it is possible to identify every member of the population. The randomness of the selection reflects the makeup of the population since we are only targeting around 5000 claimants from a population of 320,000 claimants. Strategy to reduce bias Design and data collection bias Below are guidelines that will be followed to reduce the issue: Design easy to understand questions so that they can be understood by the entire sample in the same way. Question should be kept simple and should not include any presupposition. Survey questions should be kept applicable to all the respondents No questions should be asked where the answer may not be known. Options for the questions should be kept balanced and easily interpretable Technical jargon should be avoided from the questions. Non response Bias Nonresponse bias can be a major contributing source of total survey error which is very common phenomenon and can cause skew survey results tremendously. They happen because sometimes individuals chosen for the sample are unwilling or unable to participate in the survey. In our case, we are targeting claimants who have filled a claim within the last 3 months. It is possible that majority of the claimants who respond to the survey are Page 6 of 18

individuals who had good claims experience so they may not bother to respond. In order to deal with this issue, X insurance company has decided to offer 10% premium discount on the next policy renewal for every survey participants. This incentive will ensure that claimant have an additional motive to reply to the survey. Planning/Voluntary Bias Voluntary response bias can occurs when sample members are selfselected volunteer. The resulting sample tends to over represent individuals who have strong opinions. This will be a very common influence in our claim survey as the respondents who had a bad claim experience are more like to respond. This issue will be solved through random sampling in which all probabilities are equally likely to happen. Interviewer Bias Fortunately for our survey implementation this issue is avoided as the surveys are administered via web right after the claim is settled. Strategy for missing or poor quality data Please Note By design the web survey has 10 questions and all answers are mandatory. This will take care of a lot of missing data issues. Missing data is almost part of all survey and data collection methodologies. The strategy for missing data is based on the nature of the missing data itself as there are several reasons why data may be missing. Missing completely at random (MCAR) This missing data types are not a huge issue since the analysis remains unbiased. We will use the most common means of dealing with missing data which is list wise deletion. If the data are missing completely at random, then list wise deletion does not add any bias, but it does decrease the power of the analysis by decreasing the effective sample size,. We have taken a huge sample size of 5000 claim respondents therefore this should not be an issue. Page 7 of 18

Missing not at Random (MNAR) - When we have data that are MNAR we have a problem. For our survey we will use the below techniques to address missing data issues - Missing value can be imputed from a randomly selected similar record - We may be replacing any missing value with the mean of that variable for all other cases, which has the benefit of not changing the sample mean for that variable - Regression imputation - A regression model is estimated to predict observed values of a variable based on other variables, and that model is then used to impute values in cases where that variable is missing - Propensity score method - In the propensity score method, a propensity score is generated for each variable with missing values to indicate the probability of the observation being missing. The observations are then grouped based on these propensity scores and an approximate Bayesian bootstrap imputation is applied to each group. Page 8 of 18

Analysis methods We will use several graphical means to analyze and report the data to our stakeholders. Since this assignment is more focused on the survey design itself, survey analysis and reporting is discussed briefly using graphics and cross tabulations reports for the first 3 questions. 1. Approximately, how long did it take to settle the claim? Claim Time 10% 15% 20% 25% 2 wks 2-4 wks 4-6 wks 6-8 wks Over 2 months 30% 2. What is the most important feature you like in an insurance company? Preference Percentage Score Age Demographics a. Flexible Products 10% 50-60 b. Superior customer service 16% 32-50 c. Technology & ease of use 30% 18-32 d. Premium Discounts 44% All Ages Page 9 of 18

3. How would you rate the claim intake process for the FNOL (First Notice of Loss)? Excellent 2 Good 5 Average Poor 2 3 Claim Amount >5000 2000>Claim Amount <5000 Claim Amount <2000 Very Dissappointed 2 0 1 2 3 4 5 6 Modeling Plans X insurance company wants to predict the impact on customer retention due to customer claims experience. For this the company will be using 1- Hypothesis testing using F and T tests there are several hypotheses that are being developed. For e.g. Customer interaction with the agent is directly correlated to the level of customer service. Another hypothesis is that the amount of time it takes to settle a claim is directly correlated to high claims experience. We will do several other hypotheses testing on the survey data. 2- ANOVA Analysis of variance 3- Cluster Analysis 4- Text Analytics Page 10 of 18

Questionnaire PLEASE NOTE: X insurance company has all the demographics (Age, Sex, Address etc.) Behavioral and the Claims data available in the other source systems that will have to be integrated with this survey data before the analysis. The survey is conducted via web and all the fields are mandatory in order to avoid missing data issue. Also remember that there is 10% premium reduction discount offered by X insurance company in order to improve the response rate and thus the overall effectiveness of this claim satisfaction survey 1. Approximately, how long did it take to settle the claim? a. Less than 2 weeks. b. 2-4 weeks. c. 4-6 weeks. d. 6-8 weeks. e. More than 2 months 2. What is the most important feature you like in an insurance company? a. Flexible Products b. Superior customer service c. Technology and ease of use d. Premium Discounts Page 11 of 18

All the questions below are on a scale from 1 (Very Disappointed) to 5 (excellent): 3. How would you rate the claim intake process for the FNOL (First Notice of Loss)? 1. Very Disappointed 2. Poor 3. Average 4. Good 5. Excellent 4. How would you rate the explanation of the claim process to you by the assigned claim rep.? 1. Very Disappointed 2. Poor 3. Average 4. Good 5. Excellent 5. How would you rate our communication & over all attentiveness to your needs during your claims process? 1. Very Disappointed 2. Poor 3. Average 4. Good 5. Excellent Page 12 of 18

6. The insurance appraiser/adjuster was friendly and courteous. 1. Very Disappointed 2. Poor 3. Average 4. Good 5. Excellent 7. From start to finish, the insurance appraiser/adjuster processed your claim in an efficient manner? 1. Very Disappointed 2. Poor 3. Average 4. Good 5. Excellent 8. How would you rate your overall claims experience? 1. Very Disappointed 2. Poor 3. Average 4. Good 5. Excellent 9. Would you recommend us to your friend and family? 1. Definitely Yes 2. Maybe 3. Definitely No Page 13 of 18

10. If you could change one thing about the claims process, what would it be? Free Text. Page 14 of 18

Page 15 of 18

Check List Survey Meta Data Sponsor, broad objective to achieve Definition Claims Vice President X insurance company services nearly 5,000 claims per day. Improving customer service is core part of the strategy; therefore company would like to capture customer claims experience through claims satisfaction survey so that they can predict which customers are more likely to leave the company thus reducing customer retention. Third party No. Internal to X insurance company only collaboration Purpose The purpose of the survey is to capture claims satisfaction of the customer. Survey Time Period Survey Design Open for 2 months One time panel Target population Customer that have a closed claim in the last 6 months. 65,000 approximately Sampling Method Sample Size Use of Interviewer? Mode of administration Type of computer Simple Random Sampling 5000 claimants No Web Based survey offered immediately after claim settlement Not Applicable assistance Page 16 of 18

Reporting/Respondent 1 person Frequency of survey Interview per round of Once after claim is settles Not Applicable survey Levels of observations Link to the survey Personal claimant who files the claim Not Applicable. It will be sent automatically once the claim is settled. Page 17 of 18

References 1) http://www.statefarm.com/aboutus/company/profile/state_farm_story.asp 2) http://img.en25.com/web/jdpower/10_hcs_mgmnt%20discussion.pdf 3) http://www.jdpower.com/content/press-release/ymkht9q/2013-u-s-auto-claimssatisfaction-study.htm 4) http://en.wikipedia.org/wiki/sampling_(statistics)#sampling_frame 5) http://stattrek.com/survey-research/survey-bias.aspx 6) http://her.oxfordjournals.org/content/25/1/14.full 7) http://www.uvm.edu/~dhowell/statpages/more_stuff/missing_data/missing.html 8) http://en.wikipedia.org/wiki/imputation_(statistics) 9) http://support.sas.com/rnd/app/da/new/dami.html 10) http://www.countryfinancial.com/sitecontroller?url=/customersupport/claims/claimssati sfactionsurvey 11) http://www.markham-group.com/claim-survey/ Page 18 of 18