# Modeling Lifetime Value in the Insurance Industry

Save this PDF as:

Size: px
Start display at page:

## Transcription

2 Weight Campaign Sample Non Resp/Non Pd Resp 929,075 37, Responders/Paid 37,781 37,781 1 Total 966,856 74,944 The non-responders and non-paid responders are grouped together since our target is paid responders. This gives us a manageable sample size of 74,944. name is on fewer databases and hence may have received fewer pieces of direct mail. This will often lead to better response rates. The following code is used to replace missing values: IF INCOME =. THEN INC_MISS = 1; ELSE INC_MISS = 0; IF INCOME =. THEN INCOME = 49; THE DATA CLEAN-UP To check data quality, a simple data mining procedure like PROC UNIVARIATE can provide a great deal of information. In addition to other details, it calculates three measures of central tendency: mean, median and mode. It also calculates measures of spread such as the variance and standard deviation and it displays quantile measures and extreme values. It is good practice to do a univariate analysis of all continuous variables under consideration. The following code will perform a univariate analysis on the variable income: PROC UNIVARIATE DATA=LIB.DATA; VAR INCOME; The output is displayed in Appendix B. The variable INCOME is in units of \$1000. N represents the sample size of 74,944. The mean value of is suspicious. With further scrutiny, we see that the highest value for INCOME is It is probably a data entry error and should be deleted. The two values representing the number of values greater than zero and the number of values not equal to zero are the same at 74,914. This implies that 30 records have missing values for income. We choose to replace the missing value with the mean. First, we must delete the observation with the incorrect value for income and rerun the univariate analysis. The results from the corrected data produce more reasonable results (see Appendix C). With the outlier deleted, the mean is in a reasonable range at a value of 49. This value is used to replace the missing values for income. Some analysts prefer to use the median to replace missing values. Even further accuracy can be obtained using cluster analysis to calculate cluster means. This technique is beyond the scope of this paper. Because a missing value can be indicative of other factors, it is advisable to create a binary variable, which equals 1 if the value is missing and 0 otherwise. For example, income is routinely overlaid from an outside source. Missing values often indicate that a name didn t match the outside data source. This can imply that the MODEL DEVELOPMENT The first component of the LTV, the probability of a paid sale, is based on a binary outcome, which is easily modeled using logistic regression. Logistic regression uses continuous values to predict the odds of an event happening. The log of the odds is a linear function of the predictors. The equation is similar to the one used in linear regression with the exception of the use of a log transformation to the independent variable. The equation is as follows: log(p/(1-p)) = B 0 + B 1 X 1 + B 2 X B n X n Variable Preparation - Dependent To define the dependent variable, create the variable PAIDSALE defined as follows: IF PREMIUM > 0 THEN PAIDSALE = 1; ELSE PAIDSALE = 0; Variable Preparation - Independent: Categorical Categorical variables need to be coded with numeric values for use in the model. Because logistic regression reads all independent variables as continuous, categorical variables need to be coded into n-1 binary (0/1) variables, where n is the total number of categories. The following example deals with four geographic regions: north, south, midwest, west. The following code creates three new variables: IF REGION = EAST THEN EAST = 1; ELSE EAST = 0; IF REGION = MIDWEST THEN MIDWEST = 1; ELSE MIDWEST = 0; IF REGION = SOUTH THEN SOUTH = 1; ELSE SOUTH = 0; If the value for REGION is WEST, then the values for the three named variables will all have a value of 0.

3 Variable Preparation - Independent: Continuous Since, logistic regression looks for a linear relationship between the independent variables and the log of the odds of the dependent variable, transformations can be used to make the independent variables more linear. Examples of transformations include the square, cube, square root, cube root, and the log. Some complex methods have been developed to determine the most suitable transformations. However, with the increased computer speed, a simpler method is as follows: create a list of common/favorite transformations; create new variables using every transformation for each continuous variable; perform a logistic regression using all forms of each continuous variable against the dependent variable. This allows the model to select which form or forms fit best. Occasionally, more than one transformation is significant. After each continuous variable has been processed through this method, select the one or two most significant forms for the final model. The following code demonstrates this technique for the variable AGE: PROC LOGISTIC LIB.DATA: WEIGHT SMP_WGT; MODEL PAIDSALE = AGE AGE_MISS AGE_SQR AGE_CUBE AGE_LOG / SELECTION=STEPWISE; The logistic model output (see Appendix D) shows two forms of AGE to be significant in combination: AGE_MISS and AGE_CUBE. These forms will be introduced into the final model. Partition Data The data are partitioned into two datasets, one for model development and one for validation. This is accomplished by randomly splitting the data in half using the following SAS code: DATA LIB.MODEL LIB.VALID; SET LIB.DATA; IF RANUNI(0) <.5 THEN OUTPUT LIB.MODEL; ELSE OUTPUT LIB.VALID; If the model performs well on the model data and not as well on the validation data, the model may be over-fitting the data. This happens when the model memorizes the data and fits the models to unique characteristics of that particular data. A good, robust model will score with comparable performance on both the model and validation datasets. As a result of the variable preparation, a set of candidate variables has been selected for the final model. The next step is to choose the model options. The backward selection process is favored by some modelers because it evaluates all of the variables in relation to the dependent variable while considering interactions among the independent or predictor variables. It begins by measuring the significance of all the variables and then removing one at a time until only the significant variables remain. A reasonable significance level is the default value of.05. If too many variables end up in the final model, the signifiance level can be lowered to.01,.001, or The sample weight must be included in the model code to recreate the original population dynamics. If you eliminate the weight, the model will still produce correct ranking-ordering but the actual estimates for the probability of a paid-sale will be incorrect. Since our LTV model uses actual estimates, we will include the weights. The following code is used to build the final model. PROC LOGISTIC LIB.MODEL: WEIGHT SMP_WGT; MODEL PAIDSALE = AGE_MISS AGE_CUBE EAST MIDWEST SOUTH INCOME INC_MISS LOG_INC MARRIED SINGLE POPDENS MAIL_ORD// SELECTION=BACKWARD; The resulting model has 7 predictors. (See Appendix E) The parameter estimate is multiplied times the value of the variable to create the final probability. The strength of the predictive power is distributed like a chi-square so we look to that distribution for significance. The higher the chi-square, the lower the probability of the event occurring randomly (pr > chi-square). The strongest predictor is the variable MAIL_ORD. This has a value of 1 if the individual has a record of a previous mail order purchase. This may imply that the person is comfortable making purchases through the mail and is therefore a good mailorder insurance prospect. The following equation shows how the probability is calculated, once the parameter estimates have been calculated: prob = exp(b 0 + B 1 X 1 + B 2 X B n X n ) (1+ exp(b 0 + B 1 X 1 + B 2 X B n X n )) This creates the final score, which can be evaluated using a gains table (see Appendix F). Sorting the dataset by the score and dividing it into 10 groups of equal volume creates the gains table. The validation dataset is also scored and evaluated in a gains table (see Appendix G).

5 APPENDIX A MALE FEM ALE M S D W M S D W < APPENDIX B Variable=INCOME Univariate Analysis Moments Quantiles Extremes Low High N 74, % Max Mean % Q Std Dev % Med Num ^= 0 74,914 25% Q Num > 0 74,914 0% Min APPENDIX C Variable=INCOME Univariate Analysis Moments Quantiles Extremes Low High N 74, % Max Mean 49 75% Q Std Dev % Med Num ^= 0 74,913 25% Q Num > 0 74,913 0% Min APPENDIX D The LOGISTIC Procedure Analysis of Maximum Likelihood Estimates Standardized Estimate.. Parameter Standard Wald Pr > Odds Variable DF Estimate Error Chi-Square Chi-Square Ratio INTERCPT

6 AGE AGE_MISS AGE_CUBE AGE_LOG AGE_SQR APPENDIX E The LOGISTIC Procedure Analysis of Maximum Likelihood Estimates Standardized Estimate Parameter Standard Wald Pr > Odds Variable DF Estimate Error Chi-Square Chi-Square Ratio INTERCPT AGE_CUBE MIDWEST LOG_INC INC_MISS MARRIED POPDENS MAIL_ORD Association of Predicted Probabilities and Observed Response Concordant = 57.1% Somers D = Discordant = 36.2% Gamma = Tied = 6.6% Tau-a = ( pairs) c = 0.604

7 APPENDIX F Model Data NUMBER OF PREDICTED % ACTUAL % OF NUMBER OF CUM ACTUAL % DECILE ACCOUNTS OF PAID SALES PAID SALES PAID SALES OF PAID SALES 1 48, % 11.36% 5, % 2 48, % 8.63% 4, % 3 48, % 5.03% % 4 48, % 1.94% % 5 48, % 0.95% % 6 48, % 0.28% % 7 48, % 0.11% % 8 48, % 0.08% % 9 48, % 0.00% % 10 48, % 0.00% % APPENDIX G Validation Data NUMBER OF PREDICTED % ACTUAL % OF NUMBER OF CUM ACTUAL % DECILE ACCOUNTS OF PAID SALES PAID SALES PAID SALES OF PAID SALES 1 48, % 10.12% 4, % 2 48, % 8.16% 3, % 3 48, % 5.76% % 4 48, % 2.38% 1, % 5 48, % 1.07% % 6 48, % 0.56% % 7 48, % 0.23% % 8 48, % 0.05% % 9 48, % 0.01% % 10 48, % 0.00% % APPENDIX H Financial Analysis NUMBER OF PREDICTED % RISK PRODUCT AVERAGE CUM AVERAGE SUM CUM DECILE ACCOUNTS OF PAID SALES INDEX PROFITABILITY LTV LTV LTV 1 96, % 0.94 \$553 \$58.27 \$58.27 \$5,633,985

8 2 96, % 0.99 \$553 \$46.47 \$52.37 \$10,126, , % 0.98 \$553 \$26.45 \$43.73 \$12,684, , % 0.96 \$553 \$9.49 \$35.17 \$13,602, , % 1.01 \$553 \$4.55 \$29.05 \$14,042, , % 1.00 \$553 \$0.74 \$24.33 \$14,114, , % 1.03 \$553 (\$0.18) \$20.83 \$14,096, , % 0.99 \$553 (\$0.34) \$18.18 \$14,063, , % 1.06 \$553 (\$0.76) \$16.08 \$13,990, , % 1.10 \$553 (\$0.77) \$14.39 \$13,915,946

### Data Mining: An Overview of Methods and Technologies for Increasing Profits in Direct Marketing. C. Olivia Rud, VP, Fleet Bank

Data Mining: An Overview of Methods and Technologies for Increasing Profits in Direct Marketing C. Olivia Rud, VP, Fleet Bank ABSTRACT Data Mining is a new term for the common practice of searching through

### A Basic Guide to Modeling Techniques for All Direct Marketing Challenges

A Basic Guide to Modeling Techniques for All Direct Marketing Challenges Allison Cornia Database Marketing Manager Microsoft Corporation C. Olivia Rud Executive Vice President Data Square, LLC Overview

### Modeling Customer Lifetime Value Using Survival Analysis An Application in the Telecommunications Industry

Paper 12028 Modeling Customer Lifetime Value Using Survival Analysis An Application in the Telecommunications Industry Junxiang Lu, Ph.D. Overland Park, Kansas ABSTRACT Increasingly, companies are viewing

### Predicting Customer Churn in the Telecommunications Industry An Application of Survival Analysis Modeling Using SAS

Paper 114-27 Predicting Customer in the Telecommunications Industry An Application of Survival Analysis Modeling Using SAS Junxiang Lu, Ph.D. Sprint Communications Company Overland Park, Kansas ABSTRACT

### Statistics, Data Analysis & Econometrics

Using the LOGISTIC Procedure to Model Responses to Financial Services Direct Marketing David Marsh, Senior Credit Risk Modeler, Canadian Tire Financial Services, Welland, Ontario ABSTRACT It is more important

### Segmentation For Insurance Payments Michael Sherlock, Transcontinental Direct, Warminster, PA

Segmentation For Insurance Payments Michael Sherlock, Transcontinental Direct, Warminster, PA ABSTRACT An online insurance agency has built a base of names that responded to different offers from various

### Developing Risk Adjustment Techniques Using the SAS@ System for Assessing Health Care Quality in the lmsystem@

Developing Risk Adjustment Techniques Using the SAS@ System for Assessing Health Care Quality in the lmsystem@ Yanchun Xu, Andrius Kubilius Joint Commission on Accreditation of Healthcare Organizations,

### PROC LOGISTIC: Traps for the unwary Peter L. Flom, Independent statistical consultant, New York, NY

PROC LOGISTIC: Traps for the unwary Peter L. Flom, Independent statistical consultant, New York, NY ABSTRACT Keywords: Logistic. INTRODUCTION This paper covers some gotchas in SAS R PROC LOGISTIC. A gotcha

### 11. Analysis of Case-control Studies Logistic Regression

Research methods II 113 11. Analysis of Case-control Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:

: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

### IBM SPSS Direct Marketing 23

IBM SPSS Direct Marketing 23 Note Before using this information and the product it supports, read the information in Notices on page 25. Product Information This edition applies to version 23, release

### Alex Vidras, David Tysinger. Merkle Inc.

Using PROC LOGISTIC, SAS MACROS and ODS Output to evaluate the consistency of independent variables during the development of logistic regression models. An example from the retail banking industry ABSTRACT

### Basic Statistical and Modeling Procedures Using SAS

Basic Statistical and Modeling Procedures Using SAS One-Sample Tests The statistical procedures illustrated in this handout use two datasets. The first, Pulse, has information collected in a classroom

### A Property & Casualty Insurance Predictive Modeling Process in SAS

Paper AA-02-2015 A Property & Casualty Insurance Predictive Modeling Process in SAS 1.0 ABSTRACT Mei Najim, Sedgwick Claim Management Services, Chicago, Illinois Predictive analytics has been developing

### 4.1 Exploratory Analysis: Once the data is collected and entered, the first question is: "What do the data look like?"

Data Analysis Plan The appropriate methods of data analysis are determined by your data types and variables of interest, the actual distribution of the variables, and the number of cases. Different analyses

### Myth or Fact: The Diminishing Marginal Returns of Variable Creation in Data Mining Solutions

Myth or Fact: The Diminishing Marginal Returns of Variable in Data Mining Solutions Data Mining practitioners will tell you that much of the real value of their work is the ability to derive and create

### ABSTRACT INTRODUCTION

Paper SP03-2009 Illustrative Logistic Regression Examples using PROC LOGISTIC: New Features in SAS/STAT 9.2 Robert G. Downer, Grand Valley State University, Allendale, MI Patrick J. Richardson, Van Andel

### Scoring of Bank Customers for a Life Insurance Campaign

Scoring of Bank Customers for a Life Insurance Campaign by Brian Schwartz and Jørgen Lauridsen Discussion Papers on Business and Economics No. 5/2007 FURTHER INFORMATION Department of Business and Economics

### IBM SPSS Direct Marketing 22

IBM SPSS Direct Marketing 22 Note Before using this information and the product it supports, read the information in Notices on page 25. Product Information This edition applies to version 22, release

### Improving the Performance of Data Mining Models with Data Preparation Using SAS Enterprise Miner Ricardo Galante, SAS Institute Brasil, São Paulo, SP

Improving the Performance of Data Mining Models with Data Preparation Using SAS Enterprise Miner Ricardo Galante, SAS Institute Brasil, São Paulo, SP ABSTRACT In data mining modelling, data preparation

### A Comparison of Decision Tree and Logistic Regression Model Xianzhe Chen, North Dakota State University, Fargo, ND

Paper D02-2009 A Comparison of Decision Tree and Logistic Regression Model Xianzhe Chen, North Dakota State University, Fargo, ND ABSTRACT This paper applies a decision tree model and logistic regression

### Role of Customer Response Models in Customer Solicitation Center s Direct Marketing Campaign

Role of Customer Response Models in Customer Solicitation Center s Direct Marketing Campaign Arun K Mandapaka, Amit Singh Kushwah, Dr.Goutam Chakraborty Oklahoma State University, OK, USA ABSTRACT Direct

### Students' Opinion about Universities: The Faculty of Economics and Political Science (Case Study)

Cairo University Faculty of Economics and Political Science Statistics Department English Section Students' Opinion about Universities: The Faculty of Economics and Political Science (Case Study) Prepared

### A LOGISTIC REGRESSION MODEL TO PREDICT FRESHMEN ENROLLMENTS Vijayalakshmi Sampath, Andrew Flagel, Carolina Figueroa

A LOGISTIC REGRESSION MODEL TO PREDICT FRESHMEN ENROLLMENTS Vijayalakshmi Sampath, Andrew Flagel, Carolina Figueroa ABSTRACT Predictive modeling is the technique of using historical information on a certain

### WebFOCUS RStat. RStat. Predict the Future and Make Effective Decisions Today. WebFOCUS RStat

Information Builders enables agile information solutions with business intelligence (BI) and integration technologies. WebFOCUS the most widely utilized business intelligence platform connects to any enterprise

### International Statistical Institute, 56th Session, 2007: Phil Everson

Teaching Regression using American Football Scores Everson, Phil Swarthmore College Department of Mathematics and Statistics 5 College Avenue Swarthmore, PA198, USA E-mail: peverso1@swarthmore.edu 1. Introduction

### Overview Classes. 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7)

Overview Classes 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7) 2-4 Loglinear models (8) 5-4 15-17 hrs; 5B02 Building and

### Linear Regression in SPSS

Linear Regression in SPSS Data: mangunkill.sav Goals: Examine relation between number of handguns registered (nhandgun) and number of man killed (mankill) checking Predict number of man killed using number

### Data Mining Techniques Chapter 4: Data Mining Applications in Marketing and Customer Relationship Management

Data Mining Techniques Chapter 4: Data Mining Applications in Marketing and Customer Relationship Management Prospecting........................................................... 2 DM to choose the right

### Variable Selection in the Credit Card Industry Moez Hababou, Alec Y. Cheng, and Ray Falk, Royal Bank of Scotland, Bridgeport, CT

Variable Selection in the Credit Card Industry Moez Hababou, Alec Y. Cheng, and Ray Falk, Royal ank of Scotland, ridgeport, CT ASTRACT The credit card industry is particular in its need for a wide variety

### ln(p/(1-p)) = α +β*age35plus, where p is the probability or odds of drinking

Dummy Coding for Dummies Kathryn Martin, Maternal, Child and Adolescent Health Program, California Department of Public Health ABSTRACT There are a number of ways to incorporate categorical variables into

### SAS Software to Fit the Generalized Linear Model

SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling

### Insurance Analytics - analýza dat a prediktivní modelování v pojišťovnictví. Pavel Kříž. Seminář z aktuárských věd MFF 4.

Insurance Analytics - analýza dat a prediktivní modelování v pojišťovnictví Pavel Kříž Seminář z aktuárských věd MFF 4. dubna 2014 Summary 1. Application areas of Insurance Analytics 2. Insurance Analytics

### How to set the main menu of STATA to default factory settings standards

University of Pretoria Data analysis for evaluation studies Examples in STATA version 11 List of data sets b1.dta (To be created by students in class) fp1.xls (To be provided to students) fp1.txt (To be

### SUGI 29 Statistics and Data Analysis

Paper 194-29 Head of the CLASS: Impress your colleagues with a superior understanding of the CLASS statement in PROC LOGISTIC Michelle L. Pritchard and David J. Pasta Ovation Research Group, San Francisco,

### Improved Interaction Interpretation: Application of the EFFECTPLOT statement and other useful features in PROC LOGISTIC

Paper AA08-2013 Improved Interaction Interpretation: Application of the EFFECTPLOT statement and other useful features in PROC LOGISTIC Robert G. Downer, Grand Valley State University, Allendale, MI ABSTRACT

### Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets

Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets http://info.salford-systems.com/jsm-2015-ctw August 2015 Salford Systems Course Outline Demonstration of two classification

### Using the Profitability Factor and Big Data to Combat Customer Churn

WHITE PAPER Using the Profitability Factor and Big Data to Combat Customer Churn Succeed. Transform. Compute. Perform. Succeed. Transform. Compute. Perform. Using the Profitability Factor and Big Data

### Gamma Distribution Fitting

Chapter 552 Gamma Distribution Fitting Introduction This module fits the gamma probability distributions to a complete or censored set of individual or grouped data values. It outputs various statistics

### Forecasting in STATA: Tools and Tricks

Forecasting in STATA: Tools and Tricks Introduction This manual is intended to be a reference guide for time series forecasting in STATA. It will be updated periodically during the semester, and will be

### Customer Profiling for Marketing Strategies in a Healthcare Environment MaryAnne DePesquo, Phoenix, Arizona

Paper 1285-2014 Customer Profiling for Marketing Strategies in a Healthcare Environment MaryAnne DePesquo, Phoenix, Arizona ABSTRACT In this new era of healthcare reform, health insurance companies have

### Introduction: Laurent Lo de Janvry

-- Mining Your Data To Maximize Your Fundraising Potential Laurent (Lo) de Janvry UC Berkeley Haas School of Business CASE VII Tarak Shah UC Berkeley University Relations Introduction: Laurent Lo de Janvry

### Can Annuity Purchase Intentions Be Influenced?

Can Annuity Purchase Intentions Be Influenced? Jodi DiCenzo, CFA, CPA Behavioral Research Associates, LLC Suzanne Shu, Ph.D. UCLA Anderson School of Management Liat Hadar, Ph.D. The Arison School of Business,

### ASSIGNMENT 4 PREDICTIVE MODELING AND GAINS CHARTS

DATABASE MARKETING Fall 2015, max 24 credits Dead line 15.10. ASSIGNMENT 4 PREDICTIVE MODELING AND GAINS CHARTS PART A Gains chart with excel Prepare a gains chart from the data in \\work\courses\e\27\e20100\ass4b.xls.

### Some Essential Statistics The Lure of Statistics

Some Essential Statistics The Lure of Statistics Data Mining Techniques, by M.J.A. Berry and G.S Linoff, 2004 Statistics vs. Data Mining..lie, damn lie, and statistics mining data to support preconceived

### Variables and Data A variable contains data about anything we measure. For example; age or gender of the participants or their score on a test.

The Analysis of Research Data The design of any project will determine what sort of statistical tests you should perform on your data and how successful the data analysis will be. For example if you decide

### Geostatistics Exploratory Analysis

Instituto Superior de Estatística e Gestão de Informação Universidade Nova de Lisboa Master of Science in Geospatial Technologies Geostatistics Exploratory Analysis Carlos Alberto Felgueiras cfelgueiras@isegi.unl.pt

### Accurately and Efficiently Measuring Individual Account Credit Risk On Existing Portfolios

Accurately and Efficiently Measuring Individual Account Credit Risk On Existing Portfolios By: Michael Banasiak & By: Daniel Tantum, Ph.D. What Are Statistical Based Behavior Scoring Models And How Are

### The Science and Art of Market Segmentation Using PROC FASTCLUS Mark E. Thompson, Forefront Economics Inc, Beaverton, Oregon

The Science and Art of Market Segmentation Using PROC FASTCLUS Mark E. Thompson, Forefront Economics Inc, Beaverton, Oregon ABSTRACT Effective business development strategies often begin with market segmentation,

### Methods for Interaction Detection in Predictive Modeling Using SAS Doug Thompson, PhD, Blue Cross Blue Shield of IL, NM, OK & TX, Chicago, IL

Paper SA01-2012 Methods for Interaction Detection in Predictive Modeling Using SAS Doug Thompson, PhD, Blue Cross Blue Shield of IL, NM, OK & TX, Chicago, IL ABSTRACT Analysts typically consider combinations

### 4 Other useful features on the course web page. 5 Accessing SAS

1 Using SAS outside of ITCs Statistical Methods and Computing, 22S:30/105 Instructor: Cowles Lab 1 Jan 31, 2014 You can access SAS from off campus by using the ITC Virtual Desktop Go to https://virtualdesktopuiowaedu

### Credit Risk Analysis Using Logistic Regression Modeling

Credit Risk Analysis Using Logistic Regression Modeling Introduction A loan officer at a bank wants to be able to identify characteristics that are indicative of people who are likely to default on loans,

### Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression

Data Mining and Data Warehousing Henryk Maciejewski Data Mining Predictive modelling: regression Algorithms for Predictive Modelling Contents Regression Classification Auxiliary topics: Estimation of prediction

### !"!!"#\$\$%&'()*+\$(,%!"#\$%\$&'()*""%(+,'-*&./#-\$&'(-&(0*".\$#-\$1"(2&."3\$'45"

!"!!"#\$\$%&'()*+\$(,%!"#\$%\$&'()*""%(+,'-*&./#-\$&'(-&(0*".\$#-\$1"(2&."3\$'45"!"#"\$%&#'()*+',\$\$-.&#',/"-0%.12'32./4'5,5'6/%&)\$).2&'7./&)8'5,5'9/2%.%3%&8':")08';:

### Center: Finding the Median. Median. Spread: Home on the Range. Center: Finding the Median (cont.)

Center: Finding the Median When we think of a typical value, we usually look for the center of the distribution. For a unimodal, symmetric distribution, it s easy to find the center it s just the center

### An Application of the Cox Proportional Hazards Model to the Construction of Objective Vintages for Credit in Financial Institutions, Using PROC PHREG

Paper 3140-2015 An Application of the Cox Proportional Hazards Model to the Construction of Objective Vintages for Credit in Financial Institutions, Using PROC PHREG Iván Darío Atehortua Rojas, Banco Colpatria

### Using An Ordered Logistic Regression Model with SAS Vartanian: SW 541

Using An Ordered Logistic Regression Model with SAS Vartanian: SW 541 libname in1 >c:\=; Data first; Set in1.extract; A=1; PROC LOGIST OUTEST=DD MAXITER=100 ORDER=DATA; OUTPUT OUT=CC XBETA=XB P=PROB; MODEL

### Easily Identify Your Best Customers

IBM SPSS Statistics Easily Identify Your Best Customers Use IBM SPSS predictive analytics software to gain insight from your customer database Contents: 1 Introduction 2 Exploring customer data Where do

### NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

### Free Trial - BIRT Analytics - IAAs

Free Trial - BIRT Analytics - IAAs 11. Predict Customer Gender Once we log in to BIRT Analytics Free Trial we would see that we have some predefined advanced analysis ready to be used. Those saved analysis

### STATISTICA. Financial Institutions. Case Study: Credit Scoring. and

Financial Institutions and STATISTICA Case Study: Credit Scoring STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Web-based Analytics Table of Contents INTRODUCTION: WHAT

### A SAS White Paper: Implementing a CRM-based Campaign Management Strategy

A SAS White Paper: Implementing a CRM-based Campaign Management Strategy Table of Contents Introduction.......................................................................... 1 CRM and Campaign Management......................................................

### New SAS Procedures for Analysis of Sample Survey Data

New SAS Procedures for Analysis of Sample Survey Data Anthony An and Donna Watts, SAS Institute Inc, Cary, NC Abstract Researchers use sample surveys to obtain information on a wide variety of issues Many

### Data Mining Part 2. Data Understanding and Preparation 2.1 Data Understanding Spring 2010

Data Mining Part 2. and Preparation 2.1 Spring 2010 Instructor: Dr. Masoud Yaghini Introduction Outline Introduction Measuring the Central Tendency Measuring the Dispersion of Data Graphic Displays References

### A Guide for a Selection of SPSS Functions

A Guide for a Selection of SPSS Functions IBM SPSS Statistics 19 Compiled by Beth Gaedy, Math Specialist, Viterbo University - 2012 Using documents prepared by Drs. Sheldon Lee, Marcus Saegrove, Jennifer

### USING LOGISTIC REGRESSION TO PREDICT CUSTOMER RETENTION. Andrew H. Karp Sierra Information Services, Inc. San Francisco, California USA

USING LOGISTIC REGRESSION TO PREDICT CUSTOMER RETENTION Andrew H. Karp Sierra Information Services, Inc. San Francisco, California USA Logistic regression is an increasingly popular statistical technique

### A Tutorial on Logistic Regression

A Tutorial on Logistic Regression Ying So, SAS Institute Inc., Cary, NC ABSTRACT Many procedures in SAS/STAT can be used to perform logistic regression analysis: CATMOD, GENMOD,LOGISTIC, and PROBIT. Each

### SOA 2013 Life & Annuity Symposium May 6-7, 2013. Session 30 PD, Predictive Modeling Applications for Life and Annuity Pricing and Underwriting

SOA 2013 Life & Annuity Symposium May 6-7, 2013 Session 30 PD, Predictive Modeling Applications for Life and Annuity Pricing and Underwriting Moderator: Barry D. Senensky, FSA, FCIA, MAAA Presenters: Jonathan

### Getting Correct Results from PROC REG

Getting Correct Results from PROC REG Nathaniel Derby, Statis Pro Data Analytics, Seattle, WA ABSTRACT PROC REG, SAS s implementation of linear regression, is often used to fit a line without checking

### Risk pricing for Australian Motor Insurance

Risk pricing for Australian Motor Insurance Dr Richard Brookes November 2012 Contents 1. Background Scope How many models? 2. Approach Data Variable filtering GLM Interactions Credibility overlay 3. Model

### Multivariate Logistic Regression

1 Multivariate Logistic Regression As in univariate logistic regression, let π(x) represent the probability of an event that depends on p covariates or independent variables. Then, using an inv.logit formulation

### Basic Statistics and Data Analysis for Health Researchers from Foreign Countries

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Volkert Siersma siersma@sund.ku.dk The Research Unit for General Practice in Copenhagen Dias 1 Content Quantifying association

### A Property and Casualty Insurance Predictive Modeling Process in SAS

Paper 11422-2016 A Property and Casualty Insurance Predictive Modeling Process in SAS Mei Najim, Sedgwick Claim Management Services ABSTRACT Predictive analytics is an area that has been developing rapidly

### Binary Logistic Regression

Binary Logistic Regression Main Effects Model Logistic regression will accept quantitative, binary or categorical predictors and will code the latter two in various ways. Here s a simple model including

### An Overview and Evaluation of Decision Tree Methodology

An Overview and Evaluation of Decision Tree Methodology ASA Quality and Productivity Conference Terri Moore Motorola Austin, TX terri.moore@motorola.com Carole Jesse Cargill, Inc. Wayzata, MN carole_jesse@cargill.com

### 10-3 Measures of Central Tendency and Variation

10-3 Measures of Central Tendency and Variation So far, we have discussed some graphical methods of data description. Now, we will investigate how statements of central tendency and variation can be used.

### Gerry Hobbs, Department of Statistics, West Virginia University

Decision Trees as a Predictive Modeling Method Gerry Hobbs, Department of Statistics, West Virginia University Abstract Predictive modeling has become an important area of interest in tasks such as credit

### Paper AA-08-2015. Get the highest bangs for your marketing bucks using Incremental Response Models in SAS Enterprise Miner TM

Paper AA-08-2015 Get the highest bangs for your marketing bucks using Incremental Response Models in SAS Enterprise Miner TM Delali Agbenyegah, Alliance Data Systems, Columbus, Ohio 0.0 ABSTRACT Traditional

CoolaData Predictive Analytics 9 3 6 About CoolaData CoolaData empowers online companies to become proactive and predictive without having to develop, store, manage or monitor data themselves. It is an

### Abbas S. Tavakoli, DrPH, MPH, ME 1 ; Nikki R. Wooten, PhD, LISW-CP 2,3, Jordan Brittingham, MSPH 4

1 Paper 1680-2016 Using GENMOD to Analyze Correlated Data on Military System Beneficiaries Receiving Inpatient Behavioral Care in South Carolina Care Systems Abbas S. Tavakoli, DrPH, MPH, ME 1 ; Nikki

### Customer Life Time Value

Customer Life Time Value Tomer Kalimi, Jacob Zahavi and Ronen Meiri Contents Introduction... 2 So what is the LTV?... 2 LTV in the Gaming Industry... 3 The Modeling Process... 4 Data Modeling... 5 The

### Data mining and statistical models in marketing campaigns of BT Retail

Data mining and statistical models in marketing campaigns of BT Retail Francesco Vivarelli and Martyn Johnson Database Exploitation, Segmentation and Targeting group BT Retail Pp501 Holborn centre 120

### Cool Tools for PROC LOGISTIC

Cool Tools for PROC LOGISTIC Paul D. Allison Statistical Horizons LLC and the University of Pennsylvania March 2013 www.statisticalhorizons.com 1 New Features in LOGISTIC ODDSRATIO statement EFFECTPLOT

### A fast, powerful data mining workbench designed for small to midsize organizations

FACT SHEET SAS Desktop Data Mining for Midsize Business A fast, powerful data mining workbench designed for small to midsize organizations What does SAS Desktop Data Mining for Midsize Business do? Business

### VI. Introduction to Logistic Regression

VI. Introduction to Logistic Regression We turn our attention now to the topic of modeling a categorical outcome as a function of (possibly) several factors. The framework of generalized linear models

### Data exploration with Microsoft Excel: analysing more than one variable

Data exploration with Microsoft Excel: analysing more than one variable Contents 1 Introduction... 1 2 Comparing different groups or different variables... 2 3 Exploring the association between categorical

### Official SAS Curriculum Courses

Certificate course in Predictive Business Analytics Official SAS Curriculum Courses SAS Programming Base SAS An overview of SAS foundation Working with SAS program syntax Examining SAS data sets Accessing

### Data Mining Techniques Chapter 6: Decision Trees

Data Mining Techniques Chapter 6: Decision Trees What is a classification decision tree?.......................................... 2 Visualizing decision trees...................................................

### Unit 12 Logistic Regression Supplementary Chapter 14 in IPS On CD (Chap 16, 5th ed.)

Unit 12 Logistic Regression Supplementary Chapter 14 in IPS On CD (Chap 16, 5th ed.) Logistic regression generalizes methods for 2-way tables Adds capability studying several predictors, but Limited to

### The Predictive Data Mining Revolution in Scorecards:

January 13, 2013 StatSoft White Paper The Predictive Data Mining Revolution in Scorecards: Accurate Risk Scoring via Ensemble Models Summary Predictive modeling methods, based on machine learning algorithms

### Exercise 1.12 (Pg. 22-23)

Individuals: The objects that are described by a set of data. They may be people, animals, things, etc. (Also referred to as Cases or Records) Variables: The characteristics recorded about each individual.

### SUMAN DUVVURU STAT 567 PROJECT REPORT

SUMAN DUVVURU STAT 567 PROJECT REPORT SURVIVAL ANALYSIS OF HEROIN ADDICTS Background and introduction: Current illicit drug use among teens is continuing to increase in many countries around the world.

### MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group

MISSING DATA TECHNIQUES WITH SAS IDRE Statistical Consulting Group ROAD MAP FOR TODAY To discuss: 1. Commonly used techniques for handling missing data, focusing on multiple imputation 2. Issues that could

### Detecting Email Spam. MGS 8040, Data Mining. Audrey Gies Matt Labbe Tatiana Restrepo

Detecting Email Spam MGS 8040, Data Mining Audrey Gies Matt Labbe Tatiana Restrepo 5 December 2011 INTRODUCTION This report describes a model that may be used to improve likelihood of recognizing undesirable

### IBM SPSS Direct Marketing 19

IBM SPSS Direct Marketing 19 Note: Before using this information and the product it supports, read the general information under Notices on p. 105. This document contains proprietary information of SPSS

### 6 Variables: PD MF MA K IAH SBS

options pageno=min nodate formdlim='-'; title 'Canonical Correlation, Journal of Interpersonal Violence, 10: 354-366.'; data SunitaPatel; infile 'C:\Users\Vati\Documents\StatData\Sunita.dat'; input Group

### Logistic regression diagnostics

Logistic regression diagnostics Biometry 755 Spring 2009 Logistic regression diagnostics p. 1/28 Assessing model fit A good model is one that fits the data well, in the sense that the values predicted