# USING LOGISTIC REGRESSION TO PREDICT CUSTOMER RETENTION. Andrew H. Karp Sierra Information Services, Inc. San Francisco, California USA

Save this PDF as:

Size: px
Start display at page:

Download "USING LOGISTIC REGRESSION TO PREDICT CUSTOMER RETENTION. Andrew H. Karp Sierra Information Services, Inc. San Francisco, California USA"

## Transcription

1 USING LOGISTIC REGRESSION TO PREDICT CUSTOMER RETENTION Andrew H. Karp Sierra Information Services, Inc. San Francisco, California USA Logistic regression is an increasingly popular statistical technique used to model the probability of discrete (i.e., binary or multinomial) outcomes. When properly applied, logistic regression analyses yield very powerful insights in to what attributes (i.e., variables) are more or less likely to predict event outcome in a population of interest. These models also show the extent to which changes in the values of the attributes may increase or decrease the predicted probability of event outcome. This approach to pattern recognition or data mining is particularly well suited to applied statistical analyses of consumer behavior. Logistic regression models are frequently employed to assess the chance that a customer will: a) re-purchase a product, b) remain a customer, or c) respond to a direct mail or other marketing stimulus. This paper discusses how logistic regression, and its implementation in the SAS/STAT module of the SAS System, can be employed in these situations. We will focus on concepts and implementation, rather than the statistical theory underlying logistic regression; the reader is encourage to consult the sources listed at the end of the paper for more information about the statistical theory and other details of logistic regression modeling. What is Logistic Regression? Logistic regression refers to statistical models where the dependent, or outcome variable, is categorical, rather than continuous. The logistic function maps or translates changes in the values of the continuous or dichotomous independent variables on the right-hand side of the equation to increasing or decreasing probability of the event modeled by the dependent, or left-hand-side, variable. These statements highlight the key difference between logistic regression and regular (more formally, ordinary least squares) regression: in logistic regression, the predicted value of the dependent variable being generated by operations on the right-hand-side variables is a probability. But, in ordinary least squares regression we are predicting the population mean value of the dependent variable at given levels (i.e., values) of the independent variable(s) in the model. Economists frequently call logistic regression a qualitative choice model, and for obvious reasons: a logistic regression model helps us assess probability which qualities or outcomes will be chosen (selected) by the population under analysis. When proper care is taken to create an appropriate dependent variable, logistic regression is often a superior (both substantively and statistically) alternative to other tools available to model event outcomes. The SAS System s implementation of logistic regression techniques include a wealth of tools for the analyst to use to first construct a model, and then test its ability to perform well in the population from which the data under analysis are assumed to be a random sample. Before proceeding further a clarification is warranted as to the nature of the dependent variable. In many analytic scenarios the customer (or other unit of analysis ) may have more than one choice available to them. For example, during open enrollment a health plan participant may be offered a chance to: maintain their current health care plan, choose one with a richer benefits mix, select one with fewer benefits, or select a plan from a competing provider. In this situation a multinomial logistic regression model might be employed, rather than a binary or dichotomous logistic regression model. While multinomial models can be implemented in both PROC LOGISTIC and PROC GENMOD, they are beyond the scope of this discussion, which will be limited to models where a binary (i.e., one outcome or the other ) outcome is to be analyzed. Why is Logistic Regression Often a Superior Alternative to Other Methods to Analyze Event Outcome? Other approaches besides logistic regression are often proposed to develop models of event outcome or qualitative choice. These include: using ordinary least squares regression with a binary (zero/one) dependent variable, discriminant analysis, and probit analysis. While each of these competing methods are applicable to the discrete choice modeling scenario, the logistic regression model is most often a superior approach for a number of statistical (as well as substantive reasons), including: The logistic regression equation limits generation of the predicted values of the dependent variable to lie in the interval between zero and one; whereas OLS regression often results in values of the dependent variable take on values of less than zero or greater than one,

2 which are substantively irrelevant and have no interpretative value. A simple transformation (exponentiation) of the logistic regression model s parameters leads to an easily interpretable and explainable quantity: the odds ratio. In more recent releases of SAS/STAT software the odds ratio is printed for each independent variable (parameter) in the model. A number of useful tests for assessing model adequacy and fit are available for logistic regression models. These include: measures similar to the coefficient of determination in OLS regression; a generalized test (Hosmer- Lemeshow) for lack of model fit, and the ability to develop tables which show the proportion of cases under analysis which have been correctly classified; false positive and negative rates; and the sensitivity and specificity of the model. Tests are also available for influential and other ill-fitting observations. Parameter estimates generated from a logistic regression model can be applied in a simple Data Step to the population of interest, this scoring, or creating a probability of event outcome for each member of the population. This score can then be used to select subsets of the population for various treatments as may be appropriate to the substantive issue under analysis. Implementation of logistic regression models in the SAS System in PROC LOGISTIC is very similar to how OLS regression models are implemented in PROCs REG and GLM. If you are already familiar with how to perform OLS regression in PROC REG then learning how to use PROC LOGISTIC for binary outcome modeling is a straightforward task. As with PROCs REG and GLM, separate output data sets containing predicted values and parameter estimates can be created for subsequent analysis or scoring. Construction of the Dependent Variable: a Critical Consideration In many analytic situations where logistic regression is the method of choice the analyst has several or more independent variables to use in the modeling process. These variables may be the result of some naturally occurring or observable aspect of the behavior of the units under analysis, or be constructed from other variables. For example, an analyst developing a model predicting re-enrollment in a health insurance plan may have data for each member s interaction with both the health plans administrative apparatus and health care utilization in the prior plan year. The analyst can, with appropriate prior knowledge of the substantive relevance of the data at hand, as well as knowledge of how to use the SAS System s Data Step to operate on variables in a SAS data set, obtain or construct variables which can be employed in the modeling process. These variables may include: number of times member called the health plan for information, number of physician office visits, whether or not the member changed primary care physicians during the previous plan year, and answers to a customer satisfaction survey. Using this example, it is clear that the analyst has an array of attributes (variables) from which to choose in developing models. What is not yet available is the variable which can employed as the outcome or dependent variable. In logistic regression analyses it is often the analyst s responsibility to construct the dependent variable based on an agreed-upon definition of what constitutes the event of interest which is being modeled. From a statistical standpoint, the dichotomous, nominal level dependent variable must be both discriminating and exhaustive. The variable has two levels (i.e., it is dichotomous), and the values of the variable can choose (or discriminate ) between observations in the population of interest. Finally, the dependent variable is considered exhaustive because each individual observation can be assigned to one, and only one, of its values. In most consumer behavior studies the dependent variable represents the analyst s operationalization of the construct of interest, rather than a phenomenon which naturally occurs in the environment from which the values of independent variables measured. Creation of the dependent variable therefore depends on the analyst s own understanding of the consumer phenomenon of interest, as well as any internal rules their organization follows when defining the outcome of interest. Again returning to the health care re-enrollment example from above, a health plan s management team may define attrition or failure to re-enroll as situations where a member fails to return the re-enrollment card within 30 days of its due date. Or, in a response modeling scenario, a direct mail firm may define non-response to an advertisement as failure to respond within 45 days of mailout. These rules are carefully implemented by the analyst during construction of the dependent variable in a Data Step. This often requires clear communication between the analyst and others in their organization as to how the rules will be translated in to SAS programming language statements that will construct the dependent

3 variable. In many applied situations construction of the dependent variable, once the rules for it have been agreedupon, requires extensive Data Step manipulation of variables and data sets. Failure to respond to a mail circular may, for example, be determined only by matchmerging two data sets, one with containing customer information for all persons to whom the mailing was sent, and another with just those who responded to it within the agreed-upon time frame. If a customer is found in the mailed to data set, but not found in the responses data set, the analyst may then use appropriate Data Step coding to classify the customer as a non respondent. Similarly, a customer who has been found on both data sets will be classified as a respondent to the mail campaign. From an applied standpoint, appropriate construction of the dependent variable cannot be overemphasized. Failure on the analyst s part to correctly understand both the construct (i.e., event) whose probability is to be modeled, as well as how to code the rules which operationalize the construct may result in models with limited substantive as well as statistical usefulness. Implementing and Interpreting a Logistic Regression Model We now turn to the implementation and interpretation of a logistic regression model. PROC LOGISTIC, in the SAS/STAT module, contains the tools necessary to apply a logistic regression model to a data set and assess its results. As with its ordinary least squares counterpart, PROC REG, the MODEL Statement in PROC LOGISTIC contains the dependent variable to the left of an equals sign (=) and the name(s) of the independent variable(s) on the right hand side of the equals sign. Options are placed to the right of a slash or stroke mark (/) following the last independent variable in the model. As with PROC REG, the optional OUTPUT statement in PROC LOGISTIC can be used to create an output SAS data set. This topic will be explored in detail below. Output from successful execution of a PROC LOGISTIC step in a SAS Software program yields a wealth of information about both the overall fit of the model, as well as the statistical significance of each of its parameters. The 2LOGL test is analogous to the global F test in OLS regression, and is used to determine the overall impact of the independent variables on predicting increasing or decreasing probability of event outcome. The local chi-square tests are the counterparts to the local t-tests in OLS regression, as the test whether or not the associated parameter is statistically significant. As mentioned previously, exponentiation of the parameter estimates yielded by PROC LOGISTIC yields a very useful and informative value: the odds ratio. This quantity represents the increasing or decreasing probability of event outcome per unit change in the value of the independent variable. The odds ratio is therefore a very valuable substantive, as well as statistical, outcome of applying a logistic regression model to a customer retention or other qualitative choice scenario. The odds ratio is: a) easily calculated (and printed by default by PROC LOGISTIC in recent releases of SAS/STAT software), and, b) easily interpreted by non-quantitative personnel who might use the results of a logistic regression model to plan a marketing or other campaign Since customer retention efforts also have an economic component (e.g., how much will it cost to keep a client? ), articulation of the results of a model in the context of the parameter s odds ratios makes it much easier for all decision makers in the customer retention effort to grasp the substantive impact of the parameters on the phenomenon under study. In my own experience, telling a financial institution s marketing manager if we add a credit card to a household s product portfolio, in addition to their checking and saving accounts, the probability we will keep them as a customer for another year increases by x percent, or being able to explain to an HMO s sales team that for every additional year the member has been with our HMO, the chance they will re-enroll increases by y percent permits the substantive relevance of a logistic regression model to be readily understood by those who will use its results to plan or modify strategy. Customized odds ratios, as well as confidence intervals for odds ratios are available from PROC LOGISTIC using the UNITS and RISKLIMITS options. By default, invocation of the RISKLIMITS option generates 95% confidence intervals around the odds ratios; users can specify other intervals using the ALPHA option in conjunction with the RISKLIMITS option. Selecting the Optimal Subset of Independent Variables In many modeling situations the analyst may have many variables to consider for inclusion in their model(s). In addition to relying on the substantive significance of the variables from which to choose, PROC LOGISTIC permits implementation of automated variable subset selection tools such as forward, backward and stepwise generation of optimal subsets. As with any other type of modeling effort, the results selected as best via an automated subset selection process implemented in PROC LOGISTIC should be carefully examined in both the statistical and substantive relevance of the variables chosen. In addition, both multicollinearity and influential observations are issues in logistic regression just as they are with ordinary least squares regression. Various measures for both conditions are available by specifying the INFLUENCE and IPLOTS options in PROC LOGISTIC. The reader is directed to pp

4 of SAS Institute publication SAS/STAT Software: Changes and Enhancements Through Release 6.12 (full citation below) for more information on diagnostic tests implemented via specification of these options. Choosing from Competing Models As with other forms of statistical modeling, the logistic regression modeling effort frequently yields more than one competing model of the phenomenon under study. Fortunately, PROC LOGISTIC provides several tests which help assess: a) how well the independent variables fit the model; and, b) the model s ability to correctly predict the outcome modeled by the dependent variable. Among the measures of model fit available in PROC LOGISTIC are the Hosmer-Lemeshow goodness of fit test (generated by the LACKFIT option in the MODEL statement) and several measures similar to the coefficient of determination obtained in ordinary least squares regression. These r-square like statistics are generated when the RSQUARE option is invoked. These measures generalize the coefficient of determination to the categorical modeling situation and help assess how well the independent variables fit the model. Another approach is to assess the sensitivity (proportion of observations with the condition of interest which are predicted by the model to have the condition of interest) and/or specificity (proportion of cases without the condition of interest which are predicted by the model to NOT have the condition of interest). In addition, the false positive and false negative rates of several competing models may also be of interest. These and other common measures of model adequacy are portrayed in classification tables generated by PROC LOGISTIC when the CTABLE option is specified. If the analyst has a specific (or range) of prior probabilities of event occurrence for which they want a classification table generated, the PPROB option results in limiting generation of the classification table to just those desired by the analyst. Implementing the Results of a Logistic Regression Model Once a model predicting customer retention has been selected (usually on the basis of a combination of statistical and substantive considerations), attention usually turns to the implementation of the model in the organization. In my experience, there are often two distinct aspects to how an organization applies the results of a model. The first aspect is dissemination of the model s results to others in the organization. Employees may be told, for example, about the results of the model (in a nonstatistical fashion) via internal communication channels. Managers may be tasked to implement the results of the model via changes in strategy or on going programs. For example, the credit card marketing department of a financial institution may decide to increase its offers of credit cards to (creditworthy) customers if the results of a model suggest that adding this product to a customer s portfolio increases the chance of retention. In one organization where I have consulted, management compensation plans were changed to provide financial incentives to those line managers who were successful in maintaining or improving customer retention rates. As part of the compensation plan s implementation managers attended training sessions where the retention model was discussed in the context of the drivers of customer retention and how these managers were responsible for employee performance on these drivers. A second, and very common, aspect to model implementation is to score or apply the selected model s parameters to every customer of the organization commissioning the model. Most customer retention modeling efforts take place with a (hopefully) random sample of a larger population of interest. Once a model has been chosen, its parameters are applied to the entire population of interest, thereby obtaining a probability to stay (or leave) score for each customer. Once generated, these scores can then be used to identify groups of customers who are more (or less) likely to stay (or leave), and marketing efforts tailored accordingly. The OUTEST option in the PROC LOGISTIC statement will generate an output SAS data set containing model parameters. The variables in this data set can be used in subsequent Data Steps to implement the scoring process. Summary and Conclusions Logistic regression is a very powerful tool in building models of customer retention. SAS System software implements this tool in both PROC LOGISTIC in the SAS/STAT module and the forthcoming Enterprise Miner product. When applied properly, logistic regression models can yield powerful insights in to why some customers leave and others stay. These insights can then be employed to modify organizational strategies and/or assess the impact of the implementation of these strategies. As with any other form of statistical modeling, care must be taken to carefully construct variables representing the phenomenon of interest and to select candidate independent variables that are substantively relevant to the issues under analysis. Finally, the dynamic nature of most systems of customer choice suggests that periodic re-assessment of the selected model is highly appropriate. As customer

5 demands and needs, as well as the environment in which the organization competes, change, so must the models the organization employs to predict customer retention. Note: SAS, SAS/STAT and Enterprise Miner are trademarks of SAS Institute, Cary, NC. USA. References Berry, Michael J.A., and Gordon Linoff, Data Mining Techniques For Marketing, Sales and Customer Support, Wiley, 1997 Groth. Robert, Data Mining: a Hands-On Approach for Business Professionals, Prentice-Hall, 1998 Hosmer and Lemeshow, Applied Logistic Regression, Wiley:, 1989 Kennedy, Ruby L, et,al, Solving Data Mining Problems Through Pattern Recognition, Prentice-Hall, 1997 SAS Institute, Inc: SAS/STAT Software, Volume 2: the LOGISTIC Procedure SAS Institute, Inc.: SAS/STAT Technical Report P-229, SAS/STAT Software: Changes and Enhancements, Release 6.07 SAS Institute, Inc.: SAS/STAT Software: Changes and Enhancements Release 6.10 SAS Institute, Inc.: Logistic Regression Examples Using the SAS System (1995) SAS Institute, Inc: SAS/STAT Software: Changes and Enhancements through Release 6.12 (1997) Weiss, Sholom M., and Nitin Indurkhya, Predictive Data Mining: A Practical Guide, Morgan Kaufmann, 1998 The author can be contacted at: Sierra Information Services, Inc Webster Street Suite 1308 San Francisco, California / AOL.COM

### SAS Software to Fit the Generalized Linear Model

SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling

### Application of Odds Ratio and Logistic Models in Epidemiology and Health Research

Application of Odds Ratio and Logistic Models in Epidemiology and Health Research By: Min Qi Wang, PhD; James M. Eddy, DEd; Eugene C. Fitzhugh, PhD Wang, M.Q., Eddy, J.M., & Fitzhugh, E.C.* (1995). Application

### Some Issues in Using PROC LOGISTIC for Binary Logistic Regression

Some Issues in Using PROC LOGISTIC for Binary Logistic Regression by David C. Schlotzhauer Contents Abstract 1. The Effect of Response Level Ordering on Parameter Estimate Interpretation 2. Odds Ratios

### Alex Vidras, David Tysinger. Merkle Inc.

Using PROC LOGISTIC, SAS MACROS and ODS Output to evaluate the consistency of independent variables during the development of logistic regression models. An example from the retail banking industry ABSTRACT

### Credit Risk Analysis Using Logistic Regression Modeling

Credit Risk Analysis Using Logistic Regression Modeling Introduction A loan officer at a bank wants to be able to identify characteristics that are indicative of people who are likely to default on loans,

### Multinomial and ordinal logistic regression using PROC LOGISTIC Peter L. Flom National Development and Research Institutes, Inc

ABSTRACT Multinomial and ordinal logistic regression using PROC LOGISTIC Peter L. Flom National Development and Research Institutes, Inc Logistic regression may be useful when we are trying to model a

### Andrew H. Karp Sierra Information Services, Inc. San Francisco, California USA

Indexing and Compressing SAS Data Sets: How, Why, and Why Not Andrew H. Karp Sierra Information Services, Inc. San Francisco, California USA Many users of SAS System software, especially those working

### The general form of the PROC GLM statement is

Linear Regression Analysis using PROC GLM Regression analysis is a statistical method of obtaining an equation that represents a linear relationship between two variables (simple linear regression), or

### Logistic Regression With SAS

Logistic Regression With SAS Please read my introductory handout on logistic regression before reading this one. The introductory handout can be found at. Run the program LOGISTIC.SAS from my SAS programs

### Model Fitting in PROC GENMOD Jean G. Orelien, Analytical Sciences, Inc.

Paper 264-26 Model Fitting in PROC GENMOD Jean G. Orelien, Analytical Sciences, Inc. Abstract: There are several procedures in the SAS System for statistical modeling. Most statisticians who use the SAS

: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

### %DISCIT Macro: Pre-screening continuous variables for subsequent binary logistic regression analysis through visualization

ABSTRACT Paper 92 %DISCIT Macro: Pre-screening continuous variables for subsequent binary logistic regression analysis through visualization Mohamed S. Anany, Videa (Cox Media Group subsidiary) In binary

### Data Mining: An Overview of Methods and Technologies for Increasing Profits in Direct Marketing. C. Olivia Rud, VP, Fleet Bank

Data Mining: An Overview of Methods and Technologies for Increasing Profits in Direct Marketing C. Olivia Rud, VP, Fleet Bank ABSTRACT Data Mining is a new term for the common practice of searching through

### ECLT5810 E-Commerce Data Mining Technique SAS Enterprise Miner -- Regression Model I. Regression Node

Enterprise Miner - Regression 1 ECLT5810 E-Commerce Data Mining Technique SAS Enterprise Miner -- Regression Model I. Regression Node 1. Some background: Linear attempts to predict the value of a continuous

### A Property & Casualty Insurance Predictive Modeling Process in SAS

Paper AA-02-2015 A Property & Casualty Insurance Predictive Modeling Process in SAS 1.0 ABSTRACT Mei Najim, Sedgwick Claim Management Services, Chicago, Illinois Predictive analytics has been developing

### Modeling Customer Lifetime Value Using Survival Analysis An Application in the Telecommunications Industry

Paper 12028 Modeling Customer Lifetime Value Using Survival Analysis An Application in the Telecommunications Industry Junxiang Lu, Ph.D. Overland Park, Kansas ABSTRACT Increasingly, companies are viewing

### Modeling Lifetime Value in the Insurance Industry

Modeling Lifetime Value in the Insurance Industry C. Olivia Parr Rud, Executive Vice President, Data Square, LLC ABSTRACT Acquisition modeling for direct mail insurance has the unique challenge of targeting

### Logistic Regression. Introduction. The Purpose Of Logistic Regression

Logistic Regression...1 Introduction...1 The Purpose Of Logistic Regression...1 Assumptions Of Logistic Regression...2 The Logistic Regression Equation...3 Interpreting Log Odds And The Odds Ratio...4

### A Tutorial on Logistic Regression

A Tutorial on Logistic Regression Ying So, SAS Institute Inc., Cary, NC ABSTRACT Many procedures in SAS/STAT can be used to perform logistic regression analysis: CATMOD, GENMOD,LOGISTIC, and PROBIT. Each

### A Property and Casualty Insurance Predictive Modeling Process in SAS

Paper 11422-2016 A Property and Casualty Insurance Predictive Modeling Process in SAS Mei Najim, Sedgwick Claim Management Services ABSTRACT Predictive analytics is an area that has been developing rapidly

### PROC LOGISTIC: Traps for the unwary Peter L. Flom, Independent statistical consultant, New York, NY

PROC LOGISTIC: Traps for the unwary Peter L. Flom, Independent statistical consultant, New York, NY ABSTRACT Keywords: Logistic. INTRODUCTION This paper covers some gotchas in SAS R PROC LOGISTIC. A gotcha

### Predicting Customer Churn in the Telecommunications Industry An Application of Survival Analysis Modeling Using SAS

Paper 114-27 Predicting Customer in the Telecommunications Industry An Application of Survival Analysis Modeling Using SAS Junxiang Lu, Ph.D. Sprint Communications Company Overland Park, Kansas ABSTRACT

### 1/2/2016. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2

PSY 512: Advanced Statistics for Psychological and Behavioral Research 2 When and why do we use logistic regression? Binary Multinomial Theory behind logistic regression Assessing the model Assessing predictors

### Get to Know the IBM SPSS Product Portfolio

IBM Software Business Analytics Product portfolio Get to Know the IBM SPSS Product Portfolio Offering integrated analytical capabilities that help organizations use data to drive improved outcomes 123

### How to get more value from your survey data

IBM SPSS Statistics How to get more value from your survey data Discover four advanced analysis techniques that make survey research more effective Contents: 1 Introduction 2 Descriptive survey research

Data Mining with SAS Mathias Lanner mathias.lanner@swe.sas.com Copyright 2010 SAS Institute Inc. All rights reserved. Agenda Data mining Introduction Data mining applications Data mining techniques SEMMA

### SP10 From GLM to GLIMMIX-Which Model to Choose? Patricia B. Cerrito, University of Louisville, Louisville, KY

SP10 From GLM to GLIMMIX-Which Model to Choose? Patricia B. Cerrito, University of Louisville, Louisville, KY ABSTRACT The purpose of this paper is to investigate several SAS procedures that are used in

### Figure 1. IBM SPSS Statistics Base & Associated Optional Modules

IBM SPSS Statistics: A Guide to Functionality IBM SPSS Statistics is a renowned statistical analysis software package that encompasses a broad range of easy-to-use, sophisticated analytical procedures.

### 5. Ordinal regression: cumulative categories proportional odds. 6. Ordinal regression: comparison to single reference generalized logits

Lecture 23 1. Logistic regression with binary response 2. Proc Logistic and its surprises 3. quadratic model 4. Hosmer-Lemeshow test for lack of fit 5. Ordinal regression: cumulative categories proportional

### Customer Profiling for Marketing Strategies in a Healthcare Environment MaryAnne DePesquo, Phoenix, Arizona

Paper 1285-2014 Customer Profiling for Marketing Strategies in a Healthcare Environment MaryAnne DePesquo, Phoenix, Arizona ABSTRACT In this new era of healthcare reform, health insurance companies have

### HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION HOD 2990 10 November 2010 Lecture Background This is a lightning speed summary of introductory statistical methods for senior undergraduate

### Accurately and Efficiently Measuring Individual Account Credit Risk On Existing Portfolios

Accurately and Efficiently Measuring Individual Account Credit Risk On Existing Portfolios By: Michael Banasiak & By: Daniel Tantum, Ph.D. What Are Statistical Based Behavior Scoring Models And How Are

### SUGI 29 Statistics and Data Analysis

Paper 194-29 Head of the CLASS: Impress your colleagues with a superior understanding of the CLASS statement in PROC LOGISTIC Michelle L. Pritchard and David J. Pasta Ovation Research Group, San Francisco,

### Chapter 6: Answers. Omnibus Tests of Model Coefficients. Chi-square df Sig Step Block Model.

Task Chapter 6: Answers Recent research has shown that lecturers are among the most stressed workers. A researcher wanted to know exactly what it was about being a lecturer that created this stress and

### Overview Classes. 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7)

Overview Classes 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7) 2-4 Loglinear models (8) 5-4 15-17 hrs; 5B02 Building and

### Binary Logistic Regression

Binary Logistic Regression Main Effects Model Logistic regression will accept quantitative, binary or categorical predictors and will code the latter two in various ways. Here s a simple model including

### UNIVERSITY OF WISCONSIN - MILWAUKEE HELENBADERSCHOOL OF SOCIAL WELFARE, DOCTORAL PROGRAM IN SOCIAL WORK. Statistical MethodsII (SW 962, Sec001)

UNIVERSITY OF WISCONSIN - MILWAUKEE HELENBADERSCHOOL OF SOCIAL WELFARE, DOCTORAL PROGRAM IN SOCIAL WORK Statistical MethodsII (SW 962, Sec001) Term: Spring, 2012 Instructor: Michael J. Brondino Office:

### Regression Analysis. Pekka Tolonen

Regression Analysis Pekka Tolonen Outline of Topics Simple linear regression: the form and estimation Hypothesis testing and statistical significance Empirical application: the capital asset pricing model

### Paper Beyond Breslow-Day: Homogeneity Across R x C Tables ABSTRACT INTRODUCTION SAMPLE DATA K 2 2 TABLES

Paper 74949 Beyond Breslow-Day: Homogeneity Across R x C Tables Ginny P. Lai, David R. Mink, David J. Pasta, ICON Late Phase & Outcomes Research, San Francisco, CA ABSTRACT In the epidemiological world,

### Some Essential Statistics The Lure of Statistics

Some Essential Statistics The Lure of Statistics Data Mining Techniques, by M.J.A. Berry and G.S Linoff, 2004 Statistics vs. Data Mining..lie, damn lie, and statistics mining data to support preconceived

### 2015 Workshops for Professors

SAS Education Grow with us Offered by the SAS Global Academic Program Supporting teaching, learning and research in higher education 2015 Workshops for Professors 1 Workshops for Professors As the market

### Developing Risk Adjustment Techniques Using the SAS@ System for Assessing Health Care Quality in the lmsystem@

Developing Risk Adjustment Techniques Using the SAS@ System for Assessing Health Care Quality in the lmsystem@ Yanchun Xu, Andrius Kubilius Joint Commission on Accreditation of Healthcare Organizations,

### USING ANALYTICS TO MEASURE THE VALUE OF EMPLOYEE REFERRAL PROGRAMS

USING ANALYTICS TO MEASURE THE VALUE OF EMPLOYEE REFERRAL PROGRAMS Evolv Study: The benefits of an employee referral program significantly outweigh the costs Using Analytics To Measure The Value Of Employee

### Semester 1 Statistics Short courses

Semester 1 Statistics Short courses Course: STAA0001 Basic Statistics Blackboard Site: STAA0001 Dates: Sat. March 12 th and Sat. April 30 th (9 am 5 pm) Assumed Knowledge: None Course Description Statistical

### Silvermine House Steenberg Office Park, Tokai 7945 Cape Town, South Africa Telephone: +27 21 702 4666 www.spss-sa.com

SPSS-SA Silvermine House Steenberg Office Park, Tokai 7945 Cape Town, South Africa Telephone: +27 21 702 4666 www.spss-sa.com SPSS-SA Training Brochure 2009 TABLE OF CONTENTS 1 SPSS TRAINING COURSES FOCUSING

### Official SAS Curriculum Courses

Certificate course in Predictive Business Analytics Official SAS Curriculum Courses SAS Programming Base SAS An overview of SAS foundation Working with SAS program syntax Examining SAS data sets Accessing

### (Logistic Regression) 2 *! %\$%!&2.!0(%%##%#\$" 2 \$ (dichotomy) %%!0(%-"! #\$""\$1 \$%,#*"./'(0(6 *-16*7" 22% "1 / 4"2'(0(-17")* "(4(*:#5* \$" \$2

(Logistic Regression)!"#\$ %!&'()* *%!+,-*("./.\$ Multiple Regression +%!& %%!*"0(%-"! %\$..(/" + 2 *! %\$%!&2.!0(%%##%#\$" 2 \$ (dichotomy) %%!0(%-"! #\$""\$ '()*\$%#\$"#0( 2 \$(4( 54( \$#%)#*" \$"-\$(*( \$%,#*"./'(0(6

### College Students Beliefs and Values (CSBV) Pilot Survey Methodology With a goal of developing an institutional sample that reflected diversity in

College Students Beliefs and Values (CSBV) Pilot Survey Methodology With a goal of developing an institutional sample that reflected diversity in type, control, selectivity, and geographical region, between

### Logistic Regression. http://faculty.chass.ncsu.edu/garson/pa765/logistic.htm#sigtests

Logistic Regression http://faculty.chass.ncsu.edu/garson/pa765/logistic.htm#sigtests Overview Binary (or binomial) logistic regression is a form of regression which is used when the dependent is a dichotomy

### MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS MSR = Mean Regression Sum of Squares MSE = Mean Squared Error RSS = Regression Sum of Squares SSE = Sum of Squared Errors/Residuals α = Level of Significance

### Sensitivity Analysis in Multiple Imputation for Missing Data

Paper SAS270-2014 Sensitivity Analysis in Multiple Imputation for Missing Data Yang Yuan, SAS Institute Inc. ABSTRACT Multiple imputation, a popular strategy for dealing with missing values, usually assumes

### Prediction of Stock Performance Using Analytical Techniques

136 JOURNAL OF EMERGING TECHNOLOGIES IN WEB INTELLIGENCE, VOL. 5, NO. 2, MAY 2013 Prediction of Stock Performance Using Analytical Techniques Carol Hargreaves Institute of Systems Science National University

### Master of Science in Marketing Analytics (MSMA)

Master of Science in Marketing Analytics (MSMA) COURSE DESCRIPTION The Master of Science in Marketing Analytics program teaches students how to become more engaged with consumers, how to design and deliver

### ABSTRACT INTRODUCTION

Paper SP03-2009 Illustrative Logistic Regression Examples using PROC LOGISTIC: New Features in SAS/STAT 9.2 Robert G. Downer, Grand Valley State University, Allendale, MI Patrick J. Richardson, Van Andel

### A Comparison of Decision Tree and Logistic Regression Model Xianzhe Chen, North Dakota State University, Fargo, ND

Paper D02-2009 A Comparison of Decision Tree and Logistic Regression Model Xianzhe Chen, North Dakota State University, Fargo, ND ABSTRACT This paper applies a decision tree model and logistic regression

### BOOST YOUR CONFIDENCE (INTERVALS) WITH SAS

BOOST YOUR CONFIDENCE (INTERVALS) WITH SAS Brought to you by: Peter Langlois, PhD Birth Defects Epidemiology & Surveillance Branch, Texas Dept State Health Services Background Confidence Interval Definition

### TEXT ANALYTICS INTEGRATION

TEXT ANALYTICS INTEGRATION A TELECOMMUNICATIONS BEST PRACTICES CASE STUDY VISION COMMON ANALYTICAL ENVIRONMENT Structured Unstructured Analytical Mining Text Discovery Text Categorization Text Sentiment

### LOGISTIC REGRESSION ANALYSIS

LOGISTIC REGRESSION ANALYSIS C. Mitchell Dayton Department of Measurement, Statistics & Evaluation Room 1230D Benjamin Building University of Maryland September 1992 1. Introduction and Model Logistic

### A Study to Predict No Show Probability for a Scheduled Appointment at Free Health Clinic

A Study to Predict No Show Probability for a Scheduled Appointment at Free Health Clinic Report prepared for Brandon Slama Department of Health Management and Informatics University of Missouri, Columbia

### It is important to bear in mind that one of the first three subscripts is redundant since k = i -j +3.

IDENTIFICATION AND ESTIMATION OF AGE, PERIOD AND COHORT EFFECTS IN THE ANALYSIS OF DISCRETE ARCHIVAL DATA Stephen E. Fienberg, University of Minnesota William M. Mason, University of Michigan 1. INTRODUCTION

### Improved Interaction Interpretation: Application of the EFFECTPLOT statement and other useful features in PROC LOGISTIC

Paper AA08-2013 Improved Interaction Interpretation: Application of the EFFECTPLOT statement and other useful features in PROC LOGISTIC Robert G. Downer, Grand Valley State University, Allendale, MI ABSTRACT

### Joseph Twagilimana, University of Louisville, Louisville, KY

ST14 Comparing Time series, Generalized Linear Models and Artificial Neural Network Models for Transactional Data analysis Joseph Twagilimana, University of Louisville, Louisville, KY ABSTRACT The aim

### Use Data Mining Techniques to Assist Institutions in Achieving Enrollment Goals: A Case Study

Use Data Mining Techniques to Assist Institutions in Achieving Enrollment Goals: A Case Study Tongshan Chang The University of California Office of the President CAIR Conference in Pasadena 11/13/2008

### University of Oklahoma

A peer-reviewed electronic journal. Copyright is retained by the first or sole author, who grants right of first publication to Practical Assessment, Research & Evaluation. Permission is granted to distribute

### Variable Selection in the Credit Card Industry Moez Hababou, Alec Y. Cheng, and Ray Falk, Royal Bank of Scotland, Bridgeport, CT

Variable Selection in the Credit Card Industry Moez Hababou, Alec Y. Cheng, and Ray Falk, Royal ank of Scotland, ridgeport, CT ASTRACT The credit card industry is particular in its need for a wide variety

### Improving SAS Global Forum Papers

Paper 3343-2015 Improving SAS Global Forum Papers Vijay Singh, Pankush Kalgotra, Goutam Chakraborty, Oklahoma State University, OK, US ABSTRACT Just as research is built on existing research, the references

### INTRODUCTION TO DATA MINING SAS ENTERPRISE MINER

INTRODUCTION TO DATA MINING SAS ENTERPRISE MINER Mary-Elizabeth ( M-E ) Eddlestone Principal Systems Engineer, Analytics SAS Customer Loyalty, SAS Institute, Inc. AGENDA Overview/Introduction to Data Mining

### Easily Identify Your Best Customers

IBM SPSS Statistics Easily Identify Your Best Customers Use IBM SPSS predictive analytics software to gain insight from your customer database Contents: 1 Introduction 2 Exploring customer data Where do

### Statistics in Retail Finance. Chapter 2: Statistical models of default

Statistics in Retail Finance 1 Overview > We consider how to build statistical models of default, or delinquency, and how such models are traditionally used for credit application scoring and decision

### The Probit Link Function in Generalized Linear Models for Data Mining Applications

Journal of Modern Applied Statistical Methods Copyright 2013 JMASM, Inc. May 2013, Vol. 12, No. 1, 164-169 1538 9472/13/\$95.00 The Probit Link Function in Generalized Linear Models for Data Mining Applications

### Understanding Characteristics of Caravan Insurance Policy Buyer

Understanding Characteristics of Caravan Insurance Policy Buyer May 10, 2007 Group 5 Chih Hau Huang Masami Mabuchi Muthita Songchitruksa Nopakoon Visitrattakul Executive Summary This report is intended

### An Exploration of Typing Tool Development Techniques

An Exploration of Typing Tool Development Techniques By Kevin Knight, Digital Research Inc. John Leggett, Digital Research Inc. E-Mail: kevin.knight@digitalresearch.com or John.Leggett@digitalresearch.com

### Weight of Evidence Module

Formula Guide The purpose of the Weight of Evidence (WoE) module is to provide flexible tools to recode the values in continuous and categorical predictor variables into discrete categories automatically,

### Gerry Hobbs, Department of Statistics, West Virginia University

Decision Trees as a Predictive Modeling Method Gerry Hobbs, Department of Statistics, West Virginia University Abstract Predictive modeling has become an important area of interest in tasks such as credit

### 11. Analysis of Case-control Studies Logistic Regression

Research methods II 113 11. Analysis of Case-control Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:

### Introduction to Fixed Effects Methods

Introduction to Fixed Effects Methods 1 1.1 The Promise of Fixed Effects for Nonexperimental Research... 1 1.2 The Paired-Comparisons t-test as a Fixed Effects Method... 2 1.3 Costs and Benefits of Fixed

### ElegantJ BI. White Paper. The Competitive Advantage of Business Intelligence (BI) Forecasting and Predictive Analysis

ElegantJ BI White Paper The Competitive Advantage of Business Intelligence (BI) Forecasting and Predictive Analysis Integrated Business Intelligence and Reporting for Performance Management, Operational

### JOB SATISFACTION AMONG FEMALE TEACHERS IN PUBLIC AND PRIVATE SECTORS (A CROSS-SECTIONAL STUDY)

Volume 2, 2013, Page 88-97 JOB SATISFACTION AMONG FEMALE TEACHERS IN PUBLIC AND PRIVATE SECTORS (A CROSS-SECTIONAL STUDY) Naila Amjad, Shazia Qasim Department of Statistics, Lahore College for Women University,

### Using Stata 11 & higher for Logistic Regression Richard Williams, University of Notre Dame, Last revised March 28, 2015

Using Stata 11 & higher for Logistic Regression Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised March 28, 2015 NOTE: The routines spost13, lrdrop1, and extremes are

### USING LOGIT MODEL TO PREDICT CREDIT SCORE

USING LOGIT MODEL TO PREDICT CREDIT SCORE Taiwo Amoo, Associate Professor of Business Statistics and Operation Management, Brooklyn College, City University of New York, (718) 951-5219, Tamoo@brooklyn.cuny.edu

### Chapter Seven. Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS

Chapter Seven Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS Section : An introduction to multiple regression WHAT IS MULTIPLE REGRESSION? Multiple

### Automated Statistical Modeling for Data Mining David Stephenson 1

Automated Statistical Modeling for Data Mining David Stephenson 1 Abstract. We seek to bridge the gap between basic statistical data mining tools and advanced statistical analysis software that requires

### COMP9417: Machine Learning and Data Mining

COMP9417: Machine Learning and Data Mining A note on the two-class confusion matrix, lift charts and ROC curves Session 1, 2003 Acknowledgement The material in this supplementary note is based in part

### Performance Test Suite Results for SAS 9.1 Foundation on the IBM zseries Mainframe

Performance Test Suite Results for SAS 9.1 Foundation on the IBM zseries Mainframe A SAS White Paper Table of Contents The SAS and IBM Relationship... 1 Introduction...1 Customer Jobs Test Suite... 1

### Data Mining Techniques for Mortality at Advanced Age

Data Mining Techniques for Mortality at Advanced Age Lijia Guo, Ph.D., A.S.A. and Morgan C. Wang, Ph.D. University of Central Florida Abstract This paper addresses issues and techniques for advanced age

### Data Mining Applications in Higher Education

Executive report Data Mining Applications in Higher Education Jing Luan, PhD Chief Planning and Research Officer, Cabrillo College Founder, Knowledge Discovery Laboratories Table of contents Introduction..............................................................2

### Dealing with Missing Data for Credit Scoring

ABSTRACT Paper SD-08 Dealing with Missing Data for Credit Scoring Steve Fleming, Clarity Services Inc. How well can statistical adjustments account for missing data in the development of credit scores?

### The Science and Art of Market Segmentation Using PROC FASTCLUS Mark E. Thompson, Forefront Economics Inc, Beaverton, Oregon

The Science and Art of Market Segmentation Using PROC FASTCLUS Mark E. Thompson, Forefront Economics Inc, Beaverton, Oregon ABSTRACT Effective business development strategies often begin with market segmentation,

### Descriptive Statistics

Descriptive Statistics Primer Descriptive statistics Central tendency Variation Relative position Relationships Calculating descriptive statistics Descriptive Statistics Purpose to describe or summarize

### Riku Mäkeläinen & Sakari Forslund TeliaSonera Finland / Consumer Marketing

Increasing Profitability of MMS Activation Campaigns Traditional Modelling Methods vs. Two-Stage Modelling SAS Forum International 2004 - Copenhagen 15.-17.6.2004 Riku Mäkeläinen & Sakari Forslund TeliaSonera

### Chapter 1 Introduction. 1.1 Introduction

Chapter 1 Introduction 1.1 Introduction 1 1.2 What Is a Monte Carlo Study? 2 1.2.1 Simulating the Rolling of Two Dice 2 1.3 Why Is Monte Carlo Simulation Often Necessary? 4 1.4 What Are Some Typical Situations

### USING SAS/STAT SOFTWARE'S REG PROCEDURE TO DEVELOP SALES TAX AUDIT SELECTION MODELS

USING SAS/STAT SOFTWARE'S REG PROCEDURE TO DEVELOP SALES TAX AUDIT SELECTION MODELS Kirk L. Johnson, Tennessee Department of Revenue Richard W. Kulp, David Lipscomb College INTRODUCTION The Tennessee Department

### Questionnaire Design. Outline. Introduction

Questionnaire Design Jean S. Kutner, MD, MSPH University of Colorado Health Sciences Center Outline Introduction Defining and clarifying survey variables Planning analysis Data collection methods Formulating

### DECISION TREE ANALYSIS: PREDICTION OF SERIOUS TRAFFIC OFFENDING

DECISION TREE ANALYSIS: PREDICTION OF SERIOUS TRAFFIC OFFENDING ABSTRACT The objective was to predict whether an offender would commit a traffic offence involving death, using decision tree analysis. Four

### Statistics and Data Analysis

NESUG 27 PRO LOGISTI: The Logistics ehind Interpreting ategorical Variable Effects Taylor Lewis, U.S. Office of Personnel Management, Washington, D STRT The goal of this paper is to demystify how SS models

### Multinomial Logistic Regression

Multinomial Logistic Regression Dr. Jon Starkweather and Dr. Amanda Kay Moske Multinomial logistic regression is used to predict categorical placement in or the probability of category membership on a

Chapter 39 The LOGISTIC Procedure Chapter Table of Contents OVERVIEW...1903 GETTING STARTED...1906 SYNTAX...1910 PROCLOGISTICStatement...1910 BYStatement...1912 CLASSStatement...1913 CONTRAST Statement.....1916