Statistics and Data Analysis
|
|
- Beverley Gibbs
- 8 years ago
- Views:
Transcription
1 NESUG 27 PRO LOGISTI: The Logistics ehind Interpreting ategorical Variable Effects Taylor Lewis, U.S. Office of Personnel Management, Washington, D STRT The goal of this paper is to demystify how SS models (a.k.a, parameterizes) categorical variables in PRO LOGISTI. Specifically, readers will become more familiar with the commonly used effect and reference parameterizations. In conjunction with these two parameterizations and associated options, this paper touches on issues such as why SS needs to create dummy variables for the k distinct categories and why the output displays estimates for only k 1 parameters. t the conclusion of the paper, readers should feel more confident interpreting a categorical variable s effect on the response as well as testing for significance, by way of the odds ratios computed from the output or via the ONTRST statement. Discussion uses real-world data from the U.S. Office of Personnel Management, collected for a multiple logistic regression model project whereby the likelihood of a promotion for Federal civilian employees was modeled using personnel data. KGROUND PRO LOGISTI is the SS/STT procedure which allows users to model and analyze factors affecting the outcome of a dichotomous response variable one in which an event or nonevent can occur. fter some initial derivations to linearize this modeling process (the details of which are not a concern of this paper), the end result involves computing the log-odds, or logits, and producing a logit function, L (X ), model as follows: P( event x) ( X ) log β + β x P( nonevent x) L 1 In the instance of a continuous variable, β 1 has the interpretation of the increase in the log-odds, given a one-unit increase in the variable x. Exponentiate this model parameter estimate exp(β 1 ) and you have the more readily interpretable change in the odds themselves (no more logarithms), given that one-unit increase in x. The plot thickens, however, when the predictor variable of interest is categorical in nature, rather than continuous. series of design, or dummy, variables must be created for the different levels of the categorical variable, and interpretations and tests of significance can quickly become more involved. Lucky for us, PRO LOGISTI performs a lot of the nitty-gritty modeling work behind the scenes, but it is imperative to first understand the varying SS parameterization schemes available before utilizing the PRO s options and output to guide SS in producing exactly what is desired. EFFET ODING THE DEFULT PRMETERIZTION Through the course of this paper, we will consider a personnel data extract of nearly 6, Federal employees used to model the likelihood of promotion over a one-year period. The SS data set PROM contains, for each employee, the variable PROMOTION given as 1 if a promotion occurred, if not. The predictor variable to be investigated is education level attainment, EDLEVEL, consisting of four groups of employees: high school diploma or equivalent; bachelor s degree; master s degree; and DPh.D. To initially model education, we invoke PRO LOGISTI with the following syntax PRO LOGISTI dataprom descending; LSS edlevel; MODEL promotion edlevel; RUN; note about the descending option in the PRO LOGISTI statement: SS will first try to model the probability that the variable PROMOTION. Recall that our data has a promotion indicated by a 1, and discussion makes more sense when talking about likelihood of promotion as opposed to likelihood of not being promoted. This option is a quick way to reverse the SS default. We immediately note from the nalysis of Maximum Likelihood Estimates section of the output that parameter estimates are given for EDLEVEL,, and but not D nalysis of Maximum Likelihood Estimates 1
2 NESUG 27 Parameter DF Estimate Error hi-square Pr > hisq Intercept <.1 EDLEVEL <.1 EDLEVEL <.1 EDLEVEL We also note there is a lass Level Information section with a curious matrix of 1s, s and -1s. lass Level Information lass Value Design Variables EDLEVEL D This parameterization scheme is PRO LOGISTIS s default effect coding of dummy variables. SS sorts the class variable s value list and assigns dummy variables for one less than the number of distinct values, omitting the last category the number of columns under the Design Variables heading indicates the count of dummy variables created. n initial roadblock with this scheme is that the parameter estimates of the dummy variables are not directly interpretable; they are a measure of the difference between the classification level s effect and the average effect across all levels. Notice, however, there is an Odds Ratio Estimates section in the output Odds Ratio Estimates Point 95% Effect Estimate onfidence Limits EDLEVEL vs D EDLEVEL vs D EDLEVEL vs D For any logistic regression model without interaction terms, SS computes a series of odds ratios and confidence limits for each class variable. It is important to review how these odds ratios are computed, since SS will not output all possible comparisons of interest. From the Design Variables section of lass Level Information, the first, second, and third columns correspond to the dummy variables for group,, and, all such dummy variables in the model. Each row can be thought of as the sequence of coefficients to be placed in front of the dummy variable parameter estimates to arrive at a logit function estimate for that particular level. For instance, the row of -1s for the last group, D, corresponds to a logit function of β + (-1)*β + (-1)*β + (-1)*β ) or β - β - β - β. ssume we want to investigate the odds of promotion between groups and D. Our log-odds difference of interest is ( β + ( β )) ( β + ( β β β )) L( ) D) 2 * (.26) β + β + β nd the odds ratio turns out to be exp(.7568) 2.13, exactly as seen in the first row of Odds Ratio Estimates output. This says the probability of promotion for those educated at the high school level is more than double that of the Ph.D level. Knowing how the odds ratios are calculated gives us greater flexibility to compare, say, two levels within a classification variable that do not happen to be listed in the Odds Ratio Estimates output. For instance, we may wish to investigate a statistical difference between group, high school graduates, and group, bachelor s degrees. We 2
3 NESUG 27 note from the output how close the maximum likelihood parameter estimates for the two groups are and further reason the model could be simplified if we could collapse groups and into one group. For the two groups, we take coefficients from the first and second rows of the lass Information Matrix to arrive at the following ( β + β ) ( β + β ) β β L( ) ) We observe this logit difference is approximately zero, and exp() 1. With an odds ratio of 1, the probabilities of promotion between the two groups are roughly the same, so it is not necessary for the model to distinguish between them. It may prove easier to collapse groups and together into one category covering all employees who have attained a bachelor s degree or less. REFERENE ODING N LTERNTIVE PRMETERIZTION While there are situations where such a coding scheme is preferable, SS allows users to change this setting to other parameterizations. second useful coding scheme is called reference coding, where one level of the classification variable is designated as the reference level to which parameter estimates for the remaining levels are directly comparable. Under this coding scheme, the exponentiated parameter estimate of a level is interpreted as the odds ratio between that level and the reference level. Hence, it would make sense to assign to the reference level any particular level we wanted to pit against all others. Suppose we were interested in reporting the effect of education level on promotion likelihood and wanted to compare, individually, those who had obtained a bachelor s, master s, and Ph.D, with the high school diploma. We can use additional LSS statement options to reference parameterize EDLEVEL with the group as the reference category PRO LOGISTI dataprom desc; LSS edlevel(paramref ref''); MODEL promotion edlevel; RUN; In parentheses after the listed LSS variable, paramref overrides the default parameffect and ref'' designates the high school level to be the reference. Other ref options are LST, the default, which sorts the distinct variable levels and sets the last level to the reference, and FIRST, which sorts and sets the first value in the list. Interestingly, the ref option in the LSS statement is also available under the effect parameterization; it determines what level gets the -1 row of dummy variable coefficients and, thus, what group is compared to all others in the Odds Ratio Estimates portion of the output. Looking at the output, we note some differences in the nalysis of Maximum Likelihood Estimates and lass Level Information matrix from what we initially saw under the effect parameterization nalysis of Maximum Likelihood Estimates Parameter DF Estimate Error hi-square Pr > hisq Intercept <.1 EDLEVEL EDLEVEL <.1 EDLEVEL D <.1 lass Level Information lass Value Design Variables 3
4 NESUG 27 EDLEVEL 1 1 D 1 In terms of the parameter estimates, notice how no dummy variable is created for the reference group, as the three other groups estimates are interpreted as the difference in the log-odds from that first group. The.7 parameter estimate form EDLEVEL group suggests a small, nearly zero increase in the log-odds compared to group. This is precisely the conclusion we drew under the effect coding. This should serve as an affirmation that PRO LOGISTI can take more than one path to arrive at a given conclusion. The ultimate path to be chosen can be what is most comfortable for the analyst. Rest assured, we are still able to compute odds ratios by hand from the lass Level Information matrix by plugging in the appropriate dummy variables L( ) ) ( β ) ( β + β ) β. 7 Recall that our model parameter estimates under the reference coding have a new interpretation involving odds ratios related to the reference level, but they are still reported in the output as log-odds differences. To quickly convert these to odds-ratios sans logarithms, we have the EXP option available in the MODEL statement MODEL promotion edlevel / expb; This adds a column to the end of the Parameter Estimates Output nalysis of Maximum Likelihood Estimates Parameter DF Estimate Error hi-square Pr > hisq Exp(Est) Intercept < EDLEVEL EDLEVEL < EDLEVEL D < gain, this last column is simply the Estimate column exponentiated for quick reference. We observe how this agrees with the Odds Ratio Estimates section of the output, which is still created Odds Ratio Estimates Point 95% Effect Estimate onfidence Limits EDLEVEL vs EDLEVEL vs EDLEVEL D vs THE ONTRST STTEMENT We have seen how we can compute basic odds ratios by hand. The limitation to these is they lack confidence intervals on the estimates. We often want to check that the odds ratio estimate s confidence interval does not contain 1, for example. The Odds Ratio Estimates output will contain confidence intervals, but only for the levels of a categorical variable compared to one particular reference level. Though we could re-run PRO LOGISTI with differing reference levels to get additional odds ratio estimates and confidence intervals, we are still restricted to a one-to-one comparison. It may be prudent to investigate a difference between the average of two EDLEVEL groups compared with a reference group, as we will explore momentarily, or any other relevant combination of levels. To solve this dilemma, we can make use of the ONTRST statement. It is in constructing these statements that we are apt to be familiar with the lass Level Information matrix and effect versus reference parameterizations. The general syntax of the ONTRST statement is 4
5 NESUG 27 ONTRST 'label' var-name dummy-coeff-1 < dummy-coeff-n> </ options >; fter providing a label required, since more than one ONTRST statements are allowed we define the variable name for which we are interested in constructing odds ratios. Immediately after that, we will assign dummy coefficients by summoning the lass Level Information matrix. Identically as we did by hand, we can use the ONTRST statement in a simple, one-to-one comparison to test the logit function difference between EDLEVEL and D. Recall that under effect coding we had ( β + ( β )) ( β + ( β β β )) β + β + β L ) D) 2 ( The ONTRST statement syntax would then be ONTRST 'EDLEVEL vs. D' EDLEVEL 2 1 1/ estimateboth; ontrast Test Results ontrast DF hi-square Pr > hisq EDLEVEL vs. D <.1 ontrast Rows Estimation and Testing Results ontrast Type Row Estimate Error lpha onfidence Limits hi-square EDLEVEL vs. D PRM EDLEVEL vs. D EXP With no options in the ONTRST statement, the only output is the global test given the null hypothesis that the difference in the logit functions is zero. We see here that the test statistic is large and so we have a significant result, but we do not know in which direction the odds are favored. The estimateboth option in the ONTRST statement adds the value of the logit function difference in both log-odds terms (TypePRM line) and the exponentiated odds ratio terms (TypeEXP line). The is the same odds ratio difference we have calculated twice earlier, and the 95% confidence interval (1.817, ) matches with what was seen in the Odds Ratio Estimates section of the output. Relating this to the reference parameterization with as the reference level, we reason that the third dummy variable SS created for EDLEVEL is an odds ratio of group D vs. group. To invert this computation and make comparable to the contrast above, testing -1 times this estimate produces the desired group vs. group D odds ratio. ONTRST 'EDLEVEL vs. D' EDLEVEL -1/ estimateboth; Though we refrain from reprinting, the syntax above produces the exact same contrast output as does the syntax under effect parameterization of EDLEVEL. We saw there was very little difference between odds of promotion between EDLEVEL groups and, suggesting we could collapse the two groups to simplify the model. We could also employ the ONTRST statement to jointly test whether groups / and /D could be collapsed, respectively. One can separate by a comma two parts, or rows, of a contrast. Staying with reference coding and as the reference level, to test vs you would have L( ) ) β ( β + β ) β Furthermore, to test vs D you would have 5
6 NESUG 27 ( β + β ) ( β + β D ) β β D L( ) D) So we painlessly determined the dummy variable coefficients necessary for the ONTRST statement. This time we apply a few more options. The first is the estimateexp option, which outputs only the exponentiated logit function (odds ratio); the second is the e option that outputs the vector of coefficients and corresponding dummy variables. This is good practice to double-check that the contrast being calculated is what the analyst intended. Needless to say, changes to the reference level or parameterization scheme can quickly change what a sequence of coefficients is actually testing. contrast 'Joint / & /D' edlevel -1, edlevel 1-1 / e estimateexp; Produces the following output oefficients of ontrast Joint / & /D Parameter Row1 Row2 Intercept EDLEVEL -1 EDLEVEL 1 EDLEVELD -1 ontrast Test Results ontrast DF hi-square Pr > hisq Joint / & /D <.1 ontrast Rows Estimation and Testing Results ontrast Type Row Estimate Error lpha onfidence Limits Joint / & /D EXP Joint / & /D EXP fter acknowledging the oefficients of ontrast as what we intended, we note that the ontrast Test Results section yields a test statistic which suggests strongly the contrast is not equal to zero. Virtually all of the deviation from zero is clearly coming from the second part of the contrast between group and group D, as the odds ratio for that comparison is significantly greater than 1 (1.6117), while the group vs. group odds ratio is not significantly different from 1. t this point, we conclude that we cannot jointly collapse groups with and with D. ONLUSION This paper outlined two parameterization schemes for a logistic regression model in which the predictor variable is categorical. There are other parameterizations available within SS for this PRO, but practice and experience have dictated to the author that the effect and reference parameterizations are utilized most frequently. t an initial glance of the unabridged output from a PRO LOGISTI invocation, the shear amount of output can make interpretation and analysis appear a daunting task. Yet after a little work picking out the relevant sections and tweaking the SS code with a few added options, the task at hand can be quickly simplified, especially when one can realize how the various sections are interrelated. REFERENES SS Institute Inc. 24. SS/STT 9.1 User s Guide. ary, N: SS Institute Inc. Hosmer, David and Lemeshow, Stanley, pplied Logistic Regression. John Wiley & Sons. gresti, lan, n Introduction to ategorical Data nalysis. John Wiley & Sons. 6
7 NESUG 27 ONTT INFORMTION Your comments and questions are valued and encouraged. ontact the author at: Taylor Lewis U.S. Office of Personnel Management (OPM) 19 E St., NW, Room 7439 Washington, D 2415 Work Phone: (22) Fax: (22) Taylor.Lewis@opm.gov SS and all other SS Institute Inc. product or service names are registered trademarks or trademarks of SS Institute Inc. in the US and other countries. indicates US registration. Other brand and product names are trademarks of their respective companies. 7
SUGI 29 Statistics and Data Analysis
Paper 194-29 Head of the CLASS: Impress your colleagues with a superior understanding of the CLASS statement in PROC LOGISTIC Michelle L. Pritchard and David J. Pasta Ovation Research Group, San Francisco,
More informationln(p/(1-p)) = α +β*age35plus, where p is the probability or odds of drinking
Dummy Coding for Dummies Kathryn Martin, Maternal, Child and Adolescent Health Program, California Department of Public Health ABSTRACT There are a number of ways to incorporate categorical variables into
More informationSTATISTICA Formula Guide: Logistic Regression. Table of Contents
: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary
More informationTraditional Conjoint Analysis with Excel
hapter 8 Traditional onjoint nalysis with Excel traditional conjoint analysis may be thought of as a multiple regression problem. The respondent s ratings for the product concepts are observations on the
More informationSAS Software to Fit the Generalized Linear Model
SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling
More informationUnit 12 Logistic Regression Supplementary Chapter 14 in IPS On CD (Chap 16, 5th ed.)
Unit 12 Logistic Regression Supplementary Chapter 14 in IPS On CD (Chap 16, 5th ed.) Logistic regression generalizes methods for 2-way tables Adds capability studying several predictors, but Limited to
More informationPROC LOGISTIC: Traps for the unwary Peter L. Flom, Independent statistical consultant, New York, NY
PROC LOGISTIC: Traps for the unwary Peter L. Flom, Independent statistical consultant, New York, NY ABSTRACT Keywords: Logistic. INTRODUCTION This paper covers some gotchas in SAS R PROC LOGISTIC. A gotcha
More information11. Analysis of Case-control Studies Logistic Regression
Research methods II 113 11. Analysis of Case-control Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:
More informationMultivariate Logistic Regression
1 Multivariate Logistic Regression As in univariate logistic regression, let π(x) represent the probability of an event that depends on p covariates or independent variables. Then, using an inv.logit formulation
More informationGeneralized Linear Models
Generalized Linear Models We have previously worked with regression models where the response variable is quantitative and normally distributed. Now we turn our attention to two types of models where the
More informationCool Tools for PROC LOGISTIC
Cool Tools for PROC LOGISTIC Paul D. Allison Statistical Horizons LLC and the University of Pennsylvania March 2013 www.statisticalhorizons.com 1 New Features in LOGISTIC ODDSRATIO statement EFFECTPLOT
More informationVI. Introduction to Logistic Regression
VI. Introduction to Logistic Regression We turn our attention now to the topic of modeling a categorical outcome as a function of (possibly) several factors. The framework of generalized linear models
More informationOverview Classes. 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7)
Overview Classes 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7) 2-4 Loglinear models (8) 5-4 15-17 hrs; 5B02 Building and
More informationCredit Risk Analysis Using Logistic Regression Modeling
Credit Risk Analysis Using Logistic Regression Modeling Introduction A loan officer at a bank wants to be able to identify characteristics that are indicative of people who are likely to default on loans,
More informationExample: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.
Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C
More informationMultinomial and Ordinal Logistic Regression
Multinomial and Ordinal Logistic Regression ME104: Linear Regression Analysis Kenneth Benoit August 22, 2012 Regression with categorical dependent variables When the dependent variable is categorical,
More informationUsing An Ordered Logistic Regression Model with SAS Vartanian: SW 541
Using An Ordered Logistic Regression Model with SAS Vartanian: SW 541 libname in1 >c:\=; Data first; Set in1.extract; A=1; PROC LOGIST OUTEST=DD MAXITER=100 ORDER=DATA; OUTPUT OUT=CC XBETA=XB P=PROB; MODEL
More informationLOGISTIC REGRESSION ANALYSIS
LOGISTIC REGRESSION ANALYSIS C. Mitchell Dayton Department of Measurement, Statistics & Evaluation Room 1230D Benjamin Building University of Maryland September 1992 1. Introduction and Model Logistic
More informationChapter 39 The LOGISTIC Procedure. Chapter Table of Contents
Chapter 39 The LOGISTIC Procedure Chapter Table of Contents OVERVIEW...1903 GETTING STARTED...1906 SYNTAX...1910 PROCLOGISTICStatement...1910 BYStatement...1912 CLASSStatement...1913 CONTRAST Statement.....1916
More informationGamma Distribution Fitting
Chapter 552 Gamma Distribution Fitting Introduction This module fits the gamma probability distributions to a complete or censored set of individual or grouped data values. It outputs various statistics
More informationNCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )
Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates
More informationChapter 7: Simple linear regression Learning Objectives
Chapter 7: Simple linear regression Learning Objectives Reading: Section 7.1 of OpenIntro Statistics Video: Correlation vs. causation, YouTube (2:19) Video: Intro to Linear Regression, YouTube (5:18) -
More informationInternational Statistical Institute, 56th Session, 2007: Phil Everson
Teaching Regression using American Football Scores Everson, Phil Swarthmore College Department of Mathematics and Statistics 5 College Avenue Swarthmore, PA198, USA E-mail: peverso1@swarthmore.edu 1. Introduction
More informationData Mining: An Overview of Methods and Technologies for Increasing Profits in Direct Marketing. C. Olivia Rud, VP, Fleet Bank
Data Mining: An Overview of Methods and Technologies for Increasing Profits in Direct Marketing C. Olivia Rud, VP, Fleet Bank ABSTRACT Data Mining is a new term for the common practice of searching through
More informationImproved Interaction Interpretation: Application of the EFFECTPLOT statement and other useful features in PROC LOGISTIC
Paper AA08-2013 Improved Interaction Interpretation: Application of the EFFECTPLOT statement and other useful features in PROC LOGISTIC Robert G. Downer, Grand Valley State University, Allendale, MI ABSTRACT
More informationWKU Freshmen Performance in Foundational Courses: Implications for Retention and Graduation Rates
Research Report June 7, 2011 WKU Freshmen Performance in Foundational Courses: Implications for Retention and Graduation Rates ABSTRACT In the study of higher education, few topics receive as much attention
More informationLogistic (RLOGIST) Example #1
Logistic (RLOGIST) Example #1 SUDAAN Statements and Results Illustrated EFFECTS RFORMAT, RLABEL REFLEVEL EXP option on MODEL statement Hosmer-Lemeshow Test Input Data Set(s): BRFWGT.SAS7bdat Example Using
More informationLogistic Regression. http://faculty.chass.ncsu.edu/garson/pa765/logistic.htm#sigtests
Logistic Regression http://faculty.chass.ncsu.edu/garson/pa765/logistic.htm#sigtests Overview Binary (or binomial) logistic regression is a form of regression which is used when the dependent is a dichotomy
More informationAdditional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm
Mgt 540 Research Methods Data Analysis 1 Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm http://web.utk.edu/~dap/random/order/start.htm
More informationLecture 19: Conditional Logistic Regression
Lecture 19: Conditional Logistic Regression Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South Carolina
More informationChapter 27 Using Predictor Variables. Chapter Table of Contents
Chapter 27 Using Predictor Variables Chapter Table of Contents LINEAR TREND...1329 TIME TREND CURVES...1330 REGRESSORS...1332 ADJUSTMENTS...1334 DYNAMIC REGRESSOR...1335 INTERVENTIONS...1339 TheInterventionSpecificationWindow...1339
More informationStatistics 305: Introduction to Biostatistical Methods for Health Sciences
Statistics 305: Introduction to Biostatistical Methods for Health Sciences Modelling the Log Odds Logistic Regression (Chap 20) Instructor: Liangliang Wang Statistics and Actuarial Science, Simon Fraser
More informationLOGISTIC REGRESSION. Nitin R Patel. where the dependent variable, y, is binary (for convenience we often code these values as
LOGISTIC REGRESSION Nitin R Patel Logistic regression extends the ideas of multiple linear regression to the situation where the dependent variable, y, is binary (for convenience we often code these values
More informationLogistic regression modeling the probability of success
Logistic regression modeling the probability of success Regression models are usually thought of as only being appropriate for target variables that are continuous Is there any situation where we might
More informationModeling Lifetime Value in the Insurance Industry
Modeling Lifetime Value in the Insurance Industry C. Olivia Parr Rud, Executive Vice President, Data Square, LLC ABSTRACT Acquisition modeling for direct mail insurance has the unique challenge of targeting
More informationModule 4 - Multiple Logistic Regression
Module 4 - Multiple Logistic Regression Objectives Understand the principles and theory underlying logistic regression Understand proportions, probabilities, odds, odds ratios, logits and exponents Be
More informationA LOGISTIC REGRESSION MODEL TO PREDICT FRESHMEN ENROLLMENTS Vijayalakshmi Sampath, Andrew Flagel, Carolina Figueroa
A LOGISTIC REGRESSION MODEL TO PREDICT FRESHMEN ENROLLMENTS Vijayalakshmi Sampath, Andrew Flagel, Carolina Figueroa ABSTRACT Predictive modeling is the technique of using historical information on a certain
More informationTwo Correlated Proportions (McNemar Test)
Chapter 50 Two Correlated Proportions (Mcemar Test) Introduction This procedure computes confidence intervals and hypothesis tests for the comparison of the marginal frequencies of two factors (each with
More informationMultinomial Logistic Regression
Multinomial Logistic Regression Dr. Jon Starkweather and Dr. Amanda Kay Moske Multinomial logistic regression is used to predict categorical placement in or the probability of category membership on a
More informationEstimation of σ 2, the variance of ɛ
Estimation of σ 2, the variance of ɛ The variance of the errors σ 2 indicates how much observations deviate from the fitted surface. If σ 2 is small, parameters β 0, β 1,..., β k will be reliably estimated
More informationABSTRACT INTRODUCTION
Paper SP03-2009 Illustrative Logistic Regression Examples using PROC LOGISTIC: New Features in SAS/STAT 9.2 Robert G. Downer, Grand Valley State University, Allendale, MI Patrick J. Richardson, Van Andel
More informationLogit Models for Binary Data
Chapter 3 Logit Models for Binary Data We now turn our attention to regression models for dichotomous data, including logistic regression and probit analysis. These models are appropriate when the response
More informationChapter 29 The GENMOD Procedure. Chapter Table of Contents
Chapter 29 The GENMOD Procedure Chapter Table of Contents OVERVIEW...1365 WhatisaGeneralizedLinearModel?...1366 ExamplesofGeneralizedLinearModels...1367 TheGENMODProcedure...1368 GETTING STARTED...1370
More informationYew May Martin Maureen Maclachlan Tom Karmel Higher Education Division, Department of Education, Training and Youth Affairs.
How is Australia s Higher Education Performing? An analysis of completion rates of a cohort of Australian Post Graduate Research Students in the 1990s. Yew May Martin Maureen Maclachlan Tom Karmel Higher
More informationPart 2: Analysis of Relationship Between Two Variables
Part 2: Analysis of Relationship Between Two Variables Linear Regression Linear correlation Significance Tests Multiple regression Linear Regression Y = a X + b Dependent Variable Independent Variable
More informationFree Trial - BIRT Analytics - IAAs
Free Trial - BIRT Analytics - IAAs 11. Predict Customer Gender Once we log in to BIRT Analytics Free Trial we would see that we have some predefined advanced analysis ready to be used. Those saved analysis
More informationGetting Correct Results from PROC REG
Getting Correct Results from PROC REG Nathaniel Derby, Statis Pro Data Analytics, Seattle, WA ABSTRACT PROC REG, SAS s implementation of linear regression, is often used to fit a line without checking
More informationChapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 )
Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 ) and Neural Networks( 類 神 經 網 路 ) 許 湘 伶 Applied Linear Regression Models (Kutner, Nachtsheim, Neter, Li) hsuhl (NUK) LR Chap 10 1 / 35 13 Examples
More informationMULTIPLE LINEAR REGRESSION ANALYSIS USING MICROSOFT EXCEL. by Michael L. Orlov Chemistry Department, Oregon State University (1996)
MULTIPLE LINEAR REGRESSION ANALYSIS USING MICROSOFT EXCEL by Michael L. Orlov Chemistry Department, Oregon State University (1996) INTRODUCTION In modern science, regression analysis is a necessary part
More informationDeveloping Risk Adjustment Techniques Using the SAS@ System for Assessing Health Care Quality in the lmsystem@
Developing Risk Adjustment Techniques Using the SAS@ System for Assessing Health Care Quality in the lmsystem@ Yanchun Xu, Andrius Kubilius Joint Commission on Accreditation of Healthcare Organizations,
More informationMORE ON LOGISTIC REGRESSION
DEPARTMENT OF POLITICAL SCIENCE AND INTERNATIONAL RELATIONS Posc/Uapp 816 MORE ON LOGISTIC REGRESSION I. AGENDA: A. Logistic regression 1. Multiple independent variables 2. Example: The Bell Curve 3. Evaluation
More informationANALYSING LIKERT SCALE/TYPE DATA, ORDINAL LOGISTIC REGRESSION EXAMPLE IN R.
ANALYSING LIKERT SCALE/TYPE DATA, ORDINAL LOGISTIC REGRESSION EXAMPLE IN R. 1. Motivation. Likert items are used to measure respondents attitudes to a particular question or statement. One must recall
More informationGLM I An Introduction to Generalized Linear Models
GLM I An Introduction to Generalized Linear Models CAS Ratemaking and Product Management Seminar March 2009 Presented by: Tanya D. Havlicek, Actuarial Assistant 0 ANTITRUST Notice The Casualty Actuarial
More informationData Analysis Tools. Tools for Summarizing Data
Data Analysis Tools This section of the notes is meant to introduce you to many of the tools that are provided by Excel under the Tools/Data Analysis menu item. If your computer does not have that tool
More informationRegression step-by-step using Microsoft Excel
Step 1: Regression step-by-step using Microsoft Excel Notes prepared by Pamela Peterson Drake, James Madison University Type the data into the spreadsheet The example used throughout this How to is a regression
More informationLogs Transformation in a Regression Equation
Fall, 2001 1 Logs as the Predictor Logs Transformation in a Regression Equation The interpretation of the slope and intercept in a regression change when the predictor (X) is put on a log scale. In this
More informationLOGIT AND PROBIT ANALYSIS
LOGIT AND PROBIT ANALYSIS A.K. Vasisht I.A.S.R.I., Library Avenue, New Delhi 110 012 amitvasisht@iasri.res.in In dummy regression variable models, it is assumed implicitly that the dependent variable Y
More informationAccurately and Efficiently Measuring Individual Account Credit Risk On Existing Portfolios
Accurately and Efficiently Measuring Individual Account Credit Risk On Existing Portfolios By: Michael Banasiak & By: Daniel Tantum, Ph.D. What Are Statistical Based Behavior Scoring Models And How Are
More informationBasic Statistical and Modeling Procedures Using SAS
Basic Statistical and Modeling Procedures Using SAS One-Sample Tests The statistical procedures illustrated in this handout use two datasets. The first, Pulse, has information collected in a classroom
More informationOrdinal Regression. Chapter
Ordinal Regression Chapter 4 Many variables of interest are ordinal. That is, you can rank the values, but the real distance between categories is unknown. Diseases are graded on scales from least severe
More informationThis can dilute the significance of a departure from the null hypothesis. We can focus the test on departures of a particular form.
One-Degree-of-Freedom Tests Test for group occasion interactions has (number of groups 1) number of occasions 1) degrees of freedom. This can dilute the significance of a departure from the null hypothesis.
More informationPearson's Correlation Tests
Chapter 800 Pearson's Correlation Tests Introduction The correlation coefficient, ρ (rho), is a popular statistic for describing the strength of the relationship between two variables. The correlation
More informationMULTIPLE REGRESSION WITH CATEGORICAL DATA
DEPARTMENT OF POLITICAL SCIENCE AND INTERNATIONAL RELATIONS Posc/Uapp 86 MULTIPLE REGRESSION WITH CATEGORICAL DATA I. AGENDA: A. Multiple regression with categorical variables. Coding schemes. Interpreting
More informationLecture 3: Linear methods for classification
Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,
More informationFinal Exam Practice Problem Answers
Final Exam Practice Problem Answers The following data set consists of data gathered from 77 popular breakfast cereals. The variables in the data set are as follows: Brand: The brand name of the cereal
More informationElements of statistics (MATH0487-1)
Elements of statistics (MATH0487-1) Prof. Dr. Dr. K. Van Steen University of Liège, Belgium December 10, 2012 Introduction to Statistics Basic Probability Revisited Sampling Exploratory Data Analysis -
More informationStatistics in Retail Finance. Chapter 6: Behavioural models
Statistics in Retail Finance 1 Overview > So far we have focussed mainly on application scorecards. In this chapter we shall look at behavioural models. We shall cover the following topics:- Behavioural
More informationLecture 14: GLM Estimation and Logistic Regression
Lecture 14: GLM Estimation and Logistic Regression Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South
More informationI L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN
Beckman HLM Reading Group: Questions, Answers and Examples Carolyn J. Anderson Department of Educational Psychology I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN Linear Algebra Slide 1 of
More informationPenalized regression: Introduction
Penalized regression: Introduction Patrick Breheny August 30 Patrick Breheny BST 764: Applied Statistical Modeling 1/19 Maximum likelihood Much of 20th-century statistics dealt with maximum likelihood
More informationTests for Two Survival Curves Using Cox s Proportional Hazards Model
Chapter 730 Tests for Two Survival Curves Using Cox s Proportional Hazards Model Introduction A clinical trial is often employed to test the equality of survival distributions of two treatment groups.
More information2. Simple Linear Regression
Research methods - II 3 2. Simple Linear Regression Simple linear regression is a technique in parametric statistics that is commonly used for analyzing mean response of a variable Y which changes according
More informationLogistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression
Logistic Regression Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max
More informationPredicting Successful Completion of the Nursing Program: An Analysis of Prerequisites and Demographic Variables
Predicting Successful Completion of the Nursing Program: An Analysis of Prerequisites and Demographic Variables Introduction In the summer of 2002, a research study commissioned by the Center for Student
More informationOracle Data Miner (Extension of SQL Developer 4.0)
An Oracle White Paper October 2013 Oracle Data Miner (Extension of SQL Developer 4.0) Generate a PL/SQL script for workflow deployment Denny Wong Oracle Data Mining Technologies 10 Van de Graff Drive Burlington,
More informationData Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression
Data Mining and Data Warehousing Henryk Maciejewski Data Mining Predictive modelling: regression Algorithms for Predictive Modelling Contents Regression Classification Auxiliary topics: Estimation of prediction
More informationMultiple Linear Regression
Multiple Linear Regression A regression with two or more explanatory variables is called a multiple regression. Rather than modeling the mean response as a straight line, as in simple regression, it is
More informationKSTAT MINI-MANUAL. Decision Sciences 434 Kellogg Graduate School of Management
KSTAT MINI-MANUAL Decision Sciences 434 Kellogg Graduate School of Management Kstat is a set of macros added to Excel and it will enable you to do the statistics required for this course very easily. To
More informationTaming the PROC TRANSPOSE
Taming the PROC TRANSPOSE Matt Taylor, Carolina Analytical Consulting, LLC ABSTRACT The PROC TRANSPOSE is often misunderstood and seldom used. SAS users are unsure of the results it will give and curious
More informationGerry Hobbs, Department of Statistics, West Virginia University
Decision Trees as a Predictive Modeling Method Gerry Hobbs, Department of Statistics, West Virginia University Abstract Predictive modeling has become an important area of interest in tasks such as credit
More information1. The parameters to be estimated in the simple linear regression model Y=α+βx+ε ε~n(0,σ) are: a) α, β, σ b) α, β, ε c) a, b, s d) ε, 0, σ
STA 3024 Practice Problems Exam 2 NOTE: These are just Practice Problems. This is NOT meant to look just like the test, and it is NOT the only thing that you should study. Make sure you know all the material
More informationINTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA)
INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA) As with other parametric statistics, we begin the one-way ANOVA with a test of the underlying assumptions. Our first assumption is the assumption of
More informationUSING LOGISTIC REGRESSION TO PREDICT CUSTOMER RETENTION. Andrew H. Karp Sierra Information Services, Inc. San Francisco, California USA
USING LOGISTIC REGRESSION TO PREDICT CUSTOMER RETENTION Andrew H. Karp Sierra Information Services, Inc. San Francisco, California USA Logistic regression is an increasingly popular statistical technique
More informationInteraction effects and group comparisons Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised February 20, 2015
Interaction effects and group comparisons Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised February 20, 2015 Note: This handout assumes you understand factor variables,
More informationNew SAS Procedures for Analysis of Sample Survey Data
New SAS Procedures for Analysis of Sample Survey Data Anthony An and Donna Watts, SAS Institute Inc, Cary, NC Abstract Researchers use sample surveys to obtain information on a wide variety of issues Many
More informationAutomated Statistical Modeling for Data Mining David Stephenson 1
Automated Statistical Modeling for Data Mining David Stephenson 1 Abstract. We seek to bridge the gap between basic statistical data mining tools and advanced statistical analysis software that requires
More informationRecall this chart that showed how most of our course would be organized:
Chapter 4 One-Way ANOVA Recall this chart that showed how most of our course would be organized: Explanatory Variable(s) Response Variable Methods Categorical Categorical Contingency Tables Categorical
More informationSimple Linear Regression, Scatterplots, and Bivariate Correlation
1 Simple Linear Regression, Scatterplots, and Bivariate Correlation This section covers procedures for testing the association between two continuous variables using the SPSS Regression and Correlate analyses.
More information9.2 Summation Notation
9. Summation Notation 66 9. Summation Notation In the previous section, we introduced sequences and now we shall present notation and theorems concerning the sum of terms of a sequence. We begin with a
More informationHLM software has been one of the leading statistical packages for hierarchical
Introductory Guide to HLM With HLM 7 Software 3 G. David Garson HLM software has been one of the leading statistical packages for hierarchical linear modeling due to the pioneering work of Stephen Raudenbush
More informationCREDIT SCORING MODEL APPLICATIONS:
Örebro University Örebro University School of Business Master in Applied Statistics Thomas Laitila Sune Karlsson May, 2014 CREDIT SCORING MODEL APPLICATIONS: TESTING MULTINOMIAL TARGETS Gabriela De Rossi
More informationASSIGNMENT 4 PREDICTIVE MODELING AND GAINS CHARTS
DATABASE MARKETING Fall 2015, max 24 credits Dead line 15.10. ASSIGNMENT 4 PREDICTIVE MODELING AND GAINS CHARTS PART A Gains chart with excel Prepare a gains chart from the data in \\work\courses\e\27\e20100\ass4b.xls.
More informationLinear Discrimination. Linear Discrimination. Linear Discrimination. Linearly Separable Systems Pairwise Separation. Steven J Zeil.
Steven J Zeil Old Dominion Univ. Fall 200 Discriminant-Based Classification Linearly Separable Systems Pairwise Separation 2 Posteriors 3 Logistic Discrimination 2 Discriminant-Based Classification Likelihood-based:
More informationNominal and ordinal logistic regression
Nominal and ordinal logistic regression April 26 Nominal and ordinal logistic regression Our goal for today is to briefly go over ways to extend the logistic regression model to the case where the outcome
More informationOne-Way Analysis of Variance
One-Way Analysis of Variance Note: Much of the math here is tedious but straightforward. We ll skim over it in class but you should be sure to ask questions if you don t understand it. I. Overview A. We
More informationLeast Squares Estimation
Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David
More informationPoisson Models for Count Data
Chapter 4 Poisson Models for Count Data In this chapter we study log-linear models for count data under the assumption of a Poisson error structure. These models have many applications, not only to the
More informationSome Essential Statistics The Lure of Statistics
Some Essential Statistics The Lure of Statistics Data Mining Techniques, by M.J.A. Berry and G.S Linoff, 2004 Statistics vs. Data Mining..lie, damn lie, and statistics mining data to support preconceived
More informationStatistics 104 Final Project A Culture of Debt: A Study of Credit Card Spending in America TF: Kevin Rader Anonymous Students: LD, MH, IW, MY
Statistics 104 Final Project A Culture of Debt: A Study of Credit Card Spending in America TF: Kevin Rader Anonymous Students: LD, MH, IW, MY ABSTRACT: This project attempted to determine the relationship
More informationXPost: Excel Workbooks for the Post-estimation Interpretation of Regression Models for Categorical Dependent Variables
XPost: Excel Workbooks for the Post-estimation Interpretation of Regression Models for Categorical Dependent Variables Contents Simon Cheng hscheng@indiana.edu php.indiana.edu/~hscheng/ J. Scott Long jslong@indiana.edu
More information