SP10 From GLM to GLIMMIX-Which Model to Choose? Patricia B. Cerrito, University of Louisville, Louisville, KY
|
|
- Millicent Joseph
- 7 years ago
- Views:
Transcription
1 SP10 From GLM to GLIMMIX-Which Model to Choose? Patricia B. Cerrito, University of Louisville, Louisville, KY ABSTRACT The purpose of this paper is to investigate several SAS procedures that are used in linear predictive models in SASStat. The primary focus will be on the correct choice of model given the designated outcome variable, and the combination of input variables. Procedures to be discussed include GLM, LOGISTIC, GENMOD, MIXED, and GLIMMIX. PROC GLIMMIX is a relatively new SAS procedure, although it has been available as a macro for some time. There are three main types of variables used in linear models: nominal, ordinal, and interval. Nominal is defined as categorical (such as gender) ordinal is defined as categorical that can be ordered from least to most (such as employee evaluation rank) interval data can define ratios. While all of the models discussed can include all three types of input variables, the model choice is different if the outcome variable is interval or nominal. Another consideration for model choice is whether the input variables are fixed effects or random effects. Fixed effects are definitive, and will not change regardless of the sample data collection. Random effects can change when the experiment is replicated. Examples of random effects include subjects in a drug study, choice of items to compare between retail stores for market basket price differences, and classrooms in an education study. Examples will be discussed. INTRODUCTION An inappropriate model will provide inappropriate results. For those users of SAS who know SASStat and PROC GLM, there are other models that are more appropriate to the collected data. It is necessary to fit the model to the data-not the data to the model (to a man with a hammer.) If regression is not appropriate because the assumptions are violated, change the model. There are several models readily available in SASStat (Figure 1). Figure 1. Linear Models Available in SASStat Generalized Linear Mixed Model PROC GLIMMIX Linear mixed model PROC MIXED Generalized Linear Model PROC GENMOD General Linear Model PROC GLM ANOVA PROC ANOVA Regression PROC REG Logistic Regression PROC LOGISTIC Each model serves a different purpose, and should be used with different types of data. The purpose of this paper is to focus on model choice it is not intended to provide all details concerning the use of each model. Should the investigator choose one of the models, details are available in on-line docs. Items that must be considered in model choice are 1. Type of outcome variable-whether nominal, ordinal, or interval 2. Type of input variable-whether nominal ordinal, or interval 1
2 3. Type of input variable-whether fixed or random effect 4. Choice of covariance matrix format for random effects 5. Choice of link function for non-normal residuals As the complexity of the data increases, so, too, does the complexity of the model. Choices must be made, choices that impact model outcomes. Consider Table 1, which gives some indication as to how the models should be used. Table 1. Outline of Model Choice Model Output Variable Types of Inputs Assumptions ANOVA Interval Categorical, Fixed Effects only Normality REG Interval Interval, Fixed Effects only Normality LOGISTIC Binary Categorical, Interval, Fixed Effects only Log-Normal GLM Interval Categorical, Interval, Fixed Effects only Normality GENMOD Categorical, Interval Categorical, Interval, Fixed Effects Only Exponential Family MIXED Interval Categorical, Interval, Random Effects Normality GLIMMIX Categorical, Interval Categorical, Interval, Random Effects Exponential Family This paper will discuss the different models, and how to define outcomes and inputs, along with a consideration of the assumptions as listed in Table 1. PROC ANOVA and PROC REG ANOVA should only be used for a balanced design in which every categorical choice is divided equally. If there are three treatments, then each treatment should have exactly the same number of observations. This procedure requires less computing time compared to PROC GLM. However, since a completely balanced design almost never happens with large samples, there is really no need to use ANOVA instead of GLM. PROC REG can only use interval or ordinal variables as inputs. In order to include nominal data, dummy variables need to be created. Too many nominal inputs requires considerable programming effort. Essentially, for each level of a nominal variable, PROC REG creates a new regression line that is parallel to the regression lines for all other levels of the same variable. While PROC REG has diagnostics that are of value, the same diagnostics have now been incorporated into PROC GLM. For this reason, it is better to use PROC GLM for all standard analyses. PROC GLM In the past, PROC GLM was the most sophisticated procedure for performing a linear models analysis. It can use both interval and categorical variables as inputs it now contains all of the diagnostic elements provided by PROC REG, and it does not require a balanced design. In addition, PROC GLM uses the Type III Sum of Squares to examine multiple types of treatments simultaneously. The one problem with PROC GLM is that is was never intended to be used with random effects. Special cases of random effects, such as nested designs and split plot designs have been developed for use with PROC GLM. Repeated measures, also, can be examined using PROC GLM provided that there are few subjects dropping out in the later time measurements. However, PROC GLM has become the model of choice that is used, and very little consideration is usually given to whether the inputs are fixed or random effects. Repeated measures represent a random effect since the choice of time points to collect measurements is somewhat arbitrary on the part of the investigator. Inputs such as age that are divided into blocks are also random effects since the blocks are arbitrary. For the same reason, Likert scales are random effects since it is somewhat arbitrary whether a 4-point or a 5-point scale is used. However, in many cases, these inputs are entered into PROC GLM as if they were fixed effects. However, as is true in the special cases of split plots and nested effects, assuming the effects are fixed when they are random will increase the size of the random error. That will decrease the overall size of the F-statistics. As a result, the model will have non-significant F-statistics that should be significant. Consider the following question, Should ordinal variables be defined as quantitative, or as classification variables in PROC GLM? Since ANOVA assumes class levels (ie nominal data), and regression assumes interval data, there is no real provision for ordinal variables. If defined as a class variable, many degrees of freedom will be used, but posthoc tests can be made. If defined as interval, only one degree of freedom is used in the model but post-hoc tests are unavailable. Depending on the choice, model results can differ. Sample GLM code is listed below: 2
3 PROC GLM DATA=WORK.SORT7659 CLASS CourseLevel expectknownever PROC LOGISTIC MODEL hours= CourseLevel expectknownever SS3 SOLUTION SINGULAR=1E-07 LSMEANS CourseLevel PDIFF=ALL LSMEANS CourseLevel expectknownever PDIFF=ALL PROC LOGISTIC is very similar to PROC GLM, although it has a binary outcome variable rather than an interval outcome. If the outcome is ordinal, PROC LOGISTIC can also be used, but with a complementary log-log link function instead of the more standard log function. Both PROC LOGISTIC and PROC GLM can place ordinal inputs either as class or as quantitative variables. Again, consideration of the degrees of freedom and the necessity of post-hoc tests should be made before deciding where to place the ordinal inputs. Frequently, logistic regression is used to divide a population into high risklow risk. However, this dichotomous outcome is contrived. There could just as easily be 5 or 10 categories of risk. It is not necessary to reduce the number of outcomes to 2 just to fit the results into a logistic model. Logistic regression also defines odds ratios for the input variables. However, the default does not provide confidence limits for them. Therefore, the user should always use the option to print confidence limits. In addition, the user should examine the c-statistics. It is comparable to the r 2 for the general linear model. If the outcome variable only has two levels, logistic regression can also print a classification table and a receiver operating curve. They can be used to define a cut-point to divide the population into the highlow categories. Standard code is given below: PROC LOGISTIC DATA=WORK.SORT7975 CLASS BS (PARAM=EFFECT) workhabits (PARAM=EFFECT) MODEL CourseLevel=BS workhabits hours SELECTION=NONE LINK=PROBIT CLPARM=WALD CLODDS=WALD ALPHA=0.05 OUTPUT OUT=SASUSER.PRED3492(LABEL="Logistic regression predictions and statistics for SASUSER.QURY0181") PREDPROBS=INDIVIDUAL For ordinal (or nominal outcomes with more than 2 levels), the code used is PROC LOGISTIC DATA=WORK.SORT1118 CLASS BS (PARAM=EFFECT) workhabits (PARAM=EFFECT) MODEL CourseLevel=BS workhabits SELECTION=NONE LINK=CLOGLOG CLPARM=WALD CLODDS=WALD ALPHA=0.05 OUTPUT OUT=SASUSER.PRED1881(LABEL="Logistic regression predictions and statistics for SASUSER.QURY0181") PREDPROBS=INDIVIDUAL There are some cautions in order concerning logistic regression. Logistic regression will ALWAYS inflate results, especially if the group sizes are very different and one of the groups represents a rare event, For example, if one group size is 95% and one is 5%, then one classification rule (put all subjects in class A) will be 95% accurate. 3
4 Poisson regression should be used for rare events instead. If possible, fresh data should be used to examine the inflation rate of results. PROC MIXED PROC MIXED has two components, y=αx+γz+ε. If γ=0, then the mixed model is identical to the general linear model. If γ 0, then there is some randomness in the model and some covariance between inputs. Special cases of the mixed model are repeated measures, nested designs, and split plot designs. Before the introduction of PROC MIXED, these three special cases were considered using PROC GLM, but with some changes to the error terms. PROC MIXED is a superior method for these cases. In order to use PROC MIXED, the covariance must be estimated in some way. If the investigator has no knowledge of how the input random effects correlate, the default unstructured matrix is the optimal choice. PROC MIXED has a number of possible covariance matrix designs that can be used-but only if the user has a good idea of the structure of the matrix. Standard code is PROC MIXED DATA = WORK.SORT5396 METHOD=REML CLASS CourseLevel Applied Statistics MODEL hours_modified= Applied CourseLevel Statistics HTYPE=3 DDFM=CONTAIN OUTPM=WORK._PRE6476(LABEL="Predicted means.. ) OUTP=WORK._PRE937(LABEL="Predicted values ") RANDOM CourseLevel G TYPE=VC LSMEANS Applied CourseLevel Statistics PDIFF=ALL PROC GENMOD PROC GENMOD generalizes PROC LOGISTIC by allowing for more than binary outcomes. For the general linear model (GLM), the model equation takes the form Y=α+βX+ε so that the estimate is y ˆ = Xβ. The residual error, ε, is assumed normally distributed with mean zero and constant variance. For the generalized linear model, the estimate changes to yˆ g ( yˆ) = Xβ where g is called a link function. If g yˆ) = log 1 yˆ 4 ( and the outcome is binary, then the model is the special case of logistic regression and PROC LOGISTIC can be used. If the outcome variable consists of count data then the link function g ( yˆ) = log( yˆ ) can be used. The assumption here is that the residuals have a Poisson distribution. However, this same link function can be used under the assumption that the residuals are interval data. In this case, the residuals are assumed to form a gamma distribution, which also includes the special case of the exponential distribution. There are a number of other distributions that can be used as well. The problem is that the residual distribution of g ( yˆ) = Xβ depends upon the model, and that model depends upon the choice of the link function. Possible link functions are given in Table 2. Table 2. Examples of Link Functions in PROC GENMOD Outcome Distribution Link Function Binary Binomial Logit Binary Poisson Natural Log (rare occurrence) Ordinal Multinomial Complementary Logit Count Poisson Natural Log Continuous Normal Identity
5 If the investigator has some domain knowledge that allows him to choose a link function, that function should be used. However, if the investigator cannot estimate the function, another way is to estimate Y=α+βX first using PROC GLM while saving the residuals in a dataset. The data can be used in PROC KDE to estimate the form of the distribution. The investigator can then choose the link function that comes closest to the kernel distribution. The kernel can be examined using the following code listed below. Figure 2 gives an example kernel density estimator. proc kde data=sasuser.qury0181 univar hoursgridl=0 gridu=25 out=sasuser.kdehours run PROC GPLOT DATA = sasuser.kdehours PLOT density * value VAXIS=AXIS1 HAXIS=AXIS2 FRAME Run Figure 2. Results of PROC KDE Standard code for PROC GENMOD is given below: PROC GENMOD DATA=WORK.SORT4864 CLASS Applied Statistics workhabits MODEL hours= Applied Statistics workhabits LINK=LOG DIST=GAMMA TYPE3 CORRB LRCI CL ALPHA=0.05 LSMEANS Applied Statistics workhabits ALPHA=0.05 OUTPUT OUT=WORK.TEMP6816 PREDICTED=_predicted1 RESDEV=_resdev1 RESCHI=_reschi1 RUN QUIT PROC GLIMMIX This procedure generalizes the GENMOD procedure to include error terms that are not normally distributed. It also generalizes the MIXED procedure to allow for random effects in the model. However, the random effects must be 5
6 normal. The general format for GLIMMIX is Proc glimmix Class block a b Model y=a b a*b ddf=# Random block a*block Lsmeans a b a*b diff Unlike PROC MIXED, PROC GLIMMIX does not have a repeated statement, and repeated measures are in the RANDOM statement. Possible link functions are given in Table 3. Table 3. Link Functions for PROC GLIMMIX Outcome Distribution Link Function Beta Beta Logit Binary Binary Logit Binomial Binomial Logit Exponential Exponential Log Gamma Gamma Log Gaussian Normal Identity Geometric Inverse gaussian Inverse squared Lognormal Log-normal Identity Multinomial Multinomial Cumulative logit Negbinomial Negative Log binomial Poisson Poisson Log Tcentral T Identity Sample code is given below: EXAMPLES PROC glimmix DATA = sasuser.qury0181 CLASS CourseLevel Applied Statistics MODEL hours_modified= Applied CourseLevel Statistics HTYPE=3 DDFM=CONTAIN dist=gamma RANDOM CourseLevel G TYPE=VC LSMEANS Applied CourseLevel Statistics PDIFF=ALL RUN QUIT Consider the following examples: A test to compare the effectiveness of CT scans to x-ray in the detection of lung cancer. Each patient is randomized to receive x-ray only or CT only. 10,000 patients are in the sample, limited to high-risk patients. The outcome variable is the occurrence of lung cancer. A randomized clinical trial to compare treatment of osteomyelitis (MRSA) with vancomycin and Zyvox. Patients are treated according to protocol, with follow up at 1, 2, 6, 12 months after end of treatment. What if the study is observational rather than randomized? In the first example, the occurrence of lung cancer is rare. Therefore, a Poisson distribution would better fit the study than a logistic regression. In the second, the measure of recurrence is a repeated measure. While it can also be 6
7 examined using survival analysis, the fact that measurements are at fixed intervals rather than continuous will also allow for a mixed models design. CONCLUSION While it is possible to use PROC GLIMMIX as the most complex of the models, it is not advisable. Even so, choices as to random versus fixed effects, link function, and covariance matrix still have to be made. Therefore, the investigator should use the simplest procedure that will accommodate the variable choices. CONTACT Patricia Cerrito University of Louisville Department of Mathematics Louisville, KY (fax) pcerrito@louisville.edu SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. indicates USA registration. Other brand and product names are trademarks of their respective companies. 7
Model Fitting in PROC GENMOD Jean G. Orelien, Analytical Sciences, Inc.
Paper 264-26 Model Fitting in PROC GENMOD Jean G. Orelien, Analytical Sciences, Inc. Abstract: There are several procedures in the SAS System for statistical modeling. Most statisticians who use the SAS
More informationSAS Software to Fit the Generalized Linear Model
SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling
More informationHYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION
HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION HOD 2990 10 November 2010 Lecture Background This is a lightning speed summary of introductory statistical methods for senior undergraduate
More information13. Poisson Regression Analysis
136 Poisson Regression Analysis 13. Poisson Regression Analysis We have so far considered situations where the outcome variable is numeric and Normally distributed, or binary. In clinical work one often
More informationIntroduction to Quantitative Methods
Introduction to Quantitative Methods October 15, 2009 Contents 1 Definition of Key Terms 2 2 Descriptive Statistics 3 2.1 Frequency Tables......................... 4 2.2 Measures of Central Tendencies.................
More informationGeneralized Linear Models
Generalized Linear Models We have previously worked with regression models where the response variable is quantitative and normally distributed. Now we turn our attention to two types of models where the
More informationIntroduction to Fixed Effects Methods
Introduction to Fixed Effects Methods 1 1.1 The Promise of Fixed Effects for Nonexperimental Research... 1 1.2 The Paired-Comparisons t-test as a Fixed Effects Method... 2 1.3 Costs and Benefits of Fixed
More informationOverview Classes. 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7)
Overview Classes 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7) 2-4 Loglinear models (8) 5-4 15-17 hrs; 5B02 Building and
More informationAuxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus
Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus Tihomir Asparouhov and Bengt Muthén Mplus Web Notes: No. 15 Version 8, August 5, 2014 1 Abstract This paper discusses alternatives
More informationOrdinal Regression. Chapter
Ordinal Regression Chapter 4 Many variables of interest are ordinal. That is, you can rank the values, but the real distance between categories is unknown. Diseases are graded on scales from least severe
More informationUSING LOGISTIC REGRESSION TO PREDICT CUSTOMER RETENTION. Andrew H. Karp Sierra Information Services, Inc. San Francisco, California USA
USING LOGISTIC REGRESSION TO PREDICT CUSTOMER RETENTION Andrew H. Karp Sierra Information Services, Inc. San Francisco, California USA Logistic regression is an increasingly popular statistical technique
More informationMultinomial and Ordinal Logistic Regression
Multinomial and Ordinal Logistic Regression ME104: Linear Regression Analysis Kenneth Benoit August 22, 2012 Regression with categorical dependent variables When the dependent variable is categorical,
More informationInstitute of Actuaries of India Subject CT3 Probability and Mathematical Statistics
Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics For 2015 Examinations Aim The aim of the Probability and Mathematical Statistics subject is to provide a grounding in
More informationTips for surviving the analysis of survival data. Philip Twumasi-Ankrah, PhD
Tips for surviving the analysis of survival data Philip Twumasi-Ankrah, PhD Big picture In medical research and many other areas of research, we often confront continuous, ordinal or dichotomous outcomes
More informationChapter 29 The GENMOD Procedure. Chapter Table of Contents
Chapter 29 The GENMOD Procedure Chapter Table of Contents OVERVIEW...1365 WhatisaGeneralizedLinearModel?...1366 ExamplesofGeneralizedLinearModels...1367 TheGENMODProcedure...1368 GETTING STARTED...1370
More informationX X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)
CORRELATION AND REGRESSION / 47 CHAPTER EIGHT CORRELATION AND REGRESSION Correlation and regression are statistical methods that are commonly used in the medical literature to compare two or more variables.
More informationMISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group
MISSING DATA TECHNIQUES WITH SAS IDRE Statistical Consulting Group ROAD MAP FOR TODAY To discuss: 1. Commonly used techniques for handling missing data, focusing on multiple imputation 2. Issues that could
More informationDeveloping Risk Adjustment Techniques Using the SAS@ System for Assessing Health Care Quality in the lmsystem@
Developing Risk Adjustment Techniques Using the SAS@ System for Assessing Health Care Quality in the lmsystem@ Yanchun Xu, Andrius Kubilius Joint Commission on Accreditation of Healthcare Organizations,
More informationLOGISTIC REGRESSION ANALYSIS
LOGISTIC REGRESSION ANALYSIS C. Mitchell Dayton Department of Measurement, Statistics & Evaluation Room 1230D Benjamin Building University of Maryland September 1992 1. Introduction and Model Logistic
More informationSUGI 29 Statistics and Data Analysis
Paper 194-29 Head of the CLASS: Impress your colleagues with a superior understanding of the CLASS statement in PROC LOGISTIC Michelle L. Pritchard and David J. Pasta Ovation Research Group, San Francisco,
More informationMultinomial and ordinal logistic regression using PROC LOGISTIC Peter L. Flom National Development and Research Institutes, Inc
ABSTRACT Multinomial and ordinal logistic regression using PROC LOGISTIC Peter L. Flom National Development and Research Institutes, Inc Logistic regression may be useful when we are trying to model a
More informationS03-2008 The Difference Between Predictive Modeling and Regression Patricia B. Cerrito, University of Louisville, Louisville, KY
S03-2008 The Difference Between Predictive Modeling and Regression Patricia B. Cerrito, University of Louisville, Louisville, KY ABSTRACT Predictive modeling includes regression, both logistic and linear,
More informationAssumptions. Assumptions of linear models. Boxplot. Data exploration. Apply to response variable. Apply to error terms from linear model
Assumptions Assumptions of linear models Apply to response variable within each group if predictor categorical Apply to error terms from linear model check by analysing residuals Normality Homogeneity
More informationVI. Introduction to Logistic Regression
VI. Introduction to Logistic Regression We turn our attention now to the topic of modeling a categorical outcome as a function of (possibly) several factors. The framework of generalized linear models
More informationChapter 5 Analysis of variance SPSS Analysis of variance
Chapter 5 Analysis of variance SPSS Analysis of variance Data file used: gss.sav How to get there: Analyze Compare Means One-way ANOVA To test the null hypothesis that several population means are equal,
More informationGLM I An Introduction to Generalized Linear Models
GLM I An Introduction to Generalized Linear Models CAS Ratemaking and Product Management Seminar March 2009 Presented by: Tanya D. Havlicek, Actuarial Assistant 0 ANTITRUST Notice The Casualty Actuarial
More informationECLT5810 E-Commerce Data Mining Technique SAS Enterprise Miner -- Regression Model I. Regression Node
Enterprise Miner - Regression 1 ECLT5810 E-Commerce Data Mining Technique SAS Enterprise Miner -- Regression Model I. Regression Node 1. Some background: Linear attempts to predict the value of a continuous
More informationPackage dsmodellingclient
Package dsmodellingclient Maintainer Author Version 4.1.0 License GPL-3 August 20, 2015 Title DataSHIELD client site functions for statistical modelling DataSHIELD
More informationSTATISTICA Formula Guide: Logistic Regression. Table of Contents
: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary
More informationImputing Missing Data using SAS
ABSTRACT Paper 3295-2015 Imputing Missing Data using SAS Christopher Yim, California Polytechnic State University, San Luis Obispo Missing data is an unfortunate reality of statistics. However, there are
More informationCHAPTER 3 EXAMPLES: REGRESSION AND PATH ANALYSIS
Examples: Regression And Path Analysis CHAPTER 3 EXAMPLES: REGRESSION AND PATH ANALYSIS Regression analysis with univariate or multivariate dependent variables is a standard procedure for modeling relationships
More informationModule 5: Multiple Regression Analysis
Using Statistical Data Using to Make Statistical Decisions: Data Multiple to Make Regression Decisions Analysis Page 1 Module 5: Multiple Regression Analysis Tom Ilvento, University of Delaware, College
More informationAnalysis of Survey Data Using the SAS SURVEY Procedures: A Primer
Analysis of Survey Data Using the SAS SURVEY Procedures: A Primer Patricia A. Berglund, Institute for Social Research - University of Michigan Wisconsin and Illinois SAS User s Group June 25, 2014 1 Overview
More informationLOGIT AND PROBIT ANALYSIS
LOGIT AND PROBIT ANALYSIS A.K. Vasisht I.A.S.R.I., Library Avenue, New Delhi 110 012 amitvasisht@iasri.res.in In dummy regression variable models, it is assumed implicitly that the dependent variable Y
More informationA Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution
A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 4: September
More informationAddressing Analytics Challenges in the Insurance Industry. Noe Tuason California State Automobile Association
Addressing Analytics Challenges in the Insurance Industry Noe Tuason California State Automobile Association Overview Two Challenges: 1. Identifying High/Medium Profit who are High/Low Risk of Flight Prospects
More informationLinda K. Muthén Bengt Muthén. Copyright 2008 Muthén & Muthén www.statmodel.com. Table Of Contents
Mplus Short Courses Topic 2 Regression Analysis, Eploratory Factor Analysis, Confirmatory Factor Analysis, And Structural Equation Modeling For Categorical, Censored, And Count Outcomes Linda K. Muthén
More informationGamma Distribution Fitting
Chapter 552 Gamma Distribution Fitting Introduction This module fits the gamma probability distributions to a complete or censored set of individual or grouped data values. It outputs various statistics
More informationLogistic Regression (1/24/13)
STA63/CBB540: Statistical methods in computational biology Logistic Regression (/24/3) Lecturer: Barbara Engelhardt Scribe: Dinesh Manandhar Introduction Logistic regression is model for regression used
More informationProbability Calculator
Chapter 95 Introduction Most statisticians have a set of probability tables that they refer to in doing their statistical wor. This procedure provides you with a set of electronic statistical tables that
More informationLogistic regression modeling the probability of success
Logistic regression modeling the probability of success Regression models are usually thought of as only being appropriate for target variables that are continuous Is there any situation where we might
More informationJoseph Twagilimana, University of Louisville, Louisville, KY
ST14 Comparing Time series, Generalized Linear Models and Artificial Neural Network Models for Transactional Data analysis Joseph Twagilimana, University of Louisville, Louisville, KY ABSTRACT The aim
More informationNotes on Applied Linear Regression
Notes on Applied Linear Regression Jamie DeCoster Department of Social Psychology Free University Amsterdam Van der Boechorststraat 1 1081 BT Amsterdam The Netherlands phone: +31 (0)20 444-8935 email:
More informationSAS Syntax and Output for Data Manipulation:
Psyc 944 Example 5 page 1 Practice with Fixed and Random Effects of Time in Modeling Within-Person Change The models for this example come from Hoffman (in preparation) chapter 5. We will be examining
More informationUnivariate Regression
Univariate Regression Correlation and Regression The regression line summarizes the linear relationship between 2 variables Correlation coefficient, r, measures strength of relationship: the closer r is
More informationRegression 3: Logistic Regression
Regression 3: Logistic Regression Marco Baroni Practical Statistics in R Outline Logistic regression Logistic regression in R Outline Logistic regression Introduction The model Looking at and comparing
More informationBasic Statistical and Modeling Procedures Using SAS
Basic Statistical and Modeling Procedures Using SAS One-Sample Tests The statistical procedures illustrated in this handout use two datasets. The first, Pulse, has information collected in a classroom
More informationAdditional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm
Mgt 540 Research Methods Data Analysis 1 Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm http://web.utk.edu/~dap/random/order/start.htm
More informationMULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS
MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS MSR = Mean Regression Sum of Squares MSE = Mean Squared Error RSS = Regression Sum of Squares SSE = Sum of Squared Errors/Residuals α = Level of Significance
More information11. Analysis of Case-control Studies Logistic Regression
Research methods II 113 11. Analysis of Case-control Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:
More informationPart 2: Analysis of Relationship Between Two Variables
Part 2: Analysis of Relationship Between Two Variables Linear Regression Linear correlation Significance Tests Multiple regression Linear Regression Y = a X + b Dependent Variable Independent Variable
More informationEnd User Satisfaction With a Food Manufacturing ERP
Applied Mathematical Sciences, Vol. 8, 2014, no. 24, 1187-1192 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ams.2014.4284 End-User Satisfaction in ERP System: Application of Logit Modeling Hashem
More informationSection 14 Simple Linear Regression: Introduction to Least Squares Regression
Slide 1 Section 14 Simple Linear Regression: Introduction to Least Squares Regression There are several different measures of statistical association used for understanding the quantitative relationship
More informationAssessing Model Fit and Finding a Fit Model
Paper 214-29 Assessing Model Fit and Finding a Fit Model Pippa Simpson, University of Arkansas for Medical Sciences, Little Rock, AR Robert Hamer, University of North Carolina, Chapel Hill, NC ChanHee
More informationBinary Logistic Regression
Binary Logistic Regression Main Effects Model Logistic regression will accept quantitative, binary or categorical predictors and will code the latter two in various ways. Here s a simple model including
More informationLogistic Regression (a type of Generalized Linear Model)
Logistic Regression (a type of Generalized Linear Model) 1/36 Today Review of GLMs Logistic Regression 2/36 How do we find patterns in data? We begin with a model of how the world works We use our knowledge
More informationChapter 3 Quantitative Demand Analysis
Managerial Economics & Business Strategy Chapter 3 uantitative Demand Analysis McGraw-Hill/Irwin Copyright 2010 by the McGraw-Hill Companies, Inc. All rights reserved. Overview I. The Elasticity Concept
More informationApplied Regression Analysis and Other Multivariable Methods
THIRD EDITION Applied Regression Analysis and Other Multivariable Methods David G. Kleinbaum Emory University Lawrence L. Kupper University of North Carolina, Chapel Hill Keith E. Muller University of
More informationExample: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.
Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C
More informationConsider a study in which. How many subjects? The importance of sample size calculations. An insignificant effect: two possibilities.
Consider a study in which How many subjects? The importance of sample size calculations Office of Research Protections Brown Bag Series KB Boomer, Ph.D. Director, boomer@stat.psu.edu A researcher conducts
More informationLean Six Sigma Analyze Phase Introduction. TECH 50800 QUALITY and PRODUCTIVITY in INDUSTRY and TECHNOLOGY
TECH 50800 QUALITY and PRODUCTIVITY in INDUSTRY and TECHNOLOGY Before we begin: Turn on the sound on your computer. There is audio to accompany this presentation. Audio will accompany most of the online
More informationStatistics I for QBIC. Contents and Objectives. Chapters 1 7. Revised: August 2013
Statistics I for QBIC Text Book: Biostatistics, 10 th edition, by Daniel & Cross Contents and Objectives Chapters 1 7 Revised: August 2013 Chapter 1: Nature of Statistics (sections 1.1-1.6) Objectives
More informationBusiness Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.
Business Course Text Bowerman, Bruce L., Richard T. O'Connell, J. B. Orris, and Dawn C. Porter. Essentials of Business, 2nd edition, McGraw-Hill/Irwin, 2008, ISBN: 978-0-07-331988-9. Required Computing
More informationImproving the Performance of Data Mining Models with Data Preparation Using SAS Enterprise Miner Ricardo Galante, SAS Institute Brasil, São Paulo, SP
Improving the Performance of Data Mining Models with Data Preparation Using SAS Enterprise Miner Ricardo Galante, SAS Institute Brasil, São Paulo, SP ABSTRACT In data mining modelling, data preparation
More informationOffset Techniques for Predictive Modeling for Insurance
Offset Techniques for Predictive Modeling for Insurance Matthew Flynn, Ph.D, ISO Innovative Analytics, W. Hartford CT Jun Yan, Ph.D, Deloitte & Touche LLP, Hartford CT ABSTRACT This paper presents the
More informationIntroduction to proc glm
Lab 7: Proc GLM and one-way ANOVA STT 422: Summer, 2004 Vince Melfi SAS has several procedures for analysis of variance models, including proc anova, proc glm, proc varcomp, and proc mixed. We mainly will
More informationLogit Models for Binary Data
Chapter 3 Logit Models for Binary Data We now turn our attention to regression models for dichotomous data, including logistic regression and probit analysis. These models are appropriate when the response
More informationMissing data and net survival analysis Bernard Rachet
Workshop on Flexible Models for Longitudinal and Survival Data with Applications in Biostatistics Warwick, 27-29 July 2015 Missing data and net survival analysis Bernard Rachet General context Population-based,
More informationNCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )
Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates
More informationList of Examples. Examples 319
Examples 319 List of Examples DiMaggio and Mantle. 6 Weed seeds. 6, 23, 37, 38 Vole reproduction. 7, 24, 37 Wooly bear caterpillar cocoons. 7 Homophone confusion and Alzheimer s disease. 8 Gear tooth strength.
More informationHURDLE AND SELECTION MODELS Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics July 2009
HURDLE AND SELECTION MODELS Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics July 2009 1. Introduction 2. A General Formulation 3. Truncated Normal Hurdle Model 4. Lognormal
More informationConcepts of Experimental Design
Design Institute for Six Sigma A SAS White Paper Table of Contents Introduction...1 Basic Concepts... 1 Designing an Experiment... 2 Write Down Research Problem and Questions... 2 Define Population...
More informationPoisson Regression or Regression of Counts (& Rates)
Poisson Regression or Regression of (& Rates) Carolyn J. Anderson Department of Educational Psychology University of Illinois at Urbana-Champaign Generalized Linear Models Slide 1 of 51 Outline Outline
More informationI L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN
Beckman HLM Reading Group: Questions, Answers and Examples Carolyn J. Anderson Department of Educational Psychology I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN Linear Algebra Slide 1 of
More informationDEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9
DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9 Analysis of covariance and multiple regression So far in this course,
More informationUnit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression
Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Objectives: To perform a hypothesis test concerning the slope of a least squares line To recognize that testing for a
More informationA Handbook of Statistical Analyses Using R. Brian S. Everitt and Torsten Hothorn
A Handbook of Statistical Analyses Using R Brian S. Everitt and Torsten Hothorn CHAPTER 6 Logistic Regression and Generalised Linear Models: Blood Screening, Women s Role in Society, and Colonic Polyps
More information7 Generalized Estimating Equations
Chapter 7 The procedure extends the generalized linear model to allow for analysis of repeated measurements or other correlated observations, such as clustered data. Example. Public health of cials can
More informationMethods for Interaction Detection in Predictive Modeling Using SAS Doug Thompson, PhD, Blue Cross Blue Shield of IL, NM, OK & TX, Chicago, IL
Paper SA01-2012 Methods for Interaction Detection in Predictive Modeling Using SAS Doug Thompson, PhD, Blue Cross Blue Shield of IL, NM, OK & TX, Chicago, IL ABSTRACT Analysts typically consider combinations
More informationLocal classification and local likelihoods
Local classification and local likelihoods November 18 k-nearest neighbors The idea of local regression can be extended to classification as well The simplest way of doing so is called nearest neighbor
More informationCourse Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics
Course Text Business Statistics Lind, Douglas A., Marchal, William A. and Samuel A. Wathen. Basic Statistics for Business and Economics, 7th edition, McGraw-Hill/Irwin, 2010, ISBN: 9780077384470 [This
More information1. The parameters to be estimated in the simple linear regression model Y=α+βx+ε ε~n(0,σ) are: a) α, β, σ b) α, β, ε c) a, b, s d) ε, 0, σ
STA 3024 Practice Problems Exam 2 NOTE: These are just Practice Problems. This is NOT meant to look just like the test, and it is NOT the only thing that you should study. Make sure you know all the material
More informationHLM software has been one of the leading statistical packages for hierarchical
Introductory Guide to HLM With HLM 7 Software 3 G. David Garson HLM software has been one of the leading statistical packages for hierarchical linear modeling due to the pioneering work of Stephen Raudenbush
More informationPaper PO06. Randomization in Clinical Trial Studies
Paper PO06 Randomization in Clinical Trial Studies David Shen, WCI, Inc. Zaizai Lu, AstraZeneca Pharmaceuticals ABSTRACT Randomization is of central importance in clinical trials. It prevents selection
More informationChapter Seven. Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS
Chapter Seven Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS Section : An introduction to multiple regression WHAT IS MULTIPLE REGRESSION? Multiple
More informationAnalysis of Variance. MINITAB User s Guide 2 3-1
3 Analysis of Variance Analysis of Variance Overview, 3-2 One-Way Analysis of Variance, 3-5 Two-Way Analysis of Variance, 3-11 Analysis of Means, 3-13 Overview of Balanced ANOVA and GLM, 3-18 Balanced
More informationSection 13, Part 1 ANOVA. Analysis Of Variance
Section 13, Part 1 ANOVA Analysis Of Variance Course Overview So far in this course we ve covered: Descriptive statistics Summary statistics Tables and Graphs Probability Probability Rules Probability
More informationSimple Linear Regression Inference
Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation
More informationOverview of Non-Parametric Statistics PRESENTER: ELAINE EISENBEISZ OWNER AND PRINCIPAL, OMEGA STATISTICS
Overview of Non-Parametric Statistics PRESENTER: ELAINE EISENBEISZ OWNER AND PRINCIPAL, OMEGA STATISTICS About Omega Statistics Private practice consultancy based in Southern California, Medical and Clinical
More informationIBM SPSS Direct Marketing 23
IBM SPSS Direct Marketing 23 Note Before using this information and the product it supports, read the information in Notices on page 25. Product Information This edition applies to version 23, release
More informationDirections for using SPSS
Directions for using SPSS Table of Contents Connecting and Working with Files 1. Accessing SPSS... 2 2. Transferring Files to N:\drive or your computer... 3 3. Importing Data from Another File Format...
More informationUsing Repeated Measures Techniques To Analyze Cluster-correlated Survey Responses
Using Repeated Measures Techniques To Analyze Cluster-correlated Survey Responses G. Gordon Brown, Celia R. Eicheldinger, and James R. Chromy RTI International, Research Triangle Park, NC 27709 Abstract
More informationStatistics and Pharmacokinetics in Clinical Pharmacology Studies
Paper ST03 Statistics and Pharmacokinetics in Clinical Pharmacology Studies ABSTRACT Amy Newlands, GlaxoSmithKline, Greenford UK The aim of this presentation is to show how we use statistics and pharmacokinetics
More informationSPSS TRAINING SESSION 3 ADVANCED TOPICS (PASW STATISTICS 17.0) Sun Li Centre for Academic Computing lsun@smu.edu.sg
SPSS TRAINING SESSION 3 ADVANCED TOPICS (PASW STATISTICS 17.0) Sun Li Centre for Academic Computing lsun@smu.edu.sg IN SPSS SESSION 2, WE HAVE LEARNT: Elementary Data Analysis Group Comparison & One-way
More informationDevelopment Period 1 2 3 4 5 6 7 8 9 Observed Payments
Pricing and reserving in the general insurance industry Solutions developed in The SAS System John Hansen & Christian Larsen, Larsen & Partners Ltd 1. Introduction The two business solutions presented
More informationStatistical Functions in Excel
Statistical Functions in Excel There are many statistical functions in Excel. Moreover, there are other functions that are not specified as statistical functions that are helpful in some statistical analyses.
More informationLesson 1: Comparison of Population Means Part c: Comparison of Two- Means
Lesson : Comparison of Population Means Part c: Comparison of Two- Means Welcome to lesson c. This third lesson of lesson will discuss hypothesis testing for two independent means. Steps in Hypothesis
More informationIBM SPSS Direct Marketing 22
IBM SPSS Direct Marketing 22 Note Before using this information and the product it supports, read the information in Notices on page 25. Product Information This edition applies to version 22, release
More informationService courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.
Course Catalog In order to be assured that all prerequisites are met, students must acquire a permission number from the education coordinator prior to enrolling in any Biostatistics course. Courses are
More information