# Model Fitting in PROC GENMOD Jean G. Orelien, Analytical Sciences, Inc.

Save this PDF as:

Size: px
Start display at page:

Download "Model Fitting in PROC GENMOD Jean G. Orelien, Analytical Sciences, Inc."

## Transcription

1 Paper Model Fitting in PROC GENMOD Jean G. Orelien, Analytical Sciences, Inc. Abstract: There are several procedures in the SAS System for statistical modeling. Most statisticians who use the SAS system are familiar with procedures such as PROC REG and PROC GLM for fitting general linear models. However PROC GENMOD can handle these general linear models as well as more complex ones such as logistic models, loglinear models or models for count data. In addition, the main advantage of PROC GENMOD is that it can accommodate the analysis of correlated data. In this paper, we will discuss the use of PROC GENMOD to analyze simple as well as more complex statistical models. When other procedures are available to perform the same analysis, we will highlight the options from these procedures that may be missing in PROC GENMOD but might be of interest to the user. An example is given showing how PROC GENMOD is used to analyze various types of endpoints (continuous and count data) from a toxicology experiment. The materials in this paper should be accessible even to those users with limited data analysis skills. 1. Introduction Generalized linear models (GLMs) include the most common statistical models used in Statistics. This class of models includes general linear models and logistics models. It should be noted that general linear models include ANOVA models as well as regression analysis. Complex models such as those arising from correlated data (repeated measures, clustered data) can also be fitted with GLMs. The form of a GLM model is given by: f(y)'xβ % ε (1) The function f is known as the link distribution. For ANOVA, the link distribution is the identity. Other possible link functions include the logit for logistic regression or log for count data. Whereas in general linear models, it is necessary to assume that the errors are independent, have equal variances and are normally distributed, none of these assumptions are necessary in GLMs. In the SAS System, GLMs can be fitted in PROC GENMOD. But separate procedures exist for certain sub classes of GLM models such as logistic regression or general linear models. For example, PROC GLM can fit general linear models. Although regression analysis can be fitted with PROC GLM, PROC REG is more specific to this type of analysis. Similarly, there exists also a PROC ANOVA that is specific as the name indicates to ANOVA models. Another SAS procedure for analyzing a subclass of GLM models is PROC LOGISTIC. In this paper, we will provide an overview of some of the models that can be fitted with PROC GENMOD. When these models can be fitted by other SAS procedures, we will outline some differences between these procedures and GENMOD that the user needs to be aware of. In section 2, we discuss the fitting of GLM models in GENMOD and other procedures. The fitting of logistic models is discussed in section 3. Other types of models such as those involving other link distributions besides the logit and the identify as well as models for correlated data are discussed in section 4. In section 5, we provide an example showing how we have used PROC GENMOD to analyze different types of endpoints from the National Toxicology Program (NTP).

2 2. Fitting of General Linear Models in GENMOD and Other Procedures There are many procedures besides PROC GENMOD in the SAS System for the fitting of general linear models. The three most commonly used are PROC ANOVA, PROC REG and PROC GLM. PROC GLM is the most comprehensive of the three models. Any analysis of general linear models can be performed in this procedure. However, the other two procedures are more efficient or offer more options for certain subclasses of general linear models. PROC ANOVA handles analysis of variance models with balanced designs. For these models, PROC ANOVA is faster than PROC GLM. For regression models, PROC REG provides many options that are not found in other procedures. Some of the advantages of PROC REG are that the user can fit several models with one call of the procedure, there are also more options for model selections and diagnostic tools to detect multicollinearity. Multicollinearity occurs when the independent variables in the models are correlated among themselves. This can lead to large variance for the estimated coefficients and affect our interpretation of these coefficients. Although, PROC GENMOD can fit any general linear model, there are many useful options that it does not provide. For example, in an ANOVA model with random effects, one may be interested in estimating the variance components. This would not be possible in PROC GENMOD. With ANOVA models, one may also want to compare the means of an independent variable at several levels. As of version 7.0, there is an LSMEANS statement in PROC GENMOD, but unlike in PROC GLM, only the least squares estimates can be obtained with this statement, no statistical comparisons of the means can be made. It should be noted that while the other procedures that we have mentioned in this section used least squares methods to estimate the coefficient parameters, PROC GENMOD uses maximum likelihood methods. For general linear models, the maximum likelihood and the least squares methods yield the same estimates. There are instances, where the data analyst is familiar with the data and may want to use PROC GENMOD to create outputs from the analysis of general linear models that are uniform with other outputs from more complex models that can only be analyzed in PROC GENMOD. An example of that will be given in section 5. In general, we suggest that GENMOD be used for analysis of GLM models only in those instances where the analyst wishes to obtain coefficient estimates and their variance. In these cases, the analyst should know from prior experience with similar data, that the data is well behaved and that no general linear models assumptions are violated. We give below some examples of the use of PROC GENMOD for analysis of general linear models. 2.1 Example of a Regression analysis Suppose sales for a company in a district can be predicted as a function of the target population and the per capita discretionary income. The data comes from Applied Linear Statistical Models from Neter, Wasserman and Kutner (page 249). Regression coefficients can be obtained with either of the following two syntaxes: data=salesdata; model sales=target_population discretionary_income; or proc reg data=salesdata; model sales=target_population discretionary_income; 2.2 Example of an Analysis of Variance

3 Suppose a research laboratory develops a new compound for the relief of severe hay fever and wants to compare the effect of the ingredients on the outcome. The outcome being measured is number hours of relief. This hypothetical example is also taken from Neter, Kutner and Wasserman (page 722). We could perform this analysis in PROC GENMOD with the following syntax: data=compounds_data; class ingredient1 ingredient2; model hours_of_relief= ingredient1 ingredient2; The same analysis could also be performed in PROC GLM: proc glm data=compounds_data; class ingredient1 ingredient2; model hours_of_relief= ingredient1 ingredient2; 3. Fitting of Logistic Models in PROC GENMOD and PROC LOGISTIC Logistic models are of the form: p log (2) 1&p 'Xβ % ε These models are appropriate for modeling proportions. Similar to a regular regression, a logistic model can be used to predict the proportion p that will be obtained for given values of the independent variables. But a logistic model can also be used to determine whether an independent variable significantly affects the variation of the dependent variable. In these cases, we are interested in knowing whether the odds of having the outcome are the same for all levels of the dependent variable(s). For example, we may be interested in determining if the odds of having a given disease is the same for smokers and nonsmokers. If one is interested in building a model to predict the variation of a proportion as a function of dependent variables, then PROC LOGISTIC would seem to be the clear choice because of the options it provides for model selections. On the other hand, if one is only interested in finding out whether an independent variable has a significant effect on the variation of a proportion then the analyst has the choice of using either PROC LOGISTIC or PROC GENMOD. In earlier versions of the SAS System [at least up to version 6.12], there was no CLASS or CONTRAST statements in PROC LOGISTIC. Thus, for the analysis of categorical variables one might have preferred PROC GENMOD over PROC LOGISTIC in earlier versions, since these categorical variables would have to be recoded in a data step prior to the call of the LOGISTIC procedure. We give here an example of the use of PROC GENMOD for the analysis of binary data. Example of a logistic regression analysis with binary data Bliss (1935) reports the proportion of beetles killed after 5 hours of exposure at various concentrations of gaseous carbon disulphide. To obtain the regression coefficient to model proportion of beetles killed as a function of dosage, the following SAS code can be used: data=beetle_data; model number_killed/number_of_beetles= dosage/link=logit dist=binomial; PROC Logistic could also be used: proc logistic data=beetle_data; model number_killed/number_of_beetles= dosage/link=logit dist=binomial; Notice that we needed to specify the LINK and DIST option since the defaults values used would not be appropriate for the analysis of binary data. The outcome under study is expressed as a ratio of two variables. Alternatively, we could use a dichotomous variable taking values 0 or 1 to indicate whether

4 or not an individual beetle was killed and model that variable as a function of dosage of carbon disulphide. 4. Analysis of Count data and Correlated Data The main advantage of PROC GENMOD compared to other data analysis procedures is the fact that it can fit complex models that cannot be fitted in other procedures for linear models such as GLM or Logistic. In this section, we discuss the use of proc GENMOD for the analysis of count data and correlated data. PROC GENMOD can fit data arising from a number of distributions. If the distribution is not available as an option, the user can even specify that distribution. One of the distributions available in PROC genmod is the Poisson distribution which is generally used for count data. Other available distributions include Gamma, Inverse Gaussian and Negative Binomial. Correlated data can occur as the result of clustered data. Some examples of correlated data can occur as the result of taking repeated measurements on subjects or as a result of subjects belonging to the same cluster. The cluster can be a geographical region, a clinical site in a multi-site studies or a litter in a toxicity study. Failure to account for the correlation in the data can result in underestimating the variance which will lead to artificially low p- values. Several methods can be used to analyze correlated data including general linear multivariate models (GLMMs) or Linear Mixed Models. GLMM models can be fitted in PROC GLM and Linear mixed models can be fitted using PROC MIXED. Generalized estimating equations (GEE) methods which are used in GENMOD to account for correlated data in many situations may be preferred for various reasons (such as missing data or non-normality) over the other methods mentioned above. For correlated data, the analyst must specify a working correlation matrix. This working correlation matrix reflects the analyst assumption about the correlation structure between observations from the same cluster. The correlation structure can take many forms. One of the most commonly made assumptions is that the correlation within cluster is exchangeable. That is between any two elements of a cluster the correlation is the same. Other correlation structures that are available include: independent, autoregressive structure or m-dependent. The user can also specify a fixed correlation matrix. The necessary information for Proc GENMOD to model the correlation in the data is inputted through the REPEATED statement. There are Options in the REPEATED statement to specify the form of the correlation structure as well as convergence criteria. The CORR option is probably the most important option in the REPEATED statement. It is used to specify the working correlation matrix that was described above and the SUBJECT option identifies the cluster. Example Paul (1982) reported an experiment in which pregnant rabbits were dosed with an unspecified toxic substance. The foetuses were observed for skeletal and visceral abnormalities. The cluster here is the litter. Typically, in these types of experiments, it is assumed that the correlation within litter is exchangeable, that is between any two littermates the correlation is the same. The data could be analyzed with the following syntax: Proc genmod data=rabbit_toxicity; Class litter dose ; Model malformation=dose / link=logit ; Repeated subject=litter/sorted type=exch; (The sorted option tells SAS that the data is

5 properly sorted by subject.) 5. Example of the use of PROC GENMOD to analyze various types of endpoints from a toxicity study In analyzing data from toxicity studies, my preference has been to use PROC GENMOD over other procedures. In this section, I will give a brief description of the data from these toxicology experiments. I will also discuss why we prefer to use PROC GENMOD over other procedures and how we use it. A toxicology study can best be seen as a number of independent experiments of the effect of the same explanatory variable. For example to study the health effect of a given chemical, an experiment might be conducted to investigate the effect of this chemical on male reproductive organs and another experiment with the same chemical would investigate its effect on female reproductive organs. In most of these experiments such as the example from the previous section the data will be correlated and in some others, the data will be uncorrelated. For example in the reproductive toxicity studies conducted by the national institute of environmental health sciences (NIEHS) in some experiments, rodents selected from different litters are given the toxic substance and then are sacrificed. In these types of experiments, we would consider the data to be uncorrelated and traditional ANOVA methods could be used. The endpoints collected can be binary (such as malformation), continuous (such as organ weights) or count (such as sperm count). Even from the same experiment, there may be different types of endpoints. In analyzing these data, we have preferred to use PROC GENMOD over other procedures. The main reason being that with the versatility of PROC GENMOD, we can handle correlated and uncorrelated data regardless of the type of endpoints. This makes it easier to analyze all of the endpoints from an experiment using a single SAS macro. For some of these endpoints the use of the PROC MIXED procedure might have been preferable. We opted against that option since doing so would have required that we use other procedures for endpoints that don t follow the normal distribution (binary and count data). Another reason for not using PROC MIXED is the fact that for some endpoints convergence can be difficult to achieve and would require a few trials and errors. Given that the number of endpoints to be analyzed can be more than 300, it is not possible to fit the endpoints one at a time. To handle the large number of endpoints coming from these studies, we manipulated the data so that it could be sorted by endpoint, cluster and dose group. With the data sorted in this manner, the different type of endpoints (binary, continuous and count) were grouped together and analyzed in the same call to PROC GENMOD. For example, the continuous and count endpoints from an experiment where the data was correlated would be handled by the following SAS codes: /* Correlated Data */ /* Continuous Endpoints */ data=toxdata(where=(count=0)); class dose litter; link=identity covb; repeated subject=litter /type=exch maxiter=25000 covb corrb; /* Count Endpoints */ data=toxdata(where=(count=1)); class dose litter; d=poisson covb; repeated subject=litter/type=exch maxiter=25000 covb corrb;

6 In the SAS macro we have a macro variable to identify whether the observations from the experiments are correlated or not. For an experiment where the data were uncorrelated, the SAS code below would be executed by the macro: /* Uncorrelated Data */ /* Continuous Endpoints */ data=toxdata(where=(count=0)); class dose; link=identity covb; /* Count Endpoints */ data=toxdata(where=(count=1)); class dose litter; d=poisson covb; Neter J., Wasserman J. and Kutner M. (1990). Applied Linear Statistical Models. Boston: Irwin. Orelien et al. (2000). Multiple Comparison with a control in GEE models using the SAS System. Presented at the SAS User Group International Conference in Indianapolis. Stokes M.E., Davis C.S. and Koch G.G. (1995). Categorical Data Analysis Using the SAS System. Cary, SAS Institute, Inc. Contact Information Your questions and comments are welcome. Please contact: Jean G. Orelien Analytical Sciences, Inc Meridian Pkwy. Durham, NC Work Phone: (919) (ext. 125) Fax: (919) Web: Conclusion Data that can be analyzed in PROC ANOVA, PROC GLM, PROC REG, or PROC LOGISTIC can also be handled in PROC GENMOD. There are instances where the analyst is familiar enough with the data and the only output of interest are the parameter estimates, standard errors, p-values and confidence intervals. In these instances, the use of PROC GENMOD might be preferred. We have given an example from toxicology data where using PROC GENMOD is more efficient because of the different type of endpoints that have to be analyzed from these experiments. References Agresti, A. Categorical Data Analysis (1990). New York: Wiley.

### SAS Software to Fit the Generalized Linear Model

SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling

### SP10 From GLM to GLIMMIX-Which Model to Choose? Patricia B. Cerrito, University of Louisville, Louisville, KY

SP10 From GLM to GLIMMIX-Which Model to Choose? Patricia B. Cerrito, University of Louisville, Louisville, KY ABSTRACT The purpose of this paper is to investigate several SAS procedures that are used in

### MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group

MISSING DATA TECHNIQUES WITH SAS IDRE Statistical Consulting Group ROAD MAP FOR TODAY To discuss: 1. Commonly used techniques for handling missing data, focusing on multiple imputation 2. Issues that could

### Lecture 12: Generalized Linear Models for Binary Data

Lecture 12: Generalized Linear Models for Binary Data Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of

### Paper D10 2009. Ranking Predictors in Logistic Regression. Doug Thompson, Assurant Health, Milwaukee, WI

Paper D10 2009 Ranking Predictors in Logistic Regression Doug Thompson, Assurant Health, Milwaukee, WI ABSTRACT There is little consensus on how best to rank predictors in logistic regression. This paper

### USING LOGISTIC REGRESSION TO PREDICT CUSTOMER RETENTION. Andrew H. Karp Sierra Information Services, Inc. San Francisco, California USA

USING LOGISTIC REGRESSION TO PREDICT CUSTOMER RETENTION Andrew H. Karp Sierra Information Services, Inc. San Francisco, California USA Logistic regression is an increasingly popular statistical technique

### An Introduction to Generalized Linear Mixed Models Using SAS PROC GLIMMIX

An Introduction to Generalized Linear Mixed Models Using SAS PROC GLIMMIX Phil Gibbs Advanced Analytics Manager SAS Technical Support November 22, 2008 UC Riverside What We Will Cover Today What is PROC

### VI. Introduction to Logistic Regression

VI. Introduction to Logistic Regression We turn our attention now to the topic of modeling a categorical outcome as a function of (possibly) several factors. The framework of generalized linear models

### The University of British Columbia Faculty of Forestry Forestry Advanced Regression Analysis Course Outline for 2011

The University of British Columbia Faculty of Forestry Forestry 530 -- Advanced Regression Analysis Course Outline for 2011 Calendar Description: FRST 530 (3) Multiple Regression Methods. Matrix algebra;

### Statistical Modeling Using SAS

Statistical Modeling Using SAS Xiangming Fang Department of Biostatistics East Carolina University SAS Code Workshop Series 2012 Xiangming Fang (Department of Biostatistics) Statistical Modeling Using

: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

### Application of Odds Ratio and Logistic Models in Epidemiology and Health Research

Application of Odds Ratio and Logistic Models in Epidemiology and Health Research By: Min Qi Wang, PhD; James M. Eddy, DEd; Eugene C. Fitzhugh, PhD Wang, M.Q., Eddy, J.M., & Fitzhugh, E.C.* (1995). Application

### Introduction to Fixed Effects Methods

Introduction to Fixed Effects Methods 1 1.1 The Promise of Fixed Effects for Nonexperimental Research... 1 1.2 The Paired-Comparisons t-test as a Fixed Effects Method... 2 1.3 Costs and Benefits of Fixed

### Generalized Linear Models

Generalized Linear Models We have previously worked with regression models where the response variable is quantitative and normally distributed. Now we turn our attention to two types of models where the

### , then the form of the model is given by: which comprises a deterministic component involving the three regression coefficients (

Multiple regression Introduction Multiple regression is a logical extension of the principles of simple linear regression to situations in which there are several predictor variables. For instance if we

### Mixed Reviews : An Introduction to Proc Mixed ANNMARIA DE MARS, PH.D. THE JULIA GROUP SANTA MONICA, CA

Mixed Reviews : An Introduction to Proc Mixed ANNMARIA DE MARS, PH.D. THE JULIA GROUP SANTA MONICA, CA Obligatory naked mole rat slide How to do PROC MIXED, syntax using SAS 9.2 and SAS Enterprise Guide,

### SELF-TEST: SIMPLE REGRESSION

ECO 22000 McRAE SELF-TEST: SIMPLE REGRESSION Note: Those questions indicated with an (N) are unlikely to appear in this form on an in-class examination, but you should be able to describe the procedures

### Some Issues in Using PROC LOGISTIC for Binary Logistic Regression

Some Issues in Using PROC LOGISTIC for Binary Logistic Regression by David C. Schlotzhauer Contents Abstract 1. The Effect of Response Level Ordering on Parameter Estimate Interpretation 2. Odds Ratios

### 5. Ordinal regression: cumulative categories proportional odds. 6. Ordinal regression: comparison to single reference generalized logits

Lecture 23 1. Logistic regression with binary response 2. Proc Logistic and its surprises 3. quadratic model 4. Hosmer-Lemeshow test for lack of fit 5. Ordinal regression: cumulative categories proportional

### LMM: Linear Mixed Models and FEV1 Decline

LMM: Linear Mixed Models and FEV1 Decline We can use linear mixed models to assess the evidence for differences in the rate of decline for subgroups defined by covariates. S+ / R has a function lme().

### Discrete Distributions

Discrete Distributions Chapter 1 11 Introduction 1 12 The Binomial Distribution 2 13 The Poisson Distribution 8 14 The Multinomial Distribution 11 15 Negative Binomial and Negative Multinomial Distributions

### Semester 1 Statistics Short courses

Semester 1 Statistics Short courses Course: STAA0001 Basic Statistics Blackboard Site: STAA0001 Dates: Sat. March 12 th and Sat. April 30 th (9 am 5 pm) Assumed Knowledge: None Course Description Statistical

### PROC LOGISTIC: Traps for the unwary Peter L. Flom, Independent statistical consultant, New York, NY

PROC LOGISTIC: Traps for the unwary Peter L. Flom, Independent statistical consultant, New York, NY ABSTRACT Keywords: Logistic. INTRODUCTION This paper covers some gotchas in SAS R PROC LOGISTIC. A gotcha

### Basic Statistical and Modeling Procedures Using SAS

Basic Statistical and Modeling Procedures Using SAS One-Sample Tests The statistical procedures illustrated in this handout use two datasets. The first, Pulse, has information collected in a classroom

### 13. Poisson Regression Analysis

136 Poisson Regression Analysis 13. Poisson Regression Analysis We have so far considered situations where the outcome variable is numeric and Normally distributed, or binary. In clinical work one often

### Logistic Regression (1/24/13)

STA63/CBB540: Statistical methods in computational biology Logistic Regression (/24/3) Lecturer: Barbara Engelhardt Scribe: Dinesh Manandhar Introduction Logistic regression is model for regression used

### A Tutorial on Logistic Regression

A Tutorial on Logistic Regression Ying So, SAS Institute Inc., Cary, NC ABSTRACT Many procedures in SAS/STAT can be used to perform logistic regression analysis: CATMOD, GENMOD,LOGISTIC, and PROBIT. Each

### Multinomial and ordinal logistic regression using PROC LOGISTIC Peter L. Flom National Development and Research Institutes, Inc

ABSTRACT Multinomial and ordinal logistic regression using PROC LOGISTIC Peter L. Flom National Development and Research Institutes, Inc Logistic regression may be useful when we are trying to model a

### Joseph Twagilimana, University of Louisville, Louisville, KY

ST14 Comparing Time series, Generalized Linear Models and Artificial Neural Network Models for Transactional Data analysis Joseph Twagilimana, University of Louisville, Louisville, KY ABSTRACT The aim

### SUGI 29 Statistics and Data Analysis

Paper 194-29 Head of the CLASS: Impress your colleagues with a superior understanding of the CLASS statement in PROC LOGISTIC Michelle L. Pritchard and David J. Pasta Ovation Research Group, San Francisco,

### Univariate Regression

Univariate Regression Correlation and Regression The regression line summarizes the linear relationship between 2 variables Correlation coefficient, r, measures strength of relationship: the closer r is

### 11/20/2014. Correlational research is used to describe the relationship between two or more naturally occurring variables.

Correlational research is used to describe the relationship between two or more naturally occurring variables. Is age related to political conservativism? Are highly extraverted people less afraid of rejection

### I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

Beckman HLM Reading Group: Questions, Answers and Examples Carolyn J. Anderson Department of Educational Psychology I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN Linear Algebra Slide 1 of

### HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION HOD 2990 10 November 2010 Lecture Background This is a lightning speed summary of introductory statistical methods for senior undergraduate

### Multiple Regression YX1 YX2 X1X2 YX1.X2

Multiple Regression Simple or total correlation: relationship between one dependent and one independent variable, Y versus X Coefficient of simple determination: r (or r, r ) YX YX XX Partial correlation:

### Official SAS Curriculum Courses

Certificate course in Predictive Business Analytics Official SAS Curriculum Courses SAS Programming Base SAS An overview of SAS foundation Working with SAS program syntax Examining SAS data sets Accessing

### Regression Analysis Using ArcMap. By Jennie Murack

Regression Analysis Using ArcMap By Jennie Murack Regression Basics How is Regression Different from other Spatial Statistical Analyses? With other tools you ask WHERE something is happening? Are there

### Abbas S. Tavakoli, DrPH, MPH, ME 1 ; Nikki R. Wooten, PhD, LISW-CP 2,3, Jordan Brittingham, MSPH 4

1 Paper 1680-2016 Using GENMOD to Analyze Correlated Data on Military System Beneficiaries Receiving Inpatient Behavioral Care in South Carolina Care Systems Abbas S. Tavakoli, DrPH, MPH, ME 1 ; Nikki

### Y ijk = μ + τ i + β j + τβ ij + ε ijk. σ 2

Expected mean squares (EMS) for a two-way Analysis of Variance or Completely Randomized Design (CRD) with a factorial treatment arrangement when both treatment effects are fixed. Fixed effect Model Factorial

### 11. Analysis of Case-control Studies Logistic Regression

Research methods II 113 11. Analysis of Case-control Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:

### Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

Business Course Text Bowerman, Bruce L., Richard T. O'Connell, J. B. Orris, and Dawn C. Porter. Essentials of Business, 2nd edition, McGraw-Hill/Irwin, 2008, ISBN: 978-0-07-331988-9. Required Computing

### Yiming Peng, Department of Statistics. February 12, 2013

Regression Analysis Using JMP Yiming Peng, Department of Statistics February 12, 2013 2 Presentation and Data http://www.lisa.stat.vt.edu Short Courses Regression Analysis Using JMP Download Data to Desktop

### The Probit Link Function in Generalized Linear Models for Data Mining Applications

Journal of Modern Applied Statistical Methods Copyright 2013 JMASM, Inc. May 2013, Vol. 12, No. 1, 164-169 1538 9472/13/\$95.00 The Probit Link Function in Generalized Linear Models for Data Mining Applications

### Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.

Course Catalog In order to be assured that all prerequisites are met, students must acquire a permission number from the education coordinator prior to enrolling in any Biostatistics course. Courses are

### Statistical methods for the comparison of dietary intake

Appendix Y Statistical methods for the comparison of dietary intake Jianhua Wu, Petros Gousias, Nida Ziauddeen, Sonja Nicholson and Ivonne Solis- Trapala Y.1 Introduction This appendix provides an outline

### LOGIT AND PROBIT ANALYSIS

LOGIT AND PROBIT ANALYSIS A.K. Vasisht I.A.S.R.I., Library Avenue, New Delhi 110 012 amitvasisht@iasri.res.in In dummy regression variable models, it is assumed implicitly that the dependent variable Y

### Probit Analysis By: Kim Vincent

Probit Analysis By: Kim Vincent Quick Overview Probit analysis is a type of regression used to analyze binomial response variables. It transforms the sigmoid dose-response curve to a straight line that

### Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C

### INTRODUCTORY STATISTICS

INTRODUCTORY STATISTICS FIFTH EDITION Thomas H. Wonnacott University of Western Ontario Ronald J. Wonnacott University of Western Ontario WILEY JOHN WILEY & SONS New York Chichester Brisbane Toronto Singapore

### Course Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics

Course Text Business Statistics Lind, Douglas A., Marchal, William A. and Samuel A. Wathen. Basic Statistics for Business and Economics, 7th edition, McGraw-Hill/Irwin, 2010, ISBN: 9780077384470 [This

### Dealing with Missing Data for Credit Scoring

ABSTRACT Paper SD-08 Dealing with Missing Data for Credit Scoring Steve Fleming, Clarity Services Inc. How well can statistical adjustments account for missing data in the development of credit scores?

### %DISCIT Macro: Pre-screening continuous variables for subsequent binary logistic regression analysis through visualization

ABSTRACT Paper 92 %DISCIT Macro: Pre-screening continuous variables for subsequent binary logistic regression analysis through visualization Mohamed S. Anany, Videa (Cox Media Group subsidiary) In binary

### GLM I An Introduction to Generalized Linear Models

GLM I An Introduction to Generalized Linear Models CAS Ratemaking and Product Management Seminar March 2009 Presented by: Tanya D. Havlicek, Actuarial Assistant 0 ANTITRUST Notice The Casualty Actuarial

### HLM software has been one of the leading statistical packages for hierarchical

Introductory Guide to HLM With HLM 7 Software 3 G. David Garson HLM software has been one of the leading statistical packages for hierarchical linear modeling due to the pioneering work of Stephen Raudenbush

Mplus Short Courses Topic 2 Regression Analysis, Eploratory Factor Analysis, Confirmatory Factor Analysis, And Structural Equation Modeling For Categorical, Censored, And Count Outcomes Linda K. Muthén

### Models for Count Data With Overdispersion

Models for Count Data With Overdispersion Germán Rodríguez November 6, 2013 Abstract This addendum to the WWS 509 notes covers extra-poisson variation and the negative binomial model, with brief appearances

### BOOST YOUR CONFIDENCE (INTERVALS) WITH SAS

BOOST YOUR CONFIDENCE (INTERVALS) WITH SAS Brought to you by: Peter Langlois, PhD Birth Defects Epidemiology & Surveillance Branch, Texas Dept State Health Services Background Confidence Interval Definition

### LOGISTIC REGRESSION ANALYSIS

LOGISTIC REGRESSION ANALYSIS C. Mitchell Dayton Department of Measurement, Statistics & Evaluation Room 1230D Benjamin Building University of Maryland September 1992 1. Introduction and Model Logistic

### Lecture - 32 Regression Modelling Using SPSS

Applied Multivariate Statistical Modelling Prof. J. Maiti Department of Industrial Engineering and Management Indian Institute of Technology, Kharagpur Lecture - 32 Regression Modelling Using SPSS (Refer

### Introduction to Analysis Methods for Longitudinal/Clustered Data, Part 3: Generalized Estimating Equations

Introduction to Analysis Methods for Longitudinal/Clustered Data, Part 3: Generalized Estimating Equations Mark A. Weaver, PhD Family Health International Office of AIDS Research, NIH ICSSC, FHI Goa, India,

### Simple Linear Regression One Binary Categorical Independent Variable

Simple Linear Regression Does sex influence mean GCSE score? In order to answer the question posed above, we want to run a linear regression of sgcseptsnew against sgender, which is a binary categorical

### 12/31/2016. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2

PSY 512: Advanced Statistics for Psychological and Behavioral Research 2 Understand when to use multiple Understand the multiple equation and what the coefficients represent Understand different methods

### Introduction to Multivariate Models: Modeling Multivariate Outcomes with Mixed Model Repeated Measures Analyses

Introduction to Multivariate Models: Modeling Multivariate Outcomes with Mixed Model Repeated Measures Analyses Applied Multilevel Models for Cross Sectional Data Lecture 11 ICPSR Summer Workshop University

### Overview Classes. 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7)

Overview Classes 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7) 2-4 Loglinear models (8) 5-4 15-17 hrs; 5B02 Building and

### Statistics for Management II-STAT 362-Final Review

Statistics for Management II-STAT 362-Final Review Multiple Choice Identify the letter of the choice that best completes the statement or answers the question. 1. The ability of an interval estimate to

### C: LEVEL 800 {MASTERS OF ECONOMICS( ECONOMETRICS)}

C: LEVEL 800 {MASTERS OF ECONOMICS( ECONOMETRICS)} 1. EES 800: Econometrics I Simple linear regression and correlation analysis. Specification and estimation of a regression model. Interpretation of regression

### SAS Certificate Applied Statistics and SAS Programming

SAS Certificate Applied Statistics and SAS Programming SAS Certificate Applied Statistics and Advanced SAS Programming Brigham Young University Department of Statistics offers an Applied Statistics and

### CRJ Doctoral Comprehensive Exam Statistics Friday August 23, :00pm 5:30pm

CRJ Doctoral Comprehensive Exam Statistics Friday August 23, 23 2:pm 5:3pm Instructions: (Answer all questions below) Question I: Data Collection and Bivariate Hypothesis Testing. Answer the following

### IAPRI Quantitative Analysis Capacity Building Series. Multiple regression analysis & interpreting results

IAPRI Quantitative Analysis Capacity Building Series Multiple regression analysis & interpreting results How important is R-squared? R-squared Published in Agricultural Economics 0.45 Best article of the

### USING THE SAS MIXED PROCEDURE TO ANALYZE THE REPEATED MEASURES DATA

USING THE SAS MIXED PROCEDURE TO ANALYZE THE REPEATED MEASURES DATA Hongsen Zhou, PhD Statistician Iowa Foundation for Medical Care IAPRO.HZHOU@SDPS.ORG INTRODUCTION The measurement and analysis methodology

### Logistic Regression (a type of Generalized Linear Model)

Logistic Regression (a type of Generalized Linear Model) 1/36 Today Review of GLMs Logistic Regression 2/36 How do we find patterns in data? We begin with a model of how the world works We use our knowledge

### BRIEF OVERVIEW ON INTERPRETING COUNT MODEL RISK RATIOS

BRIEF OVERVIEW ON INTERPRETING COUNT MODEL RISK RATIOS An Addendum to Negative Binomial Regression Cambridge University Press (2007) Joseph M. Hilbe 2008, All Rights Reserved This short monograph is intended

### Overview of Non-Parametric Statistics PRESENTER: ELAINE EISENBEISZ OWNER AND PRINCIPAL, OMEGA STATISTICS

Overview of Non-Parametric Statistics PRESENTER: ELAINE EISENBEISZ OWNER AND PRINCIPAL, OMEGA STATISTICS About Omega Statistics Private practice consultancy based in Southern California, Medical and Clinical

### Logistic Regression. Introduction CHAPTER The Logistic Regression Model 14.2 Inference for Logistic Regression

Logistic Regression Introduction The simple and multiple linear regression methods we studied in Chapters 10 and 11 are used to model the relationship between a quantitative response variable and one or

### Silvermine House Steenberg Office Park, Tokai 7945 Cape Town, South Africa Telephone: +27 21 702 4666 www.spss-sa.com

SPSS-SA Silvermine House Steenberg Office Park, Tokai 7945 Cape Town, South Africa Telephone: +27 21 702 4666 www.spss-sa.com SPSS-SA Training Brochure 2009 TABLE OF CONTENTS 1 SPSS TRAINING COURSES FOCUSING

### Simple linear regression

Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between

### Using An Ordered Logistic Regression Model with SAS Vartanian: SW 541

Using An Ordered Logistic Regression Model with SAS Vartanian: SW 541 libname in1 >c:\=; Data first; Set in1.extract; A=1; PROC LOGIST OUTEST=DD MAXITER=100 ORDER=DATA; OUTPUT OUT=CC XBETA=XB P=PROB; MODEL

### MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS MSR = Mean Regression Sum of Squares MSE = Mean Squared Error RSS = Regression Sum of Squares SSE = Sum of Squared Errors/Residuals α = Level of Significance

### Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 )

Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 ) and Neural Networks( 類 神 經 網 路 ) 許 湘 伶 Applied Linear Regression Models (Kutner, Nachtsheim, Neter, Li) hsuhl (NUK) LR Chap 10 1 / 35 13 Examples

### Figure 1. Figure 2. Figure 3. Figure 4. normality such as non-parametric procedures.

Introduction to Building a Linear Regression Model Leslie A. Christensen The Goodyear Tire & Rubber Company, Akron Ohio Abstract This paper will explain the steps necessary to build a linear regression

### containing Kendall correlations; and the OUTH = option will create a data set containing Hoeffding statistics.

Getting Correlations Using PROC CORR Correlation analysis provides a method to measure the strength of a linear relationship between two numeric variables. PROC CORR can be used to compute Pearson product-moment

### DEPARTMENT OF ECONOMICS. Unit ECON 12122 Introduction to Econometrics. Notes 4 2. R and F tests

DEPARTMENT OF ECONOMICS Unit ECON 11 Introduction to Econometrics Notes 4 R and F tests These notes provide a summary of the lectures. They are not a complete account of the unit material. You should also

### Discrete with many values are often treated as continuous: Minnesota math scores, years of education.

Lecture 9 1. Continuous and categorical predictors 2. Indicator variables 3. General linear model: child s IQ example 4. Proc GLM and Proc Reg 5. ANOVA example: Rat diets 6. Factorial design: 2 2 7. Interactions

### Developing Risk Adjustment Techniques Using the SAS@ System for Assessing Health Care Quality in the lmsystem@

Developing Risk Adjustment Techniques Using the SAS@ System for Assessing Health Care Quality in the lmsystem@ Yanchun Xu, Andrius Kubilius Joint Commission on Accreditation of Healthcare Organizations,

### Standard Deviation Calculator

CSS.com Chapter 35 Standard Deviation Calculator Introduction The is a tool to calculate the standard deviation from the data, the standard error, the range, percentiles, the COV, confidence limits, or

### MULTIPLE REGRESSION WITH CATEGORICAL DATA

DEPARTMENT OF POLITICAL SCIENCE AND INTERNATIONAL RELATIONS Posc/Uapp 86 MULTIPLE REGRESSION WITH CATEGORICAL DATA I. AGENDA: A. Multiple regression with categorical variables. Coding schemes. Interpreting

### Module 4 - Multiple Logistic Regression

Module 4 - Multiple Logistic Regression Objectives Understand the principles and theory underlying logistic regression Understand proportions, probabilities, odds, odds ratios, logits and exponents Be

### Chapter 1 Introduction. 1.1 Introduction

Chapter 1 Introduction 1.1 Introduction 1 1.2 What Is a Monte Carlo Study? 2 1.2.1 Simulating the Rolling of Two Dice 2 1.3 Why Is Monte Carlo Simulation Often Necessary? 4 1.4 What Are Some Typical Situations

### Module 9: Nonparametric Tests. The Applied Research Center

Module 9: Nonparametric Tests The Applied Research Center Module 9 Overview } Nonparametric Tests } Parametric vs. Nonparametric Tests } Restrictions of Nonparametric Tests } One-Sample Chi-Square Test

### X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)

CORRELATION AND REGRESSION / 47 CHAPTER EIGHT CORRELATION AND REGRESSION Correlation and regression are statistical methods that are commonly used in the medical literature to compare two or more variables.

### For example, enter the following data in three COLUMNS in a new View window.

Statistics with Statview - 18 Paired t-test A paired t-test compares two groups of measurements when the data in the two groups are in some way paired between the groups (e.g., before and after on the

### Multiple Regression Analysis in Minitab 1

Multiple Regression Analysis in Minitab 1 Suppose we are interested in how the exercise and body mass index affect the blood pressure. A random sample of 10 males 50 years of age is selected and their

### Interpreting Multiple Regression: A Short Overview

Interpreting Multiple Regression: A Short Overview Abdel-Salam G. Abdel-Salam Laboratory for Interdisciplinary Statistical Analysis (LISA) Department of Statistics Virginia Polytechnic Institute and State

### Package dsmodellingclient

Package dsmodellingclient Maintainer Author Version 4.1.0 License GPL-3 August 20, 2015 Title DataSHIELD client site functions for statistical modelling DataSHIELD

### Generalized Linear Models. Today: definition of GLM, maximum likelihood estimation. Involves choice of a link function (systematic component)

Generalized Linear Models Last time: definition of exponential family, derivation of mean and variance (memorize) Today: definition of GLM, maximum likelihood estimation Include predictors x i through

### Development Period 1 2 3 4 5 6 7 8 9 Observed Payments

Pricing and reserving in the general insurance industry Solutions developed in The SAS System John Hansen & Christian Larsen, Larsen & Partners Ltd 1. Introduction The two business solutions presented