BIOSTAT640 R-Solution for HW7 Logistic Minming Li & Steele H. Valenzuela Mar.10, 2016
|
|
- Eustacia Kelly
- 7 years ago
- Views:
Transcription
1 BIOSTAT640 R-Solution for HW7 Logistic Minming Li & Steele H. Valenzuela Mar.10, 2016 Required Libraries If there is an error in loading the libraries, you must first install the packaged library (install.packages("package name")), with package name in quotes, and then run the library() command, where the package name is not in quotes. library(foreign) # install.packages("readstata13") library(readstata13) # install.packages("proc", "dplyr", "lmtest") library(proc) # use roc{proc} and plot.roc{proc} library(dplyr) # use glimpse{dplyr} library(lmtest) # use lrtest{lmtest} # install.packages("resourceselection") library(resourceselection) # use hoslem.test{resourceselection} Upload Data We will be downloading the following data set from the 640 course website: link <- " dat <- read.dta13(link) Warning in read.dta13(link): agegp: Factor codes of type double or float detected - no labels assigned. Set option nonint.factors to TRUE to assign labels anyway. Warning in read.dta13(link): tobgp: Factor codes of type double or float detected - no labels assigned. Set option nonint.factors to TRUE to assign labels anyway. Warning in read.dta13(link): alcgp: Factor codes of type double or float detected - no labels assigned. Set option nonint.factors to TRUE to assign labels anyway. glimpse(dat) Observations: 975 Variables: 7 $ case (dbl) 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,... $ age (dbl) 42, 45, 35, 78, 45, 64, 76, 42, 48, 42, 37, 45, 60, 54,... 1
2 $ agegp (dbl) 2, 3, 2, 6, 3, 4, 6, 2, 3, 2, 2, 3, 4, 3, 5, 4, 3, 4, 3,... $ tob (dbl) 0.0, 7.5, 0.0, 0.0, 7.5, 17.5, 2.5, 0.0, 12.5, 25.0, $ tobgp (dbl) 1, 1, 1, 1, 1, 2, 1, 1, 2, 3, 2, 2, 1, 3, 1, 1, 3, 1, 1,... $ alc (dbl) 139, 66, 24, 39, 64, 49, 1, 33, 55, 62, 77, 84, 4, 32, 2... $ alcgp (dbl) 4, 2, 1, 1, 2, 2, 1, 1, 2, 2, 2, 3, 1, 1, 1, 1, 1, 2, 1,... # str(dat) 1. Global Chi Square test VS. Likelihood Ratio Test of reduced model versus current model Firstly, fit a Logistic Regression Model The command glm(y ~ x_1 + x_ x_n, family = binomial) will fit logistic regression models. And here is our first logistic model, with our predictor/dependent variable case, and our independent variables Tobacco Use/tobgp and Age/agegp. model1 <- glm(case ~ tobgp + agegp, data = dat, family = binomial) summary(model1) Call: glm(formula = case ~ tobgp + agegp, family = binomial, data = dat) Deviance Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error z value Pr(> z ) (Intercept) < 2e-16 *** tobgp e-09 *** agegp < 2e-16 *** --- Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: on 974 degrees of freedom Residual deviance: on 972 degrees of freedom AIC: Number of Fisher Scoring iterations: 5 # Odds Ratios and 95% CI: exp(cbind(or = coef(model1), confint(model1))) Waiting for profiling to be done... 2
3 OR 2.5 % 97.5 % (Intercept) tobgp agegp The Likelihood Ratio test and the Global Chi-Square test (also called Score test) usually give compatible results. If the significance results differ, use the p-value from the Likelihood Ratio Test. # fit into intercept only model model2 <- glm(case ~ 1, data = dat, family = binomial) summary(model2) Call: glm(formula = case ~ 1, family = binomial, data = dat) Deviance Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error z value Pr(> z ) (Intercept) <2e-16 *** --- Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: on 974 degrees of freedom Residual deviance: on 974 degrees of freedom AIC: Number of Fisher Scoring iterations: 4 # Likelihood Ratio Test of reduced model versus current model # lrtest(model2, model1) # L.R.Chisq is Summary It is of interest to know whether the inclusion of extra predictors to a model is statistically significant. The smaller model ( reduced ) contains the control variables. The larger model ( full ) contains the control variables plus the extra variables in question. First, the easy way. From the package lmtest will be a function called lrtest(...), hence the Likelihood Ratio test. lrtest(model2, model1) 3
4 Likelihood ratio test Model 1: case ~ 1 Model 2: case ~ tobgp + agegp #Df LogLik Df Chisq Pr(>Chisq) < 2.2e-16 *** --- Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 We see that the order is simple, the reduced model followed by the full model. The output displays the degrees of freedom for each model (#DF), the value of the log-likelihoods for each model, the difference in degrees of freedom (DF), followed by the LR statistic or in this case the Chi-square statistic, and lastyl, the p-value which determines if the extra variable in the full model is significant. That is a lot to absord but it s extremely valuable to understand. Now, for the long way. First, let s find the LR statistic. There are two ways, which follows the aforementioned formula. We must first be aware of the objects that are contained in model2. If you perform the command names(model2), you ll see that deviance is an option. You pull out the value of that object as specified in the code below. Additionally, R contains the command loglik(...) that you may also use to obtain the LR object. LR.statistic.1 <- model2$deviance - model1$deviance LR.statistic.1 [1] LR.statistic.2 <- -2*logLik(model2)[1] - (-2*logLik(model1)[1]) LR.statistic.2 [1] After finding the LR test statistic two different ways, we find the p-value under a chi-square distribution with degrees of freedom = k, where k represents the number of extra variables the full model has over the reduced model. In this case, k=2. pchisq(lr.statistic.1, 1, lower.tail = FALSE) [1] e Illustration of Hosmer-Lemeshow Goodness of Fit Test (NULL: model is a good fit) library(resourceselection) hoslem.test(model1$y, fitted(model1), g = 9) 4
5 Hosmer and Lemeshow goodness of fit (GOF) test data: model1$y, fitted(model1) X-squared = , df = 7, p-value = # g: number of bins to use to calculate quantiles. pchisq(9.0807, df = 7, lower.tail = FALSE) [1] WHAT TO LOOK FOR: Evidence of a OVERALL GOODNESS OF FIT is reflected in a NON-SIGNIFICANT p-value. Here, the Hosmer-Lemeshow test p-value is non-significant, which suggest a good overall fit. 3. Illustration of Link Test (same as linktest in Stata) Here is a great description from UCLA s statistical computation website: The Stata command linktest can be used to detect a specification error, and it is issued after the logit or logistic command. The idea behind linktest is that if the model is properly specified, one should not be able to find any additional predictors that are statistically significant except by chance. After the regression command (in our case, logit or logistic), linktest uses the linear predicted value (_hat) and linear predicted value squared (_hatsq) as the predictors to rebuild the model. The variable _hat should be a statistically significant predictor, since it is the predicted value from the model. This will be the case unless the model is completely misspecified. On the other hand, if our model is properly specified, variable _hatsq shouldn t have much predictive power except by chance. Therefore, if _hatsq is significant, then the linktest is significant. This usually means that either we have omitted relevant variable(s) or our link function is not correctly specified." model3 <- glm(case ~ agegp, data = dat, family = binomial) summary(model3) Call: glm(formula = case ~ agegp, family = binomial, data = dat) Deviance Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error z value Pr(> z ) (Intercept) <2e-16 *** agegp <2e-16 *** --- Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 (Dispersion parameter for binomial family taken to be 1) 5
6 Null deviance: on 974 degrees of freedom Residual deviance: on 973 degrees of freedom AIC: Number of Fisher Scoring iterations: 4 hat <- predict(model3) # hat hatsq <- hat^2 # hatsq linktest <- summary(glm(case ~ hat + hatsq, data = dat, family = binomial)) linktest Call: glm(formula = case ~ hat + hatsq, family = binomial, data = dat) Deviance Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error z value Pr(> z ) (Intercept) *** hat ** hatsq e-07 *** --- Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: on 974 degrees of freedom Residual deviance: on 972 degrees of freedom AIC: Number of Fisher Scoring iterations: 6 # try another model: model1 hat <- predict(model1) # hat hatsq <- hat^2 # hatsq linktest <- summary(glm(case ~ hat + hatsq, data = dat, family = binomial)) linktest Call: glm(formula = case ~ hat + hatsq, family = binomial, data = dat) Deviance Residuals: Min 1Q Median 3Q Max
7 Coefficients: Estimate Std. Error z value Pr(> z ) (Intercept) hat hatsq ** --- Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: on 974 degrees of freedom Residual deviance: on 972 degrees of freedom AIC: Number of Fisher Scoring iterations: 6 4. Illustration of Classification Table of current model # observed vs. predicted # Classified as case if predicted Pr(D) >=.5 pred <- predict(model1) # pred # if pred>0.5, then set it as 1, otherwise set it as 0; pred_cut <- ifelse(pred>0.5, 1, 0) # pred_cut table(pred_cut) pred_cut # dat$case mydata <- as.data.frame(cbind(case=dat$case, pred_cut)) # is.data.frame(mydata) # head(mydata) # 2-Way Cross Tabulation library(gmodels) Attaching package: 'gmodels' The following object is masked from 'package:proc': ci 7
8 # The CrossTable( ) function in the gmodels package produces crosstabulations, similar to PROC FREQ in S CrossTable(mydata$case, mydata$pred_cut) Cell Contents N Chi-square contribution N / Row Total N / Col Total N / Table Total Total Observations in Table: 975 mydata$pred_cut mydata$case 0 1 Row Total Column Total mydat <- as.table(matrix(c(4, 196, 7, 768), nrow=2)) mydat A B A 4 7 B colnames(mydat) <- c("case","control") rownames(mydat) <- c("pred+","pred-") mydat Case Control Pred+ 4 7 Pred
9 library(epir) Loading required package: survival Package epir is loaded Type help(epi.about) for summary information # epi.tests(): Sensitivity, specificity and predictive value (PPV, NPV) of a diagnostic test epi.tests(mydat) Outcome + Outcome - Total Test Test Total Point estimates and 95 % CIs: Apparent prevalence 0.01 (0.01, 0.02) True prevalence 0.21 (0.18, 0.23) Sensitivity 0.02 (0.01, 0.05) Specificity 0.99 (0.98, 1.00) Positive predictive value 0.36 (0.11, 0.69) Negative predictive value 0.80 (0.77, 0.82) Positive likelihood ratio 2.21 (0.65, 7.49) Negative likelihood ratio 0.99 (0.97, 1.01) # is.table(mydat) # TRUE [1] TRUE is.data.frame(mydat) # FALSE [1] FALSE # because epi.tests(dat), the data needs to be a TABLE, NOT A DATA.FRAME! 5. Illustration of ROC Curve An ROC curve ( Receiver-Operating Characteristic) is a visual display of the overall performance of a fitted logistic model and its associated equation for predicted probabilities. 9
10 library(proc) roc1 <- roc(dat$case, fitted(model1)) plot.roc(roc1, print.auc = TRUE, legacy.axes = TRUE) Sensitivity AUC: Specificity Call: roc.default(response = dat$case, predictor = fitted(model1)) Data: fitted(model1) in 775 controls (dat$case 0) < 200 cases (dat$case 1). Area under the curve: WHAT TO LOOK FOR: Classification that is no better than a coin toss is referenced in the 45 degree line. Evidence of a GOOD FIT is reflected in an ROC curve that lies above the 45 degree line reference. Area under the ROC curve = , which says that 74.6% of the observations are correctly classified. 10
Generalized Linear Models
Generalized Linear Models We have previously worked with regression models where the response variable is quantitative and normally distributed. Now we turn our attention to two types of models where the
More informationSTATISTICA Formula Guide: Logistic Regression. Table of Contents
: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary
More informationA Handbook of Statistical Analyses Using R. Brian S. Everitt and Torsten Hothorn
A Handbook of Statistical Analyses Using R Brian S. Everitt and Torsten Hothorn CHAPTER 6 Logistic Regression and Generalised Linear Models: Blood Screening, Women s Role in Society, and Colonic Polyps
More informationMultivariate Logistic Regression
1 Multivariate Logistic Regression As in univariate logistic regression, let π(x) represent the probability of an event that depends on p covariates or independent variables. Then, using an inv.logit formulation
More informationLab 13: Logistic Regression
Lab 13: Logistic Regression Spam Emails Today we will be working with a corpus of emails received by a single gmail account over the first three months of 2012. Just like any other email address this account
More informationRégression logistique : introduction
Chapitre 16 Introduction à la statistique avec R Régression logistique : introduction Une variable à expliquer binaire Expliquer un risque suicidaire élevé en prison par La durée de la peine L existence
More informationLogistic Regression (a type of Generalized Linear Model)
Logistic Regression (a type of Generalized Linear Model) 1/36 Today Review of GLMs Logistic Regression 2/36 How do we find patterns in data? We begin with a model of how the world works We use our knowledge
More informationLogistic regression (with R)
Logistic regression (with R) Christopher Manning 4 November 2007 1 Theory We can transform the output of a linear regression to be suitable for probabilities by using a logit link function on the lhs as
More information11. Analysis of Case-control Studies Logistic Regression
Research methods II 113 11. Analysis of Case-control Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:
More informationExamining a Fitted Logistic Model
STAT 536 Lecture 16 1 Examining a Fitted Logistic Model Deviance Test for Lack of Fit The data below describes the male birth fraction male births/total births over the years 1931 to 1990. A simple logistic
More informationUsing R for Linear Regression
Using R for Linear Regression In the following handout words and symbols in bold are R functions and words and symbols in italics are entries supplied by the user; underlined words and symbols are optional
More informationHow to set the main menu of STATA to default factory settings standards
University of Pretoria Data analysis for evaluation studies Examples in STATA version 11 List of data sets b1.dta (To be created by students in class) fp1.xls (To be provided to students) fp1.txt (To be
More informationOverview Classes. 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7)
Overview Classes 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7) 2-4 Loglinear models (8) 5-4 15-17 hrs; 5B02 Building and
More informationMSwM examples. Jose A. Sanchez-Espigares, Alberto Lopez-Moreno Dept. of Statistics and Operations Research UPC-BarcelonaTech.
MSwM examples Jose A. Sanchez-Espigares, Alberto Lopez-Moreno Dept. of Statistics and Operations Research UPC-BarcelonaTech February 24, 2014 Abstract Two examples are described to illustrate the use of
More informationCool Tools for PROC LOGISTIC
Cool Tools for PROC LOGISTIC Paul D. Allison Statistical Horizons LLC and the University of Pennsylvania March 2013 www.statisticalhorizons.com 1 New Features in LOGISTIC ODDSRATIO statement EFFECTPLOT
More informationANALYSING LIKERT SCALE/TYPE DATA, ORDINAL LOGISTIC REGRESSION EXAMPLE IN R.
ANALYSING LIKERT SCALE/TYPE DATA, ORDINAL LOGISTIC REGRESSION EXAMPLE IN R. 1. Motivation. Likert items are used to measure respondents attitudes to a particular question or statement. One must recall
More informationStatistical Models in R
Statistical Models in R Some Examples Steven Buechler Department of Mathematics 276B Hurley Hall; 1-6233 Fall, 2007 Outline Statistical Models Structure of models in R Model Assessment (Part IA) Anova
More informationSAS Software to Fit the Generalized Linear Model
SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling
More informationLogistic Regression. http://faculty.chass.ncsu.edu/garson/pa765/logistic.htm#sigtests
Logistic Regression http://faculty.chass.ncsu.edu/garson/pa765/logistic.htm#sigtests Overview Binary (or binomial) logistic regression is a form of regression which is used when the dependent is a dichotomy
More informationLecture 18: Logistic Regression Continued
Lecture 18: Logistic Regression Continued Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South Carolina
More informationSUGI 29 Statistics and Data Analysis
Paper 194-29 Head of the CLASS: Impress your colleagues with a superior understanding of the CLASS statement in PROC LOGISTIC Michelle L. Pritchard and David J. Pasta Ovation Research Group, San Francisco,
More informationOutline. Dispersion Bush lupine survival Quasi-Binomial family
Outline 1 Three-way interactions 2 Overdispersion in logistic regression Dispersion Bush lupine survival Quasi-Binomial family 3 Simulation for inference Why simulations Testing model fit: simulating the
More informationBasic Statistical and Modeling Procedures Using SAS
Basic Statistical and Modeling Procedures Using SAS One-Sample Tests The statistical procedures illustrated in this handout use two datasets. The first, Pulse, has information collected in a classroom
More informationVI. Introduction to Logistic Regression
VI. Introduction to Logistic Regression We turn our attention now to the topic of modeling a categorical outcome as a function of (possibly) several factors. The framework of generalized linear models
More informationdata visualization and regression
data visualization and regression Sepal.Length 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 I. setosa I. versicolor I. virginica I. setosa I. versicolor I. virginica Species Species
More informationOrdinal Regression. Chapter
Ordinal Regression Chapter 4 Many variables of interest are ordinal. That is, you can rank the values, but the real distance between categories is unknown. Diseases are graded on scales from least severe
More informationRegression 3: Logistic Regression
Regression 3: Logistic Regression Marco Baroni Practical Statistics in R Outline Logistic regression Logistic regression in R Outline Logistic regression Introduction The model Looking at and comparing
More informationMIXED MODEL ANALYSIS USING R
Research Methods Group MIXED MODEL ANALYSIS USING R Using Case Study 4 from the BIOMETRICS & RESEARCH METHODS TEACHING RESOURCE BY Stephen Mbunzi & Sonal Nagda www.ilri.org/rmg www.worldagroforestrycentre.org/rmg
More information5. Linear Regression
5. Linear Regression Outline.................................................................... 2 Simple linear regression 3 Linear model............................................................. 4
More informationSimple example of collinearity in logistic regression
1 Confounding and Collinearity in Multivariate Logistic Regression We have already seen confounding and collinearity in the context of linear regression, and all definitions and issues remain essentially
More informationE(y i ) = x T i β. yield of the refined product as a percentage of crude specific gravity vapour pressure ASTM 10% point ASTM end point in degrees F
Random and Mixed Effects Models (Ch. 10) Random effects models are very useful when the observations are sampled in a highly structured way. The basic idea is that the error associated with any linear,
More informationDeveloping Risk Adjustment Techniques Using the SAS@ System for Assessing Health Care Quality in the lmsystem@
Developing Risk Adjustment Techniques Using the SAS@ System for Assessing Health Care Quality in the lmsystem@ Yanchun Xu, Andrius Kubilius Joint Commission on Accreditation of Healthcare Organizations,
More informationFailure to take the sampling scheme into account can lead to inaccurate point estimates and/or flawed estimates of the standard errors.
Analyzing Complex Survey Data: Some key issues to be aware of Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised January 24, 2015 Rather than repeat material that is
More informationPsychology 205: Research Methods in Psychology
Psychology 205: Research Methods in Psychology Using R to analyze the data for study 2 Department of Psychology Northwestern University Evanston, Illinois USA November, 2012 1 / 38 Outline 1 Getting ready
More informationExchange Rate Regime Analysis for the Chinese Yuan
Exchange Rate Regime Analysis for the Chinese Yuan Achim Zeileis Ajay Shah Ila Patnaik Abstract We investigate the Chinese exchange rate regime after China gave up on a fixed exchange rate to the US dollar
More informationStatistics, Data Analysis & Econometrics
Using the LOGISTIC Procedure to Model Responses to Financial Services Direct Marketing David Marsh, Senior Credit Risk Modeler, Canadian Tire Financial Services, Welland, Ontario ABSTRACT It is more important
More informationApplied Statistics. J. Blanchet and J. Wadsworth. Institute of Mathematics, Analysis, and Applications EPF Lausanne
Applied Statistics J. Blanchet and J. Wadsworth Institute of Mathematics, Analysis, and Applications EPF Lausanne An MSc Course for Applied Mathematicians, Fall 2012 Outline 1 Model Comparison 2 Model
More informationThis can dilute the significance of a departure from the null hypothesis. We can focus the test on departures of a particular form.
One-Degree-of-Freedom Tests Test for group occasion interactions has (number of groups 1) number of occasions 1) degrees of freedom. This can dilute the significance of a departure from the null hypothesis.
More informationI L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN
Beckman HLM Reading Group: Questions, Answers and Examples Carolyn J. Anderson Department of Educational Psychology I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN Linear Algebra Slide 1 of
More informationLOGISTIC REGRESSION ANALYSIS
LOGISTIC REGRESSION ANALYSIS C. Mitchell Dayton Department of Measurement, Statistics & Evaluation Room 1230D Benjamin Building University of Maryland September 1992 1. Introduction and Model Logistic
More informationLogistic Regression 1. y log( ) logit( y) 1 y = = +
Written by: Robin Beaumont e-mail: robin@organplayers.co.uk Date Thursday, 05 July 2012 Version: 3 Logistic Regression model Can be converted to a probability dependent variable = outcome/event log odds
More informationStatistical Models in R
Statistical Models in R Some Examples Steven Buechler Department of Mathematics 276B Hurley Hall; 1-6233 Fall, 2007 Outline Statistical Models Linear Models in R Regression Regression analysis is the appropriate
More informationWe extended the additive model in two variables to the interaction model by adding a third term to the equation.
Quadratic Models We extended the additive model in two variables to the interaction model by adding a third term to the equation. Similarly, we can extend the linear model in one variable to the quadratic
More informationLecture 14: GLM Estimation and Logistic Regression
Lecture 14: GLM Estimation and Logistic Regression Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South
More informationChapter 23. Two Categorical Variables: The Chi-Square Test
Chapter 23. Two Categorical Variables: The Chi-Square Test 1 Chapter 23. Two Categorical Variables: The Chi-Square Test Two-Way Tables Note. We quickly review two-way tables with an example. Example. Exercise
More informationUnit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression
Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Objectives: To perform a hypothesis test concerning the slope of a least squares line To recognize that testing for a
More informationFamily economics data: total family income, expenditures, debt status for 50 families in two cohorts (A and B), annual records from 1990 1995.
Lecture 18 1. Random intercepts and slopes 2. Notation for mixed effects models 3. Comparing nested models 4. Multilevel/Hierarchical models 5. SAS versions of R models in Gelman and Hill, chapter 12 1
More informationAdditional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm
Mgt 540 Research Methods Data Analysis 1 Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm http://web.utk.edu/~dap/random/order/start.htm
More information13. Poisson Regression Analysis
136 Poisson Regression Analysis 13. Poisson Regression Analysis We have so far considered situations where the outcome variable is numeric and Normally distributed, or binary. In clinical work one often
More informationUnit 12 Logistic Regression Supplementary Chapter 14 in IPS On CD (Chap 16, 5th ed.)
Unit 12 Logistic Regression Supplementary Chapter 14 in IPS On CD (Chap 16, 5th ed.) Logistic regression generalizes methods for 2-way tables Adds capability studying several predictors, but Limited to
More informationAnalysing Questionnaires using Minitab (for SPSS queries contact -) Graham.Currell@uwe.ac.uk
Analysing Questionnaires using Minitab (for SPSS queries contact -) Graham.Currell@uwe.ac.uk Structure As a starting point it is useful to consider a basic questionnaire as containing three main sections:
More informationBivariate Statistics Session 2: Measuring Associations Chi-Square Test
Bivariate Statistics Session 2: Measuring Associations Chi-Square Test Features Of The Chi-Square Statistic The chi-square test is non-parametric. That is, it makes no assumptions about the distribution
More informationGoodness of Fit Tests for Categorical Data: Comparing Stata, R and SAS
Goodness of Fit Tests for Categorical Data: Comparing Stata, R and SAS Rino Bellocco 1,2, Sc.D. Sara Algeri 1, MS 1 University of Milano-Bicocca, Milan, Italy & 2 Karolinska Institutet, Stockholm, Sweden
More informationA LOGISTIC REGRESSION MODEL TO PREDICT FRESHMEN ENROLLMENTS Vijayalakshmi Sampath, Andrew Flagel, Carolina Figueroa
A LOGISTIC REGRESSION MODEL TO PREDICT FRESHMEN ENROLLMENTS Vijayalakshmi Sampath, Andrew Flagel, Carolina Figueroa ABSTRACT Predictive modeling is the technique of using historical information on a certain
More informationChapter 13. Chi-Square. Crosstabs and Nonparametric Tests. Specifically, we demonstrate procedures for running two separate
1 Chapter 13 Chi-Square This section covers the steps for running and interpreting chi-square analyses using the SPSS Crosstabs and Nonparametric Tests. Specifically, we demonstrate procedures for running
More informationAdatelemzés II. [SST35]
Adatelemzés II. [SST35] Idősorok 0. Lőw András low.andras@gmail.com 2011. október 26. Lőw A (low.andras@gmail.com Adatelemzés II. [SST35] 2011. október 26. 1 / 24 Vázlat 1 Honnan tudja a summary, hogy
More informationBinary Diagnostic Tests Two Independent Samples
Chapter 537 Binary Diagnostic Tests Two Independent Samples Introduction An important task in diagnostic medicine is to measure the accuracy of two diagnostic tests. This can be done by comparing summary
More informationABSTRACT INTRODUCTION
Paper SP03-2009 Illustrative Logistic Regression Examples using PROC LOGISTIC: New Features in SAS/STAT 9.2 Robert G. Downer, Grand Valley State University, Allendale, MI Patrick J. Richardson, Van Andel
More informationBinary Logistic Regression
Binary Logistic Regression Main Effects Model Logistic regression will accept quantitative, binary or categorical predictors and will code the latter two in various ways. Here s a simple model including
More informationUsing An Ordered Logistic Regression Model with SAS Vartanian: SW 541
Using An Ordered Logistic Regression Model with SAS Vartanian: SW 541 libname in1 >c:\=; Data first; Set in1.extract; A=1; PROC LOGIST OUTEST=DD MAXITER=100 ORDER=DATA; OUTPUT OUT=CC XBETA=XB P=PROB; MODEL
More informationChapter 29 The GENMOD Procedure. Chapter Table of Contents
Chapter 29 The GENMOD Procedure Chapter Table of Contents OVERVIEW...1365 WhatisaGeneralizedLinearModel?...1366 ExamplesofGeneralizedLinearModels...1367 TheGENMODProcedure...1368 GETTING STARTED...1370
More informationTwo Correlated Proportions (McNemar Test)
Chapter 50 Two Correlated Proportions (Mcemar Test) Introduction This procedure computes confidence intervals and hypothesis tests for the comparison of the marginal frequencies of two factors (each with
More informationUsing Stata for Categorical Data Analysis
Using Stata for Categorical Data Analysis NOTE: These problems make extensive use of Nick Cox s tab_chi, which is actually a collection of routines, and Adrian Mander s ipf command. From within Stata,
More informationSPSS Guide: Regression Analysis
SPSS Guide: Regression Analysis I put this together to give you a step-by-step guide for replicating what we did in the computer lab. It should help you run the tests we covered. The best way to get familiar
More informationComparing Nested Models
Comparing Nested Models ST 430/514 Two models are nested if one model contains all the terms of the other, and at least one additional term. The larger model is the complete (or full) model, and the smaller
More informationChapter 7: Simple linear regression Learning Objectives
Chapter 7: Simple linear regression Learning Objectives Reading: Section 7.1 of OpenIntro Statistics Video: Correlation vs. causation, YouTube (2:19) Video: Intro to Linear Regression, YouTube (5:18) -
More informationLecture 19: Conditional Logistic Regression
Lecture 19: Conditional Logistic Regression Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South Carolina
More informationStepwise Regression. Chapter 311. Introduction. Variable Selection Procedures. Forward (Step-Up) Selection
Chapter 311 Introduction Often, theory and experience give only general direction as to which of a pool of candidate variables (including transformed variables) should be included in the regression model.
More informationMultiple Linear Regression
Multiple Linear Regression A regression with two or more explanatory variables is called a multiple regression. Rather than modeling the mean response as a straight line, as in simple regression, it is
More information1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96
1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years
More informationMULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS
MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS MSR = Mean Regression Sum of Squares MSE = Mean Squared Error RSS = Regression Sum of Squares SSE = Sum of Squared Errors/Residuals α = Level of Significance
More informationEDUCATION AND VOCABULARY MULTIPLE REGRESSION IN ACTION
EDUCATION AND VOCABULARY MULTIPLE REGRESSION IN ACTION EDUCATION AND VOCABULARY 5-10 hours of input weekly is enough to pick up a new language (Schiff & Myers, 1988). Dutch children spend 5.5 hours/day
More informationPrinciples of Hypothesis Testing for Public Health
Principles of Hypothesis Testing for Public Health Laura Lee Johnson, Ph.D. Statistician National Center for Complementary and Alternative Medicine johnslau@mail.nih.gov Fall 2011 Answers to Questions
More informationBeginning Tutorials. PROC FREQ: It s More Than Counts Richard Severino, The Queen s Medical Center, Honolulu, HI OVERVIEW.
Paper 69-25 PROC FREQ: It s More Than Counts Richard Severino, The Queen s Medical Center, Honolulu, HI ABSTRACT The FREQ procedure can be used for more than just obtaining a simple frequency distribution
More informationSUMAN DUVVURU STAT 567 PROJECT REPORT
SUMAN DUVVURU STAT 567 PROJECT REPORT SURVIVAL ANALYSIS OF HEROIN ADDICTS Background and introduction: Current illicit drug use among teens is continuing to increase in many countries around the world.
More informationLesson 1: Comparison of Population Means Part c: Comparison of Two- Means
Lesson : Comparison of Population Means Part c: Comparison of Two- Means Welcome to lesson c. This third lesson of lesson will discuss hypothesis testing for two independent means. Steps in Hypothesis
More informationCorrelation and Simple Linear Regression
Correlation and Simple Linear Regression We are often interested in studying the relationship among variables to determine whether they are associated with one another. When we think that changes in a
More informationLets suppose we rolled a six-sided die 150 times and recorded the number of times each outcome (1-6) occured. The data is
In this lab we will look at how R can eliminate most of the annoying calculations involved in (a) using Chi-Squared tests to check for homogeneity in two-way tables of catagorical data and (b) computing
More informationModule 14: Missing Data Stata Practical
Module 14: Missing Data Stata Practical Jonathan Bartlett & James Carpenter London School of Hygiene & Tropical Medicine www.missingdata.org.uk Supported by ESRC grant RES 189-25-0103 and MRC grant G0900724
More informationThe Chi-Square Test. STAT E-50 Introduction to Statistics
STAT -50 Introduction to Statistics The Chi-Square Test The Chi-square test is a nonparametric test that is used to compare experimental results with theoretical models. That is, we will be comparing observed
More informationEARLY VS. LATE ENROLLERS: DOES ENROLLMENT PROCRASTINATION AFFECT ACADEMIC SUCCESS? 2007-08
EARLY VS. LATE ENROLLERS: DOES ENROLLMENT PROCRASTINATION AFFECT ACADEMIC SUCCESS? 2007-08 PURPOSE Matthew Wetstein, Alyssa Nguyen & Brianna Hays The purpose of the present study was to identify specific
More informationNCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )
Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates
More informationSimple Linear Regression Inference
Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation
More informationUnivariate Regression
Univariate Regression Correlation and Regression The regression line summarizes the linear relationship between 2 variables Correlation coefficient, r, measures strength of relationship: the closer r is
More informationData Mining: An Overview of Methods and Technologies for Increasing Profits in Direct Marketing. C. Olivia Rud, VP, Fleet Bank
Data Mining: An Overview of Methods and Technologies for Increasing Profits in Direct Marketing C. Olivia Rud, VP, Fleet Bank ABSTRACT Data Mining is a new term for the common practice of searching through
More informationExample: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.
Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C
More informationA Study to Predict No Show Probability for a Scheduled Appointment at Free Health Clinic
A Study to Predict No Show Probability for a Scheduled Appointment at Free Health Clinic Report prepared for Brandon Slama Department of Health Management and Informatics University of Missouri, Columbia
More informationFrom this it is not clear what sort of variable that insure is so list the first 10 observations.
MNL in Stata We have data on the type of health insurance available to 616 psychologically depressed subjects in the United States (Tarlov et al. 1989, JAMA; Wells et al. 1989, JAMA). The insurance is
More informationHLM software has been one of the leading statistical packages for hierarchical
Introductory Guide to HLM With HLM 7 Software 3 G. David Garson HLM software has been one of the leading statistical packages for hierarchical linear modeling due to the pioneering work of Stephen Raudenbush
More informationWeight of Evidence Module
Formula Guide The purpose of the Weight of Evidence (WoE) module is to provide flexible tools to recode the values in continuous and categorical predictor variables into discrete categories automatically,
More informationLecture 8: Gamma regression
Lecture 8: Gamma regression Claudia Czado TU München c (Claudia Czado, TU Munich) ZFS/IMS Göttingen 2004 0 Overview Models with constant coefficient of variation Gamma regression: estimation and testing
More informationInteraction effects and group comparisons Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised February 20, 2015
Interaction effects and group comparisons Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised February 20, 2015 Note: This handout assumes you understand factor variables,
More informationTest Positive True Positive False Positive. Test Negative False Negative True Negative. Figure 5-1: 2 x 2 Contingency Table
ANALYSIS OF DISCRT VARIABLS / 5 CHAPTR FIV ANALYSIS OF DISCRT VARIABLS Discrete variables are those which can only assume certain fixed values. xamples include outcome variables with results such as live
More informationLecture 6: Poisson regression
Lecture 6: Poisson regression Claudia Czado TU München c (Claudia Czado, TU Munich) ZFS/IMS Göttingen 2004 0 Overview Introduction EDA for Poisson regression Estimation and testing in Poisson regression
More informationStatistics in Retail Finance. Chapter 6: Behavioural models
Statistics in Retail Finance 1 Overview > So far we have focussed mainly on application scorecards. In this chapter we shall look at behavioural models. We shall cover the following topics:- Behavioural
More informationAP: LAB 8: THE CHI-SQUARE TEST. Probability, Random Chance, and Genetics
Ms. Foglia Date AP: LAB 8: THE CHI-SQUARE TEST Probability, Random Chance, and Genetics Why do we study random chance and probability at the beginning of a unit on genetics? Genetics is the study of inheritance,
More informationIntroduction to Quantitative Methods
Introduction to Quantitative Methods October 15, 2009 Contents 1 Definition of Key Terms 2 2 Descriptive Statistics 3 2.1 Frequency Tables......................... 4 2.2 Measures of Central Tendencies.................
More informationTests for Two Proportions
Chapter 200 Tests for Two Proportions Introduction This module computes power and sample size for hypothesis tests of the difference, ratio, or odds ratio of two independent proportions. The test statistics
More informationGuido s Guide to PROC FREQ A Tutorial for Beginners Using the SAS System Joseph J. Guido, University of Rochester Medical Center, Rochester, NY
Guido s Guide to PROC FREQ A Tutorial for Beginners Using the SAS System Joseph J. Guido, University of Rochester Medical Center, Rochester, NY ABSTRACT PROC FREQ is an essential procedure within BASE
More information