Statistics Chapter 2

Similar documents
Generalized Linear Models

Régression logistique : introduction

Multivariate Logistic Regression

A Handbook of Statistical Analyses Using R. Brian S. Everitt and Torsten Hothorn

Logistic Regression (a type of Generalized Linear Model)

Lab 13: Logistic Regression

MSwM examples. Jose A. Sanchez-Espigares, Alberto Lopez-Moreno Dept. of Statistics and Operations Research UPC-BarcelonaTech.

Logistic regression (with R)

Simple example of collinearity in logistic regression

STATISTICA Formula Guide: Logistic Regression. Table of Contents

We extended the additive model in two variables to the interaction model by adding a third term to the equation.

Lecture 8: Gamma regression

Adatelemzés II. [SST35]

Multiple Linear Regression

Lecture 6: Poisson regression

Session 85 IF, Predictive Analytics for Actuaries: Free Tools for Life and Health Care Analytics--R and Python: A New Paradigm!

Basic Statistical and Modeling Procedures Using SAS

Electronic Thesis and Dissertations UCLA

Statistical Models in R

N-Way Analysis of Variance

Choosing number of stages of multistage model for cancer modeling: SOP for contractor and IRIS analysts

Using An Ordered Logistic Regression Model with SAS Vartanian: SW 541

Goodness of Fit Tests for Categorical Data: Comparing Stata, R and SAS

1 Logistic Regression

Applied Statistics. J. Blanchet and J. Wadsworth. Institute of Mathematics, Analysis, and Applications EPF Lausanne

A survey analysis example

Statistical Models in R

VI. Introduction to Logistic Regression

SAS Software to Fit the Generalized Linear Model

Chapter 7: Simple linear regression Learning Objectives

L3: Statistical Modeling with Hadoop

Lecture 14: GLM Estimation and Logistic Regression

HLM software has been one of the leading statistical packages for hierarchical

ANOVA. February 12, 2015

E(y i ) = x T i β. yield of the refined product as a percentage of crude specific gravity vapour pressure ASTM 10% point ASTM end point in degrees F

Statistics, Data Analysis & Econometrics

Outline. Dispersion Bush lupine survival Quasi-Binomial family

FISHER CAM-H300C-3F CAM-H650C-3F CAM-H1300C-3F

Regression 3: Logistic Regression

Lecture 18: Logistic Regression Continued

Random effects and nested models with SAS

EDUCATION AND VOCABULARY MULTIPLE REGRESSION IN ACTION

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

Comparing Nested Models

Psychology 205: Research Methods in Psychology

Linda K. Muthén Bengt Muthén. Copyright 2008 Muthén & Muthén Table Of Contents

An Estimation of the Cost and Welfare of the new Colombian Healthcare Plan

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Lecture 11: Confidence intervals and model comparison for linear regression; analysis of variance

Local classification and local likelihoods

Module 5: Introduction to Multilevel Modelling SPSS Practicals Chris Charlton 1 Centre for Multilevel Modelling

Time Series Analysis

Developing Risk Adjustment Techniques Using the System for Assessing Health Care Quality in the

11. Analysis of Case-control Studies Logistic Regression

Final Exam Practice Problem Answers

Family economics data: total family income, expenditures, debt status for 50 families in two cohorts (A and B), annual records from

Ordinal Regression. Chapter

Analyses on Hurricane Archival Data June 17, 2014

Logistic Regression.

Correlation and Simple Linear Regression

Characteristics of Global Calling in VoIP services: A logistic regression analysis

n + n log(2π) + n log(rss/n)

Lecture 19: Conditional Logistic Regression

Specifications for this HLM2 run

Examples of Using R for Modeling Ordinal Data

SPSS Guide: Regression Analysis

An Introduction to Categorical Data Analysis Using R

Statistics I for QBIC. Contents and Objectives. Chapters 1 7. Revised: August 2013

GENERALIZED LINEAR MODELS IN VEHICLE INSURANCE

Logit Models for Binary Data

Using R for Linear Regression

Exchange Rate Regime Analysis for the Chinese Yuan

5. Linear Regression

This can dilute the significance of a departure from the null hypothesis. We can focus the test on departures of a particular form.

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries

Use of deviance statistics for comparing models

Examining a Fitted Logistic Model

Categorical Data Analysis

ANALYSING LIKERT SCALE/TYPE DATA, ORDINAL LOGISTIC REGRESSION EXAMPLE IN R.

Logistic regression modeling the probability of success

LAGUARDIA COMMUNITY COLLEGE CITY UNIVERSITY OF NEW YORK DEPARTMENT OF MATHEMATICS, ENGINEERING, AND COMPUTER SCIENCE

MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group

Part 2: Analysis of Relationship Between Two Variables

Appendix 1: Estimation of the two-variable saturated model in SPSS, Stata and R using the Netherlands 1973 example data

DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9

Failure to take the sampling scheme into account can lead to inaccurate point estimates and/or flawed estimates of the standard errors.

Chapter 29 The GENMOD Procedure. Chapter Table of Contents

CHAPTER 8 EXAMPLES: MIXTURE MODELING WITH LONGITUDINAL DATA

MIXED MODEL ANALYSIS USING R

Using Stata for Categorical Data Analysis

Violent crime total. Problem Set 1

Qualitative Choice Analysis Workshop 76 LECTURE / DISCUSSION. Hypothesis Testing

Non-life insurance mathematics. Nils F. Haavardsson, University of Oslo and DNB Skadeforsikring

Combining Data from Different Genotyping Platforms. Gonçalo Abecasis Center for Statistical Genetics University of Michigan

Statistics 305: Introduction to Biostatistical Methods for Health Sciences

Section 6: Model Selection, Logistic Regression and more...

BIOL 933 Lab 6 Fall Data Transformation

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

13. Poisson Regression Analysis

Transcription:

Statistics 9055 Chapter 2

Example: Children and Malaria A random sample of 100 children aged 3 15 years was taken from a village in Ghana. The children were followed for a period of eight months. At the beginning of the study, values of a particular antibody were assessed. Based on observations during the study period, the children were categorized into two groups: individuals with and without symptoms of malaria

Variables in the Dataset subject subject code age ab mal age in years antibody level 1 if the subject has malaria, 0 if not Note: the response variable mal is Bernoulli

Reading the Data into R > library(iswr) > data(malaria) > attach(malaria) > head(malaria) subject age ab mal 1 1 15 546 0 2 2 14 268 0 3 3 12 284 0 4 4 15 38 0 5 5 14 827 0 6 6 12 252 0

Treat age as a Factor > malglm_full<glm(mal~factor(age)+ab,family=binomial) > summary(malglm_full)

Output Call: glm(formula = mal ~ factor(age) + ab, family = binomial) Deviance Residuals: Min 1Q Median 3Q Max -1.3984-0.8654-0.4969 0.9825 2.9660 Coefficients: Estimate Std. Error z value Pr(> z ) (Intercept) -3.725e-01 7.346e-01-0.507 0.6121 factor(age)4 5.269e-01 1.013e+00 0.520 0.6029 factor(age)5 9.354e-01 1.063e+00 0.880 0.3788 factor(age)6-1.746e+01 2.557e+03-0.007 0.9946 factor(age)7-3.462e-01 1.109e+00-0.312 0.7549 factor(age)8-2.571e-01 1.119e+00-0.230 0.8184 factor(age)9 3.042e-01 9.845e-01 0.309 0.7574 factor(age)10-1.938e-01 1.126e+00-0.172 0.8633 factor(age)11 6.152e-02 1.155e+00 0.053 0.9575 factor(age)12-4.302e-01 1.367e+00-0.315 0.7530 factor(age)13-1.732e+01 2.276e+03-0.008 0.9939 factor(age)14-6.800e-01 1.349e+00-0.504 0.6141 factor(age)15-1.132e-01 1.131e+00-0.100 0.9203 ab -2.369e-03 1.222e-03-1.940 0.0524. --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 116.652 on 99 degrees of freedom Residual deviance: 96.904 on 86 degrees of freedom AIC: 124.9 Number of Fisher Scoring iterations: 17

Run the Analysis without age > malglm_ab<-glm(mal~ab,family=binomial) > summary(malglm_ab) Call: glm(formula = mal ~ ab, family = binomial) Deviance Residuals: Min 1Q Median 3Q Max -0.9960-0.8893-0.6472 1.3766 2.8993 Coefficients: Estimate Std. Error z value Pr(> z ) (Intercept) -0.437616 0.292493-1.496 0.1346 ab -0.002665 0.001214-2.196 0.0281 * --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 116.65 on 99 degrees of freedom Residual deviance: 107.28 on 98 degrees of freedom AIC: 111.28 Number of Fisher Scoring iterations: 6

Likelihood Ratio Test for age Recall Residual deviance: 96.904 on 86 degrees of freedom Residual deviance: 107.28 on 98 degrees of freedom Likelihood ratio test calculation > as.numeric(-2*loglik(malglm_full)) [1] 96.90391 > as.numeric(-2*loglik(malglm_ab)) [1] 107.2765 > lrt<-as.numeric(-2*(loglik(malglm_ab)-loglik(malglm_full))) > lrt [1] 10.37261 > pchisq(lrt,12,lower=false) [1] 0.5833076

Example: Animal Testing

Data File dead alive dose spleen 0 5 3 0 1 4 4 0 0 5 5 0 0 5 6 0 4 2 7 0 5 1 8 0 0 5 3 0.25 0 5 4 0.25 2 3 5 0.25 4 2 6 0.25 5 1 7 0.25 5 0 8 0.25 0 5 3 0.5 1 4 4 0.5 5 1 5 0.5 6 0 6 0.5 4 1 7 0.5 5 0 8 0.5 0 6 3 0.75 2 4 4 0.75 5 0 5 0.75 5 0 6 0.75 5 0 7 0.75 5 0 8 0.75 4 2 3 1 5 1 4 1 4 1 5 1 5 0 6 1 5 0 7 1 5 0 8 1

0 0.25 0.5 0.75 1 3 4 5 6 7 8 Initial Manipulations > animals<read.table("animaltesting.tx t",header=t) > attach(animals) > head(animals) dead alive dose spleen 1 0 5 3 0 2 1 4 4 0 3 0 5 5 0 4 0 5 6 0 5 4 2 7 0 6 5 1 8 0 > y<-cbind(dead,alive) > p<-dead/(dead+alive) > stripchart(p~dose) > stripchart(p~spleen) 0.0 0.2 0.4 0.6 0.8 1.0 p 0.0 0.2 0.4 0.6 0.8 1.0 p

Analysis I > glm_animal1<-glm(y~dose+spleen,family=binomial(link=logit)) > summary(glm_animal1) Call: glm(formula = y ~ dose + spleen, family = binomial(link = logit)) Deviance Residuals: Min 1Q Median 3Q Max -1.73921-0.66524 0.09684 0.49472 1.86060 Coefficients: Estimate Std. Error z value Pr(> z ) (Intercept) -10.3840 1.7507-5.931 3.00e-09 *** dose 1.5572 0.2588 6.018 1.77e-09 *** spleen 5.7412 1.0935 5.251 1.52e-07 *** --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 135.601 on 29 degrees of freedom Residual deviance: 24.057 on 27 degrees of freedom AIC: 55.504 Number of Fisher Scoring iterations: 6

Analysis II > glm_animal2<-glm(y~factor(dose)+factor(spleen),family=binomial(link=logit)) > summary(glm_animal2) Call: glm(formula = y ~ factor(dose) + factor(spleen), family = binomial(link = logit)) Deviance Residuals: Min 1Q Median 3Q Max -1.77358-0.33470 0.09222 0.49415 2.02356 Coefficients: Estimate Std. Error z value Pr(> z ) (Intercept) -6.2057 1.1985-5.178 2.25e-07 *** factor(dose)4 1.7119 0.9175 1.866 0.062056. factor(dose)5 3.9596 1.0510 3.768 0.000165 *** factor(dose)6 5.0161 1.1258 4.455 8.37e-06 *** factor(dose)7 6.2780 1.2375 5.073 3.92e-07 *** factor(dose)8 8.0508 1.5458 5.208 1.91e-07 *** factor(spleen)0.25 1.7227 0.7978 2.159 0.030817 * factor(spleen)0.5 3.2909 0.9255 3.556 0.000377 *** factor(spleen)0.75 4.0851 1.0191 4.008 6.11e-05 *** factor(spleen)1 6.2286 1.2068 5.161 2.46e-07 *** --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 135.601 on 29 degrees of freedom Residual deviance: 21.347 on 20 degrees of freedom AIC: 66.794 Number of Fisher Scoring iterations: 6

Significance Tests: Analysis II > glm_animal2_spleen<-glm(y~factor(spleen),family=binomial) > glm_animal2_dose<-glm(y~factor(dose),family=binomial) > glm_animal2$deviance [1] 21.34725 > glm_animal2_dose$deviance [1] 74.777 > glm_animal2_spleen$deviance [1] 110.2314 > devfull<-glm_animal2$deviance > devdose<-glm_animal2_dose$deviance > devspleen<-glm_animal2_spleen$deviance Testing for the significance of the dosages > pchisq(devspleen-devfull,5,lower=false) [1] 1.152612e-17 Testing for the significance of the amount of spleen that is removed > pchisq(devdose-devfull,4,lower=false) [1] 6.927712e-11