Section 12 Part 2. Chi-square test

Similar documents
Chi-square test Fisher s Exact test

Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1)

Chi Squared and Fisher's Exact Tests. Observed vs Expected Distributions

Comparing Multiple Proportions, Test of Independence and Goodness of Fit

Lesson 1: Comparison of Population Means Part c: Comparison of Two- Means

Is it statistically significant? The chi-square test

Use of the Chi-Square Statistic. Marie Diener-West, PhD Johns Hopkins University

Odds ratio, Odds ratio test for independence, chi-squared statistic.

Introduction to Analysis of Variance (ANOVA) Limitations of the t-test

Section 7.1. Introduction to Hypothesis Testing. Schrodinger s cat quantum mechanics thought experiment (1935)

Topic 8. Chi Square Tests

HYPOTHESIS TESTING (ONE SAMPLE) - CHAPTER 7 1. used confidence intervals to answer questions such as...

Testing differences in proportions

Bivariate Statistics Session 2: Measuring Associations Chi-Square Test

Chapter 19 The Chi-Square Test

3.4 Statistical inference for 2 populations based on two samples

Contingency Tables and the Chi Square Statistic. Interpreting Computer Printouts and Constructing Tables

Nonparametric Statistics

Two Correlated Proportions (McNemar Test)

Crosstabulation & Chi Square

CONTINGENCY TABLES ARE NOT ALL THE SAME David C. Howell University of Vermont

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

THE FIRST SET OF EXAMPLES USE SUMMARY DATA... EXAMPLE 7.2, PAGE 227 DESCRIBES A PROBLEM AND A HYPOTHESIS TEST IS PERFORMED IN EXAMPLE 7.

Chapter 7 Notes - Inference for Single Samples. You know already for a large sample, you can invoke the CLT so:

Chapter 8 Hypothesis Testing Chapter 8 Hypothesis Testing 8-1 Overview 8-2 Basics of Hypothesis Testing

Biostatistics: Types of Data Analysis

Study Guide for the Final Exam

Section 13, Part 1 ANOVA. Analysis Of Variance

The Chi-Square Test. STAT E-50 Introduction to Statistics

HYPOTHESIS TESTING (ONE SAMPLE) - CHAPTER 7 1. used confidence intervals to answer questions such as...

Chi Square Tests. Chapter Introduction

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Recall this chart that showed how most of our course would be organized:

C. The null hypothesis is not rejected when the alternative hypothesis is true. A. population parameters.

Recommend Continued CPS Monitoring. 63 (a) 17 (b) 10 (c) (d) 20 (e) 25 (f) 80. Totals/Marginal

Having a coin come up heads or tails is a variable on a nominal scale. Heads is a different category from tails.

Using Stata for Categorical Data Analysis

CHAPTER IV FINDINGS AND CONCURRENT DISCUSSIONS

Tests for Two Proportions

Fairfield Public Schools

UNDERSTANDING THE TWO-WAY ANOVA

Categorical Data Analysis

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

Simulating Chi-Square Test Using Excel

November 08, S8.6_3 Testing a Claim About a Standard Deviation or Variance

Association Between Variables

In the general population of 0 to 4-year-olds, the annual incidence of asthma is 1.4%

Calculating P-Values. Parkland College. Isela Guerra Parkland College. Recommended Citation

Additional sources Compilation of sources:

People like to clump things into categories. Virtually every research

Test Positive True Positive False Positive. Test Negative False Negative True Negative. Figure 5-1: 2 x 2 Contingency Table

12.5: CHI-SQUARE GOODNESS OF FIT TESTS

Hypothesis Testing --- One Mean

Experimental Design. Power and Sample Size Determination. Proportions. Proportions. Confidence Interval for p. The Binomial Test

Research Methods & Experimental Design

Stats Review Chapters 9-10

Survey, Statistics and Psychometrics Core Research Facility University of Nebraska-Lincoln. Log-Rank Test for More Than Two Groups

Chapter 5 Analysis of variance SPSS Analysis of variance

Chapter 23. Two Categorical Variables: The Chi-Square Test

CHAPTER 11 CHI-SQUARE AND F DISTRIBUTIONS

Introduction to Hypothesis Testing. Hypothesis Testing. Step 1: State the Hypotheses

Final Exam Practice Problem Answers

Statistics I for QBIC. Contents and Objectives. Chapters 1 7. Revised: August 2013

statistics Chi-square tests and nonparametric Summary sheet from last time: Hypothesis testing Summary sheet from last time: Confidence intervals

11. Analysis of Case-control Studies Logistic Regression

Examining Differences (Comparing Groups) using SPSS Inferential statistics (Part I) Dwayne Devonish

Introduction to Quantitative Methods

1 Nonparametric Statistics

SCHOOL OF HEALTH AND HUMAN SCIENCES DON T FORGET TO RECODE YOUR MISSING VALUES

Nonparametric Two-Sample Tests. Nonparametric Tests. Sign Test

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

This chapter discusses some of the basic concepts in inferential statistics.

Analysis of categorical data: Course quiz instructions for SPSS

II. DISTRIBUTIONS distribution normal distribution. standard scores

Statistical tests for SPSS

MBA 611 STATISTICS AND QUANTITATIVE METHODS

Introduction. Hypothesis Testing. Hypothesis Testing. Significance Testing

Statistical Impact of Slip Simulator Training at Los Alamos National Laboratory

Beginning Tutorials. PROC FREQ: It s More Than Counts Richard Severino, The Queen s Medical Center, Honolulu, HI OVERVIEW.

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

COMPARISONS OF CUSTOMER LOYALTY: PUBLIC & PRIVATE INSURANCE COMPANIES.

Case-control studies. Alfredo Morabia

Introduction to Hypothesis Testing

Independent t- Test (Comparing Two Means)

CHAPTER 15 NOMINAL MEASURES OF CORRELATION: PHI, THE CONTINGENCY COEFFICIENT, AND CRAMER'S V

NONPARAMETRIC STATISTICS 1. depend on assumptions about the underlying distribution of the data (or on the Central Limit Theorem)

Lecture 8. Confidence intervals and the central limit theorem

Non-Parametric Tests (I)

Statistics. One-two sided test, Parametric and non-parametric test statistics: one group, two groups, and more than two groups samples

Stat 411/511 THE RANDOMIZATION TEST. Charlotte Wickham. stat511.cwick.co.nz. Oct

3. Analysis of Qualitative Data

5/31/2013. Chapter 8 Hypothesis Testing. Hypothesis Testing. Hypothesis Testing. Outline. Objectives. Objectives

Lecture 25. December 19, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University.

Name: (b) Find the minimum sample size you should use in order for your estimate to be within 0.03 of p when the confidence level is 95%.

Statistics for Sports Medicine

Two Related Samples t Test

Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools

Parametric and non-parametric statistical methods for the life sciences - Session I

Transcription:

Section 12 Part 2 Chi-square test McNemar s Test

Section 12 Part 2 Overview Section 12, Part 1 covered two inference methods for categorical data from 2 groups Confidence Intervals for the difference of two proportions Two sample z-test of equality of two proportions Section 12 Part 2 covers inference methods for categorical data Chi-square test for comparisons between 2 categorical variables (Fisher s exact test) McNemar s Chi-square (Binomial Test) test for paired categorical data PubH 6414 Section 12 Part 2 2

Chi-square test The Chi-square test can also be used to test for independence between two variables The null hypothesis for this test is that the variables are independent (i.e. that there is no statistical association). The alternative hypothesis is that there is a statistical relationship or association between the two variables. The Chi-square test can be used to test for equality of proportions between two or more groups. The null hypothesis for this test is that the 2 proportions are equal. The alternative hypothesis is that the proportions are not equal (test for a difference in either direction) PubH 6414 Section 12 Part 2 3

Contingency Tables Setting: Let X 1 and X 2 denote categorical variables, X 1 having I levels and X 2 having J levels. There are IJ possible combinations of classifications. Level=1 Level=2 Level=1 Level=2. Level=J Level=I When the cells contain frequencies of outcomes, the table is called a contingency table. PubH 6414 Section 12 Part 2 4

Chi-square Test: Testing for Independence Step 1: Hypothesis (always two-sided): H o : Independent H A : Not independent Step 2: Calculate the test statistic: 2 2 ( xij eij) 2 Χ = ~ χ with df = ( I 1)( J e ij 1) Step 3: Calculate the p-value p-value = P(Χ 2 > X 2 ) <- value 2-sided Step 4: Draw a conclusion p-value<α reject independence p-value>α do not reject independence PubH 6414 Section 12 Part 2 5

Racial Differences and Cardiac Arrest In a large mid-western city, the association in the incidence of cardiac arrest and subsequent survival was studied in 6117 cases of non-traumatic, out of hospital cardiac arrest. During a 12 month period, fewer than 1% of African- Americans survived an arrest-to-hospital discharge, compared to 2.6% of Caucasians. PubH 6414 Section 12 Part 2 6

Racial Differences and Cardiac Arrest Survival to Discharge Race YES NO Total Caucasian 84 3123 3207 African- American 24 2886 2910 Total 108 6009 6117 PubH 6414 Section 12 Part 2 7

Racial Differences and Cardiac Arrest Scientific Hypothesis: An association exists between race (African- American/Caucasian) and survival to hospital discharge (Yes/No) in cases of non-traumatic out-of-hospital cardiac arrest. Statistical Hypothesis: H o : Race and survival to hospital discharge are independent in cases of non-traumatic out-of-hospital cardiac arrest. H A : Race and survival to hospital discharge are not independent in cases of non-traumatic out-of-hospital cardiac arrest. PubH 6414 Section 12 Part 2 8

Chi-square Test: Testing for Independence 1. Obtain a random sample of n independent observations (the selection of one observation does not influence the selection of any other). 2. Observations are classified subsequently according to cells formed by the intersection rows and columns in a contingency table. Rows (r) consist of mutually exclusive categories of one variable. Columns (c) consist of mutually exclusive categories of the other variable. 3. The frequency of observations in each cell is determined along with marginal totals. PubH 6414 Section 12 Part 2 9

Chi-square Test: Testing for Independence 4. Expected frequencies are calculated under the null hypothesis of independence (no association) and compared to observed frequencies. Recall: A and B are independent if: P(A and B) = P(A) * P(B) 5. Use the Chi-square (X 2 ) test statistic to observe the difference between the observed and expected frequencies. PubH 6414 Section 12 Part 2 10

χ 2 Distribution The probabilities associated with the chi-square distribution are in appendix D. The table is set up in the same way as the t-distribution. The chi-square distribution with 1 df is the same as the square of the Z distribution. Since the distribution only takes on positive values all the probability is in the right-tail. PubH 6414 Section 12 Part 2 11

Chi-square distributions and critical values for 1 df, 4 df and 20 df For Chi-square with 20 df, the critical value (α = 0.05) = 31.4 0 4 8 12 16 20 Critical value for α = 0.05 and Chi-square with 1 df is 3.84 Critical value for α = 0.05 and Chi-square with 4 df is 9.49 Since the Chi-square distribution is always positive, the rejection region is only in the right tail PubH 6414 Section 12 Part 2 12

How to Identify the critical value The rejection region of the Chi-square test is the upper tail so there is only one critical value First calculate the df to identify the correct Chisquare distribution For a 2 X 2 table, there are (2-1)*(2-1) = 1 df R commander: > qchisq(0.95,1) [1] 3.841459 PubH 6414 Section 12 Part 2 13

State the conclusion The p-value for P( χ 2 > X 2 ) > 1-pchisq(X 2,1) Reject the null hypothesis by either the rejection region method or the p-value method X 2 > Critical Value or Pvalue < α PubH 6414 Section 12 Part 2 14

Racial Differences and Cardiac Arrest Survival to Discharge Race YES NO Total Caucasian 84 3123 3207 56.62 African- 24 2886 2910 American Total 108 6009 6117 Under the assumption of independence: P(YES and Caucasian) = P(YES)*P(Caucasian) = 108/6117 * 3207/6117 = 0.009256 Expected cell count =e ij = 0.009256 * 6117 = 56.62 PubH 6414 Section 12 Part 2 15

Racial Differences and Cardiac Arrest Survival to Discharge Race YES NO Total Caucasian 84 3123 3207 56.62 3151.43 African- 24 2886 2910 American 51.38 2854.82 Total 108 6009 6117 Expected Cell Counts = (Marginal Row total * Marginal Column Total)/ n Rule of Thumb: Check to see if expected frequencies are > 2 No more than 20% of cells with expected frequencies < 5 PubH 6414 Section 12 Part 2 16

Racial Differences and Cardiac Arrest Step 1: Hypothesis (always two-sided): H o : Independent (Race/Survival) H A : Not independent Step 2: Calculate the test statistic: ( x e ij 2 Χ = eij ij ) 2 = 2 2 2 ( 84 56.62) ( 3123 3151.43) ( 24 51.38) ( 2886 2854.82) 56.62 + 3151.43 + 51.38 + 2854.82 2 = 13.24 + 0.26 + 14.59 +.34 = 28.42 PubH 6414 Section 12 Part 2 17

Racial Differences and Cardiac Arrest Step 3: Calculate the p-value p-value = P(Χ 2 >28.42) < 0.00000001 > 1-pchisq(28.42,1) [1] 9.765127e-08 Step 4: Draw a conclusion A significant association exists between race and survival to hospital discharge in cases of non-traumatic out-of-hospital cardiac arrest. PubH 6414 Section 12 Part 2 18

Chi-square Test: Testing for Equality or Homogeneity of Proportions Testing for equality or homogeneity of proportions examines differences between two or more independent proportions. In chi-square test for independence, we examine the crossclassification of a single sample of observations on two qualitative variables. The chi-square test can also be used for problems involving two or more independent populations. PubH 6414 Section 12 Part 2 19

Chi-square Test: Testing for Equality or Homogeneity of Proportions 30 day outcome Patients with evolving myocardial infarction were assigned independently and randomly to one of four thrombolytic treatments, and then followed to determine 30 day mortality. Streptokinase and SC Heparin Streptokinase and IV Heparin Accelerated t-pa and IV Heparin Accelerated t-pa Streptokinase with IV Heparin Total Survived 9091 9609 9692 9605 37997 Died 705 768 652 723 2848 Total 9796 10377 10344 10328 40845 Are these four treatment populations equal with respect to 30- day mortality? PubH 6414 Section 12 Part 2 20

Chi-square Test: Testing for Equality or Homogeneity of Proportions 30 day outcome Streptokinase and SC Heparin Survived 9091 9112.95 Streptokinase and IV Heparin Example Accelerated t-pa and IV Heparin Accelerated t-pa Streptokinase with IV Heparin Total 9609 9692 9605 37997 Died 705 768 652 723 2848 Total 9796 10377 10344 10328 40845 Under the assumption of independence: P(Streptokinase and SC Heparin and Survival) = P(Streptokinase and SC Heparin )*P(Survival) = 9796/40845 * 37997/40845 =0.223 Expected cell count =e ij = 0.223 * 40845 = 9112.95 PubH 6414 Section 12 Part 2 21

Chi-square Test: Testing for Equality or Homogeneity of Proportions Example 30 day outcome Streptokinase and SC Heparin Streptokinase and IV Heparin Accelerated t-pa and IV Heparin Accelerated t-pa Streptokinase with IV Heparin Total Survived 9091 9609 9692 9605 37997 9112.95 9653.44 9622.74 9607.86 Died 705 768 652 723 2848 683.05 723.56 721.26 720.14 Total 9796 10377 10344 10328 40845 Under the assumption of independence: Expected Cell Counts = (Marginal Row total * Marginal Column Total)/ n PubH 6414 Section 12 Part 2 22

Chi-square Test: Testing for Equality or Homogeneity of Proportions Step 1: Hypothesis (always two-sided): H o : The four treatment options are homogeneous with respect to 30 day survival. H A : The four treatment options are not homogeneous with respect to 30 day survival. Step 2: Calculate the test statistic: 2 2 ( xij eij) 2 Χ = ~ χ with df = ( I 1)( J e ij Step 3: Calculate the p-value = P(Χ 2 > X 2 ) Step 4: Draw a conclusion p-value<α reject independence p-value>α do not reject independence 1) PubH 6414 Section 12 Part 2 23

Chi-square Test: Testing for Equality or Homogeneity of Proportions Step 1: Hypothesis (always two-sided): H o : The four treatment options are homogeneous with respect to 30 day survival. H A : The four treatment options are not homogeneous with respect to 30 day survival. Step 2: Calculate the test statistic: Χ 2 = ( x ij e ij e ij ) 2 = 10.85 with df = (2 1)(4 1) = 3 PubH 6414 Section 12 Part 2 24

Chi-square Test: Testing for Equality or Homogeneity of Proportions Step 3: Calculate the p-value p-value = P(Χ 2 >10.85) = 0.0126 > 1-pchisq(10.85,3) [1] 0.01256526 Step 4: Draw a conclusion p-value<α=0.05 reject null The four treatment groups are not equal with respect to 30 day mortality. The largest relative departure from expected was noted in patients receiving accelerated t-pa and IV heparin, with fewer patients than expected dying. PubH 6414 Section 12 Part 2 25

Chi-square online calculator This website will calculate the Chi-square statistic and p-value for data in a 2 X 2 table. Enter the cell counts in the table. Choose the Chi-square test without Yate s correction to obtain the same results as in the example www.graphpad.com/quickcalcs/contingency1.cfm PubH 6414 Section 12 Part 2 26

Chi-Square Testing:Rules of Thumb All expected frequencies should be equal to or greater than 2 (observed frequencies can be less than 2). No more than 20% of the cells should have expected frequencies of less than 5. What if these rules of thumb are violated? PubH 6414 Section 12 Part 2 27

Small Expected Frequencies Chi-square test is an approximate method. The chi-square distribution is an idealized mathematical model. In reality, the statistics used in the chi-square test are qualitative (have discrete values and not continuous). For 2 X 2 tables, use Fisher s Exact Test (i.e. P(x=k) ~ B(n,p)) if your expected frequencies are less than 2. (Section 6.6) PubH 6414 Section 12 Part 2 28

Tests for Categorical Data To compare proportions between two groups or to test for independence between two categorical variables, use the Chi-square test If more than 20% of the expected cell frequencies < 5, use the Fisher s exact test When categorical data are paired, the McNemar test is the appropriate test. PubH 6414 Section 12 Part 2 29

Comparing Proportions with Paired data When data are paired and the outcome of interest is a proportion, the McNemar Test is used to evaluate hypotheses about the data. Developed by Quinn McNemar in 1947 Sometimes called the McNemar Chi-square test because the test statistic has a Chi-square distribution PubH 6414 Section 12 Part 2 30

Examples of Paired Data for Proportions Pair-Matched data can come from Case-control studies where each case has a matching control (matched on age, gender, race, etc.) Twins studies the matched pairs are twins. Before - After data the outcome is presence (+) or absence (-) of some characteristic measured on the same individual at two time points. PubH 6414 Section 12 Part 2 31

Summarizing the Data Like the Chi-square test, data need to be arranged in a contingency table before calculating the McNemar statistic The table will always be 2 X 2 but the cell frequencies are numbers of pairs not numbers of individuals Examples for setting up the tables are in the following slides for Case Control paired data Twins paired data: one exposed and one unexposed Before After paired data PubH 6414 Section 12 Part 2 32

Pair-Matched Data for Case-Control Study: outcome is exposure to some risk factor Control Exposed Unexposed Case Exposed a b Unexposed c d The counts in the table for a case-control study are numbers of pairs not numbers of individuals. PubH 6414 Section 12 Part 2 33

Paired Data for Before-After Counts Before treatment After treatment + - + a b - c d The counts in the table for a before-after study are numbers of pairs and number of individuals. PubH 6414 Section 12 Part 2 34

Null hypotheses for Paired Data The null hypothesis for case-control pair matched data is that the proportion of subjects exposed to the risk factor is equal for cases and controls. The null hypothesis for twin paired data is that the proportions with the event are equal for exposed and unexposed twins The null hypothesis for before-after data is that the proportion of subjects with the characteristic (or event) is the same before and after treatment. PubH 6414 Section 12 Part 2 35

McNemar s test For any of the paired data Null Hypotheses the following are true if the null hypothesis is true: Ho: b = c Ho: b/(b+c) =0.5 Cells b and c are called the discordant cells because they represent pairs with a difference Cells a and d are the concordant cells. These cells do not contribute any information about a difference between pairs or over time so they aren t used to calculate the test statistic. PubH 6414 Section 12 Part 2 36

McNemar Statistic The McNemar s Chi-square statistic is calculated using the counts in the b and c cells of the table: 2 χ = ( b c) b + c Rule of thumb: b + c 20 If the null hypothesis is true the McNemar Chi-square statistic = 0. 2 PubH 6414 Section 12 Part 2 37

McNemar statistic distribution The sampling distribution of the McNemar statistic is a Chi-square distribution. For a test with alpha = 0.05, the critical value for the McNemar statistic = 3.84. The null hypothesis is not rejected if the McNemar statistic < 3.84. The null hypothesis is rejected if the McNemar statistic > 3.84. PubH 6414 Section 12 Part 2 38

P-value for McNemar statistic You can find the p-value for the McNemar statistic using R 1-pchisq(X 2,1) If the test statistic is > 3.84, the p-value will be < 0.05 and the null hypothesis of equal proportions between pairs or over time will be rejected. PubH 6414 Section 12 Part 2 39

McNemar test Example Breast cancer patients receiving mastectomy followed by chemotherapy were matched to each other on age and cancer stage. By random assignment, one patient in each matched pair received chemo perioperatively and for an additional 6 months, while the other patient in each matched pair received chemo perioperatively only. PubH 6414 Section 12 Part 2 40

Chemo Study Periop. only Periop. + 6 Months Survived 5 years Died within 5 years Survived 5 years Died within 5 years 510 17 5 90 PubH 6414 Section 12 Part 2 41

McNemar test hypotheses Scientific Question: Does survival to 5 years differ by treatment group? H O : b = c H A : b not equal c PubH 6414 Section 12 Part 2 42

McNemar test Check: b + c 20 Critical value for Chi-square distribution with 1 df = 3.84 Calculate the test statistic 2 χ = ( b c) b + c P-value = P(χ 2 > 6.54) = 0.01 2 = (17 5) 17 + 5 2 = 6.54 > 1-pchisq(6.54,1) [1] 0.01054753 PubH 6414 Section 12 Part 2 43

Decision and Conclusion Decision: Reject Ho By the rejection region method: 6.54 > 3.84 By the p-value method: 0.01 < 0.05 Conclusion: The data provide evidence that an extra 6 months of chemotherapy results in a different survival rate compared to treatment with perioperative chemo alone. (p = 0.01). PubH 6414 Section 12 Part 2 44

McNemar test online calculator This website will calculate the McMemar test statistic and p-value http://www.graphpad.com/quickcalcs/mcnemar1.cfm PubH 6414 Section 12 Part 2 45