Chi Square for Contingency Tables

Similar documents
Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1)

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

Chi-square test Fisher s Exact test

Odds ratio, Odds ratio test for independence, chi-squared statistic.

CHAPTER 14 ORDINAL MEASURES OF CORRELATION: SPEARMAN'S RHO AND GAMMA

statistics Chi-square tests and nonparametric Summary sheet from last time: Hypothesis testing Summary sheet from last time: Confidence intervals

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

Topic 8. Chi Square Tests

Having a coin come up heads or tails is a variable on a nominal scale. Heads is a different category from tails.

Chi Squared and Fisher's Exact Tests. Observed vs Expected Distributions

Math 58. Rumbos Fall Solutions to Review Problems for Exam 2

Math 108 Exam 3 Solutions Spring 00

Bivariate Statistics Session 2: Measuring Associations Chi-Square Test

Calculating P-Values. Parkland College. Isela Guerra Parkland College. Recommended Citation

Section 12 Part 2. Chi-square test

Final Exam Practice Problem Answers

Crosstabulation & Chi Square

CONTINGENCY TABLES ARE NOT ALL THE SAME David C. Howell University of Vermont

Recall this chart that showed how most of our course would be organized:

Recommend Continued CPS Monitoring. 63 (a) 17 (b) 10 (c) (d) 20 (e) 25 (f) 80. Totals/Marginal

12: Analysis of Variance. Introduction

Comparing Multiple Proportions, Test of Independence and Goodness of Fit

Testing Research and Statistical Hypotheses

MATH 140 Lab 4: Probability and the Standard Normal Distribution

How Does My TI-84 Do That

CHAPTER 11 CHI-SQUARE AND F DISTRIBUTIONS

3.4 Statistical inference for 2 populations based on two samples

Elementary Statistics Sample Exam #3

2 Sample t-test (unequal sample sizes and unequal variances)

One-Way Analysis of Variance

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Solutions to Homework 10 Statistics 302 Professor Larget

November 08, S8.6_3 Testing a Claim About a Standard Deviation or Variance

Data Analysis Tools. Tools for Summarizing Data

INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA)

Two-sample t-tests. - Independent samples - Pooled standard devation - The equal variance assumption

Using Excel for inferential statistics

Chi Square Distribution

Chapter 23. Two Categorical Variables: The Chi-Square Test

Testing differences in proportions

Association Between Variables

CHI-SQUARE: TESTING FOR GOODNESS OF FIT

Elementary Statistics

Test Positive True Positive False Positive. Test Negative False Negative True Negative. Figure 5-1: 2 x 2 Contingency Table

Is it statistically significant? The chi-square test

Basic Probability Theory II

Contingency Tables and the Chi Square Statistic. Interpreting Computer Printouts and Constructing Tables

WHERE DOES THE 10% CONDITION COME FROM?

Mind on Statistics. Chapter 15

Chapter 19 The Chi-Square Test

Study Guide for the Final Exam

AP: LAB 8: THE CHI-SQUARE TEST. Probability, Random Chance, and Genetics

Introduction to Analysis of Variance (ANOVA) Limitations of the t-test

12.5: CHI-SQUARE GOODNESS OF FIT TESTS

Chapter 8 Hypothesis Testing Chapter 8 Hypothesis Testing 8-1 Overview 8-2 Basics of Hypothesis Testing

Confidence Intervals for Cp

VI. Introduction to Logistic Regression

In the past, the increase in the price of gasoline could be attributed to major national or global

CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression

Statistics. One-two sided test, Parametric and non-parametric test statistics: one group, two groups, and more than two groups samples

Chapter 3 RANDOM VARIATE GENERATION

One-Way Analysis of Variance (ANOVA) Example Problem

Stats Review Chapters 9-10

LAB : THE CHI-SQUARE TEST. Probability, Random Chance, and Genetics

Appendix 2 Statistical Hypothesis Testing 1

Variables Control Charts

HYPOTHESIS TESTING (ONE SAMPLE) - CHAPTER 7 1. used confidence intervals to answer questions such as...

1 Nonparametric Statistics

TABLE OF CONTENTS. About Chi Squares What is a CHI SQUARE? Chi Squares Hypothesis Testing with Chi Squares... 2

THE FIRST SET OF EXAMPLES USE SUMMARY DATA... EXAMPLE 7.2, PAGE 227 DESCRIBES A PROBLEM AND A HYPOTHESIS TEST IS PERFORMED IN EXAMPLE 7.

How To Check For Differences In The One Way Anova

Statistical Impact of Slip Simulator Training at Los Alamos National Laboratory

Pearson s Correlation

Chapter 9. Two-Sample Tests. Effect Sizes and Power Paired t Test Calculation

Unit 26 Estimation with Confidence Intervals

Poisson Models for Count Data

Simple Linear Regression Inference

Difference of Means and ANOVA Problems

5/31/2013. Chapter 8 Hypothesis Testing. Hypothesis Testing. Hypothesis Testing. Outline. Objectives. Objectives

Chapter 13. Chi-Square. Crosstabs and Nonparametric Tests. Specifically, we demonstrate procedures for running two separate

STATISTICS 8, FINAL EXAM. Last six digits of Student ID#: Circle your Discussion Section:

Nonparametric Statistics

Categorical Data Analysis

Chapter 7 Section 7.1: Inference for the Mean of a Population

" Y. Notation and Equations for Regression Lecture 11/4. Notation:

AMS 5 CHANCE VARIABILITY

General Method: Difference of Means. 3. Calculate df: either Welch-Satterthwaite formula or simpler df = min(n 1, n 2 ) 1.

Two-Sample T-Tests Assuming Equal Variance (Enter Means)

11. Analysis of Case-control Studies Logistic Regression

Non-Parametric Tests (I)

Chapter 2 Probability Topics SPSS T tests

Comparing Means in Two Populations

MULTIPLE REGRESSION EXAMPLE

Regression Analysis: A Complete Example

Goodness of fit assessment of item response theory models

STATISTICS 8: CHAPTERS 7 TO 10, SAMPLE MULTIPLE CHOICE QUESTIONS

Chapter 23. Inferences for Regression

Lecture Notes Module 1

How To Test For Significance On A Data Set

Transcription:

2 x 2 Case Chi Square for Contingency Tables A test for p 1 = p 2 We have learned a confidence interval for p 1 p 2, the difference in the population proportions. We want a hypothesis testing procedure for this difference. Definitions A contingency table is a tabular arrangement of count data representing how the row factor frequencies relate to the column factor. We call a contingency table with r rows and c columns, an r x c contingency table. Each category in a contingency table is called a cell. Example Consider a 2 x 2 contingency table with the row factor denoting a success versus failure, and the column factor denoting Group 1 or Group 2, where the samples for both Group 1 and Group 2 are independent of each other. Then, the contingency table looks like this: Group 1 Group 2 Success Y 1 Y 2 Failure Recall Example 10.37 regarding effectiveness of Timolol on angina status. The contingency table would be as follows: Timolol Placebo Angina free 44 19 Not Angina Free 116 128

We have already used this data to construct a 95% confidence interval for the difference in the proportion of angina free for the Timolol versus the Placebo conditions. Let p 1 denote the probability (or population proportion) of success for Group 1 Let p 2 denote the probability (or population proportion) of success for Group 2 To test H O : p 1 = p 2, we ll introduce Pearson s χ 2 (Chi square) statistic. Definition Pearson s χ 2 statistic is X 2 s O E 2 where the sum is over all the cells in the table, O denotes E observed values in each cell, and E denotes the value we d expect to see (if H O were true). Now, we have the observed values (the data we collected). What are the E s? Remember, we conduct hypothesis tests under the assumption that the null hypothesis is true. If the null hypothesis were true, then. So, then p 1 and p 2 would be estimating a common p (i.e. the probability of a success would be the same under Group 1 or Group 2 in our example). Then, we could estimate this common p by using a weighted ( pooled ) estimator. Little Sidebar p pool n 1p 1 n 2 p 2 n 1 n 2 n 1 Y 1 n n 2 Y 2 1 n 2 Y 1 Y 2 n 1 n 2 n 1 n 2 Suppose you are flipping an unfair coin, where the probability of a heads is 0.3 and the probability of a tails is 0.7. How many heads would you expect to see if you were to flip this unfair coin ten times? Now, apply this thought process to get the expected successes for Group 1. And compute the expected successes for Group 2. Chi square for Contingency Tables Page 2

Fill out the Expected Table for the Group 1/Group 2 success/failure contingency table. Group 1 Group 2 Success Failure Things to remember The E s (expected counts) need not be integers and we do not round them The row and column totals are the same for observed and expected tables (this is a good way to check your calculations!) For the Chi square test (we ll begin implementing in just a moment) to be valid, we need each E 1 and for the average E 5 Chi square for Contingency Tables Page 3

Calculating P values under the χ 2 distribution The χ 2 distribution is a right skewed distribution. The values of a χ 2 random variable are greater than or equal to 0. The χ 2 distribution has degrees of freedom. The degrees of freedom for a χ 2 test with a contingency table are df = (# of rows 1)(# of columns 1) For a non directional alternative, P = P{χ 2 df X 2 s} If df=1, we have the option of performing a directional alternative. In this case, 1 P P χ 2 df 2 X 2 s if data deviate in the direction specified by H A 0.5 otherwise TI 83/84 Matrix (2 nd x inverse) > scroll over to EDIT > ENTER > Enter your matrix STAT > scroll over to TESTS > scroll down to X 2 Test > ENTER > Make sure your observed values are in the matrix specified; the expected matrix will be calculated for you and stored in the matrix specified > Calculate > ENTER Chi square for Contingency Tables Page 4

Example Using the table below, conduct a test of hypothesis at the α = 0.01 significance level, to determine whether there is a significant difference in the probability of being angina free under Timolol or placebo. Timolol Placebo Angina free 44 19 Not Angina Free 116 128 Chi square for Contingency Tables Page 5

What if the researchers wanted to know to know whether the probability of being angina free is greater under Timolol than under placebo? What if the researchers wanted to detect whether the probability of being angina free under Timolol is less than under placebo? Chi square for Contingency Tables Page 6

A Test for Association The work up of all the previous examples assumed we had two independent samples and we were observing those two samples for the outcome of one variable. Many times, we are in the situation where we observe one sample for two explanatory factors. Factor 1 Level 1 Level 2 Factor 2 Level 1 Y 1 Y 2 Level 2 In the case where we have one sample and we re observing it for two explanatory factors, we ll test the hypothesis of association. The test for H O : there is no association is numerically equivalent to that of H O : p 1 = p 2 but the hypotheses and interpretations are different. Chi square for Contingency Tables Page 7

Example 10.21 To study the association of hair color and eye color in a German population, an anthropologist observed a sample of 6,800 men. Eye Color Dark Hair Color Dark 726 131 Light 3,129 2,814 Light Test at the α = 0.05 significance level, whether hair color is associated with eye color in this population of German men. Chi square for Contingency Tables Page 8

General r x c Case The ideas presented in the 2 x 2 cases just presented can be easily extended to general r x c contingency tables. For the case where we have c different samples (your columns), and we re checking each sample for different levels of the row factor, the hypothesis will change slightly. Here, we ll test whether the distributions are the same for each sample. (Think about it, if we have more than a success and a failure, then for each column we ll have P(level 1), P(level 2),,P(level r). And then, the null hypothesis would be testing whether p 11 = p 12 = = p 1c and p 21 = p 22 = = p 2c, etc This is called a compound hypothesis.) For the case where we have one sample and we re checking that one sample for different levels of two different factors, we ll still be testing association. Chi square for Contingency Tables Page 9

Example 10.31 The following table shows the observed distribution of A, B, AB, and O blood types in three samples of African Americans living in different locations. I (Florida) II (Iowa) III (Missouri) A 122 1781 353 B 117 1351 269 AB 19 289 60 O 244 3301 713 Test at the α = 0.05 level of significance, whether the distribution of blood type for African Americans is different across the three regions. Chi square for Contingency Tables Page 10

Example 10.33 To study the association of hair color and eye color in a German population, an anthropologist observed a sample of 6,800 men (this is the same study as that of example 10.21). Eye Color Hair Color Brown Black Fair Red Brown 438 288 115 16 Grey or Green 1387 746 946 53 Blue 807 189 1768 47 Test, at the α = 0.05 significance level, whether hair color is associated with eye color in this population of German men. Chi square for Contingency Tables Page 11

Final Notes on Chi Square for Contingency Tables Remember your calculator gives P values for a non directional alternative We can have a directional alternative when we re in the 2 x 2 table, and when H A is directional, one must check the data deviate in the direction specified by H A o If yes, cut P value in half o If no, P > 0.5 and fail to reject H O Degrees of freedom for an r x c table are (# rows 1)(# columns 1) Pearson s X 2 statistic for contingency tables uses the approximation X 2 ~ χ 2 df, so in order to be a valid approximation, a standard rule of thumb is to require E 1 for each cell and the average E 5 (and observations independent of one another) If expected counts are small, and data forms a 2 x 2 table, Fisher s exact test may be appropriate By contrast, example 10.21 illustrates X 2 s is very sensitive with large sample sizes For r x c tables, we have the following two hypotheses o c samples and we re checking for r levels of a row factor, then we re testing whether the distributions are the same (for the groups your columns) o one sample and we re checking for r levels of a row factor, and c levels of a column factor, then we re testing for an association of the row and column factors Chi square for Contingency Tables Page 12