Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1)



Similar documents
Is it statistically significant? The chi-square test

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

Chapter 23. Two Categorical Variables: The Chi-Square Test

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Calculating P-Values. Parkland College. Isela Guerra Parkland College. Recommended Citation

Final Exam Practice Problem Answers

Mind on Statistics. Chapter 15

Introduction to Analysis of Variance (ANOVA) Limitations of the t-test

3.4 Statistical inference for 2 populations based on two samples

UNDERSTANDING THE TWO-WAY ANOVA

Elementary Statistics Sample Exam #3

Testing Research and Statistical Hypotheses

Statistics 2014 Scoring Guidelines

Bivariate Statistics Session 2: Measuring Associations Chi-Square Test

Chi-square test Fisher s Exact test

Chapter 5 Analysis of variance SPSS Analysis of variance

Elementary Statistics

MULTIPLE REGRESSION WITH CATEGORICAL DATA

Association Between Variables

Introduction. Hypothesis Testing. Hypothesis Testing. Significance Testing

HYPOTHESIS TESTING (ONE SAMPLE) - CHAPTER 7 1. used confidence intervals to answer questions such as...

MTH 140 Statistics Videos

Regression step-by-step using Microsoft Excel

SCHOOL OF HEALTH AND HUMAN SCIENCES DON T FORGET TO RECODE YOUR MISSING VALUES

November 08, S8.6_3 Testing a Claim About a Standard Deviation or Variance

Data Mining Techniques Chapter 5: The Lure of Statistics: Data Mining Using Familiar Tools

Odds ratio, Odds ratio test for independence, chi-squared statistic.

MONT 107N Understanding Randomness Solutions For Final Examination May 11, 2010

An Introduction to Statistics Course (ECOE 1302) Spring Semester 2011 Chapter 10- TWO-SAMPLE TESTS

Simple Linear Regression Inference

THE FIRST SET OF EXAMPLES USE SUMMARY DATA... EXAMPLE 7.2, PAGE 227 DESCRIBES A PROBLEM AND A HYPOTHESIS TEST IS PERFORMED IN EXAMPLE 7.

Regression Analysis: A Complete Example

CHAPTER IV FINDINGS AND CONCURRENT DISCUSSIONS

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Section 12 Part 2. Chi-square test

Projects Involving Statistics (& SPSS)

Solutions to Homework 10 Statistics 302 Professor Larget

Section 13, Part 1 ANOVA. Analysis Of Variance

Comparing Multiple Proportions, Test of Independence and Goodness of Fit

Lesson 1: Comparison of Population Means Part c: Comparison of Two- Means

Having a coin come up heads or tails is a variable on a nominal scale. Heads is a different category from tails.

Recall this chart that showed how most of our course would be organized:

Recommend Continued CPS Monitoring. 63 (a) 17 (b) 10 (c) (d) 20 (e) 25 (f) 80. Totals/Marginal

Crosstabulation & Chi Square

Section 7.1. Introduction to Hypothesis Testing. Schrodinger s cat quantum mechanics thought experiment (1935)

AP: LAB 8: THE CHI-SQUARE TEST. Probability, Random Chance, and Genetics

Data Analysis Tools. Tools for Summarizing Data

LAB : THE CHI-SQUARE TEST. Probability, Random Chance, and Genetics

Using Stata for Categorical Data Analysis

One-Way Analysis of Variance (ANOVA) Example Problem

Linear Models in STATA and ANOVA

Study Guide for the Final Exam

Statistics. One-two sided test, Parametric and non-parametric test statistics: one group, two groups, and more than two groups samples

" Y. Notation and Equations for Regression Lecture 11/4. Notation:

HYPOTHESIS TESTING (ONE SAMPLE) - CHAPTER 7 1. used confidence intervals to answer questions such as...

11. Analysis of Case-control Studies Logistic Regression

statistics Chi-square tests and nonparametric Summary sheet from last time: Hypothesis testing Summary sheet from last time: Confidence intervals

Fairfield Public Schools

STA-201-TE. 5. Measures of relationship: correlation (5%) Correlation coefficient; Pearson r; correlation and causation; proportion of common variance

Two-sample hypothesis testing, II /16/2004

Additional sources Compilation of sources:

CONTINGENCY TABLES ARE NOT ALL THE SAME David C. Howell University of Vermont

t Tests in Excel The Excel Statistical Master By Mark Harmon Copyright 2011 Mark Harmon

C. The null hypothesis is not rejected when the alternative hypothesis is true. A. population parameters.

Multiple Linear Regression

KSTAT MINI-MANUAL. Decision Sciences 434 Kellogg Graduate School of Management

Math 108 Exam 3 Solutions Spring 00

1 Basic ANOVA concepts

Topic 8. Chi Square Tests

Two-Sample T-Tests Assuming Equal Variance (Enter Means)

Hypothesis Testing: Two Means, Paired Data, Two Proportions

First-year Statistics for Psychology Students Through Worked Examples

STAT 145 (Notes) Al Nosedal Department of Mathematics and Statistics University of New Mexico. Fall 2013

Lesson 3: Calculating Conditional Probabilities and Evaluating Independence Using Two-Way Tables

1.5 Oneway Analysis of Variance

STATISTICS 8, FINAL EXAM. Last six digits of Student ID#: Circle your Discussion Section:

CHAPTER 11 CHI-SQUARE AND F DISTRIBUTIONS

COMPARISONS OF CUSTOMER LOYALTY: PUBLIC & PRIVATE INSURANCE COMPANIES.

Hypothesis testing - Steps

Friedman's Two-way Analysis of Variance by Ranks -- Analysis of k-within-group Data with a Quantitative Response Variable

Mind on Statistics. Chapter 13

Two-Sample T-Tests Allowing Unequal Variance (Enter Difference)

Part 3. Comparing Groups. Chapter 7 Comparing Paired Groups 189. Chapter 8 Comparing Two Independent Groups 217

MBA 611 STATISTICS AND QUANTITATIVE METHODS

Categorical Data Analysis

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

This chapter discusses some of the basic concepts in inferential statistics.

Chapter 23. Inferences for Regression

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

Chapter 2. Hypothesis testing in one population

ABSORBENCY OF PAPER TOWELS

Mind on Statistics. Chapter 4

Comparing Two Groups. Standard Error of ȳ 1 ȳ 2. Setting. Two Independent Samples

In the past, the increase in the price of gasoline could be attributed to major national or global

CHAPTER 12. Chi-Square Tests and Nonparametric Tests LEARNING OBJECTIVES. USING T.C. Resort Properties

12.5: CHI-SQUARE GOODNESS OF FIT TESTS

Contingency Tables and the Chi Square Statistic. Interpreting Computer Printouts and Constructing Tables

Simple Regression Theory II 2010 Samuel L. Baker

Transcription:

Spring 204 Class 9: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.) Big Picture: More than Two Samples In Chapter 7: We looked at quantitative variables and compared the means of two samples. In Chapter 8: We looked at categorical variables and compared the proportions in two samples. What if we have more than two samples? For example, suppose we want to compare proportions of men, women, and children? Or we want to compare the mean effect of a drug on Whites, African-Americans, Hispanics, and Asians? The tests we know so far don t work. The two tests we will do now work with more than two samples: Chi-Square Test: For categorical variables, more than two samples (Chapter 9) ANOVA (Analysis of Variance): For quantitative variables, more than two samples (Chapter 2) First we ll do categorical variables and we ll start with a two sample example that can be done both by the 𝑧-test and by the new method -Chi-square. Example: ABC Poll of 500 adults, 750 men, 750 women on Presidential approval rating Ex: On March 0, 2002, 623 men, 607 women approved of Bush s handling of job. Was there a gender gap in the approval ratings significant at the 5% level? Step : Hypotheses:𝐻! : 𝑝! = 𝑝! 𝐻! : 𝑝! 𝑝! Step 2: Test Statistics: We find the sample proportions:! Men: 𝑝! = = 0.83067, Women: 𝑝! = = 0.80933, Pooled: 𝑝 = = 0.82.! Then, using the pooled proportion for the standard error: 623 607 0.83067 0.80933 𝑧 = = =.075. 0.09839 0.82 0.82 Step 3: 𝑃 value = 2𝑃 𝑧 >.075 = 2 0.4 = 28%. Step 4: We fail to reject the null hypothesis. There was no significant gender gap. Ex: On Sept 9, 200, 443 men out of 750 and 390 women out of 750 approved of Bush s handling of job. Is there a gender gap in the approval ratings significant at the 5% level? Step : Hypotheses:𝐻! : 𝑝! = 𝑝! 𝐻! : 𝑝! 𝑝! Step 2: Test Statistics: We find the sample proportions!!"! Men: 𝑝! = 0.5907, Women: 𝑝! = 0.5200 Pooled: 𝑝 = = 0.556 and! 443 390 0.5907 0.520 𝑧 = = = 2.754 0.025657 0.556 0.556 Step 3:𝑃 value = 2𝑃 𝑧 > 2.754 = 2 0.00294 = 0.0059 = 0.59% Step 4: We reject the null hypothesis. There was a significant gender gap. Aside: What might have caused the difference between the two results? 9/. This dramatically changed the way people regarded the president, and in particular the way women reacted.

Spring 204 RELATIONSHIPS BETWEEN CATGORICAL VARIABLES: TWO WAY TABLES \ In Chapter 2, we looked at relationships between quantitative variables (using regression and correlation). Now we look at relationships between categorical variables, using counts. In this example of presidential approval ratings, gender is explanatory variable; approval is response variable. We give data in a two-way table, which records counts of all the possible counts (including, for example, the total number of approvals, which we did not have before). March 0, 2002 Sept 9, 200 Approve 623 607 230 Approve 443 390 833 Don t 27 43 270 Don t 307 360 667 750 750 500 750 750 500 Look at proportions of whole population: this is called the joint distribution. Each proportion is out of the whole 500 (so, for example, 0.45 = 623/500 and 0.82 = 230/500). Here all the proportions in the interior four cells of the table should add to. Joint Distribution: March 0, 2002 Joint Distribution: Sept 9, 200 Approve 0.45 0.405 0.820 Approve 0.295 0.260 0.555 Don t 0.085 0.095 0.80 Don t 0.205 0.240 0.445 0.500 0.500.000 0.500 0.500.000 To compare men and women, we need a conditional distribution, conditioned on gender Conditional Distribution: March 0, 2002 Conditional Distribution: Sept 9, 200 Approve 0.83 0.8 Approve 0.59 0.52 Don t 0.7 0.9 Don t 0.4 0.48.00.00.00.00 Bar graphs illustrate the difference between the conditional distributions. How can we tell if the difference between man and woman is large enough to be significant? We need a hypothesis test. 2002 Approval Ra?ng 200 Approval Ra?ng 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0 Men Women 0 Men Women 2

Spring 204 HYPOTHESIS TEST: CHI-SQUARED To decide if the difference between the genders is significant, we can use a two-sample difference in proportions (as earlier in the class) or a chi-squared test. For chi-squared, we compare the observed twoway table of counts with the table of expected counts, which is obtained by assuming the row and column totals are the same, given no interaction. Expected Counts: 2002 Expected Counts: 200 Approve =(0.82)750 =(0.82)750 230 Approve 46.5 46.5 833 65 65 (=0.82 ) =(0.555)750 =(0.555)750 (0.555) Don t =(0.8)750 =(0.8)750 270 Don t 333.5 333.5 667 35 35 (=0.8 ) =(0.445)750 =(0.445)750 (0.445) 750 750 500 750 750 500 We use expected counts to compare whether each column has same percent breakdown, that is, whether men and women approve in the same proportions. [Aside: Alternatively, which turns out to be same thing, we can look at whether those who approve/disapprove have same sex breakdown.] For reference, here is the original table of observed counts: Observed Counts: 2002 Observed Counts: 200 Approve 623 607 230 Approve 443 390 833 Don t 27 43 270 Don t 307 360 667 750 750 500 750 750 500 To calculate expected counts total men total approve 230 230 = = 0.5 230 = 65 500 500 total overall men, approve Row total Column total Expected cell count = n Now we make a table of differences, with Observed Expected: Difference: 2002 Men Women Difference: 200 Men Women Approve 8 =623-65 8 =607-65 Approve 26.5 26.5 Don t Don t 26.5 26.5 8 =27-35 8 =43-35 Looking ahead: Ex: What differences do you see between the tables of differences that may indicate significance? Recall: 2002 is not significant; 200 is significant. You can see the difference appearing in the size of the differences. Ex: What do you notice about the sums of the differences in one table? How many values are needed to determine all the others? Sums in any one row or any one column are 0, so knowing one value gives us all the others. We say the table has one degree of freedom. 3

Spring 204 INFERENCE FOR TWO-WAY TABLES: CHI SQUARE TEST FOR INDEPENDENCE Null hypothesis: There is no association between variables This means the columns have same % breakdown. In addition, the rows have same % break down. Alternative hypothesis: There is an association between the variables, but we do not specify what. Chi-Square Test Statistic (observed count expected count)! expected count With 𝑑𝑓 = # rows (# cols ) We see that 𝜒! 0. For various degrees of freedom, the chi-squared distribution is shown below: Note that when there is an association between variables, the observed and expected counts are further apart, so we are going to get large values of 𝜒!. Thus Large values of 𝜒! give significance. Conversely, if there is no association, the expected and observed counts are similar (they may not be identical because of random variation), so we get a small value of 𝜒!. Thus Small values of 𝜒! give no significance. Ex: Perform the Chi-Square hypothesis test for 2002 Step: 𝐻! : There is no association between gender and approval rating 𝐻! : There is an association between gender and approval rating Step 2: The test statistic is (623 65)! (607 65)! (27 35)! (43 35)! =.56 with df = 65 65 35 35 Step3: From the table P value = P 𝜒! >.56 > 0.25 = 25% From calculator 𝑃 𝜒! >.56 = 𝜒! cdf(.56,20,) = 0.282 = 28.2% [ Step 4: We do not reject the null hypothesis. There is no evidence for association there is no evidence for a gender gap. http://en.wikipedia.org/wiki/chi-square_distribution 4

Spring 204 Ex: Perform the Chi-Square hypothesis test for 200 Step: 𝐻! : There is no association between gender and approval rating 𝐻! : There is an association between gender and approval rating Step 2: The test statistic is (443 46.5)! (390 46.5)! (307 33.5)! (360 333.5)! = 7.584 with df = 46.5 46.5 333.5 333.5 Step3: From table 𝑃 𝜒! > 7.584 is between 0.0 and 0.005, so 0.5% to %. From calculator p 𝜒! > 7.584 = 𝜒! cdf 7.58,20, = 0.0059 = 0.59% Step 4: We reject the null hypothesis. There is evidence for association that is, for a gender gap. Ex: Interpret the P-value in the previous example. Assuming there is no interaction between gender and approval, there s 0.59% chance that we see observed values as far, or further, from the expected values by chance. Comparison between the Chi-Square and the previous z-test 2002 𝜒! =.56 𝑧 =.075 𝑃-value = 28% 𝑃-value = 28%.560 = (.075)! 200 𝜒! = 7.584 𝑧 = 2.754 𝑃-value = 0.59% 𝑃-value = 0.59% 7.584 = (2.754)! Ex: What do you notice about the 𝑃-values of the two ways of doing each test? What does it tell us? Provided the 𝑧-test is two-sided, the 𝑃-values of the two tests are the same, so we always come to the same conclusion. Notes on Using the Chi-Square Test Rejecting null hypothesis does not tell us which way the interaction happens. Since 𝜒! is always positive; there is only an upper tail. We never multiply the probabilities by 2. Conditions: We can use the Chi-Square test when o For tables larger than 2 2: Average of expected counts 5; all expected counts o For 2 2 tables: All expected counts 5 5