11.3 Contingency Tables

11.3 Contingency Tables Objectives: 1. Perform a test of homogeneity. Perform a test of independence Overview: In this section we consider contingency tables (or two-way frequency tables), which include frequency counts for categorical data arranged in a table with a least two rows and at least two columns. We present a method for testing the claim that the row and column variables are independent of each other. We will use the same method for a test of homogeneity, whereby we test the claim that different populations have the same proportion of some characteristics. Contingency Tables: A contingency table (or two-way frequency table) is a table in which frequencies correspond to two variables. (One variable is used to categorize rows, and a second variable is used to categorize columns.) Contingency tables have at least two rows and at least two columns. Test of Independence: A test of independence tests the null hypothesis that in a contingency table, the row and column variables are independent. Notation: O r c represents the observed frequency in a cell of a contingency table. represents the expected frequency in a cell, found by assuming that the row and column m variables are independent represents the number of rows in a contingency table (not including labels). represents the number of columns in a contingency table (not including labels). Requirements: 1. The sample data are randomly selected.. The sample data are represented as frequency counts in a two-way table. 3. For every cell in the contingency table, the expected frequency is at least 5. (There is no requirement that every observed frequency must be at least 5. Also, there is no requirement that the population must have a normal distribution or any other specific distribution.) Null and Alternative Hypotheses: H 0 : H 1 : The row and column variables are independent. The row and column variables are dependent.

Test Statistic: (Chi Squared) χ ( O ) = where O is the observed frequency in a cell and is the expected frequency found by evaluating Critical Values: = ( row. total)( column.. total) grand. total 1. Found in Table A- 4 using ( r 1)( c 1) degrees of freedom, r is the number of rows and c is the number of columns.. Tests of Independence are always right-tailed. P-Values: P-values are typically provided by computer software, or a range of P-values can be found from Table A-4. Warning: 1. This procedure cannot be used to establish a direct cause-and-effect link between variables in question.. Dependence means only there is a relationship between the two variables. Relationships Among Key Components in Test of Independence:

xample: Responses to a survey question are broken down according to gender and the sample results are given below. At the 0.05 significance level, test the claim that response and gender are independent. Solution: Requirements are satisfied: randomly assigned to treatment groups, frequency counts, expected frequencies are all at least 5. Step 1: We are testing the claim that that response and gender are independent. Step : The opposite of the claim is that the response and gender are dependent. Step 3: Null hypothesis contains equality, therefore, Step 4: Significance level is 0.05 H 0 : The response is independent of the gender. H 1 : The response and gender are dependent. Step 5: We are testing for independence, use: χ Step 6: Use a table to calculate the vales of and the test statistic. Yes No Undecided Observed Category O Chi^ O Chi^ O Chi^ Totals Male 5 7.00 0.15 50 48.00 0.08 15 15.00 0.00 90 Female 0 18.00 0. 30 3.00 0.13 10 10.00 0.00 60 Totals 45 0.37 80 0.1 5 0.00 150 Chi-Squared = 0.579 χ ( O ) = = 0.579 The critical value of χ = 5.991 is found from Table A-4 with α = 0.05 in the right tail and the number of degrees of freedom given by (r 1)(c 1) = ( 1)(3 1) =. Test Statistic < Critical Value 0.579 < 5.991

Step 7: Because the test statistic does not fall in the critical region, there is not sufficient evidence to reject the null hypothesis. Step 8: Conclusion: There is not sufficient evidence to reject the claim that the responses are independent of gender. (Responses are dependent on gender.) xample: The table below shows the age and favorite type of music of 668 randomly selected people. Use a 97.5% level of significance to test the null hypothesis that age and preferred music type are independent. Solution: Requirements are satisfied: randomly assigned to groups, frequency counts, expected frequencies are all at least 5. Step 1: We are testing the claim that age and preferred music type are independent. Step : The opposite of the claim is that age and preferred music type are dependent. Step 3: Null hypothesis contains equality, therefore, H 0 : Music type is independent of age. H 1 : Music type and age are dependent. Step 4: Significance level is 0.05. This is obtained from 97.5% = 0.975 and 1-0.975 = 0.05. Step 5: We are testing for independence, use: χ

Step 6: Use a table to calculate the vales of and the test statistic. Rock Pop Classical Observed Category O Chi^ O Chi^ O Chi^ Totals 15-5 50 64.77 3.37 85 77.84 0.66 73 65.39 0.89 08 5-35 68 68.19 0.00 91 81.96 1.00 60 68.85 1.14 19 35-45 90 75.04.98 74 90.19.91 77 75.76 0.0 41 Totals 08 6.35 50 4.56 10.04 668 Chi-Squared = 1.954 χ ( O ) = = 1.954 The critical value of χ = 11.143 is found from Table A-4 with α = 0.05 in the right tail and the number of degrees of freedom given by (r 1)(c 1) = (3 1)(3 1) = 4. Test Statistic > Critical Value 1.954 > 11.143 Step 7: Because the test statistic falls in the critical region, there is sufficient evidence to reject the null hypothesis. Step 8: Conclusion: There is sufficient evidence to reject the claim that music type is independent of age.

xample: 160 students who were majoring in either math or nglish were asked a test question, and the researcher recorded whether they answered the question correctly. The sample results are given below. At the 0.10 significance level, test the claim that response and major are independent. Solution: Requirements are satisfied: randomly assigned to groups, frequency counts, expected frequencies are all at least 5.

xample: Use the sample data below to test whether car color affects the likelihood of being in an accident. Use a significance level of 0.01. Solution: Requirements are satisfied: randomly assigned to groups, frequency counts, expected frequencies are all at least 5.

Test of Homogeneity: In a test of homogeneity, we test the claim that different populations have the same proportions of some characteristics. How to Distinguish Between a Test of Homogeneity and a Test for Independence: Were predetermined sample sizes used for different populations (test of homogeneity), or was one big sample drawn so both row and column totals were determined randomly (test of independence)? Procedure: A test of homogeneity uses the same notation, requirements, test statistic, critical value and procedures as a test for independence. However, instead of testing for independence, we are testing to determine if the different populations have the same proportions. xample: On sensitive issues, people tend to give acceptable rather than honest responses; their answers may depend on the gender or race of the interviewer. To support that claim, men were asked if they agreed with this statement: Abortion is a private matter that should be left to the woman to decide without government intervention. Using a 0.05 significance level, test the claim that the proportions of agree/disagree responses are the same for the subjects interviewed by men and the subjects interviewed by women. Solution: Requirements are satisfied: data are random, frequency counts in a two-way table, expected frequencies are all at least 5 Step 1: We are testing the claim that the proportion of agree/disagree responses are the same for the subjects interviewed by men and the subjects interviewed by women. Step : The opposite of the claim is that the proportions are different. Step 3: Null hypothesis contains equality, therefore, H 0 : The proportions of agree/disagree responses are the same for the subjects interviewed by men and the subjects interviewed by women. H 1 : The proportions are different. Step 4: Significance level is 0.05.. Step 5: We are testing for independence, use: χ

Step 6: Use a table to calculate the vales of and the test statistic. Man Woman Observed Category O Chi^ O Chi^ Totals Agree 560 578.67 0.60 308 89.33 1.0 868 Disagree 40 1.33 1.57 9 110.67 3.15 33 Totals 800.18 400 4.35 100 Chi-Squared = 6.59 χ ( O ) = = 6.59 The critical value of χ = 3.841 is found from Table A-4 with α = 0.05 in the right tail and the number of degrees of freedom given by (r 1)(c 1) = ( 1)( 1) = 1. Test Statistic > Critical Value 6.59 > 3.841 Step 7: Because the test statistic falls in the critical region, there is sufficient evidence to reject the null hypothesis. Step 8: Conclusion: There is sufficient evidence to warrant rejection of the claim that the proportions are the same.

xample: At a high school debate tournament, half of the teams were asked to wear suits and ties and the rest were asked to wear jeans and t-shirts. The results are given in the table below. Test the hypothesis at the 0.05 level that the proportion of wins is the same for teams wearing suits as for teams wearing jeans. Solution: Requirements are satisfied: data are random, frequency counts in a two-way table, expected frequencies are all at least 5

xample: A researcher wishes to test the effectiveness of a flu vaccination. 150 people are vaccinated, 180 people are vaccinated with a placebo, and 100 people are not vaccinated. The number in each group who later caught the flu was recorded. The results are shown below. Use a 0.05 significance level to test the claim that the proportion of people catching the flu is the same in all three groups. Solution: Requirements are satisfied: data are random, frequency counts in a two-way table, expected frequencies are all at least 5