MATH 10: Elementary Statistics and Probability Chapter 11: The Chi-Square Distribution Tony Pourmohamad Department of Mathematics De Anza College Spring 2015
Objectives By the end of this set of slides, you should be able to: 1 Learn about the chi-square distribution 2 Conduct and interpret Goodness-of-Fit Test 3 Conduct and interpret tests of independence 4 Conduct and interpret test for homogeneity 2 / 15
The Chi-Square Distribution The major characteristics of the chi-square distribution are: It is positively skewed The distribution is not symmetric It contains only positive values It is based on degrees of freedom When the degrees of freedom change a new distribution is created An example of the chi-square distribution Density 0.00 0.05 0.10 0.15 0 2 4 6 8 10 3 / 15
The Chi-Square Distribution Some more chi-square distributions Chi Square Distributions Density 0.0 0.1 0.2 0.3 0.4 0.5 df = 1 df = 2 df = 3 df = 4 0 2 4 6 8 10 x 4 / 15
The Chi-Square Distribution The notation for a chi-square random variable as df is the degrees of freedom X χ 2 df The chi-square table can be found on the course webpage under handouts Let s take look at the table to understand it 5 / 15
Goodness-of-Fit Test In this type of hypothesis test, you determine whether the data "fit" a particular distribution or not You will use a chi-square test to determine if there is a fit or not H 0 : The observed frequency distribution is the same as the hypothesized frequency distribution H a : The observed and hypothesized frequency distributions are different Example: # of Absences Expected # of Absences Actual # of Absences 0-2 50 35 3-5 30 40 6-8 12 20 9-12 6 1 12+ 2 4 6 / 15
Goodness-of-Fit Test The test statistic for a goodness-of-fit test is the following x 2 = k (O i E i ) 2 E i i=1 O is the observed frequency E is the expected frequency k is the number of different categories or outcomes The critical value is a chi-square value with (k 1) degrees of freedom, where k is the number of categories or outcomes 7 / 15
Goodness-of-Fit Test We can calculate the test statistic and then compare it to the critical value If the test statistic is inside the critical region then we reject H 0 If the test statistic is outside the critical region then we fail to reject H 0 Let s take a look at handout 9 for examples 8 / 15
Test of Independence Tests of independence involve using a contingency table of observed data values A test of independence tests the null hypothesis that there is no association between the row variable and the column variable in the contingency table H 0 : The row and column variables are independent H a : The row and column variables are not independent Recall what a contingency table looks like Lunger Cancer No Lung Cancer Row Total Smoker 70 20 90 Non-Smoker 5 5 10 Column Total 75 25 100 9 / 15
Test of Independence The test statistic for a test of independence is the following x 2 = O is the observed frequency E is the expected frequency r c i=1 r is the number of rows of the table (O i E i ) 2 c is the number of columns of the table E i The critical value is a chi-square value with (r 1)(c 1) degrees of freedom 10 / 15
Test of Independence We can calculate the test statistic and then compare it to the critical value If the test statistic is inside the critical region then we reject H 0 If the test statistic is outside the critical region then we fail to reject H 0 Important: For a contingency table E = (row total)(column total) (grand total) Let s take a look at handout 9 for examples 11 / 15
Test for Homogeneity Tests of homogeneity involve using a contingency table of observed data values Used to test whether two populations have the same distribution of some characteristic In a test of homogeneity, we test the claim that different populations have the same proportion characteristics H 0 : The distribution of the two populations are the same H a : The distribution of the two populations are different Brown Eye Blue Eye Green Eye Row Total Smoker 70 20 10 100 Non-Smoker 5 5 5 15 Column Total 75 25 15 115 12 / 15
Test for Homogeneity The test statistic for a test of homogeneity is the following x 2 = O is the observed frequency E is the expected frequency r c i=1 r is the number of rows of the table (O i E i ) 2 c is the number of columns of the table E i The critical value is a chi-square value with c 1 degrees of freedom 13 / 15
Test for Homogeneity We can calculate the test statistic and then compare it to the critical value If the test statistic is inside the critical region then we reject H 0 If the test statistic is outside the critical region then we fail to reject H 0 Important: For a contingency table E = (row total)(column total) (grand total) Let s take a look at handout 9 for examples 14 / 15
Summary of Tests Goodness-of-Fit: Use to decide whether a population with an unknown distribution "fits" a known distribution H 0 : The population fits the given distribution H a : The population does not fit the given distribution Independence: Use to decide whether two variables are independent or dependent H 0 : The two variables are independent H a : The two variables are dependent Homogeneity: Use to decide if two populations with unknown distributions have the same distribution as each other H 0 : The two populations follow the same distribution H a : The two populations have different distributions 15 / 15