Hypothesis Test Notes Chi-Squared Test Statistic & Goodness of Fit Test Remember when comparing a sample percentage to a claimed population percentage we use a 1 proportion hypothesis test and a Z-test statistic. When comparing a sample percentage for 1 group to a sample percentage from a second group we use a proportion hypothesis test and a Z-test statistic. In both cases the Z-test statistic counts the number of standard errors one thing is from another. But what if we have more than groups we are comparing? r what if we are comparing multiple variables in multiple groups (two way table)? The answer to both of these is the Chi-Squared Test Statistic. The basic idea of any test statistic is to compare the sample data to the null hypothesis. In Chi- Squared, we will calculate the xpected Values if the null hypothesis is true. xample 1 Let s suppose that the percentage of high school students that graduate are the same five different high schools. This multiple P hypothesis test is often called a Goodness of Fit Test. H : p p p p p 0 1 3 4 5 H A : at least one is Suppose we have a total of 105 students that graduate. How many would we expect from each school? This are the xpected Values Notice if Ho is true then, all the schools would have the same number of graduates from the 105 total. In other words the expected values should all be 1 (105 divided by 5).
Now we need to compare what really happened to those expected values. Here is the observed sample data. (bserved Values) 17 1 3 4 5 4 13 5 6 So when doing Chi-Squared Hypothesis Tests, think xpected means Ho, but bserved means sample data. The formula for the chi-squared test statistic is pretty formidable. Remember, the computer will be doing the heavy lifting. We need to understand the formula and be able can explain it. Notice we are finding the difference between the observed values (sample data) and expected values (null hypothesis). Since we will sometimes get negative numbers we are squaring the differences. This is why the test statistic is called Chi-Squared. We are dividing by the expected value so we are looking at an average of the squares. Then adding all of these together gives us the total Chi-Squared. Remember this is a way to compare complex categorical data to the null hypothesis. Chi-Squared Sentence: The sum of the averages of the squares of the difference between the observed sample data and the expected values if the null hypothesis it true.
Let s calculate the chi-squared test statistic for example 1. Remember all of the expected values are 1 and the observed values are given below. 17 1 3 4 5 4 13 5 6 17 1 4 1 13 1 5 1 6 1 1 1 1 1 1 4 3 8 4 5 = 16 9 64 16 5 130 6.19 1 1 1 1 1 1 1 1 1 1 1 Note: While 6.19 is a lot for a Z-score or T-score, 6.19 may not be significant for a Chi-Squared. Remember Chi-Squared comes from adding up squared numbers and can be rather large. We would need to see a simulation or a P-value to see if 6.19 is significant or not. Let s simulate what chi-squared test statistics we would expect if the null hypothesis was true. Here is a simulation created with StatKey.
First of all, what is the shape of the chi-squared distribution? Notice the Chi-Squared distribution is not bell shaped (normal). It is always Skewed Right. Chi-Squared hypothesis tests are always right tail. Remember squared numbers are always positive and adding up squared numbers gives you a positive sum. So it is impossible for Chi- Squared to be negative. Chi-Squared hypothesis tests are never left tailed or two tailed. Chi-Squared takes complicated categorical data and condenses it into 1 right tail test. Now what about the Chi-Squared test statistic of 6.19 that we computed? Is it significant? (In the tail) Could it happen by random chance? What is the estimated P-value? Remember, like all hypothesis tests, there are two reasons for the sample data being different than the null hypothesis. ither the null may be true and the sample data is different because all samples are different (random chance), or, the null hypothesis is wrong. Which is it in this case? Notice the data is not significant (in the tail) and could have happened by random chance (18.9%). So there is not a significant difference between the observed sample data and the expected values from the null hypothesis. Since we have not ruled out random chance, we cannot be sure if the null hypothesis is indeed wrong. So we would fail to reject the null hypothesis.
xample Let us suppose that someone had a different claim they wanted to test with the school graduation data. They claim that 15% of the graduates come from school, 15% from school 4, 15% from school 5, 5% from school 1, and 30% from school 3. This is also a multiple P test (Goodness of Fit Test) though the null hypothesis looks a little different. Notice the groups are checking the percentage for the same success variable (graduating). We are only checking one percentage in each group. This is the trademark of a Goodness of Fit test. H : p 5%, p 15%, p 30%, p 15%, p 15% H 0 1 3 4 5 A : at least one is Let s calculate the Chi-Squared test statistic again. Let s start by calculating the expected values from the null hypothesis. This null hypothesis suggests that each group has a different percentage and therefore a different expected value. Remember our total number of graduates was 105. The null hypothesis suggest that 5% of those will come from school 1, 15% of them will come from school, school 4 and school 5, 30% will come from school 3. Remember to calculate a percentage of a total simply convert the percentage into a decimal and multiply by the total. Here are our expected values. 0.5105 6.5 1 3 4 5 0.15105 15.75 0.30105 31.5 0.15105 15.75 0.15105 15.75 Remember these are what we expect to get if the null hypothesis is true. We can compare these with the bserved sample data values. 17 1 3 4 5 4 13 5 6
Now let s calculate the Chi-Squared test statistic. 17 6.5 4 15.75 13 31.5 5 15.75 6 15.75 6.5 15.75 31.5 15.75 15.75 9.5 8.5 18.5 9.5 10.5 = 6.5 15.75 31.5 15.75 15.75 85.565 68.065 34.5 85.565 105.065 6.5 15.75 31.5 15.75 15.75 3.595 4.314 10.8651 5.435 6.6706 30.55 Is a Chi-Squared Test Statistic of 30.55 significant? Remember Chi-Squared test statistic are squared numbers added up, so it can be very large. Let s calculate a P-value with StatCrunch this time to determine if it is significant. To calculate a P-value for a Goodness of Fit test we will need to do the following. First type in the observed sample values in a column of StatCrunch. If the null hypothesis has specific percentages instead of equal, then type these percentages (written as decimals) in another column. Remember the percentage has to coincide with the observed value from the same variable. Stat Goodness of Fit Chi-Squared Test Tell StatCrunch what column your observed sample data is in. If the null hypothesis is all groups equal then click the button that says all cells in equal proportion under the xpected menu. In this case each school had a different percentage in the null hypothesis. So under the xpected menu, click the column where the percentages are. Now click compute. Notice the P-value says <0.0001. This is what StatCrunch writes when the P-value is very close to zero. P value 0
Remember, like all hypothesis tests, there are two reasons for the sample data being different than the null hypothesis. ither the null may be true and the sample data is different because all samples are different (random chance), or, the null hypothesis is wrong. Which is it in this case? A P-value of 0 is very significant and since P-value is the probability of the sample data happening by random chance, this data was very unlikely to happen by random chance. So there is a significant difference between the observed sample data and the expected values from the null hypothesis. We have ruled out random chance and can Reject the null hypothesis. Key Points about the Goodness of Fit Test A Goodness of Fit test checks the same success variable in multiple groups. The sample data will be a single row or column of observed values. (Not a two-way table). Sample Null and Alternative Hypothesis (two types) H : p p p p p 0 1 3 4 5 H A : at least one is H : p 5%, p 15%, p 30%, p 15%, p 15% H 0 1 3 4 5 A : at least one is Chi-Squared Test Statistic and P-value can be calculated with simulation (StatKey) or with StatCrunch. (Do not calculate this by hand.) The Chi-Squared distribution is always skewed right. Any hypothesis test using the Chi- Squared test statistic will always be a right tailed test.
Chi-Squared Sentence: The sum of the averages of the squares of the difference between the observed sample data and the expected values if the null hypothesis it true. What are the assumptions? All Chi-Squared hypothesis tests have the same assumptions: 1. Random. All expected values must be at least 5 (observed sample data is large enough) Large Chi-Squared Test Statistic (in the tail of the simulation) and Small P-value both tell us that the data probably did not happen by random chance and is significant. The observed sample data significantly disagrees with the expected values from the null hypothesis. We can therefore Reject the null hypothesis. Small Chi-Squared Test Statistic (Not in the tail of the simulation) and Large P-value both tell us that the data could have happen by random chance and is not significant. The observed sample data does not significantly disagree with the expected values from the null hypothesis. Since we cannot rule out random chance, we don t know if the null is right or wrong, so we Fail to Reject the null hypothesis. Conclusions may be written in the same way as all hypothesis tests. If the claim is the null hypothesis, then you will either have evidence to reject the claim (small P-value) or not have evidence to reject the claim (Large P-value). If the claim is the alternative hypothesis, then you will either have evidence to support the claim (small P-value) or not have evidence to support the claim (Large P-value). Degrees of Freedom = K 1 (K is # of groups) xpected Values (Automatically calculated with StatCrunch) n (n = sample size total, K = # of groups, use this for case when all groups are k assumed to be equal in null hypothesis) n p (n = sample size total, p = percentage from each group, use this for case when each group has different percentage in the null hypothesis)