1 About Chi Squares
TABLE OF CONTENTS About Chi Squares... 1 What is a CHI SQUARE?... 1 Chi Squares... 1 Goodness of fit test (Oneway χ 2 )... 1 Test of Independence (Twoway χ 2 )... 2 Hypothesis Testing with Chi Squares... 2 STEP 1: Establish the hypotheses... 2 STEP 2: Calculate the Chisquare... 2 STEP 3: Assess the significance level... 3 STEP 4: Make a decision... 4 Post hoc Testing... 4 METHOD METHOD Glossary... 6 References... 7
3 About Chi Squares What is a CHI SQUARE? A chi square is a nonparametric test used to explore the relationships between categorical variables. Parametric statistics are concerned with interval or ratio data. Sometimes, however, the researcher wishes to use nominal or ordinal to examine the frequency of certain categories. As you might expect, these types of analyses are called nonparametric statistics. Chi Squares Sometimes researchers are interested in determining differences in the frequency of events; however, we are limited in what we can do statistically with categorical data because we are unable to calculate the mean (which is a foundation of many statistical tests). The Chi square statistic allows us to test hypotheses about categorical variables by examining if there is a relationship between 2 or more categorical variables. Chi squares are most often used to test the relationship between 2 variables. They can also be used to make a multivariate assessment of the relationship between 3 or more variables. Goodness of fit test (Oneway χ 2 ) In this case, only 1 variable is being examined. Goodness of fit tests are often used for examining fairness in games of chance, such as cards, roulette and dice games. 1
4 Test of Independence (Twoway χ 2 ) Two variables with at least two levels each are being examined. A level is a subcategory of the variable. The goal of this test is to learn if the variables are independent of each other. Hypothesis Testing with Chi Squares STEP 1: Establish the hypotheses The hypotheses help state the purpose of the study, by indicating what it is we are trying to conclude from conducting the particular study. The null hypothesis is that the two variables are independent: H 0 = 0 The alternative hypothesis is that the two variables are related: H 1 0 STEP 2: Calculate the Chisquare There are 3 things that need to be known before the Chi square can be calculated: The first is the observed count. This is the observed frequency of each variable. The second is the expected count. This is the value that we would expect to find if there was no error in our study. It is calculated with: E = (row total) x (column total) table total 2
5 Finally, we calculate the Chi square (χ ). It is calculated with: χ 2 = Σ (O E) 2 E It is easier to conceptualize these 3 steps in a chart : The cell in the bottom right is the one that gives us the value of χ 2. O E (OE) 2 (OE) 2 /E Σ(O  E) 2 / E = χ 2 STEP 3: Assess the significance level In order to conclude if the relationship we find between the two variables is statistically significant, we have to set the alpha level. If the value that we calculated is higher than the alpha level then we can conclude that the relationship is statistically significant. Usually the alpha level is set at α =.05, which means that there is less than a 5% possibility that the result occurred by chance. The alpha level and the degrees of freedom allow us to find the critical value from a statistical table, and we compare this number to the observed value in order to determine statistical significance. Degrees of freedom for goodness of fit tests are calculated with: df = k 1, where k = number of levels Degrees of freedom for tests of independence are calculated with: df = (number of rows 1)(number of columns 1) 3
6 STEP 4: Make a decision If the obtained χ 2 value is larger than the critical χ 2 value, then we can conclude that there is a significant relationship between the variables. If the obtained χ 2 value is smaller than the critical χ 2 value, then we can conclude that there is no relationship between the variables. Post hoc Testing When you conduct a Chisquare test of independence with variables that have more than 2 levels and find a significant result, post hoc tests need to be performed in order to determine where the significance lies. METHOD 1 Post hoc means after the fact. A post hoc test is performed after a significant result is found. The post hoc test compares each of the levels and allows you to determine which pairs of cells (levels) are significantly different. This involves simply looking at the table. By examining the residual levels (the observed level the expected level), you can make a judgment on which pairs of cells are most significant. The percentage of contribution can be calculated and compared to see which pairs contribute the most to the χ 2 value. The percentage is calculated with: % = [(O  E) 2 / χ 2 ] x 100 E 4
7 This is not the most accurate method of post hoc testing because it is based mostly on visual interpretation. METHOD 2 This involves extracting a 2x2 table of interest from the original table. It is easy to collapse ordinal variables into a dichotomy because the data is rank ordered. This is much more difficult for nominal variables because they are distinct and unordered categories. NOTE: you are not removing data, but rather combining the cells in a meaningful way. This will produce a new χ 2 value that might be a significant result. We need to recognize that this value is part of the larger table and that it doesn t relate directly to the original analysis. This inflates Type I error. In order to counter the inflated Type I error we need to adopt a conservative alpha level by performing a Bonferroni adjustment (k): k = r! x c! 2!(r1)! 2!(c1)! Where r is the number of rows, c is the number of columns and! refers to a factorial, which is a series of multiplications. For example, 2! = (2)(1), 3! = (3)(2)(1), 4! = (4)(3)(2)(1), etc. Next, we divide our original alpha level by k to determine the conservative alpha level. We compare our new χ 2 value to the critical value at this alpha level to determine if the result is significant. 5
8 Glossary Bonferroni adjustment: Categorical variable: Expected count: Nonparametric test: Observed count: Post hoc test: Type I error: a statistical procedure that corrects for Type I error inflation when the cells in a chisquare table have been collapsed. one that we can place into categories, but these categories may not have any logical ordering (Nominal or ordinal). the expected frequency. a type of test that does not assume that the variables follow a specific distribution, like a normal distribution. the recorded frequency. a test that is performed after a significant result is found. When there are more than two levels, this test shows which level is significant. occurs when you reject the null hypothesis even though it is true. In other words, it is concluding that there is a significant relationship when there actually isn t one. 6
9 References Cohen, R. J., & Swerdlik, M. E. (2005). Psychological testing and assessment: An introduction to tests and measurements. (6th ed.) NY: McGrawHill. Gardner, R. C. (2001). Psychological statistics using SPSS for Windows. New Jersey: Prentice Hall. Hamilton, L. C. (1996). Data analysis for social scientists. Scarborough: Duxbury Press. Quantitative Methods in Social Sciences elessons. (n.d.). The chi square. Retrieved on May 30, 2007 from Utts, J. (2005). Seeing through the statistics. (3rd ed.). Toronto: Thomson Brooks/Cole. Wilson, J. H. (2005). Essential Statistics. New Jersey: Pearson Prentice Hall. 7
