CHAPTER 15 NOMINAL MEASURES OF CORRELATION: PHI, THE CONTINGENCY COEFFICIENT, AND CRAMER'S V

CHAPTER 15 NOMINAL MEASURES OF CORRELATION: PHI, THE CONTINGENCY COEFFICIENT, AND CRAMER'S V Chapters 13 and 14 introduced and explained the use of a set of statistical tools that researchers use to measure and evaluate the degree of association that exists between interval variables and ordinal variables. This chapter concludes the discussion of correlational statistics by providing three new measures which can be applied in research situations where an individual wishes to determine the degree of association that exists between nominal variables. Three statistics are introduced in this chapter: Phi, the Contingency Coefficient, and Cramer's V. All three statistics are simple and easy to calculate. Each begins with the calculation of the Chi-Square statistic using the methods outlined in Chapter 11 of this text. Given the nature of the nominal data used to calculate each of these statistics, the obtained values for each statistic will always fall along a range from a low of 0 to a high of 1. Negative correlations with each of these statistics are mathematically impossible. The choice of which statistic to employ in a given research situation is determined by the size of the data matrix and whether or not the two nominal variables under consideration have the same number of possible values. The Phi statistic is used when both of the nominal variables under consideration have exactly two possible values. When this is true, the data matrix will always have a simple 2x2 design. The Contingency Coefficient is used when there are 3 or more values for each nominal variable, as long as there are an equal number of possible values leading to the construction of a data matrix that has an equal number of rows and columns

(3x3, 4x4, etc). Cramer's V is used when the number of possible values for the two variables is unequal, yielding a different number of rows and columns in the data matrix (2x3, 3x5, etc). Taking an example from Chapter 11, a Chi-Square statistic is calculated as follows (using Yate's Correction because expected values for two of the cells were below 10). Figure 15:1 Chi-Square Statistic: Gender and Income Men Women Total High Income 15 (19.66) Low Income 14 (9.34) 25 (20.34) 5 (9.66) 40 19 Total 29 30 59 Row Column 1 1 15 19.66-4.66 4.16 17.31.88 1 2 25 20.34 4.66 4.16 17.31.85 2 1 14 9.34 4.66 4.16 17.31 1.85 2 2 5 9.66-4.66 4.16 17.31 1.79 The result of the calculations yielded a value of 5.37 for Chi-Square. A consultation of the table in Appendix H indicates that there is a significant difference between the groups (at.05) that suggests women are more likely to be found in the high income classification than men. Once this initial set of calculations is complete, Phi can be calculated using the following formula:

Using the obtained value of 5.37 for Chi-Square, and the value for n of 59 obtained from the total in the data matrix, Phi is calculated: The obtained value for Phi suggests the presence of a moderate correlation between the two variables. The next measure to be discussed in this chapter is the Contingency Coefficient. This statistic is calculated using the fomula:. In the way of an example, assume that a significant chi-square value of 9.68 was obtained from a comparison of two variables that each had three possible values. The data matrix would be 3x3 in this case, indicating that the Contingency Coefficient would be the most appropriate measure of association. Assuming an n of 60 for this research scenario, the calculation of the Contingency Coefficient proceeds as

follows: As in the first example, the calculated value for this statistic suggests the presence of a moderate correlation between the two variables. The final statistic commonly employed by those measuring association between nominal variables is Cramer's V. It is calculated using the formula:. To determine the value of k in the formula, look at the number of possible values of each variable (the number of rows and columns in the data matrix). The smaller of the two numbers is used to represent the variable k. Assuming once again that a researcher has conducted a Chi-Square test on a sample with an n of 60 and obtained a significant value of 9.68 for Chi-Square using a data set where variable X had 5 possible values and variable Y had 7 possible values (5x7 data matrix), calculation of Cramer's V proceeds as follows:

The obtained value of.2 in this case indicates the presence of a weak correlation between the two variables under consideration. In conclusion, remember that the appropriate measure of correlation when working with nominal data is based on the characteristics of the data and can be determined by the structure of the data matrix used to calculate the chi-square statistic. When the data matrix is 2x2, the Phi statistic is used. When the number of rows and columns in the data matrix is the same (3x3, 4x4, 22x22), the Contingency Coefficient is employed. Cramer's V is used when the number of rows and columns is unequal (2x3, 3x5, 5x7).

Exercises Chapter 15 1. Compute a chi-square statistic and the appropriate nominal correlation statistic using the following data. Draw statistical and research conclusions. Show all work Under $10,000 $10,000 - $20,000 $20,001- $55,000 Over $55,000 Total Whites 30 40 10 70 150 Blacks 50 60 10 20 140 Total 80 100 20 90 290 2. A pollster for a candidate wishes to determine whether there is a relationship between an individual's voting patterns and their television watching habits. A random sample of 350 voters was taken to address this issue. The results of the sampling yielded the data below. Calculate value for Chi-Square and determine if it is significant. Determine the appropriate nominal measure of correlation and apply it to this situation. Draw statistical and research conclusions. Television Viewing Time Democrats Republicans Independents Total Light 45 5 70 120 Moderate 50 10 68 128 Heavy 60 10 32 102 Total 155 25 170 350