CHAPTER 15 NOMINAL MEASURES OF CORRELATION: PHI, THE CONTINGENCY COEFFICIENT, AND CRAMER'S V



Similar documents
CHAPTER 14 ORDINAL MEASURES OF CORRELATION: SPEARMAN'S RHO AND GAMMA

Bivariate Statistics Session 2: Measuring Associations Chi-Square Test

Association Between Variables

The Dummy s Guide to Data Analysis Using SPSS

Section 3 Part 1. Relationships between two numerical variables

CHAPTER 12 TESTING DIFFERENCES WITH ORDINAL DATA: MANN WHITNEY U

CONTINGENCY TABLES ARE NOT ALL THE SAME David C. Howell University of Vermont

Chapter 13. Chi-Square. Crosstabs and Nonparametric Tests. Specifically, we demonstrate procedures for running two separate

Hampshire). In the general election swing states, an overwhelming majority (87%) supports at least one proposal.

Correlation Coefficient The correlation coefficient is a summary statistic that describes the linear relationship between two numerical variables 2

Nonparametric Tests. Chi-Square Test for Independence

Elementary Statistics

Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1)

5) The table below describes the smoking habits of a group of asthma sufferers. two way table ( ( cell cell ) (cell cell) (cell cell) )

Analysing Questionnaires using Minitab (for SPSS queries contact -)

THE FIELD POLL. By Mark DiCamillo, Director, The Field Poll

Main Effects and Interactions

Testing Research and Statistical Hypotheses

VIRGINIA: TRUMP, CLINTON LEAD PRIMARIES

3. Analysis of Qualitative Data

In the past, the increase in the price of gasoline could be attributed to major national or global

How the Survey was Conducted Nature of the Sample: NBC 4 NY/WSJ/Marist Poll of 1,403 New York City Adults

5/31/ Normal Distributions. Normal Distributions. Chapter 6. Distribution. The Normal Distribution. Outline. Objectives.

Mind on Statistics. Chapter 15

IB Math Research Problem

This chapter discusses some of the basic concepts in inferential statistics.

UNDERSTANDING THE TWO-WAY ANOVA

Crosstabulation & Chi Square

Chi Square Distribution

8.1. Cramer s Rule for Solving Simultaneous Linear Equations. Introduction. Prerequisites. Learning Outcomes. Learning Style

Come scegliere un test statistico

Math 58. Rumbos Fall Solutions to Review Problems for Exam 2

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

We are often interested in the relationship between two variables. Do people with more years of full-time education earn higher salaries?

Chapter 6. Linear Programming: The Simplex Method. Introduction to the Big M Method. Section 4 Maximization and Minimization with Problem Constraints

Chi-square test Fisher s Exact test

NEW JERSEY VOTERS DIVIDED OVER SAME-SEX MARRIAGE. A Rutgers-Eagleton Poll on same-sex marriage, conducted in June 2006, found the state s

CHAPTER 11 CHI-SQUARE AND F DISTRIBUTIONS

ASSIGNMENT 4 PREDICTIVE MODELING AND GAINS CHARTS

TEXAS: CRUZ, CLINTON LEAD PRIMARIES

6 3 The Standard Normal Distribution

II. DISTRIBUTIONS distribution normal distribution. standard scores

Lean Six Sigma Analyze Phase Introduction. TECH QUALITY and PRODUCTIVITY in INDUSTRY and TECHNOLOGY

Overview of Non-Parametric Statistics PRESENTER: ELAINE EISENBEISZ OWNER AND PRINCIPAL, OMEGA STATISTICS

Working with SPSS. A Step-by-Step Guide For Prof PJ s ComS 171 students

Calculating, Interpreting, and Reporting Estimates of Effect Size (Magnitude of an Effect or the Strength of a Relationship)

Session 7 Bivariate Data and Analysis

SHORT ANSWER. Write the word or phrase that best completes each statement or answers the question.

Is it statistically significant? The chi-square test

MARYLAND: CLINTON LEADS SANDERS BY 25

2. How many ways can the letters in PHOENIX be rearranged? 7! = 5,040 ways.

behavior research center s

Fundamentals of Probability

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.

EMBARGOED FOR RELEASE: Wednesday, May 4 at 6:00 a.m.

Western New England University Polling Institute

Soci Data Analysis in Sociological Research. Homework 5 Computer Handout

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS

Montana Senate Poll. Conducted: April 27-28, 2013 Respondents: 771 Margin of Error: +/- 3.53% Results:

Results of SurveyUSA Election Poll # Page 1

Graphing Parabolas With Microsoft Excel

Recommend Continued CPS Monitoring. 63 (a) 17 (b) 10 (c) (d) 20 (e) 25 (f) 80. Totals/Marginal

An introduction to using Microsoft Excel for quantitative data analysis

Section 12 Part 2. Chi-square test

An introduction to IBM SPSS Statistics

Measurement in ediscovery

SIENA RESEARCH INSTITUTE SIENA COLLEGE, LOUDONVILLE, NY

CALCULATIONS & STATISTICS

SHORT ANSWER. Write the word or phrase that best completes each statement or answers the question.

NATIONAL: TRUMP WIDENS NATIONAL LEAD

Here are some examples of combining elements and the operations used:

2.6 Exponents and Order of Operations

Guido s Guide to PROC FREQ A Tutorial for Beginners Using the SAS System Joseph J. Guido, University of Rochester Medical Center, Rochester, NY

7.4. The Inverse of a Matrix. Introduction. Prerequisites. Learning Style. Learning Outcomes

Statistics 100 Sample Final Questions (Note: These are mostly multiple choice, for extra practice. Your Final Exam will NOT have any multiple choice!

Additional sources Compilation of sources:

Media Channel Effectiveness and Trust

Lesson 3: Calculating Conditional Probabilities and Evaluating Independence Using Two-Way Tables

Florida Poll Results Trump 47%, Clinton 42% (Others 3%, 8% undecided) Rubio re-elect: 38-39% (22% undecided)

Using Stata for Categorical Data Analysis

Release #2343 Release Date: Saturday, July 10, 2010

Odds ratio, Odds ratio test for independence, chi-squared statistic.

OHIO: KASICH, TRUMP IN GOP SQUEAKER; CLINTON LEADS IN DEM RACE

First-year Statistics for Psychology Students Through Worked Examples

NATIONAL: TRUMP WIDENS LEAD

STA-201-TE. 5. Measures of relationship: correlation (5%) Correlation coefficient; Pearson r; correlation and causation; proportion of common variance

Introduction to Analysis of Variance (ANOVA) Limitations of the t-test

How to Make APA Format Tables Using Microsoft Word

Standard Deviation Estimator

Chapter 3 RANDOM VARIATE GENERATION

Calculating the Probability of Returning a Loan with Binary Probability Models

Row vs. Column Percents. tab PRAYER DEGREE, row col

FLORIDA: TRUMP WIDENS LEAD OVER RUBIO

Transcription:

CHAPTER 15 NOMINAL MEASURES OF CORRELATION: PHI, THE CONTINGENCY COEFFICIENT, AND CRAMER'S V Chapters 13 and 14 introduced and explained the use of a set of statistical tools that researchers use to measure and evaluate the degree of association that exists between interval variables and ordinal variables. This chapter concludes the discussion of correlational statistics by providing three new measures which can be applied in research situations where an individual wishes to determine the degree of association that exists between nominal variables. Three statistics are introduced in this chapter: Phi, the Contingency Coefficient, and Cramer's V. All three statistics are simple and easy to calculate. Each begins with the calculation of the Chi-Square statistic using the methods outlined in Chapter 11 of this text. Given the nature of the nominal data used to calculate each of these statistics, the obtained values for each statistic will always fall along a range from a low of 0 to a high of 1. Negative correlations with each of these statistics are mathematically impossible. The choice of which statistic to employ in a given research situation is determined by the size of the data matrix and whether or not the two nominal variables under consideration have the same number of possible values. The Phi statistic is used when both of the nominal variables under consideration have exactly two possible values. When this is true, the data matrix will always have a simple 2x2 design. The Contingency Coefficient is used when there are 3 or more values for each nominal variable, as long as there are an equal number of possible values leading to the construction of a data matrix that has an equal number of rows and columns

(3x3, 4x4, etc). Cramer's V is used when the number of possible values for the two variables is unequal, yielding a different number of rows and columns in the data matrix (2x3, 3x5, etc). Taking an example from Chapter 11, a Chi-Square statistic is calculated as follows (using Yate's Correction because expected values for two of the cells were below 10). Figure 15:1 Chi-Square Statistic: Gender and Income Men Women Total High Income 15 (19.66) Low Income 14 (9.34) 25 (20.34) 5 (9.66) 40 19 Total 29 30 59 Row Column 1 1 15 19.66-4.66 4.16 17.31.88 1 2 25 20.34 4.66 4.16 17.31.85 2 1 14 9.34 4.66 4.16 17.31 1.85 2 2 5 9.66-4.66 4.16 17.31 1.79 The result of the calculations yielded a value of 5.37 for Chi-Square. A consultation of the table in Appendix H indicates that there is a significant difference between the groups (at.05) that suggests women are more likely to be found in the high income classification than men. Once this initial set of calculations is complete, Phi can be calculated using the following formula:

Using the obtained value of 5.37 for Chi-Square, and the value for n of 59 obtained from the total in the data matrix, Phi is calculated: The obtained value for Phi suggests the presence of a moderate correlation between the two variables. The next measure to be discussed in this chapter is the Contingency Coefficient. This statistic is calculated using the fomula:. In the way of an example, assume that a significant chi-square value of 9.68 was obtained from a comparison of two variables that each had three possible values. The data matrix would be 3x3 in this case, indicating that the Contingency Coefficient would be the most appropriate measure of association. Assuming an n of 60 for this research scenario, the calculation of the Contingency Coefficient proceeds as

follows: As in the first example, the calculated value for this statistic suggests the presence of a moderate correlation between the two variables. The final statistic commonly employed by those measuring association between nominal variables is Cramer's V. It is calculated using the formula:. To determine the value of k in the formula, look at the number of possible values of each variable (the number of rows and columns in the data matrix). The smaller of the two numbers is used to represent the variable k. Assuming once again that a researcher has conducted a Chi-Square test on a sample with an n of 60 and obtained a significant value of 9.68 for Chi-Square using a data set where variable X had 5 possible values and variable Y had 7 possible values (5x7 data matrix), calculation of Cramer's V proceeds as follows:

The obtained value of.2 in this case indicates the presence of a weak correlation between the two variables under consideration. In conclusion, remember that the appropriate measure of correlation when working with nominal data is based on the characteristics of the data and can be determined by the structure of the data matrix used to calculate the chi-square statistic. When the data matrix is 2x2, the Phi statistic is used. When the number of rows and columns in the data matrix is the same (3x3, 4x4, 22x22), the Contingency Coefficient is employed. Cramer's V is used when the number of rows and columns is unequal (2x3, 3x5, 5x7).

Exercises Chapter 15 1. Compute a chi-square statistic and the appropriate nominal correlation statistic using the following data. Draw statistical and research conclusions. Show all work Under $10,000 $10,000 - $20,000 $20,001- $55,000 Over $55,000 Total Whites 30 40 10 70 150 Blacks 50 60 10 20 140 Total 80 100 20 90 290 2. A pollster for a candidate wishes to determine whether there is a relationship between an individual's voting patterns and their television watching habits. A random sample of 350 voters was taken to address this issue. The results of the sampling yielded the data below. Calculate value for Chi-Square and determine if it is significant. Determine the appropriate nominal measure of correlation and apply it to this situation. Draw statistical and research conclusions. Television Viewing Time Democrats Republicans Independents Total Light 45 5 70 120 Moderate 50 10 68 128 Heavy 60 10 32 102 Total 155 25 170 350