Bivariate Statistics Session 2: Measuring Associations Chi-Square Test



Similar documents
Association Between Variables

CHAPTER 15 NOMINAL MEASURES OF CORRELATION: PHI, THE CONTINGENCY COEFFICIENT, AND CRAMER'S V

Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1)

Is it statistically significant? The chi-square test

The Chi-Square Test. STAT E-50 Introduction to Statistics

Nonparametric Tests. Chi-Square Test for Independence

Simple Linear Regression Inference

Having a coin come up heads or tails is a variable on a nominal scale. Heads is a different category from tails.

Recommend Continued CPS Monitoring. 63 (a) 17 (b) 10 (c) (d) 20 (e) 25 (f) 80. Totals/Marginal

Study Guide for the Final Exam

Additional sources Compilation of sources:

Test Positive True Positive False Positive. Test Negative False Negative True Negative. Figure 5-1: 2 x 2 Contingency Table

Chi-square test Fisher s Exact test

Contingency Tables and the Chi Square Statistic. Interpreting Computer Printouts and Constructing Tables

Using Stata for Categorical Data Analysis

Chapter 13. Chi-Square. Crosstabs and Nonparametric Tests. Specifically, we demonstrate procedures for running two separate

CONTINGENCY TABLES ARE NOT ALL THE SAME David C. Howell University of Vermont

Chi Square Distribution

Common Univariate and Bivariate Applications of the Chi-square Distribution

Crosstabulation & Chi Square

Chapter 23. Two Categorical Variables: The Chi-Square Test

EPS 625 INTERMEDIATE STATISTICS FRIEDMAN TEST

Part 3. Comparing Groups. Chapter 7 Comparing Paired Groups 189. Chapter 8 Comparing Two Independent Groups 217

Statistical tests for SPSS

Descriptive Statistics

CHAPTER 14 ORDINAL MEASURES OF CORRELATION: SPEARMAN'S RHO AND GAMMA

Elementary Statistics

TABLE OF CONTENTS. About Chi Squares What is a CHI SQUARE? Chi Squares Hypothesis Testing with Chi Squares... 2

Odds ratio, Odds ratio test for independence, chi-squared statistic.

The Dummy s Guide to Data Analysis Using SPSS

12.5: CHI-SQUARE GOODNESS OF FIT TESTS

Analysing Questionnaires using Minitab (for SPSS queries contact -)

UNDERSTANDING THE TWO-WAY ANOVA

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

Calculating P-Values. Parkland College. Isela Guerra Parkland College. Recommended Citation

Testing differences in proportions

Chapter 19 The Chi-Square Test

Types of Data, Descriptive Statistics, and Statistical Tests for Nominal Data. Patrick F. Smith, Pharm.D. University at Buffalo Buffalo, New York

This chapter discusses some of the basic concepts in inferential statistics.

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

An introduction to IBM SPSS Statistics

MULTIPLE REGRESSION WITH CATEGORICAL DATA

Simulating Chi-Square Test Using Excel

People like to clump things into categories. Virtually every research

II. DISTRIBUTIONS distribution normal distribution. standard scores

Comparing Multiple Proportions, Test of Independence and Goodness of Fit

Introduction. Hypothesis Testing. Hypothesis Testing. Significance Testing

Chi Square Tests. Chapter Introduction

statistics Chi-square tests and nonparametric Summary sheet from last time: Hypothesis testing Summary sheet from last time: Confidence intervals

DDBA 8438: The t Test for Independent Samples Video Podcast Transcript

Multinomial and Ordinal Logistic Regression

CHAPTER IV FINDINGS AND CONCURRENT DISCUSSIONS

Statistical Impact of Slip Simulator Training at Los Alamos National Laboratory

Nonparametric Statistics

Introduction to Quantitative Methods

Introduction to Hypothesis Testing

Two Correlated Proportions (McNemar Test)

In the past, the increase in the price of gasoline could be attributed to major national or global

3. Analysis of Qualitative Data

Row vs. Column Percents. tab PRAYER DEGREE, row col

C. The null hypothesis is not rejected when the alternative hypothesis is true. A. population parameters.

Chi Squared and Fisher's Exact Tests. Observed vs Expected Distributions

Section 12 Part 2. Chi-square test

Testing Research and Statistical Hypotheses

Ordinal Regression. Chapter

Categorical Data Analysis

HYPOTHESIS TESTING (ONE SAMPLE) - CHAPTER 7 1. used confidence intervals to answer questions such as...

VI. Introduction to Logistic Regression

Mind on Statistics. Chapter 4

CALCULATIONS & STATISTICS

Using Excel for inferential statistics

Non-Inferiority Tests for Two Proportions

Rank-Based Non-Parametric Tests

First-year Statistics for Psychology Students Through Worked Examples

Guido s Guide to PROC FREQ A Tutorial for Beginners Using the SAS System Joseph J. Guido, University of Rochester Medical Center, Rochester, NY

LAB : THE CHI-SQUARE TEST. Probability, Random Chance, and Genetics

Statistics 2014 Scoring Guidelines

Math 108 Exam 3 Solutions Spring 00

Chapter 3 RANDOM VARIATE GENERATION

Introduction to Analysis of Variance (ANOVA) Limitations of the t-test

3.4 Statistical inference for 2 populations based on two samples

Section 7.1. Introduction to Hypothesis Testing. Schrodinger s cat quantum mechanics thought experiment (1935)

Pearson s Correlation

Adverse Impact Ratio for Females (0/ 1) = 0 (5/ 17) = Adverse impact as defined by the 4/5ths rule was not found in the above data.

Descriptive Analysis

SCHOOL OF HEALTH AND HUMAN SCIENCES DON T FORGET TO RECODE YOUR MISSING VALUES

Point Biserial Correlation Tests

Goodness of Fit. Proportional Model. Probability Models & Frequency Data

Overview of Non-Parametric Statistics PRESENTER: ELAINE EISENBEISZ OWNER AND PRINCIPAL, OMEGA STATISTICS

Chapter 8 Hypothesis Testing Chapter 8 Hypothesis Testing 8-1 Overview 8-2 Basics of Hypothesis Testing

Mind on Statistics. Chapter 15

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA)

CHAPTER 12. Chi-Square Tests and Nonparametric Tests LEARNING OBJECTIVES. USING T.C. Resort Properties

One-Way Analysis of Variance (ANOVA) Example Problem

Solutions to Homework 10 Statistics 302 Professor Larget

HYPOTHESIS TESTING (ONE SAMPLE) - CHAPTER 7 1. used confidence intervals to answer questions such as...

Hypothesis testing - Steps

Chapter VIII Customers Perception Regarding Health Insurance

Research Methods & Experimental Design

Transcription:

Bivariate Statistics Session 2: Measuring Associations Chi-Square Test Features Of The Chi-Square Statistic The chi-square test is non-parametric. That is, it makes no assumptions about the distribution of variables. For this reason, it is typically used with data measured at the nominal or ordinal levels. Pearson s Chi-Square (χ2) is the most popular of the non-parametric statistics. The chi-square (χ2) test is used to assess the relationship between 2 nominal or ordinal variables. It is a very general statistical test that can be used whenever we wish to evaluate whether frequencies that have been empirically obtained differ significantly from those that would be expected on the basis of chance or theoretical expectations. In other words, when the researcher wishes to explore how the categories of the row variable are distributed according to the categories of the column variable. A statistically significant chi-square test indicates that the rows and columns of the contingency table are dependent, that is, that there are differences between the cell frequencies (cell: fields in the table) that are substantial enough not to be attributed to chance or randomness. A non-significant chi-square test implies that differences in cell frequencies may be random. The basic idea of the chi-square statistic is to compare the observed distribution of frequencies with the expected distribution of frequencies. The chi-square test shows whether the observed association between the variables is due to chance. This test relies on the basic assumption that there is no association between the variables in the contingency table (remember the null hypothesis: no association between 2 variables). Assumptions of the Chi-Square Test: Required Level of Measurement: - The chi-square statistic requires 2 nominal (or ordinal) variables. Postulates of the chi-square test: 1) Random sample 2) Mutually exclusive categories 3) Expected frequencies must all be > 1 4) No more than 20% of cases in the contingency table should have an expected frequency < 5. If these conditions are not satisfied, the chi-square test may be biased. Contingency Tables The basis of any chi-square test is always a table with frequency counts in the cells. Depending on the number of columns and rows, the table is usually referred to as an N (Number of Rows) x M (number of Columns) table. The simplest version is a 2x2 contingency table. 1

Contingency table: frequencies of 2 variables presented in one table. All categories of the first variable appear in rows, and all the categories of a second variable in columns. You also obtain a joint frequency for each cell, and totals (for both rows and columns). Cells: Fields in the table. Example 1: 2x2 Table In the fear of crime survey we found that women are more likely than men to say that they go to certain areas only if accompanied by others. In certain areas, I only go in the company with others Yes No Total Male 290 364 654 Female 795 200 995 Total 1085 564 1649 Example 2: 2x3 Table Suppose a survey asked about whether people are in favour of introducing the Euro. It found the following answers by political party reference. We might want to know: Does preference for the Euro vary by political preference? How strong is the relationship? Pro Euro introduction Political Preference Labour Tory Liberal Row total yes 120 40 30 190 no 60 60 20 140 Colum total 180 100 50 330 Calculating A Chi-Square Test How can we know that the differences above are systematic, and not due to chance? We compare our observed values to the values we would expect to see by chance alone if the null hypothesis were true. Observed frequencies: distribution of variables in the sample Expected frequencies: theoretical frequencies that would be obtained if there was no association between the variables (that is, if null hypothesis was to be accepted). Expected frequencies are computed as follows: Row total * Column total Expected cell frequency = Total number of observations 2

The general formula for computing the chi-square statistic is: ChiSquare = SumOf ( ObservedFrequency ExpectedFrequency ) ExpectedFrequency 2 Degrees of Freedom (df) Statisticians use the term degrees of freedom to describe the number of values in the final calculation of a statistic that are free to vary. Degrees of freedom is computed by multiplying the number of rows minus one, by the number of columns minus one. The formula is: df = (# of rows - 1 ) * (# of columns -1) Basic steps underlying the computation of a Chi-square statistic: Step 1: Observing some distribution of frequencies in the cells of a table (the observed frequencies) and computing the sum of each row and column. Step 2: Computing the frequencies one would expect in each cell by chance (the expected frequencies: row total * column total / total number of observations) Step 3: Comparing the observed to the expected frequencies (observed frequency expected frequency) Step 4: Computing the chi-square value (see formula above) A Step to Step guide computing a 2x2 Chi-Square Statistic. Starting point: Observed Distrib. Gender Males 290 364 654 Females 795 200 995 Sum 1085 564 1649 Step 1: Compute Expected Cell Frequencies Formula (for each cell): row total * column total / n Gender Males 430.32 223.68 654 Females 654.68 340.32 995 Sum 1085 564 1649 Computing expected cell frequencies: The expected frequency in cell 1,1 (yes, males) is (654*1085)/1649= 430.32 The expected frequency in cell 1,2 (yes, females) is (995 * 1085)/1649= 654.68 The expected frequency in cell 2,1 (no, males) is (654*564)/1649= 223.68 3

The expected frequency in cell 2,2 (no, females) is (995 * 564)/1649= 340.32 Step 2: Compute difference Observed-Expected Gender Males -140.32 140.32 Females 140.32-140.32 Males yes: 290 430.32= -140.32 Females yes: 795 654.68= 140.32 Males no: 364 223.68= 140.32 Females no: 200 340.32= -140.32 Step 3: Compute (Difference Squared)/Expected Gender Males 45.75 88.02 Females 30.07 57.85 Males yes: (-140.32) 2 / 430.32= 45.75 Females yes: (140.32) 2 / 654.68= 30.07 Males no: (140.32) 2 / 223.68= 88.02 Females no: (-140.32) 2 / 340.32= 57.85 Step 4: Sum of all Cell chi-squares 45.76 + 30.07 + 88.02 + 57.85 = 221.7 χ 2 = 221.7 Testing for Significance Verifying in a standard table (χ 2 distribution table) whether for a given value of χ 2 and a given number of degrees of freedom, the association between the variables is statistically significant, i.e. whether there are differences between observed and expected frequencies that are substantial enough not to have been caused by chance. The standard level of significance (alpha) used is.05. In the example above, with one Degree of Freedom ((Rows-1)*(Columns-1)), the critical value of χ 2 for α=.05 is 3.85. Since our χ 2 value (221.7) exceeds the critical value at α=.05, we reject the null hypothesis and conclude that there is a significant association between these two variables. However, if we look at other critical values at α=.01 and α=.001, we see that our χ 2 =221.7 exceeds these values (6.63 and 10.83, respectively), too. This indicates that the association between these two variables is highly significant and we report as p<.001. 4

Just like other statistical tests, the chi-square test is sensitive to sample size. The larger the sample, the more likely it is that you will reject the null hypothesis. In other words, the chi-square test is more likely to be statistically significant with larger sample sizes, even if the association between the two variables is weak. Measuring Strength Of Association The chi-square test is not a measure of the strength of the association between two variables. Other tests need to be carried out to test the strength of the association between nominal (or ordinal variables), such as phi, Cramer s V, contingency coefficient, and gamma. 1) Phi Coefficient (φ) - Phi is a coefficient based on the value of χ 2 - Measure of the strength of the relationship between two dichotomous variables (i.e., 2x2 table) - Phi ranges between 0 and 1. The higher the value of Phi, the stronger the association between the 2 variables. - Phi is a symmetrical measure, that is, it does not make the distinction between the IV and DV. In other words, it does not indicate which variable is the cause of the other. - Phi is computed as: Phi = χ 2 N 2) Cramer s V - V is a coefficient based on the value of χ 2 - Measure of the strength of the relationship between two nominal variables, regardless of the size of the contingency table (ex: 3x2, 4x3, 5x2, etc.) - Same basic idea as Phi, but is not limited to 2x2 tables - V ranges between 0 and 1. The higher the value of V, the stronger the association between the 2 variables. - Like Phi, V is a symmetrical measure; it does not make the distinction between the IV and DV. - In a 2x2 table, V and φ are identical - V is computed as: V= SQRT (χ 2 / n (k 1)) Where χ 2 = value of chi-square statistic, n= sample size, and k= minimum number of columns or rows in the table (ex: if table has 2 rows and 3 columns, then k= 2). 5

Chi-Square in R 1. Make contingency table e.g. with CrossTable() from library(gmodels) 2. Calculate chi-square and p-value e.g. included in CrossTable() or use chisq.test() 3. if significant, interpret strength with Phi, Cramer s V library(vcd) assocstats() Alternative Tests: Fisher s Exact test (E<5) Yate s Correction (2x2 table) Likelihood ratio IN SUMMARY, WHEN YOU WANT TO ASSESS THE RELATIONSHIP BETWEEN 2 NOMINAL (OR ORDINAL VARIABLES): 1) Compute the value of χ 2. 2) If the χ 2 is statistically significant, measure the strength of the association. If the χ 2 is not statistically significant, the variables are independent (i.e., no association between the variables, and it is irrelevant to measure the strength of the association). 3) Offer an interpretation of the results. 6