Pearson s 2x2 Chi-Square Test of Independence -- Analysis of the Relationship between two Qualitative

Similar documents
Friedman's Two-way Analysis of Variance by Ranks -- Analysis of k-within-group Data with a Quantitative Response Variable

Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1)

Chapter 23. Two Categorical Variables: The Chi-Square Test

Recommend Continued CPS Monitoring. 63 (a) 17 (b) 10 (c) (d) 20 (e) 25 (f) 80. Totals/Marginal

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Having a coin come up heads or tails is a variable on a nominal scale. Heads is a different category from tails.

Testing Research and Statistical Hypotheses

UNDERSTANDING THE TWO-WAY ANOVA

Chapter 5 Analysis of variance SPSS Analysis of variance

Independent samples t-test. Dr. Tom Pierce Radford University

LAB : THE CHI-SQUARE TEST. Probability, Random Chance, and Genetics

CHAPTER IV FINDINGS AND CONCURRENT DISCUSSIONS

Comparing Multiple Proportions, Test of Independence and Goodness of Fit

1.5 Oneway Analysis of Variance

IBM SPSS Statistics 20 Part 4: Chi-Square and ANOVA

SPSS Explore procedure

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

Additional sources Compilation of sources:

AP: LAB 8: THE CHI-SQUARE TEST. Probability, Random Chance, and Genetics

Introduction to Analysis of Variance (ANOVA) Limitations of the t-test

Bivariate Statistics Session 2: Measuring Associations Chi-Square Test

COMPARISONS OF CUSTOMER LOYALTY: PUBLIC & PRIVATE INSURANCE COMPANIES.

Nonparametric Tests. Chi-Square Test for Independence

TABLE OF CONTENTS. About Chi Squares What is a CHI SQUARE? Chi Squares Hypothesis Testing with Chi Squares... 2

Research Methods & Experimental Design

Fairfield Public Schools

Is it statistically significant? The chi-square test

Odds ratio, Odds ratio test for independence, chi-squared statistic.

Chi-square test Fisher s Exact test

Crosstabulation & Chi Square

The ANOVA for 2x2 Independent Groups Factorial Design

Data Mining Techniques Chapter 5: The Lure of Statistics: Data Mining Using Familiar Tools

Introduction to Quantitative Methods

Projects Involving Statistics (& SPSS)

First-year Statistics for Psychology Students Through Worked Examples

SCHOOL OF HEALTH AND HUMAN SCIENCES DON T FORGET TO RECODE YOUR MISSING VALUES

Statistical tests for SPSS

Association Between Variables

Elementary Statistics Sample Exam #3

Test Positive True Positive False Positive. Test Negative False Negative True Negative. Figure 5-1: 2 x 2 Contingency Table

This chapter will demonstrate how to perform multiple linear regression with IBM SPSS

Using Excel for inferential statistics

STA-201-TE. 5. Measures of relationship: correlation (5%) Correlation coefficient; Pearson r; correlation and causation; proportion of common variance

Study Guide for the Final Exam

Variables Control Charts

Using Microsoft Excel to Analyze Data

Topic 8. Chi Square Tests

Elementary Statistics

II. DISTRIBUTIONS distribution normal distribution. standard scores

Two Correlated Proportions (McNemar Test)

Final Exam Practice Problem Answers

One-Way Analysis of Variance (ANOVA) Example Problem

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

CONTINGENCY TABLES ARE NOT ALL THE SAME David C. Howell University of Vermont

Analysing Questionnaires using Minitab (for SPSS queries contact -)

Statistics 2014 Scoring Guidelines

Using Microsoft Excel to Analyze Data from the Disk Diffusion Assay

Come scegliere un test statistico

INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA)

Section 12 Part 2. Chi-square test

Analysis of Variance ANOVA

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

Descriptive Statistics

Row vs. Column Percents. tab PRAYER DEGREE, row col

CHAPTER 14 ORDINAL MEASURES OF CORRELATION: SPEARMAN'S RHO AND GAMMA

Using Stata for Categorical Data Analysis

CHAPTER 11 CHI-SQUARE AND F DISTRIBUTIONS

Descriptive Analysis

DEVELOPING HYPOTHESIS AND

This can dilute the significance of a departure from the null hypothesis. We can focus the test on departures of a particular form.

Weight of Evidence Module

Chi Square Distribution

Course Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics

UNDERSTANDING ANALYSIS OF COVARIANCE (ANCOVA)

EXCEL Analysis TookPak [Statistical Analysis] 1. First of all, check to make sure that the Analysis ToolPak is installed. Here is how you do it:

Chapter 19 The Chi-Square Test

Hypothesis testing - Steps

Section 13, Part 1 ANOVA. Analysis Of Variance

This chapter discusses some of the basic concepts in inferential statistics.

Chapter 7. Comparing Means in SPSS (t-tests) Compare Means analyses. Specifically, we demonstrate procedures for running Dependent-Sample (or

Part 3. Comparing Groups. Chapter 7 Comparing Paired Groups 189. Chapter 8 Comparing Two Independent Groups 217

11. Analysis of Case-control Studies Logistic Regression

QUANTITATIVE METHODS BIOLOGY FINAL HONOUR SCHOOL NON-PARAMETRIC TESTS

Statistical Impact of Slip Simulator Training at Los Alamos National Laboratory

One-Way ANOVA using SPSS SPSS ANOVA procedures found in the Compare Means analyses. Specifically, we demonstrate

The Chi-Square Test. STAT E-50 Introduction to Statistics

Normality Testing in Excel

Data Analysis Tools. Tools for Summarizing Data

Contingency Tables and the Chi Square Statistic. Interpreting Computer Printouts and Constructing Tables

t Tests in Excel The Excel Statistical Master By Mark Harmon Copyright 2011 Mark Harmon

Describing Populations Statistically: The Mean, Variance, and Standard Deviation

Rank-Based Non-Parametric Tests

Lesson 1: Comparison of Population Means Part c: Comparison of Two- Means

STATISTICA Formula Guide: Logistic Regression. Table of Contents

MONT 107N Understanding Randomness Solutions For Final Examination May 11, 2010

The Kruskal-Wallis test:

Chapter 23 Inferences About Means

ABSORBENCY OF PAPER TOWELS

Simple Linear Regression, Scatterplots, and Bivariate Correlation

Transcription:

Pearson s 2x2 Chi-Square Test of Independence -- Analysis of the Relationship between two Qualitative or (Binary) Variables Analysis of 2-Between-Group Data with a Qualitative (Binary) Response Variable Application: This statistic has two applications that can appear very different, but are really just two variations of the same question. The first application is to compare the distributions of scores on a qualitative response variable) obtained from 2 or more groups that represent different populations. Thus, it is applied in the same data situation as an ANOVA for independent samples, except that it is used when the response variable is a qualitative variable. The other application is to test for a pattern of relationship between two qualitative variables -- in this respect, it is something like a correlation (though looking for a pattern of relationship between qualitative variables, rather than a linear relationship between quantitative variables). There are two versions of the H0:, depending upon whether one characterizes the analysis as a test of whether the populations represented by one of the variables differ in their patterns of response to the response variable (the other variable -- which corresponds to the first application described above) or as a test of whether there is a relationship between the two variables in the single population represented by the sample as a whole (which corresponds to the second application described above). The H0: for each is given below. H0: The populations represented by the two conditions have the same pattern of responses across the categories of the response variable. To reject H0: is to say that the two populations differ in their response patterns to the categories of the response variable. H0: The variables have no pattern of relationship between the variables within the population represented by the sample. To reject H0: is to say that there is a pattern of relationship between the variables in the population. The data: The analysis involves the variables reptdept (1 = not separate reptile department, 2 = separate reptile department) and fishdept (1 = only available, 2 = fresh and saltwater fish available). 1,1 1,1 1,2 1,1 2,1 2,2 2,2 2,2 2,2 1,1 1,1 2,2 Researcher Hypothesis: The researcher hypothesized that stores without a separate reptile department would be more likely to display only, whereas those stores with a separate reptile department would be more likely to display both freshwater and saltwater fish, H0: for this analysis: There is no pattern of relationship between whether or not pet stores have separate reptile departments and whether they display only or both freshwater and saltwater fish. Organize the scores into a contingency table. Since both of these categorical variables have two categories, the contingency table will be a 2x2, for a total of 4 cells, as shown below. Each store's data will be collated into one of the four cells. For example, a store that did not have a separate reptile department and that displayed only would be tallied into the cell in the upper left; a store that had a separate reptile department and displayed both fresh- and saltwater fish would be tallied into the cell in the lower right. Below is the contingency table filled with the responses from the 12 stores. These values are the obtained frequency (of) for each of the cells. Reptdebt freshwater fresh- & saltwater Not separate 5 1 Separate 1 5

Compute the row totals (sum across the values on the same row) and column totals (sum across the values on the same column) and the overall total of the observed frequencies. As a computational check, be sure that the row totals and the column totals sum to the same value for the overall total. Reptdebt freshwater fresh- & saltwater row totals Not separate 5 1 6 Separate 1 5 6 column totals 6 6 overall total 12 Compute the expected frequency (ef) of each cell. This expected frequency is computed as the product of the row total and the column total for that cell, divided by the overall total. For example, the upper left-hand cell (stores not having a separate reptile department that display only ) has an expected frequency of: row total * col total 6 * 6 ef = ---------------------- = --------- = 3.00 overall total 12 Reptdebt freshwater fresh- & saltwater row totals of ef of ef Not separate 5 3 1 3 6 Separate 1 3 5 3 6 column totals 6 6 overall total 12 Compute Chi-Square (of - ef)² (5-3)² (1-3)² (5-3)² (1-3)² X² = ------------ = --------- + --------- + -------- + -------- = 5.33 ef 3 3 3 3 Compute the degrees of freedom df df = (number of columns - 1) * (number of rows - 1) = (2-1) * (2-1) = 1 * 1 = 1 Look up the X²-critical for α =.05 and the appropriate degrees of freedom using the X²-table. X²(df=1, p=.05) = 3.84

Determine whether to retain or reject H0: -- if the obtained X² is less than the critical X², then retain the null hypothesis -- conclude that there is no relationship between values on one categorical variable and values on the other categorical variable, in the population represented by the sample -- if the obtained X² is greater than the critical X², then reject the null hypothesis -- conclude that there is a relationship between the values on one categorical variable and values on the other categorical variable, in the population represented in the sample. For the example data, we would decide to reject the null hypothesis, because the obtained Chi-square value of 29.99 is larger than the critical Chi-square value of 5.991. By the way: This test should only be applied when at least 80% of the cells have expected frequencies (ef) of five or larger. Applying the test when there are fewer cells with this minimum expected frequency can lead to inaccurate results. Determine whether or not the results support the research hypohtesis. Usually the researcher hypothesizes that there is a pattern of relationship between the variables. If so, then to support the research hypothesis will require: Reject H0: that there is no pattern of relationship between the variables The pattern of the relationship must be the same as that specified in the research hypothesis If you reject the null hypothesis, and if the pattern of data in the contingency table agrees exactly with the research hypothesis, then the research hypothesis is completely supported. If you reject the null hypothesis, and if part of the pattern of data in the contingency table agrees with the research hypothesis, but part of the pattern of data does not, then the research hypothesis partially supported. If you retain the null hypothesis, or you reject the null but no part of the pattern of data in the contingency table agrees with the research hypothesis, then the research hypothesis is not at all supported. Sometimes, however, the research hypothesis is that there is no pattern of relationship between the variables. If so, the research hypothesis and H0: are the same! When this is the case, retaining H0: provides support for the research hypothesis, whereas rejecting H0: provides evidence that research hypothesis is incorrect. Please note: When you have decided to retain H0: (because X² < X²-critical), then don't talk about there being a pattern of relationship between the variables -- retaining H0: is saying that there is no pattern of relationship between the variables in the population (and any apparent pattern of relationship is probably due to sampling variation, chance, etc.) Consistent with the research hypothesis, those stores without separate reptile departments tended to display only whereas those stores that had separate reptile departments tended to display both freshwater and saltwater fish. Describe the results of the correlation analysis -- be sure to include the following Name each variable and tell the univariate statistics (the frequencies in the conditions of each qualitative variable) The X²-value, df (in parentheses) and p-value (p <.05 or p >.05). If you reject H0:, then describe the pattern of the relationship between the variables If your retain the H0:, then say that there is no significant pattern of relationship between the variables Whether or not the results support the research hypothesis Please note: Reporting correlation results is not a form of "creative writing". The idea is to be succinct, clear, and follow the prescribed format -- it is really a lot like completing a fill-in-the-blanks sentence. After you write and read enough of these you'll develop some "style", but for now just follow the format. Please note: Describing the results of a X² analysis -- the pattern of a relationship between two qualitative variables -- requires more precision and "more words" than describing the results of a correlation or a mean comparison. Take your time and give a complete description!

X² write-ups almost always include a table showing the cell and marginal counts. Also, notice how the write-up tells the univariates (the frequencies of the conditions for each qualitative variable) and then the significance test. Table 1 shows the 2x2 table of these variables. The sample of stores was evenly divided between the two types of reptile departments and also evenly divided between the two types of fish departments. As hypothesized, those stores with separate reptile departments tended to have both fresh- and saltwater fish, whereas, those stores without separate reptile departments tended to have only, X²(1) = 5.33, p =.021. Table 1 Relationship between Type of Reptile Department and Type of Fish Available Type of Reptile Department Not separate Separate total Type of Fish Available Department Department Freshwater Fish Only 5 1 6 Fresh- and Saltwater Fish 1 5 6 total 6 6 12 "X² Write-up Examples for Other Occasions" Here's are examples (not all using the same data as the computational example above) of what we would write for different combinations of RH:, contingency table results & significance test results. For.. Rejected H0: Found a pattern of relationship between the variables opposite that of the RH: The sample of stores was evenly divided between the two types of reptile departments and also evenly divided between the two types of fish departments. Contrary to the research hypothesis, those stores with separate reptile departments tended to have only, whereas, those stores without separate reptile departments tended to have both fresh- and saltwater fish, X²(1) = 5.33, p =.021. For Rejected H0: Found a pattern of relationship between the variables partially supporting the RH: The sample of stores was evenly divided between the two types of reptile departments and 71% displayed both fresh- and saltwater fish. Consistent with the research hypothesis, those stores with separate reptile departments tended to have both fresh- and saltwater fish, however, contrary to the research hypothesis those stores without separate reptile departments tended to be evenly divided between having only and having both freshand salt-water fish, X²(1) = 5.04, p =.025. not sep 1 5 6 separate 5 1 6 6 6 not sep 6 6 12 separate 1 11 12 7 17

For.. Retained H0: The sample of stores was evenly divided between the two types of reptile departments and also about evenly divided between the two types of fish departments. Contrary to the research hypothesis, there was no significant pattern of relationship between these two variables, X²(1) =.343, p =.558. not sep 3 3 6 separate 4 2 6 7 5