: P(winter) = P(spring) = P(summer)= P(fall) H A. : Homicides are independent of seasons. : Homicides are evenly distributed across seasons

Similar documents
Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1)

Chapter 23. Two Categorical Variables: The Chi-Square Test

Association Between Variables

Odds ratio, Odds ratio test for independence, chi-squared statistic.

Chi-square test Fisher s Exact test

Recommend Continued CPS Monitoring. 63 (a) 17 (b) 10 (c) (d) 20 (e) 25 (f) 80. Totals/Marginal

Bivariate Statistics Session 2: Measuring Associations Chi-Square Test

LAB : THE CHI-SQUARE TEST. Probability, Random Chance, and Genetics

Is it statistically significant? The chi-square test

Comparing Multiple Proportions, Test of Independence and Goodness of Fit

Section 12 Part 2. Chi-square test

Recall this chart that showed how most of our course would be organized:

Crosstabulation & Chi Square

Statistical Impact of Slip Simulator Training at Los Alamos National Laboratory

Testing Research and Statistical Hypotheses

Topic 8. Chi Square Tests

This chapter discusses some of the basic concepts in inferential statistics.

AP: LAB 8: THE CHI-SQUARE TEST. Probability, Random Chance, and Genetics

MATH 140 Lab 4: Probability and the Standard Normal Distribution

Chi Squared and Fisher's Exact Tests. Observed vs Expected Distributions

UNDERSTANDING THE TWO-WAY ANOVA

The Chi-Square Test. STAT E-50 Introduction to Statistics

Having a coin come up heads or tails is a variable on a nominal scale. Heads is a different category from tails.

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

First-year Statistics for Psychology Students Through Worked Examples

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

DDBA 8438: Introduction to Hypothesis Testing Video Podcast Transcript

Test Positive True Positive False Positive. Test Negative False Negative True Negative. Figure 5-1: 2 x 2 Contingency Table

CONTINGENCY TABLES ARE NOT ALL THE SAME David C. Howell University of Vermont

Section 13, Part 1 ANOVA. Analysis Of Variance

Lesson 1: Comparison of Population Means Part c: Comparison of Two- Means

12.5: CHI-SQUARE GOODNESS OF FIT TESTS

CHAPTER 11 CHI-SQUARE AND F DISTRIBUTIONS

Two Correlated Proportions (McNemar Test)

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

TABLE OF CONTENTS. About Chi Squares What is a CHI SQUARE? Chi Squares Hypothesis Testing with Chi Squares... 2

13: Additional ANOVA Topics. Post hoc Comparisons

Variables Control Charts

Mgmt 469. Regression Basics. You have all had some training in statistics and regression analysis. Still, it is useful to review

Elementary Statistics

IB Math Research Problem

Kenken For Teachers. Tom Davis June 27, Abstract

Chapter 3 RANDOM VARIATE GENERATION

1.5 Oneway Analysis of Variance

Contingency Tables and the Chi Square Statistic. Interpreting Computer Printouts and Constructing Tables

CHAPTER IV FINDINGS AND CONCURRENT DISCUSSIONS

CHAPTER 12. Chi-Square Tests and Nonparametric Tests LEARNING OBJECTIVES. USING T.C. Resort Properties

Introduction to Fractions

CHAPTER 14 ORDINAL MEASURES OF CORRELATION: SPEARMAN'S RHO AND GAMMA

Calculating P-Values. Parkland College. Isela Guerra Parkland College. Recommended Citation

3.4 Statistical inference for 2 populations based on two samples

Descriptive Statistics

Study Guide for the Final Exam

Tests for Two Proportions

Data Mining Techniques Chapter 5: The Lure of Statistics: Data Mining Using Familiar Tools

Use of the Chi-Square Statistic. Marie Diener-West, PhD Johns Hopkins University

Statistical tests for SPSS

Nonparametric Tests. Chi-Square Test for Independence

1.6 The Order of Operations

Analysis of Variance ANOVA

CHAPTER 15 NOMINAL MEASURES OF CORRELATION: PHI, THE CONTINGENCY COEFFICIENT, AND CRAMER'S V

Solutions to Homework 10 Statistics 302 Professor Larget

Chapter 13. Chi-Square. Crosstabs and Nonparametric Tests. Specifically, we demonstrate procedures for running two separate

KSTAT MINI-MANUAL. Decision Sciences 434 Kellogg Graduate School of Management

t Tests in Excel The Excel Statistical Master By Mark Harmon Copyright 2011 Mark Harmon

CHAPTER 14 NONPARAMETRIC TESTS

Unit 26 Estimation with Confidence Intervals

5544 = = = Now we have to find a divisor of 693. We can try 3, and 693 = 3 231,and we keep dividing by 3 to get: 1

Simulating Chi-Square Test Using Excel

Ordinal Regression. Chapter

Introduction to Hypothesis Testing

Chapter 14: Repeated Measures Analysis of Variance (ANOVA)

Pre-Algebra Lecture 6

11. Analysis of Case-control Studies Logistic Regression

Tests for One Proportion

" Y. Notation and Equations for Regression Lecture 11/4. Notation:

So, using the new notation, P X,Y (0,1) =.08 This is the value which the joint probability function for X and Y takes when X=0 and Y=1.

Chi Square Distribution

SPSS Guide: Regression Analysis

Introduction to. Hypothesis Testing CHAPTER LEARNING OBJECTIVES. 1 Identify the four steps of hypothesis testing.

Chapter 8 Hypothesis Testing Chapter 8 Hypothesis Testing 8-1 Overview 8-2 Basics of Hypothesis Testing

One-Way Analysis of Variance (ANOVA) Example Problem

Non-Inferiority Tests for Two Proportions

Friedman's Two-way Analysis of Variance by Ranks -- Analysis of k-within-group Data with a Quantitative Response Variable

Chapter 19 The Chi-Square Test

Hypothesis testing - Steps

Independent samples t-test. Dr. Tom Pierce Radford University

Bayesian Updating with Discrete Priors Class 11, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom

Crystal Reports JD Edwards EnterpriseOne and World

STATISTICS PROJECT: Hypothesis Testing

6.4 Normal Distribution

SAMPLING & INFERENTIAL STATISTICS. Sampling is necessary to make inferences about a population.

Binary Diagnostic Tests Two Independent Samples

Simple Regression Theory II 2010 Samuel L. Baker

Part 3. Comparing Groups. Chapter 7 Comparing Paired Groups 189. Chapter 8 Comparing Two Independent Groups 217

Statistics 2014 Scoring Guidelines

One-Way Analysis of Variance

People like to clump things into categories. Virtually every research

6th Grade Lesson Plan: Probably Probability

Transcription:

Chi-Square worked examples: 1. Criminologists have long debated whether there is a relationship between weather conditions and the incidence of violent crime. The article Is There a Season for Homocide? classified 1361 homicides according to season, resulting in the following table (Criminology, 1988: 87-96): Winter 38 Spring 334 Summer 37 Fall 37 Test to see if there is a relationship between the seasons and violent crime. ---- Solution ---- We have two possibilities: (a) homicides do not change from season to season (homicides are independent of seasons) (b) homicides do change from season to season (homicides are not evenly distributed across seasons) Looking at the table, it s obvious that the homicides are not evenly distributed across seasons. Remember, however, that our data is only a sample of 1361 homicides. The differences among seasons that we ve observed might simply be due to sampling error. We can state our null and alternate hypotheses now. Our null hypothesis always says nothing happens. In this case, our null hypothesis would say that homicides are independent of seasons. Our alternate hypothesis would say that homicides change across seasons. We can state these hypotheses in a variety of ways: 1. In terms of probabilities. In terms of independence 3. In terms of distributions : P(winter) = P(spring) = P(summer)= P(fall) : Homicides are independent of seasons : Homicides are evenly distributed across seasons We can now run our chi-squared badness-of-fit test on this data. To do this, we need to know both the observed and expected number of homicides for each season. The observed numbers of homicides are obvious they are stated in the table. The expected number of homicides in each season are unknown we have no way of knowing what to expect each season. If we assume our null hypothesis is true, however, we can calculate our expectations. Under our null hypothesis, the number of homicides are evenly distributed across seasons. Since we observed 1361 homicides across 4 seasons, we expect to see: 1361 4 We can display these expectations in our table: = 340.5 homicides each season Winter Spring Summer Fall Observed = 38 Observed = 334 Observed = 37 Observed = 37

When we have a table of categorical data and we want to compare observed numbers of observations to expected numbers of observations, we can use the chi-squared goodness-of-fit test. Chi-Squared Goodness-of-Fit Test Statistic:! c-1 = ( observed expected) " All Cells expected We ll calculate this step-by-step in the table: Observed = 38 Winter Spring Summer Fall Observed = 334 Observed = 37 Observed = 37 ( 38! 340.5) 340.5 = 0.441 ( 334! 340.5) 340.5 = 0.115 ( 37! 340.5) 340.5 =.963 ( 37! 340.5) 340.5 = 0.516 Notes: If we forget to square the numerator, our test statistic will always equal zero. Dividing by the expected number of observations takes sample size into account We now calculate our test statistic as: 0.441+ 0.115 +.963+ 0.516 = 4.035. When we have a table with only one row (or only one column), the chi-squared test statistic has c!1 degrees of freedom, where c = the number of columns (or rows) in our table. So, for this table, we have! 4"1 =! 3 = 4.035. To see if this value is significant, we would compare it to a critical value from our chi-squared table. If our calculated test statistic is bigger than the critical value in the table, we would reject our null hypothesis. Using a computer, I calculate the p-value for our chi-squared test to be p = 0.577. This means we should retain our null hypothesis, so we conclude that there is no significant relationship between the number of homicides and the seasons.

. Each individual in a random sample of high school and college students was cross-classified with respect to both political views and marijuana usage, resulting in the following data ( Attitudes about Marijuana and Political Views, Psychological Reports, 1973: 1051-1054): Political Views Marijuana Usage Never Rarely Frequently Liberal 479 173 119 Conservative 14 47 15 Other 17 45 85 Does the data support the hypothesis that political views and marijuana usage level are independent within the population? ---- Solution ---- We have a 3x3 contingency table (categorical data), so we can run our chi-squared test for independence. Chi-Squared Test for Independence:! (r-1)(c-1) = ( observed expected) " All Cells expected Notes: Yep, this looks almost identical to the Chi-squared test for independence. Only the degrees of freedom change to (rows 1)(columns 1) We need to state our hypotheses, calculate our expected number of observations in each cell, calculate our test statistic, and state our conclusion. We always state our hypotheses in terms of the two variables in our study (the row and column variables). Under a null hypothesis, we assume the variables are independent. We can, therefore, state our hypotheses as: Hypotheses: : Marijuana use and political views are independent To get the expected number of observations, we must recall what we know about independence. Last semester, we learned that if two events (A and B) are independent, then: 1. Knowledge of the outcome of one event does not influence the probability of the other event. P(A B) = P(A) and P(B A) = P(B) 3. P(A I B) = P(A)P(B) The last result is the most important. When two events are independent, we can multiply their probabilities to calculate the probability that they both happen simultaneously. We can use this to obtain our expected number of observations.

Let s take another look at our data: Political Views Marijuana Usage Never Rarely Frequently TOTAL Liberal 479 173 119 771 Conservative 14 47 15 76 Other 17 45 85 30 TOTAL 865 65 19 1349 Suppose we re interested in calculating our expected number of observations in the top-left cell (Liberals who never used marijuana). We can first calculate the probability that someone randomly sampled from our data is a Liberal. Since we have a total of 771 Liberals out of our sample of 1349 individuals, the probability is calculated as: P(Liberal) = 771 1349 = 0.57 We can then calculate the probability that someone from our study has never used marijuana: P(Never) = 865 1349 = 0.641 Using what we know about independence: P(Liberal I Never)=P(Liberal)P(Never) = (.5715)(.641) = 0.3664 This is the expected probability for our first cell. Since we want to know the expected number of observations, we multiply this probability by the total number of observations in our data. Expected number of Liberals who Never used marijuana = (.3664)(1349) = 494.3 We can speed-up these calculations using the following: Calculating Expectations: Expected = (row total)(column total) total sample size = rc N Using this formula, we can quickly calculate the other expected numbers of observations. For example, to calculate the expected number of Conservatives who Frequently use marijuana: Expected = (76)(19) 1349 = 44.81 The observed and expected numbers of observations are reported in the table on the next page. I also use the chi-square calculation in each cell.

Never Rarely Frequently O = 479 O = 173 O = 119 Liberal E = 494.38 E = 151.46 E = 15.17! = 0.48! = 3.06! = 0.30 O = 14 O = 47 O = 15 Conservative E = 176.98 E = 54. E = 44.81! = 7.74! = 0.96! = 19.83 O = 17 O = 45 O = 85 Other E = 193.65 E = 59.33 E = 49.03! =.4! = 3.46! = 6.39 Our overall test statistic is:! (3"1)(3"1) =! 4 = 0.48 + 7.74 +.4 + 3.06 + 0.96 + 0.30 +19.83+ 6.39 = 64.64 We compare this to a chi-square distribution with 4 degrees of freedom. A computer calculates the p-value of this test statistic to be p <.0001, so we would reject the null hypothesis. We can, therefore, conclude that there is a relationship between political views and marijuana usage. In order to find the nature of that relationship, we can look at the chi-square values we calculated in each cell. The biggest chi-square values show us the biggest discrepancies between what we observed and what we expected. For this data, our biggest chi-square values were 6.39 and 19.83. So the relationship between political views and marijuana usage is due to those two cells of the table. Now we must look at the data in those cells to see what is going on. In the bottom-right cell, we see that we had many more observations that we expected. So we can conclude that individuals with Other political views are more likely to Frequently use marijuana. In the cell right above that one, we see that we had fewer observations that we expected. Therefore, we can conclude that Conservatives are less likely to frequently use marijuana. This is a clumsy way of finding our conclusions. We can use relative probabilities and odds rates to help our clarify our conclusions.

Relative probabilities: I m most interested in the individuals who frequently use marijuana, so I m going to focus on that column of data. Never Rarely Frequently Total Liberal 479 173 119 771 Conservative 14 47 15 76 Other 17 45 85 30 Total 865 65 19 1349 First, I will calculate the probability that a Liberal frequently uses marijuana: 119 771 = 0.154 Now I will calculate the probability that a Conservative frequently uses marijuana: 15 76 = 0.054 We calculate the relative probability by taking the ratio of these probabilities:.154.054 =.85. This relative probability has a simple interpretation. Liberals are.85 times more likely to frequently use marijuana than Conservatives. We could also do the same thing with odds. Recall that odds are calculated: Odds = P(A) 1! P(A) The odds that a Liberal frequently uses marijuana are:.154 1!.154 = 0.18 The odds that a conservative frequently uses marijuana are:.054 1!.054 = 0.057 Therefore, the odds ratio is calculated as:.18.057 = 3.193. The odds of a Liberal frequently using marijuana are more than 3 times higher than the odds of a Conservative frequently using marijuana.