Chapter 12: Chi-Square Procedures

Similar documents
Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1)

Chapter 23. Two Categorical Variables: The Chi-Square Test

MBA 611 STATISTICS AND QUANTITATIVE METHODS

Odds ratio, Odds ratio test for independence, chi-squared statistic.

THE FIRST SET OF EXAMPLES USE SUMMARY DATA... EXAMPLE 7.2, PAGE 227 DESCRIBES A PROBLEM AND A HYPOTHESIS TEST IS PERFORMED IN EXAMPLE 7.

Recommend Continued CPS Monitoring. 63 (a) 17 (b) 10 (c) (d) 20 (e) 25 (f) 80. Totals/Marginal

Test Positive True Positive False Positive. Test Negative False Negative True Negative. Figure 5-1: 2 x 2 Contingency Table

Testing Research and Statistical Hypotheses

Comparing Multiple Proportions, Test of Independence and Goodness of Fit

Association Between Variables

The Chi-Square Test. STAT E-50 Introduction to Statistics

Chi-square test Fisher s Exact test

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

Lesson 1: Comparison of Population Means Part c: Comparison of Two- Means

This chapter discusses some of the basic concepts in inferential statistics.

Crosstabulation & Chi Square

Calculating P-Values. Parkland College. Isela Guerra Parkland College. Recommended Citation

First-year Statistics for Psychology Students Through Worked Examples

Probability Distributions

CHAPTER 11 CHI-SQUARE AND F DISTRIBUTIONS

Goodness of Fit. Proportional Model. Probability Models & Frequency Data

Bivariate Statistics Session 2: Measuring Associations Chi-Square Test

MATH 140 Lab 4: Probability and the Standard Normal Distribution

Statistics 2014 Scoring Guidelines

12: Analysis of Variance. Introduction

5.1 Identifying the Target Parameter

Chapter 13. Chi-Square. Crosstabs and Nonparametric Tests. Specifically, we demonstrate procedures for running two separate

Charts, Tables, and Graphs

SAS Software to Fit the Generalized Linear Model

Hypothesis Testing: Two Means, Paired Data, Two Proportions

CHAPTER 15 NOMINAL MEASURES OF CORRELATION: PHI, THE CONTINGENCY COEFFICIENT, AND CRAMER'S V

Part 3. Comparing Groups. Chapter 7 Comparing Paired Groups 189. Chapter 8 Comparing Two Independent Groups 217

Is it statistically significant? The chi-square test

8 6 X 2 Test for a Variance or Standard Deviation

Data Analysis Tools. Tools for Summarizing Data

6.4 Normal Distribution

REPEATED TRIALS. The probability of winning those k chosen times and losing the other times is then p k q n k.

Math 58. Rumbos Fall Solutions to Review Problems for Exam 2

Chapter 19 The Chi-Square Test

Having a coin come up heads or tails is a variable on a nominal scale. Heads is a different category from tails.

Characteristics of Binomial Distributions

Section 12 Part 2. Chi-square test

Once saved, if the file was zipped you will need to unzip it. For the files that I will be posting you need to change the preferences.

Experimental Design. Power and Sample Size Determination. Proportions. Proportions. Confidence Interval for p. The Binomial Test

Lesson 3: Calculating Conditional Probabilities and Evaluating Independence Using Two-Way Tables

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Chi Square Tests. Chapter Introduction

Chi Square Distribution

Introduction to Analysis of Variance (ANOVA) Limitations of the t-test

Statistical tests for SPSS

A Study to Predict No Show Probability for a Scheduled Appointment at Free Health Clinic

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

CHI-SQUARE: TESTING FOR GOODNESS OF FIT

Graphs and charts - quiz

SAMPLING & INFERENTIAL STATISTICS. Sampling is necessary to make inferences about a population.

Simple Linear Regression Inference

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

4. Continuous Random Variables, the Pareto and Normal Distributions

Fairfield Public Schools

Chapter 3 RANDOM VARIATE GENERATION

3.4 Statistical inference for 2 populations based on two samples

Fixture List 2018 FIFA World Cup Preliminary Competition

TEXAS CRIME ANALYSIS 2

11. Analysis of Case-control Studies Logistic Regression

Study Guide for the Final Exam

STA-201-TE. 5. Measures of relationship: correlation (5%) Correlation coefficient; Pearson r; correlation and causation; proportion of common variance

ABSORBENCY OF PAPER TOWELS

Contingency Tables and the Chi Square Statistic. Interpreting Computer Printouts and Constructing Tables

Testing differences in proportions

Descriptive Analysis

12.5: CHI-SQUARE GOODNESS OF FIT TESTS

Chapter 4. Probability and Probability Distributions

How To Check For Differences In The One Way Anova

Unit 9 Describing Relationships in Scatter Plots and Line Graphs

Examining Differences (Comparing Groups) using SPSS Inferential statistics (Part I) Dwayne Devonish

Solutions to Homework 10 Statistics 302 Professor Larget

The Big Picture. Describing Data: Categorical and Quantitative Variables Population. Descriptive Statistics. Community Coalitions (n = 175)

Density Curve. A density curve is the graph of a continuous probability distribution. It must satisfy the following properties:

Cruise Line Agencies of Alaska. Cruise Ship Calendar for 2016 FOR PORT(S) = KTN AND SHIP(S) = ALL AND VOYAGES = ALL

Hypothesis testing - Steps

One-Way Analysis of Variance (ANOVA) Example Problem

AP: LAB 8: THE CHI-SQUARE TEST. Probability, Random Chance, and Genetics

EMPLOYMENT APPLICATION {PLEASE Print Clearly}

Projects Involving Statistics (& SPSS)

Topic 8. Chi Square Tests

Math 108 Exam 3 Solutions Spring 00

Course Syllabus MATH 110 Introduction to Statistics 3 credits

Simulating Chi-Square Test Using Excel

SCHOOL OF HEALTH AND HUMAN SCIENCES DON T FORGET TO RECODE YOUR MISSING VALUES

Cruise Line Agencies of Alaska. Cruise Ship Calendar for 2016 FOR PORT(S) = KTN AND SHIP(S) = ALL AND VOYAGES = ALL

Lecture Notes Module 1

Chi Squared and Fisher's Exact Tests. Observed vs Expected Distributions

6 3 The Standard Normal Distribution

Mind on Statistics. Chapter 15

Elementary Statistics

HYPOTHESIS TESTING (ONE SAMPLE) - CHAPTER 7 1. used confidence intervals to answer questions such as...

CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression

Unit 13 Handling data. Year 4. Five daily lessons. Autumn term. Unit Objectives. Link Objectives

The Standard Normal distribution

HYPOTHESIS TESTING (ONE SAMPLE) - CHAPTER 7 1. used confidence intervals to answer questions such as...

Transcription:

January 6, 2010

Chapter Outline 12.1 The Chi-Square Distribution 12.2 Chi-Square Goodness-of-Fit Test 12.3 Contingency Tables: Association 12.4 Chi-Square Test of Independence

General Objective: In Chapter 11 in Section 11.2 we learned to test hypothesis regarding a single population proportion. For example, we looked at the problem of Playing Hooky From Work. In that example each person either plays Hooky or does not plays Hooky. For each person there are only two possible responses. These are known as binary response. In many other situations outcomes can be classified in more than two groups. These are known as multinomial response. In Section 12.2, we will extend the hypothesis testing methods of Section 11.2 to accommodate multinomial responses. This is known as Test of Goodness-of-Fit. In previous chapters, we performed statistical inference for only one variable. In Section 12.3 we will learn the technique of assessing the Association between two variables. Finally in Section 12.4 we will learn to test for the independence of two variables. For example, suppose a violent crime has been committed. It can

12.1 The Chi-Square Distribution So far we have used standard normal distribution, N(0,1) and the t distribution for confidence interval and hypothesis testing. In this chapter we will need to use another probability distribution which is known as Chi-square distribution. The Chi-square random variable is always positive. Like t distribution, Chi-square distribution also depends on degrees of freedom. Selected critical values for various degreees of freedom are listed in Table V. Figure 12.1 below shows how the curve changes when the degrees of freedom changes.

As the degrees of freedom increases the curve looks more and more symmetric.

The grapgh shows the critical value for α = 0.025 Table V lists the critical values for selected values of α StatCrunch can be used to get these Follow: Stat > Calculator > Chisquare.

Change the entries in the graph on the left. Enter 9 in DF box Select => in the left box on the second row Enter the value of t o in the middle box. Say 1.56. Press Compute. You should see the graph on the right. The area of the shaded region will appear in the last box.

12.2 Chi-Square Gooness-of-Fit Test Controlling Road Rage: Road rage is defined as.. an incident in which an angry motorist tries to harm another motorist. Suppose we want to find out whether any particular day of the week is more succeptible to road rage compared to other days of the week. This can be formulated as hypothesis test as follows: H 0 : It is equally likely to happen on any day of the week. H a : It is NOT equally likely to happen on any day of the week.

Suppose we define p m = chance that it will happen on Monday p t = chance that it will happen on Tuesday p w = chance that it will happen on Wednesday p th = chance that it will happen on Thursday p f = chance that it will happen on Friday p sa = chance that it will happen on Saturday p su = chance that it will happen on Sunday We can state the null hypothesis as: H 0 : p m = p t = p w = p th = p f = p sa = p su = 1/7. Since p m + p t + p w + p th + p f + p sa + p su = 1, and all are equal, each one must be = 1/7. This is similar to testing H 0 : p = p 0 in one proportion case as we have seen in Chapter 11.

Road Rage Sample data: A random sample of 69 cases were taken from all police records. The reults are summarized below. Day Observed Count Null Hypothesis Expected Count M 5 p m = 1/7 9.86713 Tu 11 p t = 1/7 9.86713 W 12 p w = 1/7 9.86713 Th 11 p th = 1/7 9.86713 F 18 p f = 1/7 9.86713 Sa 7 p sa = 1/7 9.86713 Su 5 p su = 1/7 9.86713 Total 69 1 69 If the null hypothesis is true then we should expect equal number of cases on each day which = 69/7 = 9.86713.

To verify the null hypothesis we need to compare the observed counts with the expected counts. Observed Expected Square of Chi-Square Day Freq. Freq. Diff. Diff. sub-total O E O - E (O E) 2 (O E) 2 /E M 5 9.85713-4.85713 23.5917 2.39337 T 11 9.85713 1.14287 1.3062 0.13251 W 12 9.85713 2.14287 4.5919 0.46584 Th 11 9.85713 1.14287 1.3062 0.13251 F 18 9.85713 8.14287 66.3063 6.72673 Sa 7 9.85713-2.85713 8.1632 0.82815 Su 5 9.85713-4.85713 23.5917 2.39337 69 69 0 13.072 If the Null Hypothesis is true then in we should expect O E in each category.

To measure the discrepency between observed and expected counts the difference, (O E), is calculated. Since the differences could be positive or negative, they are squared. Finally the squraed difference (O E) 2 is standardized by diving by E. Sum of these gives a overall measure of discrepency between observed and expected counts. This denoted by Chi-Square = χ 2 = (O E) 2 /E = 13.072 If the hypothesis is true the we expect O E. Hence each term in the last column should be small. Hence the total should be small. If the hypothesis is NOT true then at least one (O E) 2 /E will be large. Hence the total will be a large positive quantity. This suggest the rejection rule: Reject H 0 : if χ 2 = (O E) 2 /E χ 2 α, where χ 2 α is obtained from the Chi-Square distribution.

The df in this case is (7-1) =6 and χ 2 05 = 12.592. Since the observed value 13.072 > 12.592, the null hyppothesis is rejected. Conclusion: There is enough evidence to conclude that some days are more succesptible to road rage than others. Use StatCrunch to compute the P value: Open StatCrunch Follow: Stat > Calculator > Chi-Square Enter DF = 6, Select and Enter 13.072 in the middle box. Press Calculate In this case P value = P(χ 2 13.072) = 0.041906722 Sonce P value < 0.05 reject H 0. Also see the instruction in Section 12.1 for calculating P values.

Example 12.2: Violent Crimes: Table 12.1 below shows the relative frequency of four different crimes in the year 2000. Table 12.2 below shows the frequency distribution of 500 randomly selected violent crime cases in 2008. We want to test whether the year 2000 proportions are still valid in the year 2008. Table 12.1 (Yr 2000) Table 12.2 (Yr 2008) Type of Relative Type of Violent Crime Frequency Violent Crime Frequency Murder 0.011 Murder 3 Forcible Rape 0.063 Forcible Rape 37 Robbery 0.286 Robbery 154 Agg Assault 0.640 Agg. Assault 306 1 500

Set up the Null Hypothesis: Define p M = Probability of committing a Murder p F = Probability of committing a Forcible Rape p R = Probability of committing a Robbery p A = Probability of committing an Agg. Assault Expected Counts H 0 : p m = 0.011 E 1 = 500 0.011 = 5.5 p F = 0.063 E 2 = 500 0.063 = 31.5 p R = 0.286 E 3 = 500 0.286 = 143.0 p A = 0.640 E 4 = 500 0.640 = 320.0 The first column specifies the null hypothesis. The second shows the expected counts in each category out of 500 cases if the null hypothesis is true. The expected counts will be compared to observed counts to test H 0.

Type of Observed Expected Square of Chi-Square Crime Freq. Freq. Diff. Diff. sub-total O E = np O - E (O E) 2 (O E) 2 /E Murder 3 5.5-2.5 6.25 1.136 Rape 37 31.5 5.5 30.25 0.960 Robbery 154 143.0 11.0 121.00 0.846 Assault 306 320.0-14 196.0 0.613 500 500 0 3.555 If the Null Hypothesis is true then in we should expect O E in each category. To measure the discrepency between observed and expected counts the difference, (O E), is calculated. Since the differences could be positive or negative, they are squared. Finally the squraed difference (O E) 2 is standardized by diving by E.

Sum of these gives a overall measure of discrepency between observed and expected counts. This denoted by Chi-Square = χ 2 = (O E) 2 /E = 3.555 If the hypothesis is true the we expect O E. Hence each term in the last column should be small. Hence the total should be small. If the hypothesis is NOT true then at least one (O E) 2 /E will be large. Hence the total will be a large positive quantity. This suggest the rejection rule: Reject H 0 : if χ 2 = (O E) 2 /E χ 2 α, where χ 2 α is obtained from the Chi-Square distribution. The degrees of freedom = (k-1), k= number of groups. P value = P(χ 2 3.555) = 0.314) (using StatCrunch). Since P value > 0.05, do not reject H 0. Conclusion: The crime rates remained same in 2008.

Note: The χ 2 procedures are approximate procedure. For the approximation to be valid all expected counts should be at least 1. At most 20% expected frequencies could be less than 5 Two rules above are adhoc rules. For Goodness-of-Fit test the expected frequencies are calculated as E = np for each category. The degrees of freedom = (k-1), k= number of groups.

Steps in Computing χ 2 Observed Specified Expected Square of Chi-Square Freq. Prop Freq. Diff. Diff. sub-total O p E = np O - E (O E) 2 (O E) 2 /E n 1 n 0 χ 2 = State the null hypothesis with specified proportions. Complete the entries in the table above. Decide the degrees of freedom = (k-1)=(# of groups) -1 Compute χ 2 α or P value Make a decision.

12.3 Association in Contingency Table:

The previous page shows the information collected from a randomly selected sample of 40 students. For each student two pieces of information were collected, political party affiliation and class level. Goal is to find out if the these two characteristics are associated or dependent. The information is summarized in a two-way table, also called contingency table.

The two-way frequency table shows the joint distribution of the two characteristics. To assess the association we can compare the any two columns to see if the frequency distribution are similar. However, due to unequal column total direct comparision of frequencies in the columns is NOT appropriate. To remedy this, we make total frquency in column to 1. This will put the column comparisions on equal footing.

If class level is NOT associated with political party then proportions in column Freshman should be similar to proportion in Sophomore and so on. If these proportions differ substantially from column to column then they are associated. These information can also be presented in a graphical form. Segmented bar graph is used here. One bar for each column. In this case bar graphs for Junior and Senior look different from those of Freshman and Sophomore.

Conclusion: Political Party and Class level are associated.

Output 12.2 Using StatCrunch Open StatCrunch and load the data from Table 12.7. Follow: Stat > Tables > Contingency > with data Select a column for row variable and for column variable and press next. Select appropriate items from the list. For this problem Select column percentage.

12.3 Chi-Square Independent Test Graphical method presented in last section does NOT provide a quantitative method for evaluating the association between two characteristics. In this section we describe a way to calculate χ 2 statistic that will be used evaluate the dependence between any two characteristics. To test whether Drinking habit is related to Marital status, 1772 individulas were interviewed. The information is summarized in next table.

Example: 12.9 Marital Status and Drinking. The following two-way frequency table was based on a sample of 1772 individuals. The counts in each cell is the observed frequency count. We need to calculate the expected count in each cell when the two characteristics are independent. The mathematical formulation is beyond the scope of this class. We will simply present the formula to calculate the expected frequency in each cell. E = (R C)/n, where, R = Row total, C = column total, n = Grand total. Next table shows both observed and expected counts.

The observed count in the first cell is 67. To calculate the corresponding expected count: Row total corresponding to 67 is R = 354 Column total corresponding to 67 is C = 590 Grand total n = 1772 Hence E = (R C)/n = (354 590)/1772 = 117.9 This formula is applied to all cells to get the expected counts.

To compare the observed counts with the expected counts we follow the same method as in Goodness-of-Fit Test. This means calculate (O E) 2 /E for each cell in the table. Then add these quantities. χ 2 = [ (O E) 2 E ] The degrees of freedom in this case: df = (number of rows -1 )(number of columns -1 )=(r-1)(c-1) Reject H 0 if χ 2 χ 2 α. The critical value is obtained from the table with df = (r-1)(c-1).

Example 12.10 Chi-square calculations are shown at the bottom of the table. For first cell: O = 67, E = 117.87. Hence (O E) 2 /E = (67 117.87) 2 /117.87 = 21.952. Similarly for other cells.

χ 2 value is 94.269. df = (r-1)(c-1) = (4-1)(3-1) = 6. Hence P value = P(χ 2 94.269) = 0.000. The null hypothesis is: H 0 : Drinking habit and marital status are independent. Reject null hypothesis since P value < 0.05.

Output 12.4 Using StatCrunch Open StatCrunch and load the data from Table 12.13. Follow: Stat > Tables > Contingency > with summary Select: columns for data in the table Select: column for Row labels and press Next Select: expected counts from the list. Press calculate Note: StatCrunch will not give individual (O E) 2 /E. It will give the calculated value of χ 2 = [ (O E) 2 /E ]