CATEGORICAL DATA Chi-Square Tests for Univariate Data

Similar documents
Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1)

Odds ratio, Odds ratio test for independence, chi-squared statistic.

Topic 8. Chi Square Tests

12.5: CHI-SQUARE GOODNESS OF FIT TESTS

LAB : THE CHI-SQUARE TEST. Probability, Random Chance, and Genetics

Chi-square test Fisher s Exact test

Is it statistically significant? The chi-square test

AP: LAB 8: THE CHI-SQUARE TEST. Probability, Random Chance, and Genetics

Section 12 Part 2. Chi-square test

Recommend Continued CPS Monitoring. 63 (a) 17 (b) 10 (c) (d) 20 (e) 25 (f) 80. Totals/Marginal

Having a coin come up heads or tails is a variable on a nominal scale. Heads is a different category from tails.

Chapter 23. Two Categorical Variables: The Chi-Square Test

The Chi-Square Test. STAT E-50 Introduction to Statistics

Calculating P-Values. Parkland College. Isela Guerra Parkland College. Recommended Citation

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

3.4 Statistical inference for 2 populations based on two samples

Comparing Multiple Proportions, Test of Independence and Goodness of Fit

Bivariate Statistics Session 2: Measuring Associations Chi-Square Test

Association Between Variables

2 GENETIC DATA ANALYSIS

CHAPTER IV FINDINGS AND CONCURRENT DISCUSSIONS

Lesson 1: Comparison of Population Means Part c: Comparison of Two- Means

THE FIRST SET OF EXAMPLES USE SUMMARY DATA... EXAMPLE 7.2, PAGE 227 DESCRIBES A PROBLEM AND A HYPOTHESIS TEST IS PERFORMED IN EXAMPLE 7.

Additional sources Compilation of sources:

" Y. Notation and Equations for Regression Lecture 11/4. Notation:

Introduction to Hypothesis Testing. Hypothesis Testing. Step 1: State the Hypotheses

Chapter 7 Notes - Inference for Single Samples. You know already for a large sample, you can invoke the CLT so:

Final Exam Practice Problem Answers

Section 7.1. Introduction to Hypothesis Testing. Schrodinger s cat quantum mechanics thought experiment (1935)

TABLE OF CONTENTS. About Chi Squares What is a CHI SQUARE? Chi Squares Hypothesis Testing with Chi Squares... 2

Crosstabulation & Chi Square

Statistical Impact of Slip Simulator Training at Los Alamos National Laboratory

Section 13, Part 1 ANOVA. Analysis Of Variance

Chapter 13. Chi-Square. Crosstabs and Nonparametric Tests. Specifically, we demonstrate procedures for running two separate

Chi Square Tests. Chapter Introduction

Elementary Statistics Sample Exam #3

Using Stata for Categorical Data Analysis

Mendelian Genetics in Drosophila

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

UNDERSTANDING THE TWO-WAY ANOVA

C. The null hypothesis is not rejected when the alternative hypothesis is true. A. population parameters.

Mind on Statistics. Chapter 15

Goodness of Fit. Proportional Model. Probability Models & Frequency Data

Simple Linear Regression Inference

Descriptive Analysis

CHAPTER 11 CHI-SQUARE AND F DISTRIBUTIONS

HYPOTHESIS TESTING (ONE SAMPLE) - CHAPTER 7 1. used confidence intervals to answer questions such as...

Chapter 3 RANDOM VARIATE GENERATION

Math 108 Exam 3 Solutions Spring 00

An Introduction to Statistics Course (ECOE 1302) Spring Semester 2011 Chapter 10- TWO-SAMPLE TESTS

Non-Parametric Tests (I)

statistics Chi-square tests and nonparametric Summary sheet from last time: Hypothesis testing Summary sheet from last time: Confidence intervals

Chapter 19 The Chi-Square Test

This can dilute the significance of a departure from the null hypothesis. We can focus the test on departures of a particular form.

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

Difference of Means and ANOVA Problems

Chapter 8 Hypothesis Testing Chapter 8 Hypothesis Testing 8-1 Overview 8-2 Basics of Hypothesis Testing

BA 275 Review Problems - Week 5 (10/23/06-10/27/06) CD Lessons: 48, 49, 50, 51, 52 Textbook: pp

Elementary Statistics

November 08, S8.6_3 Testing a Claim About a Standard Deviation or Variance

A Study to Predict No Show Probability for a Scheduled Appointment at Free Health Clinic

KSTAT MINI-MANUAL. Decision Sciences 434 Kellogg Graduate School of Management

Hypothesis Testing: Two Means, Paired Data, Two Proportions

5/31/ Normal Distributions. Normal Distributions. Chapter 6. Distribution. The Normal Distribution. Outline. Objectives.

Statistics. One-two sided test, Parametric and non-parametric test statistics: one group, two groups, and more than two groups samples

STATISTICS 8, FINAL EXAM. Last six digits of Student ID#: Circle your Discussion Section:

Analysis of Variance ANOVA

Nonparametric Tests. Chi-Square Test for Independence

Simulating Chi-Square Test Using Excel

Introduction. Hypothesis Testing. Hypothesis Testing. Significance Testing

Chi Squared and Fisher's Exact Tests. Observed vs Expected Distributions

First-year Statistics for Psychology Students Through Worked Examples

Statistics I for QBIC. Contents and Objectives. Chapters 1 7. Revised: August 2013

Nominal and ordinal logistic regression

The Genetics of Drosophila melanogaster

1.5 Oneway Analysis of Variance

Part 3. Comparing Groups. Chapter 7 Comparing Paired Groups 189. Chapter 8 Comparing Two Independent Groups 217

Name: (b) Find the minimum sample size you should use in order for your estimate to be within 0.03 of p when the confidence level is 95%.

Introduction. Statistics Toolbox

Chapter 8: Hypothesis Testing for One Population Mean, Variance, and Proportion

Statistical tests for SPSS

Contingency Tables and the Chi Square Statistic. Interpreting Computer Printouts and Constructing Tables

Stat 411/511 THE RANDOMIZATION TEST. Charlotte Wickham. stat511.cwick.co.nz. Oct

Simple Regression Theory II 2010 Samuel L. Baker

Solutions to Homework 10 Statistics 302 Professor Larget

Elements of statistics (MATH0487-1)

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Comparing Two Groups. Standard Error of ȳ 1 ȳ 2. Setting. Two Independent Samples

The ANOVA for 2x2 Independent Groups Factorial Design

Statistics 2014 Scoring Guidelines

Variables Control Charts

Testing Research and Statistical Hypotheses

EMPIRICAL FREQUENCY DISTRIBUTION

1 Nonparametric Statistics

Math 251, Review Questions for Test 3 Rough Answers

Nonparametric Statistics

Lecture Notes Module 1

Chi Square Distribution

Transcription:

CATEGORICAL DATA Chi-Square Tests For Univariate Data 1 CATEGORICAL DATA Chi-Square Tests for Univariate Data Recall that a categorical variable is one in which the possible values are categories or groupings. We ve seen one such variable: it s the binary variable with only two possible outcomes: success or failure. In this topic we explore testing hypotheses about categorical variables with MORE than two outcomes. EXAMPLE Consider an experiment in which two different tomato phenotypes are crossed and the resulting offspring observed. The parent types are tall cut-leaf tomatoes and dwarf -leaf tomatoes. Variable: Offspring Phenotype Possible Values: 1) tall cut-leaf, ) tall -leaf, 3) dwarf cut-leaf, and 4) dwarf -leaf. If Mendel s laws of inheritance hold, the resulting population proportions in the offspring would be 1) 9 / 16, ) 3 / 16, 3) 3 / 16, and 4) 1 / 16. One might hypothesize that Mendel s Laws don t hold for these genes. In an experiment to test that, the researcher observed the proportions 1).575, ).179, 3).18, and 4).65 based on a sample of 1611 offspring.

CATEGORICAL DATA Chi-Square Tests For Univariate Data EXAMPLE Consider an observational study in which the types of insects that feed on the nectar from a certain flower are studied. The scientist randomly selects hours during the day over several days during the summer season and selects several different plants. She counts the number of different kinds of insects that feed at the plant during the study. Variable: Insect Family Possible Values: 1) bees, ) wasps, or 3) flies One might hypothesize that this flower attracts the different insect families in unequal proportions. Important Point: Testing procedures for hypotheses of this form are called Goodness-of-Fit tests. These tests compare the sample proportions to the hypothesized proportions to see how good the fit is. Important Point: These categories must be mutually exclusive and exhaustive. Notation: k = number of possible categories that the variable of interest can have.

CATEGORICAL DATA Chi-Square Tests For Univariate Data 3 Category True Population 1 π 1 1 Sample ˆπ π ˆπ π k π πˆ k k Hypothesized Population π 1 π k Exhaustive means that π i = 1, ˆ π i = 1, and π =1. i EXAMPLE tomatoes and Mendel s Laws. k = 4 Category True Population Sample Hypothesized Population tall cutleaf π 1 ˆ1 π =. 575 1 9 16 tall π ˆ π =. 179 3 16 dwarf π 3 ˆ3 π =. 18 π 3 = 3 16 cut dwarf π 4 ˆ π 4 =. 65 4 1 16

CATEGORICAL DATA Chi-Square Tests For Univariate Data 4 Now, for a sample of size n and a set of hypothesized proportions under the null hypothesis, I can calculate how many sample units should be in each category (if there was no sampling variability, of course). These numbers are called the EXPECTED CELL COUNTS under the null hypothesis and are calculated as n hypothesized value ( π ) for that category (cell). i The OBSERVED CELL COUNTS are the actual counts seen in each category during the experiment. Category Expected Count π Observed Count n ˆπ 1 n 1 1 n π n ˆπ k nπˆ nπ k k Important Point: This test procedure is valid only if the sample sizes and hypothesized proportions are such that virtually every cell has an expected count of 5 or more. If they aren t you must use a different test procedure.

CATEGORICAL DATA Chi-Square Tests For Univariate Data 5 EXAMPLE Tomatoes & Mendel s Laws. n = 1611 Category Expected Count Observed Count tall cutleaf n π1 = 1611(9 /16) = 96. n ˆ1 π = 96 tall n π = 1611(3 /16) = 3.1 n ˆ π = 88 dwarf n π 3 = 1611(3/16) = 3.1 n ˆ π = 93 cut dwarf n π 4 = 1611(1/16) = 1.7 nπ ˆ k = 14 Hypotheses: H o : π 1 = 9 / 16, π = 3/ 16, π 3 = 3 /16, and π 4 = 1/ 16 H A : not H o (H o is not true) Important Point: Note how uninformative the alternative hypothesis is in a goodness-of-fit test. These tests compare the sample data against a specific set of hypothesized proportions. If the null hypothesis is rejected, one cannot tell what the true proportions are, only that they are not the ones listed in the null hypothesis. Significance Level: let s choose α=.4.

CATEGORICAL DATA Chi-Square Tests For Univariate Data 6 Test Statistic: is a summary of the comparison of the observed and expected cell counts. The actual form is Χ = all cells (observed count - expected count expected count) This is called the CHI-SQUARE or GOODNESS- OF-FIT STATISTIC. Important Point: the closer the expected and observed counts are to each other, the smaller the value of. Χ Small values of values support HA. Χ support the null hypothesis and large EXAMPLE tomatoes and Mendel s Laws. Category Expected Observed ( n ˆ π nπ Count Count nπ tall cutleaf n π 1 = n ˆ1 π = 96.433 96. tall 3.1 n π = n ˆ π = 88.658 dwarf n π 3 = 3.1 n ˆ π = 93.74 cut dwarf 1.7 n π 4 = nπ ˆ k = 14.18 So, Χ =.433 +.658 +.74 +.18 = 1. 473 )

CATEGORICAL DATA Chi-Square Tests For Univariate Data 7 P-value: under the null hypothesis, the test statistic a sampling distribution known as the CHI-SQUARE DISTRIBUTION. Χ has Like the T-distribution, the shape of the Chi-Square Distribution depends on the degrees of freedom. Here, df = k 1. Important Point: the degrees of freedom for the Chi- Square Goodness of Fit test are the number of categories (k) minus 1 NOT the sample size minus 1. The p-value is the area under the Chi-square distribution to the right of the test statistic value: To find the P-value, first calculate Χ and the df. Then go to Table 8 (page 686 of the text).

CATEGORICAL DATA Chi-Square Tests For Univariate Data 8 Find the row labeled with the df you have for your test. Go across the values in the row, until you find two values that Χ bracket your value. Read the P-value from the tops of the columns containing the two bracketing values. EXAMPLE tomatoes, df=4-1=3 and Χ =1.473. So, on page 686, go to the row labeled df=3 and find the closest value to 1.47. It s bracketed by the values.5844 to the left and 6.51 to the right. The column headers for these two values are.9 (left) and.1 (right). This says that the P-value falls between.1 and.9. Conclusion: since the P-value >.1 >> α =.4, do not reject H o. There is insufficient evidence to suggest that something other than Mendel s law of inheritance is working for the two tomato phenotypes that were crossed.

CATEGORICAL DATA Chi-Square Tests For Univariate Data 9 GOODNESS-OF-FIT TEST PROCEDURE FOR UNIVARIATE CATEGORICAL DATA Null Hypothesis: H o : π 1 = π 1, π = π,, π k = π k where π is the hypothesized population proportion of the i π i i th category and =1 Alternative Hypothesis: H A : H o is not true Test Statistic: (observed count - expected count) Χ = expected count all cells where the expected count in the i th category is nπ i. P-value: area to the right of the observed Χ value under the Chi-Square distribution with k 1 degrees of freedom. Use table 8 to get an approximate value for the P-value. Assumptions: 1) the sample was random. ) the sample size is sufficiently large and the hypothesized cell proportions are such that the expected cell counts are all 5 or more.

CATEGORICAL DATA Chi-Square Tests For Univariate Data 1 EXAMPLE It is hypothesized that when homing pigeons are disoriented in a particular manner, they exhibit no preference in direction of flight after takeoff. To test this, 1 pigeons were disoriented, then let loose and their direction observed (as wedges representing an eighth of 36 ). The results are given below. Use a significance level of.1 to test the hypothesis that direction of takeoff is equally likely. Hypotheses: H o : π1= π = π3 =... = π8 = 1 8 H A : not H o Direction Observed Frequency Expected Frequency ( n ˆ π nπ nπ -45 18 1(.15)=15.6 46-9 15 1.667 91-135 15 1.667 136-18 15 15 181-5 13 15.67 6-7 8 15 3.67 71-315 7 15 4.67 316-36 9 15.4 Test Statistic: (observed count - expected count) Χ = expected count all cells =(.6+1.667+ +.4)=14.135 )

CATEGORICAL DATA Chi-Square Tests For Univariate Data 11 P-value: df = k-1 = 8-1 = 7. From the table,.5 < p-value <.5. Conclusion: since the P-value < α, reject H o and conclude that there exists evidence to indicate that the pigeons show directional preferences when disoriented before being allowed to fly. NOTE: This is an example of a test in which the null hypothesis is the claim, which goes against what we have learned this semester. Most goodness of fit tests are that way, that is the null hypothesis is the distribution of interest. It is assumed to be true unless the data indicate otherwise. But we cannot show that the null hypothesis is true only that there is no evidence to not believe it.