Chi-Square Tests and the F-Distribution. Goodness of Fit Multinomial Experiments. Chapter 10

Similar documents
The Chi-Square Test. STAT E-50 Introduction to Statistics

Chapter 23. Two Categorical Variables: The Chi-Square Test

Elementary Statistics

Is it statistically significant? The chi-square test

Test Positive True Positive False Positive. Test Negative False Negative True Negative. Figure 5-1: 2 x 2 Contingency Table

Chapter 8 Hypothesis Testing Chapter 8 Hypothesis Testing 8-1 Overview 8-2 Basics of Hypothesis Testing

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

Difference of Means and ANOVA Problems

Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1)

SHORT ANSWER. Write the word or phrase that best completes each statement or answers the question.

Crosstabulation & Chi Square

THE FIRST SET OF EXAMPLES USE SUMMARY DATA... EXAMPLE 7.2, PAGE 227 DESCRIBES A PROBLEM AND A HYPOTHESIS TEST IS PERFORMED IN EXAMPLE 7.

Section 7.1. Introduction to Hypothesis Testing. Schrodinger s cat quantum mechanics thought experiment (1935)

Chapter 19 The Chi-Square Test

Simulating Chi-Square Test Using Excel

Mind on Statistics. Chapter 15

Having a coin come up heads or tails is a variable on a nominal scale. Heads is a different category from tails.

Section 12 Part 2. Chi-square test

Bivariate Statistics Session 2: Measuring Associations Chi-Square Test

Recommend Continued CPS Monitoring. 63 (a) 17 (b) 10 (c) (d) 20 (e) 25 (f) 80. Totals/Marginal

Solutions to Homework 10 Statistics 302 Professor Larget

In the past, the increase in the price of gasoline could be attributed to major national or global

Lesson 3: Calculating Conditional Probabilities and Evaluating Independence Using Two-Way Tables

Comparing Multiple Proportions, Test of Independence and Goodness of Fit

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Chi-square test Fisher s Exact test

STAT 145 (Notes) Al Nosedal Department of Mathematics and Statistics University of New Mexico. Fall 2013

Association Between Variables

A POPULATION MEAN, CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

CHAPTER 11 CHI-SQUARE AND F DISTRIBUTIONS

HYPOTHESIS TESTING (ONE SAMPLE) - CHAPTER 7 1. used confidence intervals to answer questions such as...

Chapter 8: Hypothesis Testing for One Population Mean, Variance, and Proportion

8 6 X 2 Test for a Variance or Standard Deviation

HYPOTHESIS TESTING (ONE SAMPLE) - CHAPTER 7 1. used confidence intervals to answer questions such as...

UNDERSTANDING THE TWO-WAY ANOVA

Guide to Microsoft Excel for calculations, statistics, and plotting data

Statistical Impact of Slip Simulator Training at Los Alamos National Laboratory

Nonparametric Tests. Chi-Square Test for Independence

A and B This represents the probability that both events A and B occur. This can be calculated using the multiplication rules of probability.

Using Microsoft Excel to Analyze Data from the Disk Diffusion Assay

Using Microsoft Excel to Analyze Data

Multivariate Analysis of Variance (MANOVA)

Statistical tests for SPSS

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

Section 13, Part 1 ANOVA. Analysis Of Variance

Calculating P-Values. Parkland College. Isela Guerra Parkland College. Recommended Citation

An Introduction to Statistics Course (ECOE 1302) Spring Semester 2011 Chapter 10- TWO-SAMPLE TESTS

Chapter 4. Probability and Probability Distributions

Chi Square Tests. Chapter Introduction

9 Testing the Difference

statistics Chi-square tests and nonparametric Summary sheet from last time: Hypothesis testing Summary sheet from last time: Confidence intervals

Study Guide for the Final Exam

3.4 Statistical inference for 2 populations based on two samples

Independent samples t-test. Dr. Tom Pierce Radford University

Course Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics

One-Way Analysis of Variance (ANOVA) Example Problem

Regression Analysis: A Complete Example

Chapter 5 Analysis of variance SPSS Analysis of variance

Statistical Functions in Excel

ELEMENTARY STATISTICS

Poisson Models for Count Data

Statistics 2014 Scoring Guidelines

Common Univariate and Bivariate Applications of the Chi-square Distribution

Additional sources Compilation of sources:

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

AP: LAB 8: THE CHI-SQUARE TEST. Probability, Random Chance, and Genetics

Independent t- Test (Comparing Two Means)

CONTINGENCY TABLES ARE NOT ALL THE SAME David C. Howell University of Vermont

Bowerman, O'Connell, Aitken Schermer, & Adcock, Business Statistics in Practice, Canadian edition

CHAPTER IV FINDINGS AND CONCURRENT DISCUSSIONS

IBM SPSS Statistics 20 Part 4: Chi-Square and ANOVA

5/31/2013. Chapter 8 Hypothesis Testing. Hypothesis Testing. Hypothesis Testing. Outline. Objectives. Objectives

Elementary Statistics Sample Exam #3

SAMPLING & INFERENTIAL STATISTICS. Sampling is necessary to make inferences about a population.

Goodness of Fit. Proportional Model. Probability Models & Frequency Data

Opgaven Onderzoeksmethoden, Onderdeel Statistiek

5 Cumulative Frequency Distributions, Area under the Curve & Probability Basics 5.1 Relative Frequencies, Cumulative Frequencies, and Ogives

Appendix 2 Statistical Hypothesis Testing 1

How To Check For Differences In The One Way Anova

t Tests in Excel The Excel Statistical Master By Mark Harmon Copyright 2011 Mark Harmon

One-Way Analysis of Variance: A Guide to Testing Differences Between Multiple Groups

Recall this chart that showed how most of our course would be organized:

Linear Models in STATA and ANOVA

Hypothesis Testing --- One Mean

5/31/ Normal Distributions. Normal Distributions. Chapter 6. Distribution. The Normal Distribution. Outline. Objectives.

Hypothesis Testing: Two Means, Paired Data, Two Proportions

Statistics I for QBIC. Contents and Objectives. Chapters 1 7. Revised: August 2013

1.5 Oneway Analysis of Variance

HYPOTHESIS TESTING: POWER OF THE TEST

Chapter 3 RANDOM VARIATE GENERATION

IBM SPSS Statistics for Beginners for Windows

The F distribution and the basic principle behind ANOVAs. Situating ANOVAs in the world of statistical tests

Introduction to. Hypothesis Testing CHAPTER LEARNING OBJECTIVES. 1 Identify the four steps of hypothesis testing.

Good luck! BUSINESS STATISTICS FINAL EXAM INSTRUCTIONS. Name:

DRIVER ATTRIBUTES AND REAR-END CRASH INVOLVEMENT PROPENSITY

Data Analysis Tools. Tools for Summarizing Data

Bill Burton Albert Einstein College of Medicine April 28, 2014 EERS: Managing the Tension Between Rigor and Resources 1

Introduction to Hypothesis Testing

Introduction to Analysis of Variance (ANOVA) Limitations of the t-test

Transcription:

Chapter 0 Chi-Square Tests and the F-Distribution 0 Goodness of Fit Multinomial xperiments A multinomial experiment is a probability experiment consisting of a fixed number of trials in which there are more than two possible outcomes for each independent trial (Unlike the binomial experiment in which there were only two possible outcomes) A researcher claims that the distribution of favorite pizza toppings among teenagers is as shown below ach outcome is classified into categories Topping Cheese Pepperoni Sausage Mushrooms Onions Frequency, f 4% 5% 5% 0% 9% The probability for each possible outcome is fixed Larson & Farber, lementary Statistics: Picturing the World, 3e 3

Chi-Square Goodness-of-Fit Test A Chi-Square Goodness-of-Fit Test is used to test whether a frequency distribution fits an expected distribution To calculate the test statistic for the chi-square goodness-of-fit test, the observed frequencies and the expected frequencies are used The observed frequency O of a category is the frequency for the category observed in the sample data The expected frequency of a category is the calculated frequency for the category xpected frequencies are obtained assuming the specified (or hypothesized) distribution The expected frequency for the ith category is i = np i where n is the number of trials (the sample size) and p i is the assumed probability of the ith category Larson & Farber, lementary Statistics: Picturing the World, 3e 4 Observed and xpected Frequencies 00 teenagers are randomly selected and asked what their favorite pizza topping is The results are shown below Find the observed frequencies and the expected frequencies Topping Cheese Pepperoni Sausage Mushrooms Onions Results (n = 00) 78 5 30 5 5 % of teenagers 4% 5% 5% 0% 9% Observed Frequency 78 5 30 5 5 xpected Frequency 00(04) = 8 00(05) = 50 00(05) = 30 00(00) = 0 00(009) = 8 Larson & Farber, lementary Statistics: Picturing the World, 3e 5 Chi-Square Goodness-of-Fit Test For the chi-square goodness-of-fit test to be used, the following must be true The observed frequencies must be obtained by using a random sample ach expected frequency must be greater than or equal to 5 The Chi-Square Goodness-of-Fit Test If the conditions listed above are satisfied, then the sampling distribution for the goodness-of-fit test is approximated by a chi-square distribution with k degrees of freedom, where k is the number of categories The test statistic for the chi-square goodness-of-fit test is ( O ) The test is always a righttailed test χ = where O represents the observed frequency of each category and represents the expected frequency of each category Larson & Farber, lementary Statistics: Picturing the World, 3e 6

Chi-Square Goodness-of-Fit Test Performing a Chi-Square Goodness-of-Fit Test In Words Identify the claim State the null and alternative hypotheses In Symbols State H 0 and H a Specify the level of significance 3 Identify the degrees of freedom 4 Determine the critical value 5 Determine the rejection region Identify α df = k Use Table 6 in Appendix B Continued Larson & Farber, lementary Statistics: Picturing the World, 3e 7 Chi-Square Goodness-of-Fit Test Performing a Chi-Square Goodness-of-Fit Test In Words In Symbols 6 Calculate the test statistic χ = ( O ) 7 Make a decision to reject or fail to reject the null hypothesis 8 Interpret the decision in the context of the original claim If χ is in the rejection region, reject H 0 Otherwise, fail to reject H 0 Larson & Farber, lementary Statistics: Picturing the World, 3e 8 Chi-Square Goodness-of-Fit Test A researcher claims that the distribution of favorite pizza toppings among teenagers is as shown below 00 randomly selected teenagers are surveyed Topping Cheese Pepperoni Sausage Mushrooms Onions Frequency, f 39% 6% 5% 5% 75% Using α = 00, and the observed and expected values previously calculated, test the surveyor s claim using a chi-square goodnessof-fit test Continued Larson & Farber, lementary Statistics: Picturing the World, 3e 9 3

Chi-Square Goodness-of-Fit Test xample continued: H 0 : The distribution of pizza toppings is 39% cheese, 6% pepperoni, 5% sausage, 5% mushrooms, and 75% onions (Claim) H a : The distribution of pizza toppings differs from the claimed or expected distribution Because there are 5 categories, the chi-square distribution has k = 5 = 4 degrees of freedom With df = 4 and α = 00, the critical value is χ 0 = 377 Continued Larson & Farber, lementary Statistics: Picturing the World, 3e 0 Chi-Square Goodness-of-Fit Test xample continued: χ 0 = 377 Rejection region α = 00 X Topping Cheese Pepperoni Sausage Mushrooms Onions Observed Frequency 78 5 30 5 5 xpected Frequency 8 50 30 0 8 ( O ) (78 8) χ = = 8 (5 50) + + 50 (30 30) 30 + (5 0) 0 + (5 8) 8 05 Fail to reject H 0 There is not enough evidence at the % level to reject the surveyor s claim Larson & Farber, lementary Statistics: Picturing the World, 3e 0 Independence 4

Contingency Tables An r c contingency table shows the observed frequencies for two variables The observed frequencies are arranged in r rows and c columns The intersection of a row and a column is called a cell The following contingency table shows a random sample of 3 fatally injured passenger vehicle drivers by age and gender (Adapted from Insurance Institute for Highway Safety) Age Gender Male Female 6 0 3 3 30 5 3 40 5 33 4 50 43 5 60 8 0 6 and older 0 6 Larson & Farber, lementary Statistics: Picturing the World, 3e 3 xpected Frequency Assuming the two variables are independent, you can use the contingency table to find the expected frequency for each cell Finding the xpected Frequency for Contingency Table Cells The expected frequency for a cell r,c in a contingency table is (Sum of row r ) (Sum of column c ) xpected frequency r, c = Sample size Larson & Farber, lementary Statistics: Picturing the World, 3e 4 xpected Frequency Find the expected frequency for each Male cell in the contingency table for the sample of 3 fatally injured drivers Assume that the variables, age and gender, are independent Gender Male Female Total 6 0 3 3 45 30 5 73 3 40 5 33 85 Age 4 50 43 64 5 60 8 0 38 6 and older 0 6 6 Total 6 05 3 Continued Larson & Farber, lementary Statistics: Picturing the World, 3e 5 5

xpected Frequency xample continued: Age Gender 6 0 30 3 40 4 50 5 60 6 and older Total Male 3 5 5 43 8 0 6 Female 3 33 0 6 05 Total 45 73 85 64 38 6 3 (Sum of row r ) (Sum of column c ) xpected frequency r, c = Sample size 6 45 6 73 6 85, = 308, = 49 3 3,3 = 570 3 6 64 6 38 6 6,4 = 4307,5 = 557 3 3,6 = 077 3 Larson & Farber, lementary Statistics: Picturing the World, 3e 6 Chi-Square Independence Test A chi-square independence test is used to test the independence of two variables Using a chi-square test, you can determine whether the occurrence of one variable affects the probability of the occurrence of the other variable For the chi-square independence test to be used, the following must be true The observed frequencies must be obtained by using a random sample ach expected frequency must be greater than or equal to 5 Larson & Farber, lementary Statistics: Picturing the World, 3e 7 Chi-Square Independence Test The Chi-Square Independence Test If the conditions listed are satisfied, then the sampling distribution for the chi-square independence test is approximated by a chisquare distribution with (r )(c ) degrees of freedom, where r and c are the number of rows and columns, respectively, of a contingency table The test statistic for the chi-square independence test is ( O ) The test is always a righttailed test χ = where O represents the observed frequencies and represents the expected frequencies Larson & Farber, lementary Statistics: Picturing the World, 3e 8 6

Chi-Square Independence Test Performing a Chi-Square Independence Test In Words Identify the claim State the null and alternative hypotheses In Symbols State H 0 and H a Specify the level of significance 3 Identify the degrees of freedom 4 Determine the critical value 5 Determine the rejection region Identify α df = (r )(c ) Use Table 6 in Appendix B Continued Larson & Farber, lementary Statistics: Picturing the World, 3e 9 Chi-Square Independence Test Performing a Chi-Square Independence Test In Words In Symbols 6 Calculate the test statistic χ = ( O ) 7 Make a decision to reject or fail to reject the null hypothesis 8 Interpret the decision in the context of the original claim If χ is in the rejection region, reject H 0 Otherwise, fail to reject H 0 Larson & Farber, lementary Statistics: Picturing the World, 3e 0 Chi-Square Independence Test The following contingency table shows a random sample of 3 fatally injured passenger vehicle drivers by age and gender The expected frequencies are displayed in parentheses At α = 005, can you conclude that the drivers ages are related to gender in such accidents? Gender Male Female 6 0 3 (308) 3 (47) 45 30 5 (49) (388) 73 3 40 5 (570) 33 (780) 85 Age 4 50 43 (4307) (093) 64 5 60 8 (557) 0 (43) 38 6 and older 0 (077) 6 (53) 6 Total 6 05 3 Larson & Farber, lementary Statistics: Picturing the World, 3e 7

Chi-Square Independence Test xample continued: Because each expected frequency is at least 5 and the drivers were randomly selected, the chi-square independence test can be used to test whether the variables are independent H 0 : The drivers ages are independent of gender H a : The drivers ages are dependent on gender (Claim) df = (r )(c ) = ( )(6 ) = ()(5) = 5 With df = 5 and α = 005, the critical value is χ 0 = 07 Continued Larson & Farber, lementary Statistics: Picturing the World, 3e Chi-Square Independence Test xample continued: χ 0 = 07 Rejection region α = 005 ( O χ = = 84 ) Fail to reject H 0 X O 3 5 5 43 8 0 3 33 0 6 308 49 570 4307 557 077 47 388 780 093 43 53 O 7 88 5 007 43 077 7 88 5 007 43 077 (O ) ( O ) 9584 00977 35344 007 704 0477 00049 0000 59049 0309 0599 0055 9584 00 35344 048 704 0977 00049 0000 59049 0475 0599 034 There is not enough evidence at the 5% level to conclude that age is dependent on gender in such accidents Larson & Farber, lementary Statistics: Picturing the World, 3e 3 03 Comparing Two Variances 8

F-Distribution Let s and s represent the sample variances of two different populations If both populations are normal and the population variances are equal, σthen and the σ sampling distribution of s F = is called an F-distribution s There are several properties of this distribution The F-distribution is a family of curves each of which is determined by two types of degrees of freedom: the degrees of freedom corresponding to the variance in the numerator, denoted df N, and the degrees of freedom corresponding to the variance in the denominator, denoted df D Continued Larson & Farber, lementary Statistics: Picturing the World, 3e 5 F-Distribution Properties of the F-distribution continued: F-distributions are positively skewed 3 The total area under each curve of an F-distribution is equal to 4 F-values are always greater than or equal to 0 5 For all F-distributions, the mean value of F is approximately equal to df N = and df D = 8 df N = 8 and df D = 6 df N = 6 and df D = 7 df N = 3 and df D = 3 4 F Larson & Farber, lementary Statistics: Picturing the World, 3e 6 Critical Values for the F-Distribution Finding Critical Values for the F-Distribution Specify the level of significance α Determine the degrees of freedom for the numerator, df N 3 Determine the degrees of freedom for the denominator, df D 4 Use Table 7 in Appendix B to find the critical value If the hypothesis test is a one-tailed, use the α F-table b two-tailed, use the α F-table Larson & Farber, lementary Statistics: Picturing the World, 3e 7 9

Critical Values for the F-Distribution Find the critical F-value for a right-tailed test when α = 005, df N = 5 and df D = 8 df D : Degrees of freedom, denominator 64 85 Appendix B: Table 7: F-Distribution α = 005 df N : Degrees of freedom, numerator 3 4 5 6 995 57 46 30 340 900 96 95 930 933 7 8 9 4 40 48 335 334 333 96 95 93 73 7 70 57 56 55 46 45 43 The critical value is F 0 = 56 Larson & Farber, lementary Statistics: Picturing the World, 3e 8 Critical Values for the F-Distribution Find the critical F-value for a two-tailed test when α = 00, df N = 4 and df D = 6 df D : Degrees of freedom, denominator 3 4 5 6 7 64 85 03 77 66 599 559 The critical value is F 0 = 453 α = (00) = 005 Appendix B: Table 7: F-Distribution α = 005 df N : Degrees of freedom, numerator 3 4 5 6 995 57 46 30 340 900 96 95 930 933 955 98 9 90 894 694 659 639 66 66 579 54 59 505 495 54 476 453 439 48 474 435 4 397 387 Larson & Farber, lementary Statistics: Picturing the World, 3e 9 Two-Sample F-Test for Variances Two-Sample F-Test for Variances A two-sample F-test is used to compare two population variances when σ a sample is randomly selected from each population and σ The populations must be independent and normally distributed The test statistic is s F = s where s represent the sample variances with and s The s degrees s of freedom for the numerator is df N = n and the degrees of freedom for the denominator is df D = n, where n is the size of the sample having variance and n is the size of the sample having variance s s Larson & Farber, lementary Statistics: Picturing the World, 3e 30 0

Two-Sample F-Test for Variances Using a Two-Sample F-Test to Compare In Words Identify the claim State the null and alternative hypotheses Specify the level of significance 3 Identify the degrees of freedom 4 Determine the critical value In Symbols State H 0 and H a Identify α df N = n df D = n Use Table 7 in Appendix B Continued Larson & Farber, lementary Statistics: Picturing the World, 3e 3