Elementary Statistics



Similar documents
Mind on Statistics. Chapter 15

Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1)

Chi Square Distribution

Math 58. Rumbos Fall Solutions to Review Problems for Exam 2

Is it statistically significant? The chi-square test

Recommend Continued CPS Monitoring. 63 (a) 17 (b) 10 (c) (d) 20 (e) 25 (f) 80. Totals/Marginal

Bivariate Statistics Session 2: Measuring Associations Chi-Square Test

Math 108 Exam 3 Solutions Spring 00

Mind on Statistics. Chapter 4

CHAPTER IV FINDINGS AND CONCURRENT DISCUSSIONS

Use of the Chi-Square Statistic. Marie Diener-West, PhD Johns Hopkins University

Contingency Tables and the Chi Square Statistic. Interpreting Computer Printouts and Constructing Tables

Using Stata for Categorical Data Analysis

The Chi-Square Test. STAT E-50 Introduction to Statistics

TABLE OF CONTENTS. About Chi Squares What is a CHI SQUARE? Chi Squares Hypothesis Testing with Chi Squares... 2

Having a coin come up heads or tails is a variable on a nominal scale. Heads is a different category from tails.

CHAPTER 15 NOMINAL MEASURES OF CORRELATION: PHI, THE CONTINGENCY COEFFICIENT, AND CRAMER'S V

Chapter 23. Two Categorical Variables: The Chi-Square Test

Solutions: Problems for Chapter 3. Solutions: Problems for Chapter 3

Simulating Chi-Square Test Using Excel

Comparing Multiple Proportions, Test of Independence and Goodness of Fit

Association Between Variables

Crosstabulation & Chi Square

Adverse Impact Ratio for Females (0/ 1) = 0 (5/ 17) = Adverse impact as defined by the 4/5ths rule was not found in the above data.

Lecture 13. Understanding Probability and Long-Term Expectations

statistics Chi-square tests and nonparametric Summary sheet from last time: Hypothesis testing Summary sheet from last time: Confidence intervals

One-Way Analysis of Variance (ANOVA) Example Problem

Introduction to Analysis of Variance (ANOVA) Limitations of the t-test

In the past, the increase in the price of gasoline could be attributed to major national or global

CHAPTER 12. Chi-Square Tests and Nonparametric Tests LEARNING OBJECTIVES. USING T.C. Resort Properties

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

AMS 5 CHANCE VARIABILITY

Elementary Statistics

Chapter 16: law of averages

Unit 12 Logistic Regression Supplementary Chapter 14 in IPS On CD (Chap 16, 5th ed.)

SHORT ANSWER. Write the word or phrase that best completes each statement or answers the question.

People like to clump things into categories. Virtually every research

Standard 12: The student will explain and evaluate the financial impact and consequences of gambling.

COMPARISONS OF CUSTOMER LOYALTY: PUBLIC & PRIVATE INSURANCE COMPANIES.

AP: LAB 8: THE CHI-SQUARE TEST. Probability, Random Chance, and Genetics

MONT 107N Understanding Randomness Solutions For Final Examination May 11, 2010

CHAPTER 11 CHI-SQUARE AND F DISTRIBUTIONS

Hypothesis Tests for 1 sample Proportions

THE ASSOCIATED PRESS POLL CONDUCTED BY IPSOS PUBLIC AFFAIRS RELEASE DATE: DECEMEBER 16, 2005 PROJECT # REGISTERED VOTERS/ PARTY AFFILIATION

Test Positive True Positive False Positive. Test Negative False Negative True Negative. Figure 5-1: 2 x 2 Contingency Table

11. Analysis of Case-control Studies Logistic Regression

An Introduction to Statistics Course (ECOE 1302) Spring Semester 2011 Chapter 10- TWO-SAMPLE TESTS

CONTINGENCY TABLES ARE NOT ALL THE SAME David C. Howell University of Vermont

Topic 8. Chi Square Tests

Chapter 26: Tests of Significance

LAB : THE CHI-SQUARE TEST. Probability, Random Chance, and Genetics

The overall size of these chance errors is measured by their RMS HALF THE NUMBER OF TOSSES NUMBER OF HEADS MINUS NUMBER OF TOSSES

Categorical Data Analysis

Common Univariate and Bivariate Applications of the Chi-square Distribution

Mind on Statistics. Chapter 13

Regression Analysis: A Complete Example

Chi-square test Fisher s Exact test

A probability experiment is a chance process that leads to well-defined outcomes. 3) What is the difference between an outcome and an event?

STATISTICS 8, FINAL EXAM. Last six digits of Student ID#: Circle your Discussion Section:

Chapter 20: chance error in sampling

Solutions to Homework 10 Statistics 302 Professor Larget

3. Analysis of Qualitative Data

CONTINGENCY (CROSS- TABULATION) TABLES

12.5: CHI-SQUARE GOODNESS OF FIT TESTS

Chi Squared and Fisher's Exact Tests. Observed vs Expected Distributions

MATH 140 Lab 4: Probability and the Standard Normal Distribution

Mind on Statistics. Chapter 12

NATION OK WITH SPORTS BETTING; ONLINE GAMBLING A DIFFERENT STORY

Testing Hypotheses About Proportions

Elementary Statistics Sample Exam #3

Odds ratio, Odds ratio test for independence, chi-squared statistic.

Release #2301 Release Date and Time: 6:00 a.m., Tuesday, March 10, 2009

- - Each Split Sample = ± 5.6 percentage points

AP Statistics 7!3! 6!

Basic Probability Theory II

Statistics 2014 Scoring Guidelines

CHAPTER 5 COMPARISON OF DIFFERENT TYPE OF ONLINE ADVERTSIEMENTS. Table: 8 Perceived Usefulness of Different Advertisement Types

Online Appendix: Thar SHE blows? Gender, Competition, and Bubbles in Experimental Asset Markets, by Catherine C. Eckel and Sascha C.

FINDINGS OF THE CALIFORNIA SENATE BASELINE SURVEY

Chapter 5 Section 2 day f.notebook. November 17, Honors Statistics

An Introduction to Statistical Tests for the SAS Programmer Sara Beck, Fred Hutchinson Cancer Research Center, Seattle, WA

Betting systems: how not to lose your money gambling

A POPULATION MEAN, CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

Section 12 Part 2. Chi-square test

II. DISTRIBUTIONS distribution normal distribution. standard scores

1) The table lists the smoking habits of a group of college students. Answer: 0.218

MAINE K-12 & SCHOOL CHOICE SURVEY What Do Voters Say About K-12 Education?

Medicare Advantage National Senior Survey 600 Senior Registered Voters in the Medicare Advantage Program February 24-28, 2015

Nonparametric Statistics

STT 200 LECTURE 1, SECTION 2,4 RECITATION 7 (10/16/2012)

17% of cell phone owners do most of their online browsing on their phone, rather than a computer or other device

2.5 Conditional Probabilities and 2-Way Tables

Statistical Impact of Slip Simulator Training at Los Alamos National Laboratory

General Method: Difference of Means. 3. Calculate df: either Welch-Satterthwaite formula or simpler df = min(n 1, n 2 ) 1.

MATH 2200 PROBABILITY AND STATISTICS M2200FL083.1

Poisson Models for Count Data

MUHLENBERG COLLEGE /MORNING CALL Presidential Tracking Poll

Chapter 19 The Chi-Square Test

AP STATISTICS (Warm-Up Exercises)

Transcription:

lementary Statistics Chap10 Dr. Ghamsary Page 1 lementary Statistics M. Ghamsary, Ph.D. Chapter 10 Chi-square Test for Goodness of fit and Contingency tables

lementary Statistics Chap10 Dr. Ghamsary Page Chi-Square Test Generally speaking, the chi-square test is a statistical test used to examine differences with categorical variables. The chi-square test is used in two similar but distinct circumstances: 1. for estimating how closely an observed distribution matches an expected distribution - we'll refer to this as the goodness-of-fit test. for estimating whether two random variables are independent (Contingency Tables) Goodness of Fit Test One of the more interesting goodness-of-fit applications of the chi-square test is to examine issues of fairness and cheating in games of chance, such as coins, cards, dice, and roulette. Since such games usually involve wagering, there is significant incentive for people to try to rig the games and allegations of missing cards, "loaded" dice, and "sticky" roulette wheels are all too common. So how can the goodness-of-fit test be used to examine cheating in gambling? It is easier to describe the process through an example. Take the example of dice. Most dice used in wagering have six sides, with each side having a value of one, two, three, four, five, or six. If the die being used is fair, then the chance of any particular number coming up is the same: 1 in 6. However, if the die is loaded, then certain numbers will have a greater likelihood of appearing, while others will have a lower likelihood So we would like to test and see if a given data set will match the hypothesized distribution. The following is the test statistics used for this purpose. where, χ = ( O ) O: is the observed data : the expected value.

lementary Statistics Chap10 Dr. Ghamsary Page 3 Clearly if the data matches the claimed distribution, this chi-square value will be small and we cannot reject the null hypothesis. Otherwise this value, χ, will be large and we must reject the H 0. xample 1: The simplest example is to flip a coin 100 times and record the outcomes. Suppose we observed 40 heads. Test the claim that the coin is fair, which means the outcomes are equally likely. Use 5% level of significance. Solution: Let us write the outcome in the following table. The expected number of heads is Step1: R S T 0.50(100)=50 H :The Coin isfair 0 H :The Coin isnot fair 1 Step: Calculate the test statistics as follows: Step3: Decision: So we reject H 0. Head Tail Observed 40 60 xpected 50 50 df=-1=1 α = 005. ( O ) ( 50) ( 50) by using Table III CV=3.84 40 60 χ = = + = 50 50 4 Conclusion: This means the coin is biased..

lementary Statistics Chap10 Dr. Ghamsary Page 4 xample : The next simplest example is to roll a die 10 times and record the outcomes. Suppose we have observed 18 one s, 3 two s, 15 three s, four s, 17 five s, and 5 six s. Test the claim that the die is fair, which means the outcomes are equally likely again. Use 5% level of significance. Solution: Let us write the outcome in the following table. The expected number of outcomes is all equal 0, under the assumption of equality. So we have =10/6=0. 1 3 4 5 6 Observed 18 3 15 17 5 Step1: R S T xpected 0 0 0 0 0 0 H :The Die isfair 0 H :The Die is not fair 1 Step: Calculate the test statistics as follows: df=6-1=5 α = 005. by using Table III CV=11.07 ( O ) χ = = ( 18 0) ( 3 0) ( 15 0) + + 0 0 0 ( 0) ( 17 0) ( 5 0) + + = 3.8. 0 0 0 Step3: Decision: fail to reject H 0 Conclusion: This means the die is unbiased

lementary Statistics Chap10 Dr. Ghamsary Page 5 xample 3: An ice cream shop would like to know which flavor is preferred by the customers. The past record shows that 50% prefer vanilla, 0% prefer chocolate, 10% prefer vanilla fudge, 15% prefer strawberry, and 5% prefer other kinds. A random sample of 500 customers revealed the following results. Test the claim that the observed numbers and the percentage match. Flavor Vanilla Chocolate Strawberry Vanilla Fudge Others Customers 40 10 70 40 30 Solution: Let us calculate the expected value as follows: Vanilla: 50% of 500 = 0.50*500=50 Chocolate: 0% of 500 = 0.0*500 =100 Strawberry: 15% of 500 = 0.15*500= 75 Vanilla Fudge: 10% of 500 = 0.10*500=50 Others: 5% of 500 = 0.05*500=5 Flavor Vanilla Chocolate Strawberry Vanilla Fudge Other Observed 40 10 70 40 30 xpected 50 100 75 50 5 Step1: H 0:The Observed and expected match H 1:TheObservedandexpected donot match df=5-1=4α = 005. CV=9.49 Step: Calculate the test statistics as follows: ( O ) χ = = ( 40 50) ( 10 100) ( 70 75) ( 40 50 ) ( 30 5 ) + + + 50 100 75 50 + 1. 3. 5 Step3: Decision: fail to reject H 0 Conclusion: This means the die is unbiased

lementary Statistics Chap10 Dr. Ghamsary Page 6 xample 4: Affirmative Action Problem A large organization in a city is accused of being racist in one or more race group. If in that city, there are 45% White, 15% Black, 0% Hispanic, 5% Asian, and the rest are others. A random sample of 50 from the whole corporation is collected with the following results. Test to see if the frequency of the observed and the percentage in the population are the same. H 0: The frequency observed matches the percentage of population Step1: H 1: The frequency observed does not match the percentage of population α = 005. df=5-1=4 CV=9.49 Race Observed xpected % White 100 45% 11.5 Black 30 15% 37.5 Hispanic 40 0% 50 Asian 0 5% 1.5 Other 60 15% 37.5 Total: 50 CV=9.49 Step: ( O ) χ = = ( 100 11. 5) ( 30 37. 5) ( 40 50) + + 11. 5 37. 5 50 ( 0 1. 5) ( 60 37. 5) + +.9 1. 5 37. 5 Step3: Decision: reject H 0 Conclusion: This means the frequency of observed and the % of the population do not match.

lementary Statistics Chap10 Dr. Ghamsary Page 7 xample5: In a study in Alameda County, California, researchers compared the demographic characteristics of members of grand juries to determine how closely these juries reflected the population of the county. If the juries were selected randomly or impartially, then the characteristics of the jurors should closely match those of the larger county; however, if attorneys were tilting the jury selection process, then the jurors' characteristics would be quite different from the county. (figures taken from UCLA Law Review, vol 0, 1973 - as shown at: http://www.stat.ucla.edu/cases/jury/) Age Country-Wide % # of Jurors 1-40 4 5 41-50 3 9 51-60 16 19 >61 19 33 Questions Based on the figures shown in the table above, use the chi-square test to evaluate whether there is evidence of jury fixing in terms of the age of jurors in Alameda County. a. What is the null hypothesis? What is the alternative hypothesis? b. What figures do you need to calculate for this test? c. How many degrees of freedom are there? d. What is the value of the chi-square statistic for this table? What is the p-value of this statistic? e. From this value, what can you conclude about the age of jurors in Alameda County?

lementary Statistics Chap10 Dr. Ghamsary Page 8 Test of Independence The other primary use of the chi-square test is to examine whether two variables are independent or not. What does it mean to be independent, in this sense? It means that the two factors are not related. Typically in any research such as epidemiology and social science research, we're interested in finding factors that are related - education and income, occupation and prestige, age and voting behavior. In this case, the chi- square can be used to assess whether two variables are independent or not. More generally, we say that factor A is "not correlated with" or "independent of" the factor B if more of one is not associated with more of another. If two categorical variables are correlated their values tend to move together, either in the same direction or in the opposite. In practice there are many data comes in the following format. They are called two way frequency table and some other text book call it contingency tables. The test of dependency is a test to see if row factor and the column factor are related. Test Statistics: is the same as before, namely: χ ( O ) =, Where b the O is the observed cells and is the expected cells which is can be find from the following: Row Total gb Colunm Total g = Grand total Also we have degrees of freedom = (r-1)(c-1), Where, r = Number of rows, c = Number of column

lementary Statistics Chap10 Dr. Ghamsary Page 9 xample 6: Dr. Ghamsary and colleagues are testing to see if the habits of smoking and gender are independent. They have collected a random sample of 50 people as they appear in the following table. Test their claim by using the 0.05 level of significance. Sex Male Smoking Yes No 60 40 80 70 Total Female 150 Total 140 110 50 Solution: H 0: Sex and smoking are independent Step1: H 1: Sex and smoking are not independent df = ( 1)( 1) = 1 α = 005. CV=3.84 100* 140 100* 110 11 = = 56 1 = = 44 50 50 150* 140 150* 110 1 = = 84 = = 66 50 50 Smoking Total Yes No 60 40 Sex Male 56 44 100 80 70 100 Female 84 66 150 Total 140 110 50 Step: Calculate the test statistics as follows: ( O ) χ = = Step3: Decision: fail to reject H 0 ( 60 56) ( 40 44) + + 56 44 ( 80 84) ( 70 66) 84 66 1.08 +. Conclusion: This means the sex and smoking are independent.

lementary Statistics Chap10 Dr. Ghamsary Page 10 Cautionary Note It is important to keep in mind that the chi-square test only tests whether two variables are independent. It cannot address questions of which is greater or less. Using the chi-square test, we cannot evaluate directly the hypothesis that men smoke more than women; rather, the test (strictly speaking) can only test whether the two variables are independent or not. xample 7: Ghamsary and others have done some research (Kashan, Vol 1, 13. 1998) on income and level of education. They are interested to know if people with more education have higher income. They collected a random sample of 00 people from a large population and they found the following results.test the claim that the education and income are independent factors. Use α = 001. Income\ducation None High School 4 year college Graduate School Less than 30K 5 0 10 6 30K-50K 10 8 40 1 Above 50K 5 1 50 3 Solution: Income\ducation N HS College Graduate Total Less than 30K 5 0 10 6 61 9.76 14.64 4.40 1.0 30K-50K 10 8 40 1 90 14.40 1.60 36.00 18.00 Above 50K 5 1 50 3 99 15.84 3.76 39.60 19.80 Total 40 60 100 50 50 61* 40 61* 60 61* 100 61* 50 11 = =9.76 1 = =14.64 13 = = 4.40 14 = = 1.0 0 5 0 5 50 0 5 90* 40 90* 60 90* 100 90* 50 1 = =14.40 = =1.60 3 = = 36.00 4 = = 18.00 0 5 0 5 50 0 5 99* 40 99* 60 99* 100 99* 50 31 = =15.84 3 = =3.76 33 = = 39.60 34 = = 19.80 50 0 5 50 0 5

lementary Statistics Chap10 Dr. Ghamsary Page 11 Step1: H 0:Income and ducation are independent H 1:Income and ducation are not independent Step: Calculate the test statistics as follows: ( O ) χ = = ( 5 9.76) ( 0 14.64) ( 10 4.40) ( 6 1.0) df = ( 3 1)( 4 1) = 6 at 0.01, CV=16.81 + + + 9.76 14.64 4.40 1.0 10 14.40 8 1.60 40 36.00 1 18.00 + + + + 14.40 1.60 36.00 18.00 5 15.84 1 3.76 50 39.60 3 19.80 + + + 66.58 15.84 3.76 39.60 19.80 ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) MINITAB: Chi-Square Test xpected counts are printed below observed counts C1 C C3 C4 Total 1 5 0 10 6 61 9.76 14.64 4.40 1.0 10 8 40 1 90 14.40 1.60 36.00 18.00 3 5 1 50 3 99 15.84 3.76 39.60 19.80 Total 40 60 100 50 50 Chi-Sq = 3.797 + 1.96 + 8.498 + 3.151 + 1.344 + 1.896 + 0.444 +.000 + 7.418 + 5.81 +.731 + 7.517 = 66.581 DF = 6, P-Value = 0.000 Step3: Decision: Reject H 0 Conclusion: This means the income and education are not independent.

lementary Statistics Chap10 Dr. Ghamsary Page 1 xample 8: In a recent research taken from a random sample of 500 student, show in the following table by two factors, Study on time for the tests and School areas. Is there an association between the type of school area and the student goals? School Area Study on time on the tests Rural Suburban Urban Total Always 80 90 60 30 Some Times 70 70 30 170 Never 50 30 0 100 Total 00 190 110 500

lementary Statistics Chap10 Dr. Ghamsary Page 13 xample 9: A chocolate manufacturing company conducted a survey of 300 customers. The research question is: Is there a significant relationship between packaging preference (size of the bottle purchased) and economic status? There were four packaging sizes: small, medium, large, and jumbo. conomic status was: lower, middle, and upper. The following data was collected. Test the claim that the size of the packaging and economic status are independent by using 0.10 level of significance. conomic Status Size Lower Middle Upper Total Small 30 18 70 Medium 3 8 19 70 Large 18 7 35 80 Jumbo 19 3 38 80 Total 90 100 110

lementary Statistics Chap10 Dr. Ghamsary Page 14 xample 10: A random sample of 1500 persons is questioned regarding their political affiliation and opinion on the war in IRAQ. Test if the political affiliation and their opinion on the war in IRAQ are dependent using 5% level of significance. The observed data is given in the following table. War in IRAQ Party Affiliation Favor Indifferent Opposed Total Democrat 10 50 580 750 Republican 600 150 100 850 Independent 50 30 10 00 Total 770 30 800 1800