# Unit 29 Chi-Square Goodness-of-Fit Test

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 Unit 29 Chi-Square Goodness-of-Fit Test Objectives: To perform the chi-square hypothesis test concerning proportions corresponding to more than two categories of a qualitative variable To perform the Bonferroni method of multiple comparison when the null hypothesis in a chi-square goodness-of-fit is rejected The chi-square goodness-of-fit test is a generalization of the z test about a population proportion. If the manager at a particular manufacturing plant believes that 20% of the packages from an assembly line are underweight, the z test about a proportion could be used to see if there is any evidence against this belief. However, if the manager believes that 20% of the packages are underweight, 30% are of the packages are overweight, and 50% of the packages are an acceptable weight, the z test is not appropriate, since more than two categories are involved. When the null hypothesis concerns proportions corresponding to more than two categories of a qualitative variable, the chi-square goodness-of-fit test is appropriate. If we let k represent the number of categories, then k is the number of population proportions. The null hypothesis in a chi-square goodness-of-fit test states hypothesized values for these population proportions. The test statistic in the chi-square goodness-of-fit test is ( ) O χ 2 k 1 = E E 2, where O represents observed frequencies and E represents frequencies that are expected if H 0 were actually true. The symbol χ is the Greek letter chi. The superscript 2 on the χ is to emphasize that the test statistic involves squaring, hence χ 2 is read as chi-squared. The subscript k 1 refers to the degrees of freedom associated with the test statistic. In a chi-square goodness-of-fit test, we decide whether or not to reject the null hypothesis by comparing our χ 2 test statistic to an appropriate χ 2 distribution. In order to perform the chi-square goodness-of-fit test, we must first study the χ 2 distributions. Table A.5 provides us with information about the χ 2 distributions. The figure at the top of the first page of Table A.5 indicates that each χ 2 distribution, like each f distribution, is positively skewed; also, the values from a χ 2 distribution are nonnegative just as are the values from an f distribution are nonnegative. Table A.5 contains information about several different χ 2 distributions, each of which is identified by df (degrees of freedom). The information about χ 2 distributions in Table A.5 is organized similar to how the information about t distributions is organized in Table A.3. The rows are labeled with df, and the columns are labeled with various areas under a χ 2 density curve. The notation we shall use to represent the χ 2 -scores in Table A.4 is similar to the notation we use to represent f-scores in Table A.4 and t-scores in Table A.3. For instance, just as we use t 5;0.01 to represent the t-score above which lies 0.01 of the area under the density curve for the Student's t distribution with df = 5, and just as we use f 5, 13; 0.01 to represent the f-score above which lies 0.01 of the area under the density curve for the Fisher's f distribution with numerator df = 5 and denominator df = 13, we shall use χ 2 5; 0.01 to represent the χ 2 -score above which lies 0.01 of the area under the density curve for the χ 2 distribution with df = 5. Table A.5 extends over two 218

2 pages because unlike the t distributions, the χ 2 distributions are not symmetric. The t distributions have the property that the t-scores in the negative half and the t-scores in the positive half are mirror images of each other; consequently, we only need the information about the positive half. Since the values from a χ 2 distribution are only nonnegative, we must table the values at the left end of the distribution (nearer to zero) and the values at the right end of the distribution separately. For our purposes, we shall only need the values on the first page (the upper end). Now, let us suppose we collect data for a chi-square goodness-of-fit test, that is, that is, we have observed frequencies for each of k categories of a qualitative variable. If each observed frequency, denoted O, were exactly equal to the corresponding frequency that is expected if H 0 were actually true, denoted by E, then the χ 2 test statistic would be exactly equal to zero. However, even if H 0 were actually true, we still expect there to be some random variation between O and E. The figure at the top of the first page of Table A.5 illustrates the distribution of the χ 2 test statistic if the null hypothesis is true. Figure 29-1 displays this same typical density curve for a χ 2 distribution, and this looks very similar to Figure 27-1 which displays a typical density curve for an f distribution. Notice that, like the typical density curve for an f distribution, the typical χ 2 density curve has a shape which suggests that a high proportion of the χ 2 -scores are roughly speaking in the range from 0 to 2. In a chi-square goodness-of-fit test, a χ 2 test statistic which is unusually large would provide us with evidence that the hypothesized proportions are not all correct. We then define the rejection region in a chi-square goodness-of-fit test by the χ 2 -score above which lies α of the area under the corresponding density curve, where α is of course the significance level. The shaded area in the figure at the top of the first page of Table A.5 graphically illustrates the type of rejection region we use in a chi-square goodness-of-fit test, and we find this to be similar to the type of rejection region in a one-way ANOVA f test. The same four steps we have used in the past to perform hypothesis tests are used in a chi-square goodness-of-fit test. Let us use a chi-square goodness-of-fit test to see if there is any evidence against the belief that 20% of the packages from an assembly line are underweight, 30% are of the packages are overweight, and 50% of the packages are acceptable weight. We choose a 0.05 significance level to perform this hypothesis test, which is described in Table 29-1a. Letting λ U, λ O, and λ A respectively represent the population proportions of packages which are underweight, overweight, and an acceptable weight, we complete the first step in our hypothesis test as follows: H 0 : λ U = 0.2, λ O = 0.3, λ A = 0.5 vs. H 1 : Not all of the hypothesized proportions are correct (α = 0.05). Copy these hypotheses and the significance level in Step 1 of Table 29-1a. The second step is to collect data and calculate the value of the test statistic. In the simple random sample of 415 packages selected from the assembly line in Table 29-1a, 106 are found to be underweight, 111 are found to be overweight, and 198 are found to have an acceptable weight; these are the observed frequencies, represented by O. If H 0 were true, we would expect that out of 415 randomly selected packages, (0.2)(415) will be underweight, (0.3)(415) will be overweight, and (0.5)(415) will be an acceptable weight. These are the expected frequencies, represented by E. Calculate these expected frequencies, and enter the results in the appropriate places in Step 2 of Table 29-1a; notice that the observed frequencies have already been entered. (You should find that the expected frequencies are respectively 83.0, 124.5, and ) Now that we have the values for O and E, it is a simple matter to calculate the test statistic. Before calculating the test statistic, however, a comment about expected frequencies is in order. We do not interpret an expected frequency as a frequency that we must necessarily observe if H 0 is true, because we expect that there will be some random variation. For instance, if you were to flip a perfectly fair coin 10 times, the expected frequency of heads is 5; however, because of random variation, we certainly should not be surprised to observe 6 heads or 4 heads. The correct interpretation of an expected frequency is that it is an average of the frequencies that we would observe if simple random samples of the same size were repeatedly selected. Depending on the sample size, it may not even be possible to actually observe an expected frequency. If you were to flip the fair coin 9 times, the expected frequency of heads is 4.5, which is not possible to actually observe, but 4.5 would be the long term average for expected frequencies. Recall that the z test about a proportion is appropriate only if the random sample size is sufficiently large; this is also true for the chi-square goodness-of-fit test. In general, the random sample size will be sufficiently large if each expected frequency (E) is greater than or equal to 5. If one or more of the expected 219

3 frequencies is less than 5, categories can be combined in a way to insure that all E 5. You should notice that each of the expected frequencies calculated in Table 29-1a is greater than 5. We now return to the calculation of our test statistic. Since our hypothesis test involves k = 3 categories, there are k 1 = 3 1 = 2 degrees of freedom associated with the test statistic. Substituting the observed and expected frequencies into the test statistic formula, we have ( ) 2 ( ) ( ) χ 2 2 = Do this calculation, and enter the result in the appropriate place in Step 2 of Table 29-1a. (You should find that χ 2 2 = ) In general, calculating the chi-square goodness-of-fit test statistic is not too difficult, but the appropriate statistical software or programmable calculator will be readily available

4 The third step is to define the rejection region, decide whether or not to reject the null hypothesis, and obtain the p-value of the test. The shaded area in the figure at the top of the first page of Table A.5 graphically illustrates our rejection region, which is defined by the χ 2 -score above which lies 0.01 of the area under the density curve for the χ 2 distribution with df = 2. From Table A.5, find the appropriate χ 2 -score, and then define the rejection region algebraically in Step 3 of Table 29-1a. (You should find the rejection region to be defined by χ ) Since our test statistic, calculated in the second step, was found to be χ 2 2 = 8.272, which is in the rejection region, our decision is to reject H 0 : λ U = 0.2, λ O = 0.3, λ A = 0.5 ; in other words, our data provides sufficient evidence that not all of the hypothesized proportions are correct. The shaded area in the figure at the top of the first page of Table A.5 graphically illustrates the p-value, which is the area above our observed test statistic value χ 2 2 = under the density curve for the χ 2 distribution with df = 2. This shaded area would represent the probability of obtaining a test statistic value χ 2 2 which represents greater differences between the observed and expected frequencies than the observed test statistic value χ 2 2 = By looking at the entries in Table A.5 corresponding to df = 2, we find that the observed test statistic χ 2 2 = is between χ 2 2; = and χ 2 2; 0.01 = This tells us that the p-value is between 0.01 and 0.025, which we can designate by writing 0.01 < p-value < The fact that 0.01 < p-value < confirms to us that H 0 is rejected with α = However, this also tells us that H 0 would not be rejected with α = 0.01 but would of course be rejected with α = Now, complete Step 3 of Table 29-1a. To complete the fourth step of the hypothesis test, write a summary of the results of the hypothesis test in Step 4 of Table 29-1a. Your summary should be similar to the following: 221

5 Since χ 2 2 = and χ 2 2; 0.05 = 5.991, we have sufficient evidence to reject H 0. We conclude that not all of the hypothesized proportions, λ U = 0.2, λ O = 0.3, λ A = 0.5, are correct (0.01 < p-value < 0.025). Since H 0 is rejected, we need to use multiple comparison to identify which hypothesized proportions are not correct. When we do not reject the null hypothesis in a chi-square goodness-of-fit test, no further analysis is called for, since we are concluding that the hypothesized proportions are correct. However, rejecting the null hypothesis in a chi-square goodness-of-fit test prompts us to do some further analysis in order to identify which hypothesized proportions are not correct (as we indicated in the summary of results). Which hypothesized proportions are not correct can be identified with multiple comparison. We have previously introduced Scheffe's method of multiple comparison of means. We shall now utilize the Bonferroni method to identify which hypothesized proportions are not correct. Recall that when making multiple comparisons, we want to maintain an overall significance level. With the Bonferroni method, we accomplish this by first dividing the desired overall significance level by 2k; we then use the resulting significance level on individual comparisons. There are three steps to using the Bonferroni method for identifying which hypothesized proportions are not correct. The first step is to obtain a z-score for each sample proportion, based on the hypothesized proportions. The second step is to obtain a value against which the calculated z-scores are compared, and to identify the hypothesized proportion(s) significantly different from the corresponding hypothesized proportion(s). The third and last step is to state the results. Table 29-2 displays in detail these three steps for the Bonferroni method to identify which hypothesized proportions are not correct. To illustrate this Bonferroni method, let us identify which hypothesized proportions are not correct from the hypothesis test in Table The first step is to obtain a z-score for each sample proportion. Each z-score indicated by the formula in Step 1 of Table 29-2 is of the familiar form where a hypothesized proportion is subtracted from a sample proportion, and the result is divided by a standard error which is the square root of a ratio computed by multiplying the hypothesized proportion and one minus the hypothesized proportion, and dividing this result by the sample size. For the data in Table 29-1a, you should see that the sample proportion of underweight packages is 106/415, the sample proportion of overweight packages is 111/415, and the sample proportion of packages of acceptable weight is 198/415. Calculate each of these sample proportions, and enter the results in the appropriate places in Table 29-1b. (You should find that the respective sample proportions are , , and ) 222

6 Since the sample proportion of underweight packages was found to be , and the hypothesized proportion of underweight packages is 0.2, the z-score for the sample proportion of underweight packages is z U = ( 0.2)( 1 0.2) 415. Calculate this z-score, and enter the result in the appropriate place in Table 29-1b; then, repeat this for the overweight packages and for the packages of acceptable weight. (You should find that the respective z-scores are , 1.446, and ) The second step of Bonferroni's method is to obtain a value against which the calculated z-scores are compared. We shall use a 0.05 significance level, which is the same significance level chosen in the hypothesis test of Table 29-1a. If we were concerned with whether or not only one sample proportion was different from a hypothesized proportion with α = 0.05, we would in essence be concerned with a two-sided one sample z test. We would compare a single z-score against z (where α = 0.05 has been divided by 2, so that we can identify a significant difference in either direction.) However, since we are actually concerned with three proportions, Bonferroni s method has us compare a single z-score against z α / (2k) = z 0.05 / (2 3) = z 0.05 / 6 = z Step 2 of Table 29-2 tells us that the difference between a sample proportion and the corresponding hypothesized proportion is considered statistically significant when the corresponding z-score is found in the two-sided rejection region defined by z z α / (2k) or z z α / (2k) (i.e., z z α / (2k) ). In other words, with the Bonferroni method, we divide the desired overall significance level, which is α = 0.05 in the present illustration, by 2k, which is (2)(3) = 6 in the present illustration. We then use the resulting significance level 0.05/6 = to define the rejection region on each individual comparison. You will not find z at the bottom of Table A.2 nor at the bottom of Table A.3. To obtain z , you must search through the areas listed in the main body of Table A.2 for the closest area to , and obtain the corresponding z-score. Use Table A.2 to verify that z = 2.395, and enter this value in the appropriate place in Table 29-1b. Now, treating each calculated z-score just like a test statistic in a two-sided z test, look for the significant differences by looking for each calculated z-score which falls into the two-sided rejection region z or z (i.e., z 2.395). Circle each calculated z-score in Table 29-2b which represents a statistically significant difference. (You should find that the difference between the sample proportion and the hypothesized proportion is statistically significant only for the underweight packages.) The third and last step of Bonferroni's method is to state the results, including the significance level and the directions of differences. After checking the direction of the significant differences, write your statement of results in the appropriate place in Table 29-1b. Your summary the results of Bonferroni's method should be similar to the following: With α = 0.05, we conclude that the proportion of underweight packages is larger than the hypothesized 0.2. A graphical display of the data is always helpful in summarizing results. A bar chart or pie chart displaying the sample proportions for each category would be appropriate for the data of Table 29-1a, since the data consists of the one qualitative variable type of package with the three categories. Figure 29-1 is a pie 223

7 chart displaying the sample proportions for underweight packages, overweight packages, and packages of acceptable weight. It should be no surprise that applying a chi-square goodness-of-fit test when there are only k = 2 categories will produce exactly the same results as the one sample z test about the proportion for one of the two categories. In fact, each of the χ 2 -scores with 1 degree of freedom in Table A.5 is the square of a corresponding z-score. For instance, you can check that [z ] 2 = = 3.841, and that χ 2 1; 0.05 = Self-Test Problem A particular car manufacturer has believed for several years that among buyers of a particular type of sports car, 40% prefer red, 30% prefer green, 20% prefer yellow, and 10% prefer white. In a random sample of 252 buyers of the sports car, 114 prefer red, 57 prefer green, 42 prefer yellow, and 39 prefer white. A 0.05 significance level is chosen for a hypothesis test to see if there is any evidence against this belief. (a) Explain how the data for this hypothesis test is appropriate for a chi-square goodness-of-fit test. (b) Complete the four steps of the hypothesis test by completing the table titled Hypothesis Test for Self-Test Problem You should find that χ 2 3 = (c) If multiple comparison is necessary, apply Bonferroni's method and state the results; if multiple comparison is not necessary, explain why not. (d) Verify that the sample size is sufficiently large for the chi-square goodness-of-fit test to be appropriate. (e) Considering the results of the hypothesis test, decide which of the Type I or Type II errors is possible, and describe this error. (f) Decide whether H 0 would have been rejected or would not have been rejected with each of the following significance levels: (i) α = 0.01, (ii) α = (g) Construct an appropriate graphical display for the data used in this hypothesis test. 224

8 Answers to Self-Test Problems 29-1 (a) The data consists of a random sample of observations of the qualitative variable color, and the purpose of a chi-square goodness-of-fit test is to compare the sample category proportions of the qualitative variable to hypothesized proportions. (b) Step 1: H 0 : λ R = 0.4, λ G = 0.3, λ Y = 0.2, λ W = 0.1 (α = 0.05) H 1 : At least one hypothesized proportion is not correct Step 2: For the respective colors red, green, yellow, and white, the expected frequencies are 100.8, 75.6, 50.4, and χ 2 3 = Step 3: The rejection is χ H 0 is rejected; < p-value < Step 4: Since χ 2 3 = and χ 2 3; 0.05 = 7.815, we have sufficient evidence to reject H 0. We conclude that not all of the hypothesized proportions, λ R = 0.4, λ G = 0.3, λ Y = 0.2, λ W = 0.1 are correct (0.001 < p-value < 0.005). Since H 0 is rejected, we need to use multiple comparison to identify which hypothesized proportions are not correct. (c) Red Green Yellow White Sample Proportion 114/252 = /252 = /252 = /252 = z-score z = 2.50 With α = 0.05, we conclude that the proportion of buyers preferring green is smaller than the hypothesized 0.3, and that the proportion of buyers preferring white is larger than the hypothesized 0.1. (d) Since the expected frequencies are all greater than 5, the sample size is sufficiently large for chi-square goodness-of-fit test to be appropriate. (e) Since H 0 is rejected, the Type I error is possible, which is concluding that at least one hypothesized proportion is not correct when in reality the hypothesized proportions are all correct. (f) H 0 would have been rejected with both α = 0.01 and α = (g) Since color is a qualitative variable, a bar chart or pie chart is an appropriate graphical display. Figure 29-3 displays a bar chart. 225

9 Summary With a sufficiently large sample size, the chi-square goodness-of-fit hypothesis test concerning proportions corresponding to k > 2 categories can be performed by using the chi-square test statistic ( ) O χ 2 k 1 = E E 2, where O observed frequencies and E represents frequencies that are expected if H 0 were actually true. The null hypothesis, which gives hypothesized proportions, is rejected when the chi-square test statistic is larger than a value determined from the chi-square distribution with k 1 degrees of freedom. An easy way to check that the sample size is sufficiently large is to verify that all E 5. When we do not reject the null hypothesis in a chi-square goodness-of-fit test, no further analysis is called for, since we are concluding that the hypothesized proportions are correct. When we reject the null hypothesis in a chi-square goodness-of-fit test, multiple comparison is desirable to identify which hypothesized proportions are not correct. The Bonferroni method is one of many multiple comparison methods available. When applied to the sample proportions from a chi-square goodness-of-fit test, this method consists of the three steps displayed in Table

### Unit 21 Student s t Distribution in Hypotheses Testing

Unit 21 Student s t Distribution in Hypotheses Testing Objectives: To understand the difference between the standard normal distribution and the Student's t distributions To understand the difference between

### Unit 24 Hypothesis Tests about Means

Unit 24 Hypothesis Tests about Means Objectives: To recognize the difference between a paired t test and a two-sample t test To perform a paired t test To perform a two-sample t test A measure of the amount

### Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Objectives: To perform a hypothesis test concerning the slope of a least squares line To recognize that testing for a

### Unit 22 One-Sided and Two-Sided Hypotheses Tests

Unit 22 One-Sided and Two-Sided Hypotheses Tests Objectives: To differentiate between a one-sided hypothesis test and a two-sided hypothesis test about a population proportion or a population mean To understand

### Unit 26 Estimation with Confidence Intervals

Unit 26 Estimation with Confidence Intervals Objectives: To see how confidence intervals are used to estimate a population proportion, a population mean, a difference in population proportions, or a difference

### MATH 10: Elementary Statistics and Probability Chapter 11: The Chi-Square Distribution

MATH 10: Elementary Statistics and Probability Chapter 11: The Chi-Square Distribution Tony Pourmohamad Department of Mathematics De Anza College Spring 2015 Objectives By the end of this set of slides,

### Calculating P-Values. Parkland College. Isela Guerra Parkland College. Recommended Citation

Parkland College A with Honors Projects Honors Program 2014 Calculating P-Values Isela Guerra Parkland College Recommended Citation Guerra, Isela, "Calculating P-Values" (2014). A with Honors Projects.

### Unit 16 Normal Distributions

Unit 16 Normal Distributions Objectives: To obtain relative frequencies (probabilities) and percentiles with a population having a normal distribution While there are many different types of distributions

### Chapter Additional: Standard Deviation and Chi- Square

Chapter Additional: Standard Deviation and Chi- Square Chapter Outline: 6.4 Confidence Intervals for the Standard Deviation 7.5 Hypothesis testing for Standard Deviation Section 6.4 Objectives Interpret

### Module 5 Hypotheses Tests: Comparing Two Groups

Module 5 Hypotheses Tests: Comparing Two Groups Objective: In medical research, we often compare the outcomes between two groups of patients, namely exposed and unexposed groups. At the completion of this

### The Goodness-of-Fit Test

on the Lecture 49 Section 14.3 Hampden-Sydney College Tue, Apr 21, 2009 Outline 1 on the 2 3 on the 4 5 Hypotheses on the (Steps 1 and 2) (1) H 0 : H 1 : H 0 is false. (2) α = 0.05. p 1 = 0.24 p 2 = 0.20

### CATEGORICAL DATA Chi-Square Tests for Univariate Data

CATEGORICAL DATA Chi-Square Tests For Univariate Data 1 CATEGORICAL DATA Chi-Square Tests for Univariate Data Recall that a categorical variable is one in which the possible values are categories or groupings.

### 12.5: CHI-SQUARE GOODNESS OF FIT TESTS

125: Chi-Square Goodness of Fit Tests CD12-1 125: CHI-SQUARE GOODNESS OF FIT TESTS In this section, the χ 2 distribution is used for testing the goodness of fit of a set of data to a specific probability

### 11-2 Goodness of Fit Test

11-2 Goodness of Fit Test In This section we consider sample data consisting of observed frequency counts arranged in a single row or column (called a one-way frequency table). We will use a hypothesis

### The Chi Square Test. Diana Mindrila, Ph.D. Phoebe Balentyne, M.Ed. Based on Chapter 23 of The Basic Practice of Statistics (6 th ed.

The Chi Square Test Diana Mindrila, Ph.D. Phoebe Balentyne, M.Ed. Based on Chapter 23 of The Basic Practice of Statistics (6 th ed.) Concepts: Two-Way Tables The Problem of Multiple Comparisons Expected

### Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1)

Spring 204 Class 9: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.) Big Picture: More than Two Samples In Chapter 7: We looked at quantitative variables and compared the

### 1.5 Oneway Analysis of Variance

Statistics: Rosie Cornish. 200. 1.5 Oneway Analysis of Variance 1 Introduction Oneway analysis of variance (ANOVA) is used to compare several means. This method is often used in scientific or medical experiments

### How to Conduct a Hypothesis Test

How to Conduct a Hypothesis Test The idea of hypothesis testing is relatively straightforward. In various studies we observe certain events. We must ask, is the event due to chance alone, or is there some

### Chapter 3 RANDOM VARIATE GENERATION

Chapter 3 RANDOM VARIATE GENERATION In order to do a Monte Carlo simulation either by hand or by computer, techniques must be developed for generating values of random variables having known distributions.

### 13.2 The Chi Square Test for Homogeneity of Populations The setting: Used to compare distribution of proportions in two or more populations.

13.2 The Chi Square Test for Homogeneity of Populations The setting: Used to compare distribution of proportions in two or more populations. Data is organized in a two way table Explanatory variable (Treatments)

### Lesson 1: Comparison of Population Means Part c: Comparison of Two- Means

Lesson : Comparison of Population Means Part c: Comparison of Two- Means Welcome to lesson c. This third lesson of lesson will discuss hypothesis testing for two independent means. Steps in Hypothesis

### Module 9: Nonparametric Tests. The Applied Research Center

Module 9: Nonparametric Tests The Applied Research Center Module 9 Overview } Nonparametric Tests } Parametric vs. Nonparametric Tests } Restrictions of Nonparametric Tests } One-Sample Chi-Square Test

### The Chi-Square Test. STAT E-50 Introduction to Statistics

STAT -50 Introduction to Statistics The Chi-Square Test The Chi-square test is a nonparametric test that is used to compare experimental results with theoretical models. That is, we will be comparing observed

### Statistics for Management II-STAT 362-Final Review

Statistics for Management II-STAT 362-Final Review Multiple Choice Identify the letter of the choice that best completes the statement or answers the question. 1. The ability of an interval estimate to

### Odds ratio, Odds ratio test for independence, chi-squared statistic.

Odds ratio, Odds ratio test for independence, chi-squared statistic. Announcements: Assignment 5 is live on webpage. Due Wed Aug 1 at 4:30pm. (9 days, 1 hour, 58.5 minutes ) Final exam is Aug 9. Review

### Section 13, Part 1 ANOVA. Analysis Of Variance

Section 13, Part 1 ANOVA Analysis Of Variance Course Overview So far in this course we ve covered: Descriptive statistics Summary statistics Tables and Graphs Probability Probability Rules Probability

### Descriptive Statistics

Descriptive Statistics Primer Descriptive statistics Central tendency Variation Relative position Relationships Calculating descriptive statistics Descriptive Statistics Purpose to describe or summarize

### Null Hypothesis H 0. The null hypothesis (denoted by H 0

Hypothesis test In statistics, a hypothesis is a claim or statement about a property of a population. A hypothesis test (or test of significance) is a standard procedure for testing a claim about a property

### 6.4 Normal Distribution

Contents 6.4 Normal Distribution....................... 381 6.4.1 Characteristics of the Normal Distribution....... 381 6.4.2 The Standardized Normal Distribution......... 385 6.4.3 Meaning of Areas under

### Technology Step-by-Step Using StatCrunch

Technology Step-by-Step Using StatCrunch Section 1.3 Simple Random Sampling 1. Select Data, highlight Simulate Data, then highlight Discrete Uniform. 2. Fill in the following window with the appropriate

### CHAPTER 11 CHI-SQUARE: NON-PARAMETRIC COMPARISONS OF FREQUENCY

CHAPTER 11 CHI-SQUARE: NON-PARAMETRIC COMPARISONS OF FREQUENCY The hypothesis testing statistics detailed thus far in this text have all been designed to allow comparison of the means of two or more samples

### 9-3.4 Likelihood ratio test. Neyman-Pearson lemma

9-3.4 Likelihood ratio test Neyman-Pearson lemma 9-1 Hypothesis Testing 9-1.1 Statistical Hypotheses Statistical hypothesis testing and confidence interval estimation of parameters are the fundamental

### Chi-Square Tests. In This Chapter BONUS CHAPTER

BONUS CHAPTER Chi-Square Tests In the previous chapters, we explored the wonderful world of hypothesis testing as we compared means and proportions of one, two, three, and more populations, making an educated

### Study Guide for the Final Exam

Study Guide for the Final Exam When studying, remember that the computational portion of the exam will only involve new material (covered after the second midterm), that material from Exam 1 will make

### Sampling and Hypothesis Testing

Population and sample Sampling and Hypothesis Testing Allin Cottrell Population : an entire set of objects or units of observation of one sort or another. Sample : subset of a population. Parameter versus

### Association Between Variables

Contents 11 Association Between Variables 767 11.1 Introduction............................ 767 11.1.1 Measure of Association................. 768 11.1.2 Chapter Summary.................... 769 11.2 Chi

### Hypothesis Testing Level I Quantitative Methods. IFT Notes for the CFA exam

Hypothesis Testing 2014 Level I Quantitative Methods IFT Notes for the CFA exam Contents 1. Introduction... 3 2. Hypothesis Testing... 3 3. Hypothesis Tests Concerning the Mean... 10 4. Hypothesis Tests

### Lecture Notes Module 1

Lecture Notes Module 1 Study Populations A study population is a clearly defined collection of people, animals, plants, or objects. In psychological research, a study population usually consists of a specific

### Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Summary of Formulas and Concepts Descriptive Statistics (Ch. 1-4) Definitions Population: The complete set of numerical information on a particular quantity in which an investigator is interested. We assume

### Introduction to Hypothesis Testing

I. Terms, Concepts. Introduction to Hypothesis Testing A. In general, we do not know the true value of population parameters - they must be estimated. However, we do have hypotheses about what the true

### Box plots & t-tests. Example

Box plots & t-tests Box Plots Box plots are a graphical representation of your sample (easy to visualize descriptive statistics); they are also known as box-and-whisker diagrams. Any data that you can

### Analysis of Environmental Data Problem Set Conceptual Foundations: Hy p o th e sis te stin g c o n c e p ts

Analysis of Environmental Data Problem Set Conceptual Foundations: Hy p o th e sis te stin g c o n c e p ts Answer any one of questions 1-4, AND answer question 5: 1. Consider a hypothetical study of wood

### Chapter 8. Hypothesis Testing

Chapter 8 Hypothesis Testing Hypothesis In statistics, a hypothesis is a claim or statement about a property of a population. A hypothesis test (or test of significance) is a standard procedure for testing

### Sampling Distribution of the Mean & Hypothesis Testing

Sampling Distribution of the Mean & Hypothesis Testing Let s first review what we know about sampling distributions of the mean (Central Limit Theorem): 1. The mean of the sampling distribution will be

### 3. Nonparametric methods

3. Nonparametric methods If the probability distributions of the statistical variables are unknown or are not as required (e.g. normality assumption violated), then we may still apply nonparametric tests

### LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING In this lab you will explore the concept of a confidence interval and hypothesis testing through a simulation problem in engineering setting.

### 7 Hypothesis testing - one sample tests

7 Hypothesis testing - one sample tests 7.1 Introduction Definition 7.1 A hypothesis is a statement about a population parameter. Example A hypothesis might be that the mean age of students taking MAS113X

### We know from STAT.1030 that the relevant test statistic for equality of proportions is:

2. Chi 2 -tests for equality of proportions Introduction: Two Samples Consider comparing the sample proportions p 1 and p 2 in independent random samples of size n 1 and n 2 out of two populations which

### Two-Sample T-Test from Means and SD s

Chapter 07 Two-Sample T-Test from Means and SD s Introduction This procedure computes the two-sample t-test and several other two-sample tests directly from the mean, standard deviation, and sample size.

### Hypothesis Testing COMP 245 STATISTICS. Dr N A Heard. 1 Hypothesis Testing 2 1.1 Introduction... 2 1.2 Error Rates and Power of a Test...

Hypothesis Testing COMP 45 STATISTICS Dr N A Heard Contents 1 Hypothesis Testing 1.1 Introduction........................................ 1. Error Rates and Power of a Test.............................

### Having a coin come up heads or tails is a variable on a nominal scale. Heads is a different category from tails.

Chi-square Goodness of Fit Test The chi-square test is designed to test differences whether one frequency is different from another frequency. The chi-square test is designed for use with data on a nominal

### PASS Sample Size Software

Chapter 250 Introduction The Chi-square test is often used to test whether sets of frequencies or proportions follow certain patterns. The two most common instances are tests of goodness of fit using multinomial

### Hypothesis Testing for Two Variances

Hypothesis Testing for Two Variances The standard version of the two-sample t test is used when the variances of the underlying populations are either known or assumed to be equal In other situations,

### Bivariate Statistics Session 2: Measuring Associations Chi-Square Test

Bivariate Statistics Session 2: Measuring Associations Chi-Square Test Features Of The Chi-Square Statistic The chi-square test is non-parametric. That is, it makes no assumptions about the distribution

### INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA)

INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA) As with other parametric statistics, we begin the one-way ANOVA with a test of the underlying assumptions. Our first assumption is the assumption of

### Lecture 7: Binomial Test, Chisquare

Lecture 7: Binomial Test, Chisquare Test, and ANOVA May, 01 GENOME 560, Spring 01 Goals ANOVA Binomial test Chi square test Fisher s exact test Su In Lee, CSE & GS suinlee@uw.edu 1 Whirlwind Tour of One/Two

### Chapter 23. Two Categorical Variables: The Chi-Square Test

Chapter 23. Two Categorical Variables: The Chi-Square Test 1 Chapter 23. Two Categorical Variables: The Chi-Square Test Two-Way Tables Note. We quickly review two-way tables with an example. Example. Exercise

### November 08, 2010. 155S8.6_3 Testing a Claim About a Standard Deviation or Variance

Chapter 8 Hypothesis Testing 8 1 Review and Preview 8 2 Basics of Hypothesis Testing 8 3 Testing a Claim about a Proportion 8 4 Testing a Claim About a Mean: σ Known 8 5 Testing a Claim About a Mean: σ

### TImath.com. F Distributions. Statistics

F Distributions ID: 9780 Time required 30 minutes Activity Overview In this activity, students study the characteristics of the F distribution and discuss why the distribution is not symmetric (skewed

### AP Statistics 1998 Scoring Guidelines

AP Statistics 1998 Scoring Guidelines These materials are intended for non-commercial use by AP teachers for course and exam preparation; permission for any other use must be sought from the Advanced Placement

### AP: LAB 8: THE CHI-SQUARE TEST. Probability, Random Chance, and Genetics

Ms. Foglia Date AP: LAB 8: THE CHI-SQUARE TEST Probability, Random Chance, and Genetics Why do we study random chance and probability at the beginning of a unit on genetics? Genetics is the study of inheritance,

### STA-201-TE. 5. Measures of relationship: correlation (5%) Correlation coefficient; Pearson r; correlation and causation; proportion of common variance

Principles of Statistics STA-201-TE This TECEP is an introduction to descriptive and inferential statistics. Topics include: measures of central tendency, variability, correlation, regression, hypothesis

### 4. Sum the results of the calculation described in step 3 for all classes of progeny

F09 Biol 322 chi square notes 1. Before proceeding with the chi square calculation, clearly state the genetic hypothesis concerning the data. This hypothesis is an interpretation of the data that gives

### Chi Square for Contingency Tables

2 x 2 Case Chi Square for Contingency Tables A test for p 1 = p 2 We have learned a confidence interval for p 1 p 2, the difference in the population proportions. We want a hypothesis testing procedure

### Statistics 2014 Scoring Guidelines

AP Statistics 2014 Scoring Guidelines College Board, Advanced Placement Program, AP, AP Central, and the acorn logo are registered trademarks of the College Board. AP Central is the official online home

### ANOVA MULTIPLE CHOICE QUESTIONS. In the following multiple-choice questions, select the best answer.

ANOVA MULTIPLE CHOICE QUESTIONS In the following multiple-choice questions, select the best answer. 1. Analysis of variance is a statistical method of comparing the of several populations. a. standard

### Hypothesis Testing. Bluman Chapter 8

CHAPTER 8 Learning Objectives C H A P T E R E I G H T Hypothesis Testing 1 Outline 8-1 Steps in Traditional Method 8-2 z Test for a Mean 8-3 t Test for a Mean 8-4 z Test for a Proportion 8-5 2 Test for

### Chapter 8 Introduction to Hypothesis Testing

Chapter 8 Student Lecture Notes 8-1 Chapter 8 Introduction to Hypothesis Testing Fall 26 Fundamentals of Business Statistics 1 Chapter Goals After completing this chapter, you should be able to: Formulate

### ANOVA - Analysis of Variance

ANOVA - Analysis of Variance ANOVA - Analysis of Variance Extends independent-samples t test Compares the means of groups of independent observations Don t be fooled by the name. ANOVA does not compare

### Hypothesis testing: Examples. AMS7, Spring 2012

Hypothesis testing: Examples AMS7, Spring 2012 Example 1: Testing a Claim about a Proportion Sect. 7.3, # 2: Survey of Drinking: In a Gallup survey, 1087 randomly selected adults were asked whether they

### Chapter 15 Multiple Choice Questions (The answers are provided after the last question.)

Chapter 15 Multiple Choice Questions (The answers are provided after the last question.) 1. What is the median of the following set of scores? 18, 6, 12, 10, 14? a. 10 b. 14 c. 18 d. 12 2. Approximately

### UNDERSTANDING THE TWO-WAY ANOVA

UNDERSTANDING THE e have seen how the one-way ANOVA can be used to compare two or more sample means in studies involving a single independent variable. This can be extended to two independent variables

### Chi-square test Fisher s Exact test

Lesson 1 Chi-square test Fisher s Exact test McNemar s Test Lesson 1 Overview Lesson 11 covered two inference methods for categorical data from groups Confidence Intervals for the difference of two proportions

### χ 2 = (O i E i ) 2 E i

Chapter 24 Two-Way Tables and the Chi-Square Test We look at two-way tables to determine association of paired qualitative data. We look at marginal distributions, conditional distributions and bar graphs.

### Chapter 11 Chi square Tests

11.1 Introduction 259 Chapter 11 Chi square Tests 11.1 Introduction In this chapter we will consider the use of chi square tests (χ 2 tests) to determine whether hypothesized models are consistent with

### 6 3 The Standard Normal Distribution

290 Chapter 6 The Normal Distribution Figure 6 5 Areas Under a Normal Distribution Curve 34.13% 34.13% 2.28% 13.59% 13.59% 2.28% 3 2 1 + 1 + 2 + 3 About 68% About 95% About 99.7% 6 3 The Distribution Since

### Inferential Statistics

Inferential Statistics Sampling and the normal distribution Z-scores Confidence levels and intervals Hypothesis testing Commonly used statistical methods Inferential Statistics Descriptive statistics are

### Introduction to Quantitative Methods

Introduction to Quantitative Methods October 15, 2009 Contents 1 Definition of Key Terms 2 2 Descriptive Statistics 3 2.1 Frequency Tables......................... 4 2.2 Measures of Central Tendencies.................

### Hypothesis Testing or How to Decide to Decide Edpsy 580

Hypothesis Testing or How to Decide to Decide Edpsy 580 Carolyn J. Anderson Department of Educational Psychology University of Illinois at Urbana-Champaign Hypothesis Testing or How to Decide to Decide

### Lecture 42 Section 14.3. Tue, Apr 8, 2008

the Lecture 42 Section 14.3 Hampden-Sydney College Tue, Apr 8, 2008 Outline the 1 2 the 3 4 5 the The will compute χ 2 areas, but not χ 2 percentiles. (That s ok.) After performing the χ 2 test by hand,

### Comparing Multiple Proportions, Test of Independence and Goodness of Fit

Comparing Multiple Proportions, Test of Independence and Goodness of Fit Content Testing the Equality of Population Proportions for Three or More Populations Test of Independence Goodness of Fit Test 2

### CHAPTER 11 CHI-SQUARE AND F DISTRIBUTIONS

CHAPTER 11 CHI-SQUARE AND F DISTRIBUTIONS CHI-SQUARE TESTS OF INDEPENDENCE (SECTION 11.1 OF UNDERSTANDABLE STATISTICS) In chi-square tests of independence we use the hypotheses. H0: The variables are independent

### Recall this chart that showed how most of our course would be organized:

Chapter 4 One-Way ANOVA Recall this chart that showed how most of our course would be organized: Explanatory Variable(s) Response Variable Methods Categorical Categorical Contingency Tables Categorical

### Simple Regression Theory II 2010 Samuel L. Baker

SIMPLE REGRESSION THEORY II 1 Simple Regression Theory II 2010 Samuel L. Baker Assessing how good the regression equation is likely to be Assignment 1A gets into drawing inferences about how close the

### Unit 9 Describing Relationships in Scatter Plots and Line Graphs

Unit 9 Describing Relationships in Scatter Plots and Line Graphs Objectives: To construct and interpret a scatter plot or line graph for two quantitative variables To recognize linear relationships, non-linear

### NPTEL STRUCTURAL RELIABILITY

NPTEL Course On STRUCTURAL RELIABILITY Module # 02 Lecture 6 Course Format: Web Instructor: Dr. Arunasis Chakraborty Department of Civil Engineering Indian Institute of Technology Guwahati 6. Lecture 06:

### LAB : THE CHI-SQUARE TEST. Probability, Random Chance, and Genetics

Period Date LAB : THE CHI-SQUARE TEST Probability, Random Chance, and Genetics Why do we study random chance and probability at the beginning of a unit on genetics? Genetics is the study of inheritance,

### 5/31/2013. 6.1 Normal Distributions. Normal Distributions. Chapter 6. Distribution. The Normal Distribution. Outline. Objectives.

The Normal Distribution C H 6A P T E R The Normal Distribution Outline 6 1 6 2 Applications of the Normal Distribution 6 3 The Central Limit Theorem 6 4 The Normal Approximation to the Binomial Distribution

### 1. Rejecting a true null hypothesis is classified as a error. 2. Failing to reject a false null hypothesis is classified as a error.

1. Rejecting a true null hypothesis is classified as a error. 2. Failing to reject a false null hypothesis is classified as a error. 8.5 Goodness of Fit Test Suppose we want to make an inference about

### Analysis of Variance ANOVA

Analysis of Variance ANOVA Overview We ve used the t -test to compare the means from two independent groups. Now we ve come to the final topic of the course: how to compare means from more than two populations.

### MBA 611 STATISTICS AND QUANTITATIVE METHODS

MBA 611 STATISTICS AND QUANTITATIVE METHODS Part I. Review of Basic Statistics (Chapters 1-11) A. Introduction (Chapter 1) Uncertainty: Decisions are often based on incomplete information from uncertain

### Biostatistics: DESCRIPTIVE STATISTICS: 2, VARIABILITY

Biostatistics: DESCRIPTIVE STATISTICS: 2, VARIABILITY 1. Introduction Besides arriving at an appropriate expression of an average or consensus value for observations of a population, it is important to

### statistics Chi-square tests and nonparametric Summary sheet from last time: Hypothesis testing Summary sheet from last time: Confidence intervals

Summary sheet from last time: Confidence intervals Confidence intervals take on the usual form: parameter = statistic ± t crit SE(statistic) parameter SE a s e sqrt(1/n + m x 2 /ss xx ) b s e /sqrt(ss

### 13: Additional ANOVA Topics. Post hoc Comparisons

13: Additional ANOVA Topics Post hoc Comparisons ANOVA Assumptions Assessing Group Variances When Distributional Assumptions are Severely Violated Kruskal-Wallis Test Post hoc Comparisons In the prior

### Nonparametric Statistics

1 14.1 Using the Binomial Table Nonparametric Statistics In this chapter, we will survey several methods of inference from Nonparametric Statistics. These methods will introduce us to several new tables

### Chi-Square Test. Contingency Tables. Contingency Tables. Chi-Square Test for Independence. Chi-Square Tests for Goodnessof-Fit

Chi-Square Tests 15 Chapter Chi-Square Test for Independence Chi-Square Tests for Goodness Uniform Goodness- Poisson Goodness- Goodness Test ECDF Tests (Optional) McGraw-Hill/Irwin Copyright 2009 by The

### When to use a Chi-Square test:

When to use a Chi-Square test: Usually in psychological research, we aim to obtain one or more scores from each participant. However, sometimes data consist merely of the frequencies with which certain

### AP Statistics 2002 Scoring Guidelines

AP Statistics 2002 Scoring Guidelines The materials included in these files are intended for use by AP teachers for course and exam preparation in the classroom; permission for any other use must be sought