# Unit 21 Student s t Distribution in Hypotheses Testing

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 Unit 21 Student s t Distribution in Hypotheses Testing Objectives: To understand the difference between the standard normal distribution and the Student's t distributions To understand the difference between a hypothesis test about a population proportion λ and a hypothesis test about a population mean µ Our introduction to hypothesis testing has focused solely on hypothesis tests about a population proportion λ. The z test statistic we use in the hypothesis test H : λ = λ vs. H 1 : λ λ is z = λ p λ ( 1 λ ) n Let us now investigate whether we can use the same sort of test statistic in a hypothesis test about a population mean µ. As an illustration, suppose the manager at a certain manufacturing plant would like to see if there is any evidence that the mean amount of cereal per box from an assembly line is different from 12 oz., which is the advertised amount. He chooses a.5 significance level to perform a hypothesis test. The following amounts (in ounces) are recorded for a simple random sample of 16 boxes of cereal selected from the assembly line: You can verify that the sample mean is x = oz., and that the sample standard deviation is s =.22 oz. We shall now attempt to perform the four steps of a hypothesis test. The first step is to state the null and alternative hypotheses, and to choose a significance level. Remember that the null hypothesis is what we assume to be true unless there is sufficient evidence against it, and the alternative hypothesis is the statement we are looking for evidence to support; also, recall that a null hypothesis is often a statement involving equality, while the alternative hypothesis is a statement involving inequality. In this hypothesis test, we shall assume that the mean amount of cereal per box is 12 oz. unless we find sufficient evidence otherwise; in other words, 12 oz. is our hypothesized value for µ. We complete the first step of the hypothesis test by writing following: H : µ = 12 vs. H 1 : µ 12 (α =.5). The second step is to collect data and calculate the value of the test statistic. Recall that our test statistic in a hypothesis test about λ is the z-score of p calculated under the assumption that the hypothesized value of λ is correct. To use the same sort of test statistic in a hypothesis test about a population mean µ, we would calculate the z-score of x under the assumption that the hypothesized value of µ was correct. Letting µ represent our hypothesized value for µ, this z-score for x is z = x σ µ x x x µ = σ n. Calculating this z-score for x presents a problem because we do not have the value of σ. We know the value of x (since we calculated it from our sample), we know the value of µ (since this represents our hypothesized 147

2 mean), and we know the value of n (since this is of course the sample size of our data); however, we do not know the population standard deviation σ, and consequently we are not able to obtain the value of σ x. When calculating the z-score of p, we do not have this problem, since the value of σ p depends only on λ and n. If it were possible for us to have access to the entire population in order to obtain the population standard deviation σ, then we could also obtain the population mean µ, and there would be no reason for us to be doing a hypothesis test! A very reasonable way to estimate the population standard deviation σ is to use the sample standard deviation s whose value we can easily obtain from our sample. It is then tempting to simply replace σ by s in the z-score which would be our test statistic, and in doing so the denominator of our test statistic would have σ / n in place of by s / n. For small sample sizes n, simply using s / n in place of σ x = σ / n in the z-score for x and still treating this statistic as a z-score may not give accurate results. This was demonstrated in a paper published in 198 by W. S. Gosset. Gosset showed that instead of associating this test statistic with the standard normal distribution, the test statistic should be associated with the Student's t distribution with n 1 degrees of freedom. (In order to prevent the competitors of Gosset's employers from realizing that they could benefit from more accurate statistical methods, Gosset's paper was published anonymously under the name of "Student," because Gosset was studying at the University of London at the time of his discovery.) In the hypothesis test H : µ = µ vs. H 1 : µ µ, our test statistic shall be x µ t n 1 = s n, which we refer to as a t statistic with n 1 degrees of freedom. Consequently, in order to perform hypothesis tests about µ, we must first study the t distributions. Recall that Table A.2 provides us with information about the standard normal distribution. We shall now be concerned with Table A.3 which provides us with information about the t distributions. The figures displaying density curves at the top of Tables A.2 and A.3 indicate that some similarities exist between the standard normal distribution and the t distributions; specifically, these distributions are symmetric and bell-shaped. The difference between the standard normal distribution and the t distributions is that the t distributions have more variation and a flatter bell-shape than the standard normal distribution. The figures at the top of Tables A.2 and A.3 look alike only because they are not drawn to scale. If a standard normal density curve and a Student's t density curve were to be graphed on the same scale, the Student's t density curve would look flatter and more "spread out." It is important to keep in mind that Table A.2 supplies information about the one and only standard normal density curve, but Table A.3 contains information about several different t distributions. Each different t distribution is identified by something we call degrees of freedom, which we shall abbreviate as df. We refer to our test statistic in a test about a population mean µ as t n 1 where n 1 is the corresponding df. It is difficult to explain in detail the concept of df in a general context without venturing beyond the mathematical scope of this text. While we could just think of degrees of freedom (df) simply as a way of distinguishing between the different density curves for the various t distributions, there is an intuitive rule of thumb possible to explain degrees of freedom. In any situation where a statistical procedure is applied to data to obtain some information about population parameters, we can identify the number of parameters being estimated and the size of the samples(s) in the data. The rule of thumb that often applies is that the relevant degrees of freedom is equal to the total size of the samples(s) minus the number of parameters being estimated. Suppose a random sample of size n is to be used in a hypothesis test about a population mean µ, then the rule of thumb leads us to believe that the degrees of freedom should be equal to the total sample size n minus the number of parameters being estimated which is one (i.e., the one and only parameter is µ). This justifies the n 1 degrees of freedom associated with the test statistic t n 1 defined previously. Each row of Table A.3 is labeled by a value for df. The information contained in each row is exactly the same type of information contained at the bottom of Table A.2. At the bottom of Table A.2, you find values 148

3 such as z.5, z.25, and z.5, where the amount of area above the z-score is indicated in the subscript. Each row of Table A.3 contains t-scores, where the column heading indicates the amount of area above each t-score under the corresponding t density curve, etc. The notation we shall use to represent the t-scores in each row of Table A.3 will be similar to the notation we use to represent the z-scores at the bottom of Table A.2. For instance, just as we use z.1 to represent the z-score above which lies.1 of the area under the standard normal density curve, we shall use t 5;.1 to represent the t-score above which lies.1 of the area under the density curve for the Student's t distribution with df = 5. When designating t-scores in Table A.3, we include in the subscript the degrees of freedom followed by the area above the t-score, separated by a semicolon. Pick any column of Table A.3, and look at how the t-scores change from the row for df = 1 down through the successive rows. You should notice that as df increase, the corresponding t-scores in two successive rows become closer in value. By the time df = 3, the corresponding t-scores in two successive rows are so much alike that Table A.3 jumps from df = 3 to df = 4 to df = 6 to df = 12. The t distributions for df > 12 are almost identical to the standard normal distribution. Consequently, the t-scores in the last row of Table A.3, labeled df = (where represents infinity), are exactly equal to the values at the bottom of Table A.2: z.1, z.5, etc. It should not be surprising that the t distributions become more like a standard normal distribution as df become larger. In the formula for the test statistic t n 1, we are using s to estimate σ, and as the sample size n increases, s tends to become a more accurate estimate of σ, as we should expect to happen. In fact, with sample sizes n > 12, s is such a good estimator of σ, that it is reasonable to treat the value of s as if it were the actual value of σ. In other words, for sample sizes n > 12, we may treat the test statistic t n 1 as if it were the test statistic z. This is why the t-scores in the last row of Table A.3 (df = ) are exactly equal to the values at the bottom of Table A.2. Now let us return to the second step of our hypothesis test concerning the mean amount of cereal per box (i.e., H : µ = 12 vs. H 1 : µ 12). Since n = 16, then df = n 1 = 16 1 = 15. Earlier, we found that x = oz. and s =.22 oz. in the simple random sample of n = 16 cereal boxes; since 12 oz. is our hypothesized mean, we can calculate the value of our test statistic t 15 as follows: x µ t 15 = = = s.22 n 16. The third step of our hypothesis test is to define the rejection region, decide whether or not to reject the null hypothesis, and obtain the p-value of the test. Figure 21-1 displays the rejection region graphically. The shaded area corresponds to values of the test statistic t 15 which are unlikely to occur if H : µ = 12 is true. In order that the total shaded area be equal to α =.5, the shaded area corresponding to positive values of t 15 is.25, and the shaded area corresponding to negative values of t 15 is.25. We see then that the rejection region is defined by the t-score above which.25 of the area lies and the t-score below which.25 of the area lies, when df = 15. The symmetry of the t distributions of course implies these two t-scores are negatives of each other. From the row corresponding to df = 15 in Table A.3, we find that the t-score above which.25 of the area lies is t 15;.25 = We can then define our rejection region algebraically as t or t (i.e., t ). 149

4 Since our test statistic, calculated in the second step, was found to be t 15 = which is in the rejection region, our decision is to reject H : µ = 12; in other words, our data provides sufficient evidence against H : µ = 12. Figure 21-2 graphically illustrates the p-value. The p-value of this hypothesis test is the probability of obtaining a test statistic value t 15 which represents a larger distance away from the hypothesized mean than the observed t 15 = In Figure 21-2, the observed test statistic value t 15 = has been located on the horizontal axis along with the value The two shaded areas together correspond to test statistic values which represent a larger distance away from the hypothesized mean than the observed t 15 = 2.182; in other words, the total shaded area in Figure 21-2 is the p-value of the hypothesis test. To obtain the shaded area exactly from a table, we would need a table of areas under a Student's t density curve as detailed as Table A.2 is for the standard normal distribution. However, we can use Table A.3 to obtain upper and lower bounds for the p-value, and such bounds are satisfactory in practice. In the row of Table A.3 corresponding to df = 15, we find that the observed test statistic t 15 = is between t 15;.25 = and t 15;.1 = This tells us that the area above the t-score is between.1 and.25, and the area below the t-score is between.1 and.25. Therefore, the total shaded area in Figure 21-2, which is the p-value, must be between.2 = and.5 = We indicate this by writing.2 < p-value <.5. Finding that.2 < p-value <.5 confirms to us that H is rejected with α =.5. However, it also provides the reader of these results with some additional information. Remember that the choice of α =.5 is made by the person(s) performing the hypothesis test. Those who read the results of a hypothesis test may wish to know whether or not the conclusion would have been the same with a different significance level. The p-value provides this information. For instance, suppose someone looking at the results of this present hypothesis test wants to know whether or not the H would have been rejected with α =.1 or with α =.1. Knowing that.2 < p-value <.5 tells us that H would be rejected with α =.1 but not with α =.1. In other words, the evidence against H is sufficiently strong with α =.1 or α =.5 but not with α =.1. To complete the fourth step of the hypothesis test, we can summarize the results of the hypothesis test as follows: Since t 15 = and t 15;.25 = 2.131, we have sufficient evidence to reject H. We conclude that the mean amount of cereal per box is different from 12 oz. (.2 < p-value <.5). The data suggest that the mean amount of cereal per box is less than 12 oz. Since we rejected the H, thus concluding that the population mean was not equal to the hypothesized 12 oz., we are left with the question of whether the mean is less than or greater than the hypothesized 12 oz. We can answer this question by noting that the observed test statistic t 15 = is in the half of the rejection region corresponding to negative values of t 15, because the sample mean x = oz. was less than the hypothesized 12 oz. It is for this reason that we included as part of our statement of results a sentence indicating that the mean amount of cereal per box appears to be less than 12 oz. Consequently, an adjustment appears to be necessary in the assembly line from which the boxes of cereal come. It is often useful to include a graphical display with the results of a hypothesis test. Figure 21-3 is a stem-and-leaf display for the amounts of cereal recorded for the sample of 16 boxes used in the hypothesis test just completed. From this stem-and-leaf display, we can easily find the five-number summary: 11.55, 11.74, 11.88, 12.65, This five-number summary was used to construct the box plot displayed as Figure

5 While Tables A.2 and A.3 provide us with detailed information about the standard normal distribution and the t distributions, this information is also often available from computer software such as spreadsheets and statistical packages, and also from some calculators such as programmable calculators. However, when it is necessary to consult Table A.3, the desired df is not always available. For instance, t 45;.5 and t 5;.5 are not directly available from Table A.3, but this really presents no problem. The df in the table closest to df = 45 is df = 4; since t 4;.5 = and t 6;.5 = are close in value, we may simply choose to say t 45;.5 t 4;.5 = Since df = 5 is exactly in between df = 4 and df = 6, we can approximate t 5;.5 as the average of t 4;.5 and t 6;.5 ; that is, we can say t 5;.5 (t 4;.5 + t 6;.5 ) / 2 = ( ) / 2 = The use of the test statistic t n 1 in a hypothesis test about µ is based on the assumption that the data is a simple random sample selected from a population having a normal distribution. When the population from which the simple random sample is selected does not have a normal distribution, the test statistic t n 1 will still be appropriate if the sample size n is sufficiently large. We have previously seen how we can easily verify that the use of the test statistic z is appropriate in a test about λ, but it is not as easy to verify that the use of the test statistic t n 1 is appropriate in a test about µ. One of the best ways to decide whether or not the population from which a simple random sample is selected has a normal distribution is to use the Shapiro-Wilk test. We will not discuss the Shapiro-Wilk test in detail here, since it is somewhat complex and generally requires the use of a statistical software package on a computer. We shall instead just say that the Shapiro-Wilk test is based on what is called a normal probability plot. A normal probability plot is a scatter plot constructed by labeling one axis with the observations from a simple random sample and labeling the other axis with the z-scores corresponding to the percentile ranks of the observations in the sample. If the points lie sufficiently close to a straight line, then we conclude that the simple random sample came from a normal distribution, in which case we can feel confident that the use of the test statistic t n 1 is appropriate; otherwise, we conclude that the sample was selected from a non-normal distribution. Generally, it is not easy to construct a normal probability plot without access to computer software or a graphing calculator. If one cannot easily perform the Shapiro-Wilk test or does not have easy access to a normal probability plot, a box plot or histogram can perhaps give one some information about whether or not there appears to be any skewness or outliers. Of course, examination of a normal probability plot or the results of the Shapiro-Wilk test are always preferable. If we should find that the population from which our sample comes does not have a normal distribution, the t n 1 statistic may still be appropriate with a sufficiently large sample size n. In cases where we do not think the sample size is sufficiently large, or where we think the non-normality of the 151

6 distribution is too great, there are alternative test statistics available in place of the t n 1 statistic. These alternative test statistics are often available with statistical software and on certain calculators, and we will not take the time to discuss these in detail here. The points in a normal probability plot of the data displayed in Figure 21-3 will all lie relatively close to a straight line, which you can verify if you were to construct this normal probability plot. This suggests that the use of the test statistic t n 1 seems appropriate. Without access to this normal probability plot, one could look at the box plot displayed as Figure Since there are no outliers and there appears to be only a slight skewness in the data, one might think that it is not unreasonable to assume that the sample size n = 16 is sufficiently large for the test statistic t 15 to be appropriate. We have now discussed the z test about a population proportion λ and the t n 1 test about a population mean µ. One assumption on which each of these two hypothesis tests is based is that a simple random sample has been selected. Back when we discussed the topic of sampling, we observed how a true simple random sample may often not be available in practice. However, there are situations in which we may reasonably treat a sample which is not truly a simple random sample as if it were a simple random sample. When conducting any hypothesis test, it is important to assess how the sampling procedure used may affect the results of the test. As we study many different types of hypothesis tests, we shall find that even though the purpose behind various hypothesis tests may differ, the general procedure can always be summarized in terms of the four steps we have been using. The primary difference is the test statistic which is used and the distribution on which the rejection region is based. Self-Test Problem For many years, the mean gas mileage on a long trip was known to be 26.5 miles per gallon for a certain type of automobile. When a newly designed engine is incorporated into the automobile, a.5 significance level is chosen for a hypothesis test to see if there is any evidence that the mean gas mileage with the new design is different from 26.5 miles per gallon. Each automobile in a simple random sample of those with the new design is driven 1 miles and the following gas mileages are recorded: (a) Complete the four steps of the hypothesis test by completing the table titled Hypothesis Test for Self-Test Problem (b) Why would a box plot be an appropriate graphical display for the data used in this hypothesis test? (c) Construct the box plot, and comment on whether there is any reason to think the t statistic is not appropriate. (d) Considering the results of the hypothesis test, decide which of the Type I or Type II errors is possible, and describe this error. (e) Decide whether H would have been rejected or would not have been rejected with each of the following significance levels: (i) α =.1, (ii) α =.1. Self-Test Problem A manufacturer of light bulbs claims that the mean lifetime of the bulbs is 5 hours. A consumer group chooses a.1 significance level for a hypothesis test to see if there is any evidence that the mean lifetime is different from 5 hours. In a simple random sample of 4 light bulbs, the mean lifetime is hours and the standard deviation is hours. (a) Complete the four steps of the hypothesis test by completing the table titled Hypothesis Test for Self-Test Problem (b) If the data were available, decide which of the following would be best as a graphical display for the data and say why: (i) bar or pie chart, (ii) scatter plot, (iii) box plot. (c) Considering the results of the hypothesis test, decide which of the Type I or Type II errors is possible, and describe this error. (d) Decide whether H would have been rejected or would not have been rejected with each of the following significance levels: (i) α =.1, (ii) α =.5. (e) What would the presence of one or more outliers in the data suggest about using the t statistic? 152

7 153

8 Self-Test Problem Figure 21-5 is the same stem-and-leaf display as Figure 5-1. The data are from Self-Test Problem 5-2 where the breaking strength in pounds is recorded for several pieces of a type of rope. Find the five-number summary for the data, determine whether or not there are any potential outliers, and give a reason why the test statistic t n 1 might not be considered appropriate for a hypothesis test. (Optional: Construct a normal probability plot.) Answers to Self-Test Problems 21-1 (a) Step 1: H : µ = 26.5, H 1 : µ 26.5, α =.5 (b) (c) (d) Step 2: n = 9, x = 26.9 mpg, s =.5916 mpg, and t 8 = Step 3: The rejection is t or t (which can be written as t ). H is not rejected;.5 < p-value <.1. Step 4: Since t 8 = and t 8;.25 = 2.36, we do not have sufficient evidence to reject H. We conclude that the mean gas mileage with the new design is not different from 26.5 miles per gallon (.5 < p-value <.1). Since gas mileage is a quantitative variable, a box plot is an appropriate graphical display. The five-number summary is 25.9, 26.35, 27.1, 27.4, 27.6, and Figure 21-6 is the box plot. Since there are no potential outliers, and the box plot does not look extremely skewed, there is no reason to think that the t statistic is not appropriate. Since H is not rejected, a Type II error is possible, which is concluding that µ = 26.5 when really µ (e) H would not have been rejected with α =.1 but would have been rejected with α = (a) Step 1: H : µ = 5, H 1 : µ 5, α =.1 (b) (c) Step 2: n = 4, x = hours, s = hours, and t 39 = 3.9 Step 3: The rejection is t or t (which can be written as t ). H is rejected; p-value <.1. Step 4: Since t 39 = 3.9 and t 39;.5 = 1.684, we have sufficient evidence to reject H. We conclude that the mean lifetime is different from 5 hours (p-value <.1). The data suggest that the mean lifetime is less than 5 hours. Since light bulb lifetime is a quantitative variable, a box plot is an appropriate graphical display. Since H is rejected, a Type I error is possible, which is concluding that µ 26.5 when really µ = (d) H would have been rejected with both α =.1 and α =.5. (e) The presence of one or more outliers would suggest that the t statistic may not be appropriate The five-number summary is 6, 93, 13.5, 113, 128; IQR = = 2 and each of 6 and 62 is a potential outlier. Since there are potential outliers (and the points on a normal probability plot do not appear to lie close to a straight line), the test statistic t n 1 may not be appropriate. 154

9 Summary The test statistic in a hypothesis test about λ is the z-score of p calculated under the assumption that the hypothesized value of λ is correct, that is, the test statistic is z = p λ = σ λ p λ ( 1 λ ) p n, where λ represents the hypothesized value for λ. In order to use the same sort of test statistic in a hypothesis test about a population mean µ, we must know the population standard deviation σ ; when σ is not known, which is generally the case, the test statistic in a hypothesis test a population mean µ is x µ t n 1 = s n, where µ represents the hypothesized value for µ. We refer to this test statistic as a t statistic with df = n 1, where df is degrees of freedom. The rule of thumb that often applies is that the relevant degrees of freedom is equal to the total size of the samples(s) minus the number of parameters being estimated. In hypothesis test about the mean, the test statistic t n 1 must be compared to the Student's t distribution with n 1 degrees of freedom. Some similarities exist between the standard normal distribution and the t distributions in that these distributions are all symmetric and bell-shaped; the differences are that the t distributions have more variation and a flatter bell-shape than the standard normal distribution. If a standard normal density curve and a Student's t density curve were to be graphed on the same scale, the Student's t density curve would look flatter and more "spread out." In the formula for the test statistic t n 1, σ is estimated by s. As the sample size n increases, s tends to become a more accurate estimate of σ. With sample sizes n > 12, s is such a good estimator of σ, that it is reasonable to treat the value of s as if it were the actual value of σ. Consequently, we can treat a Student's t distribution with df > 12 as if it were a standard normal distribution. The steps in a hypothesis test about a population proportion λ and a hypothesis test about a population mean µ are the same except for the different test statistics that are used. In virtually all hypothesis tests, the same basic steps are performed with differing test statistics and differing purposes behind various hypothesis tests. One of the best ways to decide whether or not the population from which a simple random sample is selected has a normal distribution is to use the Shapiro-Wilk test which is based on what is called a normal probability plot. A normal probability plot is a scatter plot constructed by labeling one axis with the observations from a simple random sample and labeling the other axis with the z-scores corresponding to the percentile ranks of the observations from the sample. If the points do not lie sufficiently close to a straight line, 155

10 then Shapiro-Wilk test will conclude that the sample was selected from a non-normal distribution. One might look at a box plot or histogram of the sample to see if there is any indication of skewness or if there are any outliers, but the Shapiro-Wilk test and/or examination of a normal probability plot is preferable. If we should find that the population from which our sample comes does not have a normal distribution, the t n 1 statistic may still be appropriate with a sufficiently large sample size n. In cases where we do not think the sample size is sufficiently large, or where we think the non-normality of the distribution is too great, there are alternative test statistics available in place of the t n 1 statistic. The z test about a population proportion λ and the t test are both based on the assumption that a simple random sample has been selected. While a true simple random sample may often not be available in practice, there are situations in which we may reasonably treat a sample which is not a simple random sample as if it were a simple random sample. When conducting any hypothesis test, it is important to assess how the sampling procedure used may affect the results of the test. 156

### Unit 29 Chi-Square Goodness-of-Fit Test

Unit 29 Chi-Square Goodness-of-Fit Test Objectives: To perform the chi-square hypothesis test concerning proportions corresponding to more than two categories of a qualitative variable To perform the Bonferroni

### Unit 26 Estimation with Confidence Intervals

Unit 26 Estimation with Confidence Intervals Objectives: To see how confidence intervals are used to estimate a population proportion, a population mean, a difference in population proportions, or a difference

### Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Objectives: To perform a hypothesis test concerning the slope of a least squares line To recognize that testing for a

### Inferential Statistics

Inferential Statistics Sampling and the normal distribution Z-scores Confidence levels and intervals Hypothesis testing Commonly used statistical methods Inferential Statistics Descriptive statistics are

### 6.4 Normal Distribution

Contents 6.4 Normal Distribution....................... 381 6.4.1 Characteristics of the Normal Distribution....... 381 6.4.2 The Standardized Normal Distribution......... 385 6.4.3 Meaning of Areas under

### Recall this chart that showed how most of our course would be organized:

Chapter 4 One-Way ANOVA Recall this chart that showed how most of our course would be organized: Explanatory Variable(s) Response Variable Methods Categorical Categorical Contingency Tables Categorical

### 6 3 The Standard Normal Distribution

290 Chapter 6 The Normal Distribution Figure 6 5 Areas Under a Normal Distribution Curve 34.13% 34.13% 2.28% 13.59% 13.59% 2.28% 3 2 1 + 1 + 2 + 3 About 68% About 95% About 99.7% 6 3 The Distribution Since

### Simple Regression Theory II 2010 Samuel L. Baker

SIMPLE REGRESSION THEORY II 1 Simple Regression Theory II 2010 Samuel L. Baker Assessing how good the regression equation is likely to be Assignment 1A gets into drawing inferences about how close the

### 10-3 Measures of Central Tendency and Variation

10-3 Measures of Central Tendency and Variation So far, we have discussed some graphical methods of data description. Now, we will investigate how statements of central tendency and variation can be used.

### Complement: 0.4 x 0.8 = =.6

Homework Chapter 5 Name: 1. Use the graph below 1 a) Why is the total area under this curve equal to 1? Rectangle; A = LW A = 1(1) = 1 b) What percent of the observations lie above 0.8? 1 -.8 =.2; A =

### Chapter 7. Section Introduction to Hypothesis Testing

Section 7.1 - Introduction to Hypothesis Testing Chapter 7 Objectives: State a null hypothesis and an alternative hypothesis Identify type I and type II errors and interpret the level of significance Determine

### AP Statistics 2001 Solutions and Scoring Guidelines

AP Statistics 2001 Solutions and Scoring Guidelines The materials included in these files are intended for non-commercial use by AP teachers for course and exam preparation; permission for any other use

### Statistical Inference

Statistical Inference Idea: Estimate parameters of the population distribution using data. How: Use the sampling distribution of sample statistics and methods based on what would happen if we used this

### 4. Introduction to Statistics

Statistics for Engineers 4-1 4. Introduction to Statistics Descriptive Statistics Types of data A variate or random variable is a quantity or attribute whose value may vary from one unit of investigation

### Graphical and Tabular. Summarization of Data OPRE 6301

Graphical and Tabular Summarization of Data OPRE 6301 Introduction and Re-cap... Descriptive statistics involves arranging, summarizing, and presenting a set of data in such a way that useful information

### Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Summary of Formulas and Concepts Descriptive Statistics (Ch. 1-4) Definitions Population: The complete set of numerical information on a particular quantity in which an investigator is interested. We assume

### Interpreting Data in Normal Distributions

Interpreting Data in Normal Distributions This curve is kind of a big deal. It shows the distribution of a set of test scores, the results of rolling a die a million times, the heights of people on Earth,

### The right edge of the box is the third quartile, Q 3, which is the median of the data values above the median. Maximum Median

CONDENSED LESSON 2.1 Box Plots In this lesson you will create and interpret box plots for sets of data use the interquartile range (IQR) to identify potential outliers and graph them on a modified box

### Chapter 3: Data Description Numerical Methods

Chapter 3: Data Description Numerical Methods Learning Objectives Upon successful completion of Chapter 3, you will be able to: Summarize data using measures of central tendency, such as the mean, median,

### Week 4: Standard Error and Confidence Intervals

Health Sciences M.Sc. Programme Applied Biostatistics Week 4: Standard Error and Confidence Intervals Sampling Most research data come from subjects we think of as samples drawn from a larger population.

### Exercise 1.12 (Pg. 22-23)

Individuals: The objects that are described by a set of data. They may be people, animals, things, etc. (Also referred to as Cases or Records) Variables: The characteristics recorded about each individual.

### Density Curve. A density curve is the graph of a continuous probability distribution. It must satisfy the following properties:

Density Curve A density curve is the graph of a continuous probability distribution. It must satisfy the following properties: 1. The total area under the curve must equal 1. 2. Every point on the curve

### Report of for Chapter 2 pretest

Report of for Chapter 2 pretest Exam: Chapter 2 pretest Category: Organizing and Graphing Data 1. "For our study of driving habits, we recorded the speed of every fifth vehicle on Drury Lane. Nearly every

### Comparing Means in Two Populations

Comparing Means in Two Populations Overview The previous section discussed hypothesis testing when sampling from a single population (either a single mean or two means from the same population). Now we

### F. Farrokhyar, MPhil, PhD, PDoc

Learning objectives Descriptive Statistics F. Farrokhyar, MPhil, PhD, PDoc To recognize different types of variables To learn how to appropriately explore your data How to display data using graphs How

### Descriptive Statistics

Descriptive Statistics Suppose following data have been collected (heights of 99 five-year-old boys) 117.9 11.2 112.9 115.9 18. 14.6 17.1 117.9 111.8 16.3 111. 1.4 112.1 19.2 11. 15.4 99.4 11.1 13.3 16.9

### Hypothesis Testing Level I Quantitative Methods. IFT Notes for the CFA exam

Hypothesis Testing 2014 Level I Quantitative Methods IFT Notes for the CFA exam Contents 1. Introduction... 3 2. Hypothesis Testing... 3 3. Hypothesis Tests Concerning the Mean... 10 4. Hypothesis Tests

### How to Conduct a Hypothesis Test

How to Conduct a Hypothesis Test The idea of hypothesis testing is relatively straightforward. In various studies we observe certain events. We must ask, is the event due to chance alone, or is there some

### CALCULATIONS & STATISTICS

CALCULATIONS & STATISTICS CALCULATION OF SCORES Conversion of 1-5 scale to 0-100 scores When you look at your report, you will notice that the scores are reported on a 0-100 scale, even though respondents

### Chapter 8: Introduction to Hypothesis Testing

Chapter 8: Introduction to Hypothesis Testing We re now at the point where we can discuss the logic of hypothesis testing. This procedure will underlie the statistical analyses that we ll use for the remainder

### STAT 155 Introductory Statistics. Lecture 5: Density Curves and Normal Distributions (I)

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL STAT 155 Introductory Statistics Lecture 5: Density Curves and Normal Distributions (I) 9/12/06 Lecture 5 1 A problem about Standard Deviation A variable

### Technology Step-by-Step Using StatCrunch

Technology Step-by-Step Using StatCrunch Section 1.3 Simple Random Sampling 1. Select Data, highlight Simulate Data, then highlight Discrete Uniform. 2. Fill in the following window with the appropriate

### Introductory Statistics Notes

Introductory Statistics Notes Jamie DeCoster Department of Psychology University of Alabama 348 Gordon Palmer Hall Box 870348 Tuscaloosa, AL 35487-0348 Phone: (205) 348-4431 Fax: (205) 348-8648 August

### Characteristics of Binomial Distributions

Lesson2 Characteristics of Binomial Distributions In the last lesson, you constructed several binomial distributions, observed their shapes, and estimated their means and standard deviations. In Investigation

### Chapter 2 - Graphical Summaries of Data

Chapter 2 - Graphical Summaries of Data Data recorded in the sequence in which they are collected and before they are processed or ranked are called raw data. Raw data is often difficult to make sense

### Exploratory data analysis (Chapter 2) Fall 2011

Exploratory data analysis (Chapter 2) Fall 2011 Data Examples Example 1: Survey Data 1 Data collected from a Stat 371 class in Fall 2005 2 They answered questions about their: gender, major, year in school,

### MINITAB ASSISTANT WHITE PAPER

MINITAB ASSISTANT WHITE PAPER This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. One-Way

### Hypothesis Testing (unknown σ)

Hypothesis Testing (unknown σ) Business Statistics Recall: Plan for Today Null and Alternative Hypotheses Types of errors: type I, type II Types of correct decisions: type A, type B Level of Significance

### Statistical Inference and t-tests

1 Statistical Inference and t-tests Objectives Evaluate the difference between a sample mean and a target value using a one-sample t-test. Evaluate the difference between a sample mean and a target value

### Basic Statistics. Probability and Confidence Intervals

Basic Statistics Probability and Confidence Intervals Probability and Confidence Intervals Learning Intentions Today we will understand: Interpreting the meaning of a confidence interval Calculating the

### 5/31/2013. 6.1 Normal Distributions. Normal Distributions. Chapter 6. Distribution. The Normal Distribution. Outline. Objectives.

The Normal Distribution C H 6A P T E R The Normal Distribution Outline 6 1 6 2 Applications of the Normal Distribution 6 3 The Central Limit Theorem 6 4 The Normal Approximation to the Binomial Distribution

### Hypothesis Testing. Bluman Chapter 8

CHAPTER 8 Learning Objectives C H A P T E R E I G H T Hypothesis Testing 1 Outline 8-1 Steps in Traditional Method 8-2 z Test for a Mean 8-3 t Test for a Mean 8-4 z Test for a Proportion 8-5 2 Test for

### Session 7 Bivariate Data and Analysis

Session 7 Bivariate Data and Analysis Key Terms for This Session Previously Introduced mean standard deviation New in This Session association bivariate analysis contingency table co-variation least squares

### Coins, Presidents, and Justices: Normal Distributions and z-scores

activity 17.1 Coins, Presidents, and Justices: Normal Distributions and z-scores In the first part of this activity, you will generate some data that should have an approximately normal (or bell-shaped)

### Fairfield Public Schools

Mathematics Fairfield Public Schools AP Statistics AP Statistics BOE Approved 04/08/2014 1 AP STATISTICS Critical Areas of Focus AP Statistics is a rigorous course that offers advanced students an opportunity

### Chapter 3 RANDOM VARIATE GENERATION

Chapter 3 RANDOM VARIATE GENERATION In order to do a Monte Carlo simulation either by hand or by computer, techniques must be developed for generating values of random variables having known distributions.

### Statistiek I. t-tests. John Nerbonne. CLCG, Rijksuniversiteit Groningen. John Nerbonne 1/35

Statistiek I t-tests John Nerbonne CLCG, Rijksuniversiteit Groningen http://wwwletrugnl/nerbonne/teach/statistiek-i/ John Nerbonne 1/35 t-tests To test an average or pair of averages when σ is known, we

### Probability. Distribution. Outline

7 The Normal Probability Distribution Outline 7.1 Properties of the Normal Distribution 7.2 The Standard Normal Distribution 7.3 Applications of the Normal Distribution 7.4 Assessing Normality 7.5 The

### Statistics I for QBIC. Contents and Objectives. Chapters 1 7. Revised: August 2013

Statistics I for QBIC Text Book: Biostatistics, 10 th edition, by Daniel & Cross Contents and Objectives Chapters 1 7 Revised: August 2013 Chapter 1: Nature of Statistics (sections 1.1-1.6) Objectives

### Module 5 Hypotheses Tests: Comparing Two Groups

Module 5 Hypotheses Tests: Comparing Two Groups Objective: In medical research, we often compare the outcomes between two groups of patients, namely exposed and unexposed groups. At the completion of this

### 13.2 Measures of Central Tendency

13.2 Measures of Central Tendency Measures of Central Tendency For a given set of numbers, it may be desirable to have a single number to serve as a kind of representative value around which all the numbers

### 1.5 Oneway Analysis of Variance

Statistics: Rosie Cornish. 200. 1.5 Oneway Analysis of Variance 1 Introduction Oneway analysis of variance (ANOVA) is used to compare several means. This method is often used in scientific or medical experiments

### Standard Deviation Calculator

CSS.com Chapter 35 Standard Deviation Calculator Introduction The is a tool to calculate the standard deviation from the data, the standard error, the range, percentiles, the COV, confidence limits, or

### MTH 140 Statistics Videos

MTH 140 Statistics Videos Chapter 1 Picturing Distributions with Graphs Individuals and Variables Categorical Variables: Pie Charts and Bar Graphs Categorical Variables: Pie Charts and Bar Graphs Quantitative

### 1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number A. 3(x - x) B. x 3 x C. 3x - x D. x - 3x 2) Write the following as an algebraic expression

### Describing Populations Statistically: The Mean, Variance, and Standard Deviation

Describing Populations Statistically: The Mean, Variance, and Standard Deviation BIOLOGICAL VARIATION One aspect of biology that holds true for almost all species is that not every individual is exactly

### Histogram. Graphs, and measures of central tendency and spread. Alternative: density (or relative frequency ) plot /13/2004

Graphs, and measures of central tendency and spread 9.07 9/13/004 Histogram If discrete or categorical, bars don t touch. If continuous, can touch, should if there are lots of bins. Sum of bin heights

### HISTOGRAMS, CUMULATIVE FREQUENCY AND BOX PLOTS

Mathematics Revision Guides Histograms, Cumulative Frequency and Box Plots Page 1 of 25 M.K. HOME TUITION Mathematics Revision Guides Level: GCSE Higher Tier HISTOGRAMS, CUMULATIVE FREQUENCY AND BOX PLOTS

### Chapter 15 Multiple Choice Questions (The answers are provided after the last question.)

Chapter 15 Multiple Choice Questions (The answers are provided after the last question.) 1. What is the median of the following set of scores? 18, 6, 12, 10, 14? a. 10 b. 14 c. 18 d. 12 2. Approximately

### Data Analysis: Describing Data - Descriptive Statistics

WHAT IT IS Return to Table of ontents Descriptive statistics include the numbers, tables, charts, and graphs used to describe, organize, summarize, and present raw data. Descriptive statistics are most

### Chapter 3 Normal Distribution

Chapter 3 Normal Distribution Density curve A density curve is an idealized histogram, a mathematical model; the curve tells you what values the quantity can take and how likely they are. Example Height

### Descriptive statistics Statistical inference statistical inference, statistical induction and inferential statistics

Descriptive statistics is the discipline of quantitatively describing the main features of a collection of data. Descriptive statistics are distinguished from inferential statistics (or inductive statistics),

### Lecture 05 Measures of Central Tendency

Lecture 05 Measures of Central Tendency There are three main measures of central tendency: the mean, median, and mode. The purpose of measures of central tendency is to identify the location of the center

### Mind on Statistics. Chapter 2

Mind on Statistics Chapter 2 Sections 2.1 2.3 1. Tallies and cross-tabulations are used to summarize which of these variable types? A. Quantitative B. Mathematical C. Continuous D. Categorical 2. The table

### Biostatistics: A QUICK GUIDE TO THE USE AND CHOICE OF GRAPHS AND CHARTS

Biostatistics: A QUICK GUIDE TO THE USE AND CHOICE OF GRAPHS AND CHARTS 1. Introduction, and choosing a graph or chart Graphs and charts provide a powerful way of summarising data and presenting them in

### MATH 10: Elementary Statistics and Probability Chapter 9: Hypothesis Testing with One Sample

MATH 10: Elementary Statistics and Probability Chapter 9: Hypothesis Testing with One Sample Tony Pourmohamad Department of Mathematics De Anza College Spring 2015 Objectives By the end of this set of

### StatTools Assignment #1, Winter 2007 This assignment has three parts.

StatTools Assignment #1, Winter 2007 This assignment has three parts. Before beginning this assignment, be sure to carefully read the General Instructions document that is located on the StatTools Assignments

### The Normal Distribution

Chapter 6 The Normal Distribution 6.1 The Normal Distribution 1 6.1.1 Student Learning Objectives By the end of this chapter, the student should be able to: Recognize the normal probability distribution

### 1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years

### STAT 350 Practice Final Exam Solution (Spring 2015)

PART 1: Multiple Choice Questions: 1) A study was conducted to compare five different training programs for improving endurance. Forty subjects were randomly divided into five groups of eight subjects

### Unit 9 Describing Relationships in Scatter Plots and Line Graphs

Unit 9 Describing Relationships in Scatter Plots and Line Graphs Objectives: To construct and interpret a scatter plot or line graph for two quantitative variables To recognize linear relationships, non-linear

### Cents and the Central Limit Theorem Overview of Lesson GAISE Components Common Core State Standards for Mathematical Practice

Cents and the Central Limit Theorem Overview of Lesson In this lesson, students conduct a hands-on demonstration of the Central Limit Theorem. They construct a distribution of a population and then construct

### Sampling and Hypothesis Testing

Population and sample Sampling and Hypothesis Testing Allin Cottrell Population : an entire set of objects or units of observation of one sort or another. Sample : subset of a population. Parameter versus

### Pie Charts. proportion of ice-cream flavors sold annually by a given brand. AMS-5: Statistics. Cherry. Cherry. Blueberry. Blueberry. Apple.

Graphical Representations of Data, Mean, Median and Standard Deviation In this class we will consider graphical representations of the distribution of a set of data. The goal is to identify the range of

### Statistics 2014 Scoring Guidelines

AP Statistics 2014 Scoring Guidelines College Board, Advanced Placement Program, AP, AP Central, and the acorn logo are registered trademarks of the College Board. AP Central is the official online home

### LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING In this lab you will explore the concept of a confidence interval and hypothesis testing through a simulation problem in engineering setting.

### When σ Is Known: Recall the Mystery Mean Activity where x bar = 240.79 and we have an SRS of size 16

8.3 ESTIMATING A POPULATION MEAN When σ Is Known: Recall the Mystery Mean Activity where x bar = 240.79 and we have an SRS of size 16 Task was to estimate the mean when we know that the situation is Normal

### AP Statistics Chapter 1 Test - Multiple Choice

AP Statistics Chapter 1 Test - Multiple Choice Name: 1. The following bar graph gives the percent of owners of three brands of trucks who are satisfied with their truck. From this graph, we may conclude

### Variables. Exploratory Data Analysis

Exploratory Data Analysis Exploratory Data Analysis involves both graphical displays of data and numerical summaries of data. A common situation is for a data set to be represented as a matrix. There is

### AP Statistics Solutions to Packet 2

AP Statistics Solutions to Packet 2 The Normal Distributions Density Curves and the Normal Distribution Standard Normal Calculations HW #9 1, 2, 4, 6-8 2.1 DENSITY CURVES (a) Sketch a density curve that

### Numerical Summarization of Data OPRE 6301

Numerical Summarization of Data OPRE 6301 Motivation... In the previous session, we used graphical techniques to describe data. For example: While this histogram provides useful insight, other interesting

### There are some general common sense recommendations to follow when presenting

Presentation of Data The presentation of data in the form of tables, graphs and charts is an important part of the process of data analysis and report writing. Although results can be expressed within

### 1 Measures for location and dispersion of a sample

Statistical Geophysics WS 2008/09 7..2008 Christian Heumann und Helmut Küchenhoff Measures for location and dispersion of a sample Measures for location and dispersion of a sample In the following: Variable

### Data Analysis. Lecture Empirical Model Building and Methods (Empirische Modellbildung und Methoden) SS Analysis of Experiments - Introduction

Data Analysis Lecture Empirical Model Building and Methods (Empirische Modellbildung und Methoden) Prof. Dr. Dr. h.c. Dieter Rombach Dr. Andreas Jedlitschka SS 2014 Analysis of Experiments - Introduction

### NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

### HYPOTHESIS TESTING WITH SPSS:

HYPOTHESIS TESTING WITH SPSS: A NON-STATISTICIAN S GUIDE & TUTORIAL by Dr. Jim Mirabella SPSS 14.0 screenshots reprinted with permission from SPSS Inc. Published June 2006 Copyright Dr. Jim Mirabella CHAPTER

### Newton s First Law of Migration: The Gravity Model

ch04.qxd 6/1/06 3:24 PM Page 101 Activity 1: Predicting Migration with the Gravity Model 101 Name: Newton s First Law of Migration: The Gravity Model Instructor: ACTIVITY 1: PREDICTING MIGRATION WITH THE

### Chapter 1: Looking at Data Section 1.1: Displaying Distributions with Graphs

Types of Variables Chapter 1: Looking at Data Section 1.1: Displaying Distributions with Graphs Quantitative (numerical)variables: take numerical values for which arithmetic operations make sense (addition/averaging)

### 32 Measures of Central Tendency and Dispersion

32 Measures of Central Tendency and Dispersion In this section we discuss two important aspects of data which are its center and its spread. The mean, median, and the mode are measures of central tendency

### 1.5 NUMERICAL REPRESENTATION OF DATA (Sample Statistics)

1.5 NUMERICAL REPRESENTATION OF DATA (Sample Statistics) As well as displaying data graphically we will often wish to summarise it numerically particularly if we wish to compare two or more data sets.

### Descriptive Statistics

Descriptive Statistics Descriptive statistics consist of methods for organizing and summarizing data. It includes the construction of graphs, charts and tables, as well various descriptive measures such

### DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.

DESCRIPTIVE STATISTICS The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses. DESCRIPTIVE VS. INFERENTIAL STATISTICS Descriptive To organize,

### MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Final Exam Review MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. 1) A researcher for an airline interviews all of the passengers on five randomly

### Sampling Distribution of a Normal Variable

Ismor Fischer, 5/9/01 5.-1 5. Formal Statement and Examples Comments: Sampling Distribution of a Normal Variable Given a random variable. Suppose that the population distribution of is known to be normal,

### Power and Sample Size Determination

Power and Sample Size Determination Bret Hanlon and Bret Larget Department of Statistics University of Wisconsin Madison November 3 8, 2011 Power 1 / 31 Experimental Design To this point in the semester,

### MBA 611 STATISTICS AND QUANTITATIVE METHODS

MBA 611 STATISTICS AND QUANTITATIVE METHODS Part I. Review of Basic Statistics (Chapters 1-11) A. Introduction (Chapter 1) Uncertainty: Decisions are often based on incomplete information from uncertain

### 2.0 Lesson Plan. Answer Questions. Summary Statistics. Histograms. The Normal Distribution. Using the Standard Normal Table

2.0 Lesson Plan Answer Questions 1 Summary Statistics Histograms The Normal Distribution Using the Standard Normal Table 2. Summary Statistics Given a collection of data, one needs to find representations