# Solutions to Homework 8 Statistics 302 Professor Larget

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 s to Homework 8 Statistics 302 Professor Larget Tetbook Eercises 6.12 Impact of the Population Proportion on SE Compute the standard error for sample proportions from a population with proportions p = 0.8, p = 0.5, p = 0.3, and p = 0.1 using a sample size of n = 100. Comment on what you see. For which proportion is the standard error the greatest? For which is it the smallest? We compute the standard errors using the formula: p(1 p) 0.8(0.2) p = 0.8 : SE = = = n 100 p(1 p) 0.5(0.5) p = 0.5 : SE = = = n 100 p(1 p) 0.3(0.7) p = 0.3 : SE = = = n 100 p(1 p) 0.1(0.9) p = 0.1 : SE = = = n 100 The largest standard error is at a population proportion of 0.5 (which represents a population split between being in the category we are interested in and not begin in). The farther we get from this proportion, the smaller the standard error is. Of the four we computed, the smallest standard error is at a population proportion of 0.1. Standard Error from a Formula and a Bootstrap Distribution In eercise 6.20, use Statkey or other technology to generate a bootstrap distribution of sample proportions and find the standard error for that distribution. Compare the result to the standard error given by the Central Limit Theorem, using the sample proportion as an estimate of the population proportion Proportion of home team wins in soccer, with n = 120 and ˆp = Using StatKey or other technology to create a bootstrap distribution, we see for one set of 1000 simulations that SE = (Answers may vary slightly with other simulations.) Using the formula from the Central Limit Theorem, and using ˆp = as an estimate for p, we have p(1 p) 0.583(1.583) SE = = n 120 We see that the bootstrap standard error and the formula match very closely Home Field Advantage in Baseball There were 2430 Major League Baseball (MLB) games played in 2009, and the home team won in 54.9% of the games. If we consider the games played in 2009 as a sample of all MLB games, find and interpret a 90% confidence interval for the proportion of games the home team wins in Major League Baseball. To find a 90% confidence interval for p, the proportion of MLB games won by the home team, we 1

2 use z = and ˆp = from the sample of n = 2430 games. The confidence interval is Sample statistic ± z SE ˆp(1 ˆp) ˆp ± z n 0.549(0.451) ± ± to We are 90% confident that the proportion of MLB games that are won by the home team is between and This statement assumes that the 2009 season is representative of all Major League Baseball games. If there is reason to assume that that season introduces bias, then we cannot be confident in our statement What Proportion Favor a Gun Control Law? A survey is planned to estimate the proportion of voters who support a proposed gun control law. The estimate should be within a margin of error of ±2% with 95% confidence, and we do not have any prior knowledge about the proportion who might support the law. How many people need to be included in the sample? The margin of error we desire is ME = 0.02, and for 95% confidence we use z = Since we have no prior knowledge about the proportion in support p, we use the conservative estimate of p = 0.5. We have: ( ) z 2 n = p(1 p) ME ( ) = 0.5(1 0.5) 0.02 = 2401 We need to include 2, 401 people in the survey in order to get the margin of error down to within ±2% Home Field Advantage in Baseball There were 2430 Major League Baseball (MLB) games played in 2009, and the home team won the game in 54.9% of the games. If we consider the games played in 2009 as a sample of all MLB games, test to see if there is evidence, at the 1% level, that the home team wins more than half the games. Show all details of the test. We are conducting a hypothesis test for a proportion p, where p is the proportion of all MLB games won by the home team. We are testing to see if there is evidence that p > 0.5, so we have H 0 : p = 0.5 H a : p > 0.5 This is a one-tail test since we are specifically testing to see if the proportion is greater than

3 The test statistic is: z = Sample Statistic Null parameter SE = ˆp p 0 p 0 (1 p 0 ) n = (0.5) 2430 = Using the normal distribution, we find a p-value of (to five decimal places) zero. This provides very strong evidence to reject H 0 and conclude that the home team wins more than half the games played. The home field advantage is real! 6.70 Percent of Smokers The data in Nutrition Study, introduced in Eercise 1.13 on page 13, include information on nutrition and health habits of a sample of 315 people. One of the variables is Smoke, indicating whether a person smokes or not (yes or no). Use technology to test whether the data provide evidence that the proportion of smokers is different from 20%. We use technology to determine that the number of smokers in the sample is 43, so the sample proportion of smokers is ˆp = 43/315 = The hypotheses are: H 0 : p = 0.20 The test statistic is: z = Sample Statistic Null Parameter SE H a : p 0.20 = ˆp p 0 p 0 (1 p 0 ) n = (0.8) 325 = 2.82 This is a two-tail test, so the p-value is twice the area below in a standard normal distribution. We see that the p-value is 2(0.0024) = This small p-value leads us to reject H 0. We find strong evidence that the proportion of smokers is not 20% How Old is the US Population? From the US Census, we learn that the average age of all US residents is years with a standard deviation of years. Find the mean and standard deviation of the distribution of sample means for age if we take random samples of US residents of size: (a) n = 10 (b) n = 100 (c) n = 1000 (a) The mean of the distribution is years old. The standard deviation of the distribution of sample means is the standard error: SE = σ n = = 7.14 (b) The mean of the distribution is years old. The standard deviation of the distribution of sample means is the standard error: SE = σ n = =

4 (c) The mean of the distribution is years old. The standard deviation of the distribution of sample means is the standard error: SE = σ n = = Notice that as the sample size goes up, the standard error of the sample means goes down. Standard Error from a Formula and a Bootstrap Distribution In Eercises 6.96 to 6.99, use StatKey or other technology to generate a bootstrap distribution of sample means and find the standard error for that distribution. Compare the result to the standard error given by the Central Limit Theorem, using the sample standard deviation as an estimate of the population standard deviation Mean commute time in Atlanta, in minutes, using the data in CommuteAtlanta with n = 500, = 29.11, and s = Using StatKey or other technology to create a bootstrap distribution, we see for one set of 1000 simulations that SE (Answers may vary slightly with other simulations.) Using the formula from the Central Limit Theorem, and using s = as an estimate for σ, we have SE = s n = = We see that the bootstrap standard error and the formula match very closely Bright Light at Night Makes Even Fatter Mice Data A.1 on page 136 introduces a study in which mice that had a light on at night (rather than complete darkness) ate most of their calories when they should have been resting. These mice gained a significant amount of weight, despite eating the same number of calories as mice kept in total darkness. The time of eating seemed to have a significant effect. Eercise eamines the mice with dim light at night. A second group of mice had bright light on all the time (day and night). There were nine mice in the group with bright light at night and they gained an average of 11.0g with a standard deviation of 2.6. The data are shown in the figure in the book. Is it appropriate to use a t-distribution in this situation? Why or why not? If not, how else might we construct a confidence interval for mean weight gain of mice with a bright light on all the time? The sample size of n = 9 is quite small, so we require a condition of approimate normality for the underlying population in order to use the t-distribution. In the dotplot of the data, it appears that the data might be right skewed and there is quite a large outlier. It is probably more reasonable to use other methods, such as a bootstrap distribution, to compute a confidence interval using this data Find the sample size needed to give, with 95% confidence, a margin of error within ±10. Within ±5. Within ±1. Assume that we use σ = 30 as our estimate of the standard deviation in each case. Comment on the relationship between the sample size and the margin of error. 4

5 We use z = 1.96 for 95% confidence, and we use σ = 30. For a desired margin of error of ME = 10, we have: ( ) z σ 2 ( ) n = = = 34.6 ME 10 We round up to n = 35. For a desired margin of error of ME = 5, we have: ( ) z σ 2 ( ) n = = = ME 5 We round up to n = 139. For a desired margin of error of ME = 1, we have: ( ) z σ 2 ( ) n = = = ME 1 We round up to n = 3, 458. We see that the sample size goes up as we require more accuracy. Or, put another way, a larger sample size gives greater accuracy The Chips Ahoy! Challenge in the mid-1990s a Nabisco marketing campaign claimed that there were at least 1000 chips in every bag of Chips Ahoy! cookies. A group of Air Force cadets collected a sample of 42 bags of Chips Ahoy! cookies, bought from locations all across the country to verify this claim. The cookies were dissolved in water and the number of chips (any piece of chocolate) in each bag were hand counted by the cadets. The average number of chips per bag was , with standard deviation chips. (a) Why were the cookies bought from locations all over the country? (b) Test whether the average number of chips per bag is greater than Show all details. (c) Does part (b) confirm Nabisco s claim that every bag has at least 1000 chips? Why or why not? (a) The cookies were bought from locations all over the country to try to avoid sampling bias. (b) Let µ be the mean number of chips per bag. We are testing H 0 : µ = 1000 vs H a : µ > The test statistic is t = 117.6/ = We use a t-distribution with 41 degrees of freedom. The area to the left of 14.4 is negligible, and p-value 0. We conclude, with very strong evidence, that the average number of chips per bag of Chips Ahoy! cookies is greater than (c) No! The test in part (b) gives convincing evidence that the average number of chips per bag is greater than However, this does not necessarily imply that every individual bag has more than 1000 chips Are Florida Lakes Acidic or Alkaline? The ph of a liquid is a measure of its acidity or 5

6 alkalinity. Pure water has a ph of 7, which is neutral. s with a ph less than 7 are acidic while solutions with a ph greater than 7 are basic or alkaline. The dataset FloridaLakes gives information, including ph values, for a sample of lakes in Florida. Computer output of descriptive statistics for the ph variable is shown in the book. (a) How many lakes are included in the dataset? What is the mean ph value? What is the standard devotion? (b) Use the descriptive statistics above to conduct a hypothesis test to determine whether there is evidence that average ph in Florida lakes is different from the neutral value of 7. Show all details of the test and use a 5% significance level. If there is evidence that it is not neutral, does the mean appear to be more acidic or more alkaline? (c) Compare the test statistic and p-value found in part (b) to the computer output shown in the book for the same data. (a) We see that n = 53 with = and s = (b) The hypotheses are: H 0 : µ = 7 H a : µ 7?where µ represents the mean ph level of all Florida lakes. We calculate the test statistic t = µ 0 s/ n = / 53 = 2.31 We use a t-distribution with 52 degrees of freedom to see that the area below?2.31 is Since this is a two-tail test, the p-value is 2(0.0124) = We reject the null hypothesis at a 5% significance level, and conclude that average ph of Florida lakes is different from the neutral value of 7. Florida lakes are, in general, somewhat more acidic than neutral. (c) The test statistic matches the computer output eactly and the p-value is the same up to rounding off. Computer Eercises For each R problem, turn in answers to questions with the written portion of the homework. Send the R code for the problem to Katherine Goode. The answers to questions in the written part should be well written, clear, and organized. The R code should be commented and well formatted. R problem 1 Ideally, a 95% confidence interval will be as tightly clustered around the true value as possible, and will have a 95% coverage probability. When the possible data values are discrete, (such as in the case of sample proportions which can only be a count over the sample size), the true coverage or capture probability is not eactly 0.95 for every p. This problem eamines the true coverage probability for three different methods of making confidence intervals. To compute the coverage probability of a method, recognize that each possible value from 0 to n for a given method results in a confidence interval with a lower bound a() and an upper bound b(). The interval will capture p if a() < p < b(). To compute the capture probability of 6

7 a given p, we need to add up all of the binomial probabilities for the values that capture p in the interval. For a sample size n and true population proportion p, this coverage probability is ( ) n P (p in interval ) = p (1 p) n :a()<p<b() To compute this in R, you need to find the lower and upper bounds of the confidence interval for each possible outcome, and add the probabilities of the outcomes that capture p. Here is some code to get you started using the tetbook method for an eample where n = 10 and p = 0.3. = 0:10 p.hat = /10 # will be a vector se = sqrt(p.hat*(1-p.hat)/10) # also a vector z = qnorm(0.975) a = p.hat - z*se # also a vector b = p.hat + z*se #also a vector [ (a < 0.3) & (0.3 < b) ] # that capture p ## [1] For each of the following methods, find which outcomes result in confidence intervals that capture p and compute the coverage probability from a sample of size n = when p = Normal from maimum likelihood estimate, ˆp = X/n, SE = ˆp(1 ˆp)/n, with the interval ˆp ± 1.96SE Since n =, it is possible for to range from 0 to. Thus, we calculate all possible values of ˆp, which are { 0, 1 },...,. We then calculate the standard errors associated with each ˆp using the formula shown above and R. With the standard error, we are able to calculate both the lower and upper bounds for the confidence intervals using the formula ˆp ± 1.96SE. Several of the values we calculated are shown in the table below. X ˆp SE Lower Bound Upper Bound Now, we need to determine for which values of X, 0.4 is captured by X s confidence interval. Using R, we find that this is the case when X {18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31}. 7

8 Thus, we can now compute the coverage probability as shown below. P (0.4 in interval ) = 31 =18 ( ) (0.4) (1 0.4) n = We find that the coverage probability in this case is in fact a bit less than 95%. The following R code was used to complete this problem..1 = 0: p.hat.1 =.1/ se.1 = sqrt(p.hat.1*(1-p.hat.1)/) z = qnorm(0.975) a.1 = p.hat.1 - z*se.1 b.1 = p.hat.1 + z*se.1 [ (a.1 < 0.4) & (0.4 < b.1) ] sum(dbinom(18:31,,0.4)) 2. Normal from adjusted maimum likelihood estimate, p = (X+2)/(n+4), SE = p(1 p)/(n + 4), with the interval p ± 1.96SE We perform the same process again, but this time we use the new equations presented for calculating p, SE, and the confidence intervals. The table below shows some of the values that we calculate using R. X p SE Lower Bound Upper Bound = = = = = = We use R to determine that 0.4 is captured by the confidence intervals when Hence, the coverage probability is X {17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31}. P (0.4 in interval ) = 31 =17 ( ) (0.4) (1 0.4) n = We find that this method for calculating confidence intervals gives us a coverage probability that is much closer to 95% than the method in part 1. The following R code was used to complete this problem. 8

9 .2 = 0: p.tilde.2 = (.2+2)/(+4) se.2 = sqrt(p.tilde.2*(1-p.tilde.2)/(+4)) z = qnorm(0.975) a.2 = p.tilde.2 - z*se.2 b.2 = p.tilde.2 + z*se.2 [ (a.2 < 0.4) & (0.4 < b.2) ] sum(dbinom(17:31,,0.4)) 3. Within z 2 /2 of the maimum likelihood loglikelihood. For this method, the file logl.r has a function logl.ci.p() which returns the lower and upper bounds of a 95% confidence interval given n and. You can graph the loglikelihood using glogl.p() for n,, and z = 1.96 to see if the returned values make sense. Using R and the code presented by the professor, we calculate a confidence interval based on the maimum loglikelihood for each value of X between 0 and. Some of the confidence intervals are shown below. X Lower Bound Upper Bound We use R to determine that 0.4 is captured by the confidence intervals when X {17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31}. Thus, once again, the capture probability is P (0.4 in interval ) = 31 =17 The following R code was used to complete this problem. CIs <- matri(,nrow=61,ncol=2) for(i in 1:61) { CIs[i,1]<-logl.ci.p(,i-1,conf=0.95)[1] CIs[i,2]<-logl.ci.p(,i-1,conf=0.95)[2] }.3 <- 0:.3[ (CIs[,1] < 0.4) & (0.4 < CIs[,2]) ] ( ) (0.4) (1 0.4) n =

10 R Problem 2 Repeat the previous problem, but for n = and p = 0.1. We go through the same process that we did for problem 1. For the normal maimum likelihood estimate, we determine that 0.1 is captured by a confidence interval when X {3, 4, 5, 6, 7, 8, 9, 10, 11, 12}. Thus, the capture probability is P (0.1 in interval ) = 12 =3 The following R code was used to complete this problem..1 = 0: p.hat.1 =.1/ se.1 = sqrt(p.hat.1*(1-p.hat.1)/) z = qnorm(0.975) a.1 = p.hat.1 - z*se.1 b.1 = p.hat.1 + z*se.1.1[ (a.1 < 0.1) & (0.1 < b.1) ] sum(dbinom(3:12,,0.1)) ( ) (0.1) (1 0.1) n = For the normal from adjusted maimum likelihood estimate, we determine that 0.1 is captured by a confidence interval when X {2, 3, 4, 5, 6, 7, 8, 9, 10}. Thus, the capture probability is P (0.1 in interval ) = 10 =2 The following R code was used to complete this problem..2 = 0: p.tilde.2 = (.2+2)/(+4) se.2 = sqrt(p.tilde.2*(1-p.tilde.2)/(+4)) z = qnorm(0.975) a.2 = p.tilde.2 - z*se.2 b.2 = p.tilde.2 + z*se.2.2[ (a.2 < 0.1) & (0.1 < b.2) ] sum(dbinom(2:10,,0.1)) ( ) (0.1) (1 0.1) n = For the confidence intervals derived from the maimum likelihood loglikelihood, we determine that 0.1 is captured by a confidence interval when X {3, 4, 5, 6, 7, 8, 9, 10, 11}. 10

11 Thus, the capture probability is P (0.1 in interval ) = 11 =3 The following R code was used to complete this problem. CIs <- matri(,nrow=61,ncol=2) for(i in 1:61) { CIs[i,1]<-logl.ci.p(,i-1,conf=0.95)[1] CIs[i,2]<-logl.ci.p(,i-1,conf=0.95)[2] }.3 <- 0:.3[ (CIs[,1] < 0.1) & (0.1 < CIs[,2]) ] sum(dbinom(3:11,,0.1)) ( ) (0.1) (1 0.1) n = R Problem 3 This problem eamines a t distribution with 4 degrees of freedom. Here is some sample code to draw graphs of continuous distributions. = seq(-4,4,0.001) z = dnorm() y.10 = dt(, df=10) d = data.frame(,z,y.10) require(ggplot2) ## Loading required package: ggplot2 ggplot(d) + geom_line(aes(=,y=y.10),color="blue") + geom_line(aes(=,y=z),color="red") + ylab( density ) + ggtitle("t(10) distribution in blue, N(0,1) in red") 1. Draw a graph of a t distribution with 4 degrees of freedom and a standard normal curve from -4 to 4. Below is the graph that was drawn in R. This is the R code used to create the above graph. = seq(-4,4,0.001) z = dnorm(,0,1) y.10 = dt(, df=4) d = data.frame(,z,y.10) require(ggplot2) ggplot(d) + geom_line(aes(=,y=y.10),color="blue")+ geom_line(aes(=,y=z),color="red")+ ylab( density )+ ggtitle("t(4) distribution in blue, N(0,1) in red") 11

12 t(4) distribution in blue, N(0,1) in red density Find the area to the right of 2 under each curve. The area to the right of 2 under the t distribution curve is as follows. P (t > 2) = The area to the right of 2 under the standard normal distribution curve is as follows. P (z > 2) = The following is the R code used to achieve these answers. 1-pt(2,4) 1-pnorm(2,0,1) 3. Find the quantile of each curve. The quantile for the t distribution is The quantile for the standard normal distribution is The following is the R code used to achieve these answers. qt(0.975,4) qnorm(0.975,0,1) 12

13 R Problem 4 Repeat the previous problem, but for a t distribution with 20 degrees of freedom. 1. Draw a graph of a t distribution with 20 degrees of freedom and a standard normal curve from -4 to 4. Below is the graph that was drawn in R. 0.4 t(20) distribution in blue, N(0,1) in red 0.3 density

14 This is the R code used to create the above graph. = seq(-4,4,0.001) z = dnorm(,0,1) y.10 = dt(, df=20) d = data.frame(,z,y.10) require(ggplot2) ggplot(d) + geom_line(aes(=,y=y.10),color="blue")+ geom_line(aes(=,y=z),color="red")+ ylab( density )+ ggtitle("t(20) distribution in blue, N(0,1) in red") 2. Find the area to the right of 2 under each curve. The area to the right of 2 under the t distribution curve is as follows. P (t > 2) = The area to the right of 2 under the standard normal distribution curve is as follows. P (z > 2) = The following is the R code used to achieve these answers. 1-pt(2,20) 1-pnorm(2,0,1) 3. Find the quantile of each curve. The quantile for the t distribution is The quantile for the standard normal distribution is The following is the R code used to achieve these answers. qt(0.975,20) qnorm(0.975,0,1) R Problem 5 Repeat the previous problem, but for a t distribution with 100 degrees of freedom. 1. Draw a graph of a t distribution with 100 degrees of freedom and a standard normal curve from -4 to 4. Below is the graph that was drawn in R. This is the R code used to create the above graph. 14

15 t(100) distribution in blue, N(0,1) in red density = seq(-4,4,0.001) z = dnorm(,0,1) y.10 = dt(, df=100) d = data.frame(,z,y.10) require(ggplot2) ggplot(d) + geom_line(aes(=,y=y.10),color="blue")+ geom_line(aes(=,y=z),color="red")+ ylab( density )+ ggtitle("t(100) distribution in blue, N(0,1) in red") 2. Find the area to the right of 2 under each curve. The area to the right of 2 under the t distribution curve is as follows. P (t > 2) = The area to the right of 2 under the standard normal distribution curve is as follows. P (z > 2) = The following is the R code used to achieve these answers. 1-pt(2,100) 1-pnorm(2,0,1) 15

16 3. Find the quantile of each curve. The quantile for the t distribution is The quantile for the standard normal distribution is The following is the R code used to achieve these answers. qt(0.975,100) qnorm(0.975,0,1) 16

### Solutions to Homework 6 Statistics 302 Professor Larget

s to Homework 6 Statistics 302 Professor Larget Textbook Exercises 5.29 (Graded for Completeness) What Proportion Have College Degrees? According to the US Census Bureau, about 27.5% of US adults over

### Test of proportion = 0.5 N Sample prop 95% CI z- value p- value (0.400, 0.466)

STATISTICS FOR THE SOCIAL AND BEHAVIORAL SCIENCES Recitation #10 Answer Key PROBABILITY, HYPOTHESIS TESTING, CONFIDENCE INTERVALS Hypothesis tests 2 When a recent GSS asked, would you be willing to pay

### Solutions to Homework 3 Statistics 302 Professor Larget

s to Homework 3 Statistics 302 Professor Larget Textbook Exercises 3.20 Customized Home Pages A random sample of n = 1675 Internet users in the US in January 2010 found that 469 of them have customized

### Power and Sample Size Determination

Power and Sample Size Determination Bret Hanlon and Bret Larget Department of Statistics University of Wisconsin Madison November 3 8, 2011 Power 1 / 31 Experimental Design To this point in the semester,

### Math 140: Introductory Statistics Instructor: Julio C. Herrera Exam 3 January 30, 2015

Name: Exam Score: Instructions: This exam covers the material from chapter 7 through 9. Please read each question carefully before you attempt to solve it. Remember that you have to show all of your work

### Wording of Final Conclusion. Slide 1

Wording of Final Conclusion Slide 1 8.3: Assumptions for Testing Slide 2 Claims About Population Means 1) The sample is a simple random sample. 2) The value of the population standard deviation σ is known

### MINITAB ASSISTANT WHITE PAPER

MINITAB ASSISTANT WHITE PAPER This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. One-Way

### Unit 25: Tests of Significance

Unit 25: Tests of Significance Prerequisites In this unit, we use inference about the mean μ of a normal distribution to illustrate the reasoning of significance tests. Hence, students should be familiar

### 6.1 The Elements of a Test of Hypothesis

University of California, Davis Department of Statistics Summer Session II Statistics 13 August 22, 2012 Date of latest update: August 20 Lecture 6: Tests of Hypothesis Suppose you wanted to determine

### LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING In this lab you will explore the concept of a confidence interval and hypothesis testing through a simulation problem in engineering setting.

### 1 Confidence intervals

Math 143 Inference for Means 1 Statistical inference is inferring information about the distribution of a population from information about a sample. We re generally talking about one of two things: 1.

### Hypothesis Testing Level I Quantitative Methods. IFT Notes for the CFA exam

Hypothesis Testing 2014 Level I Quantitative Methods IFT Notes for the CFA exam Contents 1. Introduction... 3 2. Hypothesis Testing... 3 3. Hypothesis Tests Concerning the Mean... 10 4. Hypothesis Tests

### Experimental Design. Power and Sample Size Determination. Proportions. Proportions. Confidence Interval for p. The Binomial Test

Experimental Design Power and Sample Size Determination Bret Hanlon and Bret Larget Department of Statistics University of Wisconsin Madison November 3 8, 2011 To this point in the semester, we have largely

### I. Basics of Hypothesis Testing

Introduction to Hypothesis Testing This deals with an issue highly similar to what we did in the previous chapter. In that chapter we used sample information to make inferences about the range of possibilities

### Solutions to Homework 7 Statistics 302 Professor Larget

s to Homework 7 Statistics 30 Professor Larget Textbook Exercises.56 Housing Units in the US (Graded for Accurateness According to the 00 US Census, 65% of housing units in the US are owner-occupied while

### The calculations lead to the following values: d 2 = 46, n = 8, s d 2 = 4, s d = 2, SEof d = s d n s d n

EXAMPLE 1: Paired t-test and t-interval DBP Readings by Two Devices The diastolic blood pressures (DBP) of 8 patients were determined using two techniques: the standard method used by medical personnel

PROBLEM SET 1 For the first three answer true or false and explain your answer. A picture is often helpful. 1. Suppose the significance level of a hypothesis test is α=0.05. If the p-value of the test

### Lecture Topic 6: Chapter 9 Hypothesis Testing

Lecture Topic 6: Chapter 9 Hypothesis Testing 9.1 Developing Null and Alternative Hypotheses Hypothesis testing can be used to determine whether a statement about the value of a population parameter should

### E205 Final: Version B

Name: Class: Date: E205 Final: Version B Multiple Choice Identify the choice that best completes the statement or answers the question. 1. The owner of a local nightclub has recently surveyed a random

### Hypothesis tests, confidence intervals, and bootstrapping

Hypothesis tests, confidence intervals, and bootstrapping Business Statistics 41000 Fall 2015 1 Topics 1. Hypothesis tests Testing a mean: H0 : µ = µ 0 Testing a proportion: H0 : p = p 0 Testing a difference

### Stats for Strategy Exam 1 In-Class Practice Questions DIRECTIONS

Stats for Strategy Exam 1 In-Class Practice Questions DIRECTIONS Choose the single best answer for each question. Discuss questions with classmates, TAs and Professor Whitten. Raise your hand to check

### find confidence interval for a population mean when the population standard deviation is KNOWN Understand the new distribution the t-distribution

Section 8.3 1 Estimating a Population Mean Topics find confidence interval for a population mean when the population standard deviation is KNOWN find confidence interval for a population mean when the

### Chapter 7 Part 2. Hypothesis testing Power

Chapter 7 Part 2 Hypothesis testing Power November 6, 2008 All of the normal curves in this handout are sampling distributions Goal: To understand the process of hypothesis testing and the relationship

### CONFIDENCE INTERVALS I

CONFIDENCE INTERVALS I ESTIMATION: the sample mean Gx is an estimate of the population mean µ point of sampling is to obtain estimates of population values Example: for 55 students in Section 105, 45 of

### UCLA STAT 13 Statistical Methods - Final Exam Review Solutions Chapter 7 Sampling Distributions of Estimates

UCLA STAT 13 Statistical Methods - Final Exam Review Solutions Chapter 7 Sampling Distributions of Estimates 1. (a) (i) µ µ (ii) σ σ n is exactly Normally distributed. (c) (i) is approximately Normally

### Unit 27: Comparing Two Means

Unit 27: Comparing Two Means Prerequisites Students should have experience with one-sample t-procedures before they begin this unit. That material is covered in Unit 26, Small Sample Inference for One

### Test of Hypotheses. Since the Neyman-Pearson approach involves two statistical hypotheses, one has to decide which one

Test of Hypotheses Hypothesis, Test Statistic, and Rejection Region Imagine that you play a repeated Bernoulli game: you win \$1 if head and lose \$1 if tail. After 10 plays, you lost \$2 in net (4 heads

### Instructor: Doug Ensley Course: MAT Applied Statistics - Ensley

Student: Date: Instructor: Doug Ensley Course: MAT117 01 Applied Statistics - Ensley Assignment: Online 15 - Sections 8.2-8.3 1. A survey asked the question "What do you think is the ideal number of children

### Section 7.2 Confidence Intervals for Population Proportions

Section 7.2 Confidence Intervals for Population Proportions 2012 Pearson Education, Inc. All rights reserved. 1 of 83 Section 7.2 Objectives Find a point estimate for the population proportion Construct

### Basic Elements of a Hypothesis Test. Hypothesis Testing of Proportions and Small Sample Means. Proportions. Proportions

Hypothesis Testing of Proportions and Small Sample Means Dr. Tom Ilvento FREC 408 Basic Elements of a Hypothesis Test H 0 : H a : : : Proportions The Pepsi Challenge asked soda drinkers to compare Diet

### STAT 3090 Test 3 - Version A Fall 2015

Multiple Choice: (Questions 1-16) Answer the following questions on the scantron provided using a #2 pencil. Bubble the response that best answers the question. Each multiple choice correct response is

### Sampling and Hypothesis Testing

Population and sample Sampling and Hypothesis Testing Allin Cottrell Population : an entire set of objects or units of observation of one sort or another. Sample : subset of a population. Parameter versus

### Fairfield Public Schools

Mathematics Fairfield Public Schools AP Statistics AP Statistics BOE Approved 04/08/2014 1 AP STATISTICS Critical Areas of Focus AP Statistics is a rigorous course that offers advanced students an opportunity

### The Basics of a Hypothesis Test

Overview The Basics of a Test Dr Tom Ilvento Department of Food and Resource Economics Alternative way to make inferences from a sample to the Population is via a Test A hypothesis test is based upon A

### Stat 20: Discussion at section

Stat 20: Discussion at section B. M. Bolstad, bolstad@stat.berkeley.edu Oct 20, 2003 Hypothesis tests To perform a hypothesis test we need to do the following things. (a.) State a null and alternative

### Hypothesis Testing with One Sample. Introduction to Hypothesis Testing 7.1. Hypothesis Tests. Chapter 7

Chapter 7 Hypothesis Testing with One Sample 71 Introduction to Hypothesis Testing Hypothesis Tests A hypothesis test is a process that uses sample statistics to test a claim about the value of a population

### An interval estimate (confidence interval) is an interval, or range of values, used to estimate a population parameter. For example 0.476<p<0.

Lecture #7 Chapter 7: Estimates and sample sizes In this chapter, we will learn an important technique of statistical inference to use sample statistics to estimate the value of an unknown population parameter.

### The alternative hypothesis,, is the statement that the parameter value somehow differs from that claimed by the null hypothesis. : 0.5 :>0.5 :<0.

Section 8.2-8.5 Null and Alternative Hypotheses... The null hypothesis,, is a statement that the value of a population parameter is equal to some claimed value. :=0.5 The alternative hypothesis,, is the

### STATISTICS 151 SECTION 1 FINAL EXAM MAY

STATISTICS 151 SECTION 1 FINAL EXAM MAY 2 2009 This is an open book exam. Course text, personal notes and calculator are permitted. You have 3 hours to complete the test. Personal computers and cellphones

### Elements of Hypothesis Testing (Summary from lecture notes)

Statistics-20090 MINITAB - Lab 1 Large Sample Tests of Hypothesis About a Population Mean We use hypothesis tests to make an inference about some population parameter of interest, for example the mean

### A) to D) to B) to E) to C) to Rate of hay fever per 1000 population for people under 25

Unit 0 Review #3 Name: Date:. Suppose a random sample of 380 married couples found that 54 had two or more personality preferences in common. In another random sample of 573 married couples, it was found

### Inferences Based on a Single Sample Tests of Hypothesis

Inferences Based on a Single Sample Tests of Hypothesis 11 Inferences Based on a Single Sample Tests of Hypothesis Chapter 8 8. used to decide whether or not to reject the null hypothesis in favor of the

### Example for testing one population mean:

Today: Sections 13.1 to 13.3 ANNOUNCEMENTS: We will finish hypothesis testing for the 5 situations today. See pages 586-587 (end of Chapter 13) for a summary table. Quiz for week 8 starts Wed, ends Monday

### Extending Hypothesis Testing. p-values & confidence intervals

Extending Hypothesis Testing p-values & confidence intervals So far: how to state a question in the form of two hypotheses (null and alternative), how to assess the data, how to answer the question by

### Water Quality Problem. Hypothesis Testing of Means. Water Quality Example. Water Quality Example. Water quality example. Water Quality Example

Water Quality Problem Hypothesis Testing of Means Dr. Tom Ilvento FREC 408 Suppose I am concerned about the quality of drinking water for people who use wells in a particular geographic area I will test

### Hypothesis Testing - II

-3σ -2σ +σ +2σ +3σ Hypothesis Testing - II Lecture 9 0909.400.01 / 0909.400.02 Dr. P. s Clinic Consultant Module in Probability & Statistics in Engineering Today in P&S -3σ -2σ +σ +2σ +3σ Review: Hypothesis

### Page 548, Exercise 13

Millersville University Name Answer Key Department of Mathematics MATH 130, Elements of Statistics I, Homework 4 April 29, 2008 Page 548, Exercise 13 Height versus Arm Span A statistics student believes

### Hypothesis Testing. Hypothesis Testing CS 700

Hypothesis Testing CS 700 1 Hypothesis Testing! Purpose: make inferences about a population parameter by analyzing differences between observed sample statistics and the results one expects to obtain if

### Chapter 9 Introduction to Hypothesis Testing

Chapter 9 Introduction to Hypothesis Testing 9.2 - Hypothesis Testing Hypothesis testing is an eample of inferential statistics We use sample information to draw conclusions about the population from which

### Introduction to Hypothesis Testing. Point estimation and confidence intervals are useful statistical inference procedures.

Introduction to Hypothesis Testing Point estimation and confidence intervals are useful statistical inference procedures. Another type of inference is used frequently used concerns tests of hypotheses.

### 9Tests of Hypotheses. for a Single Sample CHAPTER OUTLINE

9Tests of Hypotheses for a Single Sample CHAPTER OUTLINE 9-1 HYPOTHESIS TESTING 9-1.1 Statistical Hypotheses 9-1.2 Tests of Statistical Hypotheses 9-1.3 One-Sided and Two-Sided Hypotheses 9-1.4 General

### Unit 21 Student s t Distribution in Hypotheses Testing

Unit 21 Student s t Distribution in Hypotheses Testing Objectives: To understand the difference between the standard normal distribution and the Student's t distributions To understand the difference between

### Solutions to Homework 4 Statistics 302 Professor Larget

s to Homework 4 Statistics 302 Professor Larget Textbook Exercises 3.102 How Important is Regular Exercise? In a recent poll of 1000 American adults, the number saying that exercise is an important part

### Probability, Binomial Distributions and Hypothesis Testing Vartanian, SW 540

Probability, Binomial Distributions and Hypothesis Testing Vartanian, SW 540 1. Assume you are tossing a coin 11 times. The following distribution gives the likelihoods of getting a particular number of

### Hypothesis Testing. Bluman Chapter 8

CHAPTER 8 Learning Objectives C H A P T E R E I G H T Hypothesis Testing 1 Outline 8-1 Steps in Traditional Method 8-2 z Test for a Mean 8-3 t Test for a Mean 8-4 z Test for a Proportion 8-5 2 Test for

### MAT140: Applied Statistical Methods Summary of Calculating Confidence Intervals and Sample Sizes for Estimating Parameters

MAT140: Applied Statistical Methods Summary of Calculating Confidence Intervals and Sample Sizes for Estimating Parameters Inferences about a population parameter can be made using sample statistics for

### Lesson 17: Margin of Error When Estimating a Population Proportion

Margin of Error When Estimating a Population Proportion Classwork In this lesson, you will find and interpret the standard deviation of a simulated distribution for a sample proportion and use this information

### Sampling Distribution of a Sample Proportion

Sampling Distribution of a Sample Proportion From earlier material remember that if X is the count of successes in a sample of n trials of a binomial random variable then the proportion of success is given

### Chi-Square Tests. In This Chapter BONUS CHAPTER

BONUS CHAPTER Chi-Square Tests In the previous chapters, we explored the wonderful world of hypothesis testing as we compared means and proportions of one, two, three, and more populations, making an educated

### Sample Size Determination

Sample Size Determination Population A: 10,000 Population B: 5,000 Sample 10% Sample 15% Sample size 1000 Sample size 750 The process of obtaining information from a subset (sample) of a larger group (population)

### Chapter 7 Section 7.1: Inference for the Mean of a Population

Chapter 7 Section 7.1: Inference for the Mean of a Population Now let s look at a similar situation Take an SRS of size n Normal Population : N(, ). Both and are unknown parameters. Unlike what we used

### Testing: is my coin fair?

Testing: is my coin fair? Formally: we want to make some inference about P(head) Try it: toss coin several times (say 7 times) Assume that it is fair ( P(head)= ), and see if this assumption is compatible

### CHAPTER 11 SECTION 2: INTRODUCTION TO HYPOTHESIS TESTING

CHAPTER 11 SECTION 2: INTRODUCTION TO HYPOTHESIS TESTING MULTIPLE CHOICE 56. In testing the hypotheses H 0 : µ = 50 vs. H 1 : µ 50, the following information is known: n = 64, = 53.5, and σ = 10. The standardized

### Problem Set 2 - Solutions

Ecn 102 - Analysis of Economic Data University of California - Davis January 13, 2010 John Parman Problem Set 2 - Solutions This problem set will be not be graded and does not need to be turned in. However,

### Let m denote the margin of error. Then

S:105 Statistical Methods and Computing Sample size for confidence intervals with σ known t Intervals Lecture 13 Mar. 6, 009 Kate Cowles 374 SH, 335-077 kcowles@stat.uiowa.edu 1 The margin of error The

### Learning Objective. Simple Hypothesis Testing

Exploring Confidence Intervals and Simple Hypothesis Testing 35 A confidence interval describes the amount of uncertainty associated with a sample estimate of a population parameter. 95% confidence level

### Estimating and Finding Confidence Intervals

. Activity 7 Estimating and Finding Confidence Intervals Topic 33 (40) Estimating A Normal Population Mean μ (σ Known) A random sample of size 10 from a population of heights that has a normal distribution

### Chapter 7. Estimates and Sample Size

Chapter 7. Estimates and Sample Size Chapter Problem: How do we interpret a poll about global warming? Pew Research Center Poll: From what you ve read and heard, is there a solid evidence that the average

### Third Midterm Exam (MATH1070 Spring 2012)

Third Midterm Exam (MATH1070 Spring 2012) Instructions: This is a one hour exam. You can use a notesheet. Calculators are allowed, but other electronics are prohibited. 1. [40pts] Multiple Choice Problems

### MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

STT315 Practice Ch 5-7 MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. Solve the problem. 1) The length of time a traffic signal stays green (nicknamed

### STA 2023H Solutions for Practice Test 4

1. Which statement is not true about confidence intervals? A. A confidence interval is an interval of values computed from sample data that is likely to include the true population value. B. An approximate

### An Introduction to Statistics Course (ECOE 1302) Spring Semester 2011 Chapter 9 - FUNDAMENTALS OF HYPOTHESIS TESTING: ONE-SAMPLE TESTS

The Islamic University of Gaza Faculty of Commerce Department of Economics and Political Sciences An Introduction to Statistics Course (ECOE 302) Spring Semester 20 Chapter 9 - FUNDAMENTALS OF HYPOTHESIS

### AP Statistics 1998 Scoring Guidelines

AP Statistics 1998 Scoring Guidelines These materials are intended for non-commercial use by AP teachers for course and exam preparation; permission for any other use must be sought from the Advanced Placement

### Section: 101 (10am-11am) 102 (11am-12pm) 103 (1pm-2pm) 104 (1pm-2pm)

Stat 0 Midterm Exam Instructor: Tessa Childers-Day 1 May 014 Please write your name and student ID below, and circle your section. With your signature, you certify that you have not observed poor or dishonest

### 6. Duality between confidence intervals and statistical tests

6. Duality between confidence intervals and statistical tests Suppose we carry out the following test at a significance level of 100α%. H 0 :µ = µ 0 H A :µ µ 0 Then we reject H 0 if and only if µ 0 does

### Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Summary of Formulas and Concepts Descriptive Statistics (Ch. 1-4) Definitions Population: The complete set of numerical information on a particular quantity in which an investigator is interested. We assume

### Point and Interval Estimates

Point and Interval Estimates Suppose we want to estimate a parameter, such as p or µ, based on a finite sample of data. There are two main methods: 1. Point estimate: Summarize the sample by a single number

### Confidence level. Most common choices are 90%, 95%, or 99%. (α = 10%), (α = 5%), (α = 1%)

Confidence Interval A confidence interval (or interval estimate) is a range (or an interval) of values used to estimate the true value of a population parameter. A confidence interval is sometimes abbreviated

### AP Statistics 2001 Solutions and Scoring Guidelines

AP Statistics 2001 Solutions and Scoring Guidelines The materials included in these files are intended for non-commercial use by AP teachers for course and exam preparation; permission for any other use

### Sociology 6Z03 Topic 15: Statistical Inference for Means

Sociology 6Z03 Topic 15: Statistical Inference for Means John Fox McMaster University Fall 2016 John Fox (McMaster University) Soc 6Z03: Statistical Inference for Means Fall 2016 1 / 41 Outline: Statistical

### Math 1070 Exam 2B 22 March, 2013

Math 1070 Exam 2B 22 March, 2013 This exam will last 50 minutes and consists of 13 multiple choice and 6 free response problems. Write your answers in the space provided. All solutions must be sufficiently

### MAT 118 DEPARTMENTAL FINAL EXAMINATION (written part) REVIEW. Ch 1-3. One problem similar to the problems below will be included in the final

MAT 118 DEPARTMENTAL FINAL EXAMINATION (written part) REVIEW Ch 1-3 One problem similar to the problems below will be included in the final 1.This table presents the price distribution of shoe styles offered

### Multiple random variables

Multiple random variables Multiple random variables We essentially always consider multiple random variables at once. The key concepts: Joint, conditional and marginal distributions, and independence of

### Null Hypothesis H 0. The null hypothesis (denoted by H 0

Hypothesis test In statistics, a hypothesis is a claim or statement about a property of a population. A hypothesis test (or test of significance) is a standard procedure for testing a claim about a property

### AP Statistics: Syllabus 3

AP Statistics: Syllabus 3 Scoring Components SC1 The course provides instruction in exploring data. 4 SC2 The course provides instruction in sampling. 5 SC3 The course provides instruction in experimentation.

### CONFIDENCE INTERVALS ON µ WHEN σ IS UNKNOWN

CONFIDENCE INTERVALS ON µ WHEN σ IS UNKNOWN A. Introduction 1. this situation, where we do not know anything about the population but the sample characteristics, is far and away the most common circumstance

### 1. Parameter/Statistic In each of the following, state whether the quantity described is a parameter or a statistic and give the correct notation.

MATH 117 - Elements of Statistics (Revised-03/22/2016) Spring 2016 Final Exam Review for Chapters 1 7 Montgomery College, Maryland Selected problems from the textbook: Statistics: Unlocking The Power of

### Math 62 Statistics Sample Exam Questions

Math 62 Statistics Sample Exam Questions 1. (10) Explain the difference between the distribution of a population and the sampling distribution of a statistic, such as the mean, of a sample randomly selected

### Math 251: Practice Questions for Test III Hints and Answers.. Then. P (67 x 69) = P (.33 z.33) = =.2586.

Math 251: Practice Questions for Test III Hints and Answers III.A: The Central Limit Theorem and Sampling Distributions. III.A.1 The heights of 18-year-old men are approximately normally distributed with

### Chapter 8 Section 1. Homework A

Chapter 8 Section 1 Homework A 8.7 Can we use the large-sample confidence interval? In each of the following circumstances state whether you would use the large-sample confidence interval. The variable

### Hypothesis Testing with z Tests

CHAPTER SEVEN Hypothesis Testing with z Tests NOTE TO INSTRUCTOR This chapter is critical to an understanding of hypothesis testing, which students will use frequently in the coming chapters. Some of the

### Chi-Square Tests. Comments on Projects. Comments on Projects. The Big Picture. Comments on Projects. Population 12/23/2012. Sample and Population

STAT 101 Dr. Kari Lock Morgan 10/30/1 Chi-Square Tests SECTIONS 7.1, 7. Testing the distribution of a single categorical variable : goodness of fit (7.1) Testing for an association between two categorical

### Need for Sampling. Very large populations Destructive testing Continuous production process

Chapter 4 Sampling and Estimation Need for Sampling Very large populations Destructive testing Continuous production process The objective of sampling is to draw a valid inference about a population. 4-

### HypoTesting. Name: Class: Date: Multiple Choice Identify the choice that best completes the statement or answers the question.

Name: Class: Date: HypoTesting Multiple Choice Identify the choice that best completes the statement or answers the question. 1. A Type II error is committed if we make: a. a correct decision when the

### AP Statistics 2010 Scoring Guidelines

AP Statistics 2010 Scoring Guidelines The College Board The College Board is a not-for-profit membership association whose mission is to connect students to college success and opportunity. Founded in

### Nominal Scaling. Measures of Central Tendency, Spread, and Shape. Interval Scaling. Ordinal Scaling

Nominal Scaling Measures of, Spread, and Shape Dr. J. Kyle Roberts Southern Methodist University Simmons School of Education and Human Development Department of Teaching and Learning The lowest level of

### Suppose we want to compare the average effectiveness of two treatments in a completely randomized experiment. In this case, the parameters µ 1

AP Statistics: 10.2: Comparing Two Means Name: Suppose we want to compare the average effectiveness of two treatments in a completely randomized experiment. In this case, the parameters µ 1 and µ 2 are

### Statistics I for QBIC. Contents and Objectives. Chapters 1 7. Revised: August 2013

Statistics I for QBIC Text Book: Biostatistics, 10 th edition, by Daniel & Cross Contents and Objectives Chapters 1 7 Revised: August 2013 Chapter 1: Nature of Statistics (sections 1.1-1.6) Objectives