Reminders Please send all grading concerns/questions for HW 6 and quiz 6 to Aaron and the TAs sometime today Aaron s out of town and can t make his office hours on Tuesday Homework 7 is on a pdf on the website 1
Warm Up Individual batting averages in the National League are approximately Normally distributed with a mean of.275 and a standard deviation of.025. What batting average has a standard score of 2? What batting average has a standard score of -2? Approximately what percent of hitters in the National League have a batting average between these two values? 2
Chapter 21, Part 1: Confidence Intervals for Proportions Aaron Zimmerman STAT 220 - Summer 2014 Department of Statistics University of Washington - Seattle 3
Road Map Unit 1: Collecting data Sample surveys Observational studies Experiments Unit 2: Summarizing and organizing data Graphs and summary statistics Normal distributions Correlation and regression Unit 3: Statistical Inference Confidence Intervals Hypothesis Testing 4
Statistical Inference For the last three weeks, we have been answering questions about data from a sample e.g., What is the mean of this variable in the sample? But, what we d usually like to do is answer some question about the population e.g., What is the mean of this variable in the population? Statistical inference draws conclusions about a population on the basis of data from a sample 5
Statistical Inference For example, suppose I m interested in using a random sample of 20 students from this class to estimate the proportion of all students in the class who are male Recall: The population is all students in the class The sample is the 30 random students The parameter is the proportion of all students in the class who are male (p) The statistic is the proportion of students in the sample who are male (ˆp) We use the statistic to estimate the parameter In this case, the sample proportion is an estimate of the population proportion 6
Sampling Distributions When we try to answer questions about a population, we cannot conclude that the parameter is exactly the same as the statistic we calculate e.g., if I calculate that 50% of the students in my sample are male, the true value of the parameter may be more or less than 50% Remember: sampling error is always a potential source of error, even with simple random samples So, we always want to give some estimate of how confident we are in the statistic we calculate Key idea #1: If you want to describe your confidence in a statistic, think about what would happen if you took many samples of the same size from the same population and calculated the statistic for each sample 7
Sampling Distributions The sampling distribution of a statistic is the distribution of values taken by the statistic in all possible sample of the same size from the same population If I keep taking samples of size 30 from this class and calculate the proportion of students in each sample who are male, the sampling distribution describes the distribution of all these sample proportions 8
Sampling Distributions Suppose we take a SRS of size n from a large population, and that we want to estimate a proportion p from the population. Let ˆp be the sample proportion: ˆp = count of successes in sample n Then, if the sample size is large enough The sampling distribution of ˆp is approximately Normal The mean of the sampling distribution is p The standard deviation of the sampling distribution (called the standard error) is p(1 p) n 9
Sampling Distributions Why is this not helpful to know? why is this helpful to know? 10
Sampling Distributions Well, we don t know p, and that s a bummer because we can t know the exact distribution of ˆp. But, if we know the sampling distribution behaves this way, we can start to perform inference about p if we have an estimate ˆp 11
Sampling Distributions Suppose were trying to estimate the proportion of students at UW who are male, and the true proportion is 0.44 Note: I changed the example so that the population is large If we take a SRS of 30 students and compute the sample proportion who are male, then we can describe the sampling distribution of the statistic as follows: The sampling distribution is approximately Normal, with mean 0.44 and standard deviation: p(1 p) 0.44(0.56) SE = = = 0.91 n 30 In other words, the standard error of the sample proportion is.091 12
Practice Suppose the true proportion of voting Seattlites who support the 15$/hr wage increase is actually 65%. If we took a simple random sample of 805 likely voters in Seattle, and calculated the sample proportion who plan to vote in support of the proposition, what would be the standard error of the statistic? 13
Practice Suppose the true proportion of voting Seattlites who support the 15$/hr wage increase is actually 65%. If we took a simple random sample of 805 likely voters in Seattle, and calculated the sample proportion who plan to vote in support of the proposition, what would be the standard error of the statistic?.65(.35) SE= = 0.0168 805 14
Confidence Intervals We dont actually know the population parameter, so we usually approximate the standard error as: ˆp(1 ˆp) We can use this standard error to create an interval around our estimate that we are confident the true population proportion falls within Not surprisingly, we call this a confidence interval n 15
Confidence Intervals A level C confidence interval for a parameter has two parts: An interval calculated from the data A confidence level C, which gives the probability that the interval will capture the true parameter value in repeated samples The most common type of confidence interval is a 95% confidence interval A 95% confidence interval is an interval calculated from sample data by a process that is guaranteed to capture the true population proportion in 95% of all samples 16
Confidence Intervals The process to form a confidence interval relies on the sampling distribution being approximately Normal 17
Confidence Intervals Since the sampling distribution is approximately Normal, we can calculate critical values of the sampling distribution For any confidence level C, the critical value z* is the standard score that contains C% of the area under the normal curve between z* and z* 0.0 0.1 0.2 0.3 0.4 Standard Normal Curve C% z* +z* 3 2 1 0 1 2 3 Standard Scores 18
Table 21.1 in your book contains the critical values for the most common confidence levels Conf Lvl Crit. Val 50% 0.67 60% 0.84 70% 1.04 80% 1.28 90% 1.64 95% 1.96 99% 2.58 99.9% 3.29 Confidence Intervals 0.0 0.1 0.2 0.3 0.4 Standard Normal Curve 3 2 1 0 1 2 3 Standard Scores 19
Confidence Intervals The sampling distribution for the sample proportion of UW students who are male is to the right (mean =.44, SE =.091) What proportions contain the middle 95% of the sampling distribution? The standard score from Table 21.1 is z* = 1.96 So, the middle 95% of the standard normal curve is between -1.96 and 1.96 0.0 0.1 0.2 0.3 0.4 Standard Normal Curve 3 2 1 0 1 2 3 Standard Scores 20
Confidence Intervals The sampling distribution for the sample proportion of UW students who are male is to the right (mean =.44, SE =.091) observation = (1.96).091 + 0.44 observation = (-1.96).091 + 0.44 Conclusion: the middle 95% of the sampling distribution is between.262 and.618 Standard Normal Curve 0 1 2 3 4 0.2 0.3 0.4 0.5 0.6 0.7 Proportion 21
Confidence Intervals So, in repeated samples from the population, 95% of the sample proportions I calculate will be between.262 and.618 Standard Normal Curve But again, we dont know the true proportion Key intuition: every time I calculate a sample proportion, what if I make an interval around it that is as wide as the interval on the right? Then, 95% of the intervals generated by this process will contain the true population proportion This is the definition of a 95% confidence interval! 0.0 0.1 0.2 0.3 0.4 3 2 1 0 1 2 3 Standard Scores 22
Important note: when we calculate confidence intervals in this class, we will use the equation on p. 266 of your textbook (NOT the equation on p. 260): Choose an SRS of size n from a population of individuals with true proportion p. The sample proportion is ˆp. When n is large, an approximate level C confidence interval for p is Standard Error {}}{ ˆp(1 ˆp) ˆp ± z }{{ n } Margin of Error Remember, z* is the critical value for C from Table 21.1 23
Confidence Intervals So, back to our example: suppose I take a SRS of 30 students at UW, and calculate the sample proportion of male students in the sample is 0.4. What is a 95% confidence interval for this estimate? The sample proportion is ˆp = 0.4 The estimated standard error is ˆp(1 ˆp) 0.4(0.6) SE = = =.0894 40 n The critical value is z* = 1.96 So, the 95% conf. interval for the true proportion is: ˆp(1 ˆp) ˆp ± z = 0.44 ± 1.96.0894 = (.265,.615) n Conclusion: the interval generated by this procedure will capture the true proportion 95% of the time 24
Confidence Intervals So, back to our example: suppose I take a SRS of 30 students at UW, and calculate the sample proportion of male students in the sample is 0.4. What is a 90% confidence interval for this estimate? The sample proportion is ˆp = 0.4 The estimated standard error is ˆp(1 ˆp) 0.4(0.6) SE = = =.0894 40 n The critical value is z* = 1.64 So, the 95% conf. interval for the true proportion is: ˆp(1 ˆp) ˆp ± z = 0.44 ± 1.64.0894 = (.293,.587) n Note: lower confidence levels produce narrower confidence intervals. Why? 25
Confidence Intervals So, back to our example: suppose I take a SRS of 100 students at UW, and calculate the sample proportion of male students in the sample is 0.4. What is a 95% confidence interval for this estimate? The sample proportion is ˆp = 0.4 The estimated standard error is ˆp(1 ˆp) 0.4(0.6) SE = = =.0490 100 n The critical value is z* = 1.96 So, the 95% conf. interval for the true proportion is: ˆp(1 ˆp) ˆp ± z = 0.44 ± 1.96.0490 = (.344,.536) n Conclusion: larger sample sizes produce narrower confidence intervals 26
Practice EMC research conducted a poll between Jan. 14 and Jan 22 2014. They asked a SRS of 805 voting Seattlites if they supported the minimum raise hike to $15/hr. They found a sample proportion of 68%. What is the estimated standard error? 27
Practice EMC research conducted a poll between Jan. 14 and Jan 22 2014. They asked a SRS of 805 voting Seattlites if they supported the minimum raise hike to $15/hr. They found a sample proportion of 68%. What is the estimated standard error? SE ˆ.68(.32) =.016 805 28
Practice EMC research conducted a poll between Jan. 14 and Jan 22 2014. They asked a SRS of 805 voting Seattlites if they supported the minimum raise hike to $15/hr. They found a sample proportion of 68%. What is a 95% confidence interval for the true proportion? 29
Practice EMC research conducted a poll between Jan. 14 and Jan 22 2014. They asked a SRS of 805 voting Seattlites if they supported the minimum raise hike to $15/hr. They found a sample proportion of 68%. What is a 95% confidence interval for the true proportion?.68(.32).68 ± 1.96 = (.649,.711) 805 30
Practice EMC research conducted a poll between Jan. 14 and Jan 22 2014. They asked a SRS of 805 voting Seattlites if they supported the minimum raise hike to $15/hr. They found a sample proportion of 68%. What is a 90% confidence interval for the true proportion? 31
Practice EMC research conducted a poll between Jan. 14 and Jan 22 2014. They asked a SRS of 805 voting Seattlites if they supported the minimum raise hike to $15/hr. They found a sample proportion of 68%. What is a 90% confidence interval for the true proportion?.68(.32).68 ± 1.53 = (.653,.706) 805 32
Homework All of the HW for the week is on the website in a pdf. For tomorrow s section, you should: Read Chapter 21, pp. 455-466 To calculate confidence intervals, we will use the equation on p. 466, NOT the equation on p. 460! Do problems 21.2, 21.7 (your answers for parts b and c should be slightly different than the answers in the back of the book), 21.8, 21.10, 21.21, 21.22, 21.23 33