Chapter 7. Estimates and Sample Size

Transcription

1 Chapter 7. Estimates and Sample Size Chapter Problem: How do we interpret a poll about global warming? Pew Research Center Poll: From what you ve read and heard, is there a solid evidence that the average temperature on earth has been increasing over the past a few decades, or not? 1501 randomly selected US adults responded 70% yes Important issues related to this poll: How can the poll be used to estimate the percentage of US adults who believe that the earth is getting warmer? How accurate is the result of 70% likely to be? Is the sample size too small? (1501/5,139,000 = % of the population) Does the method of selecting the people to be polled have much of an effect on the results? 1

2 Review Descriptive statistics 7.1 Review and Preview Two major applications of inferential statistics o Estimate the value of a population parameter o Test some claim (or hypothesis) about a population This chapter focus on Estimating important population parameters: o o o Proportion Mean Variance Methods for determining the sample sizes necessary to estimate the above parameters

3 7. Estimating a Population Proportion 1. Introduction. Why Do We Need Confidence Interval? 3. Interpreting a Confidence Interval 4. Critical Values 5. Margin of Error 6. Determining Sample Size 7. Better-Performing Confidence Intervals 3

4 7. Estimating a Population Proportion 1. Introduction Definition a point estimate is a single value (or point) used to approximate a population parameter. We will see, the sample proportion pˆ is the best point estimate of the population proportion p. Proportion, probability, and percent. We focus on proportion. We can also work with probabilities and percentages. 4

5 7. Estimating a Population Proportion 1. Introduction e.g.1 Proportion of Adults Believing in Global warming. In the chapter problem we noted that in a Pew Research Center poll, 70% of 1501 randomly selected adults in the U.S. believe in global warming, so that sample proportion is pˆ = Find the best point estimate of the proportion of all adults in the U.S. who believe in global warming. Solution. The best point estimate of p is

6 7. Estimating a Population Proportion. Why Do We Need Confidence Interval? Definition a confidence interval (or interval estimate) is a range (or an interval) of values used to estimate the true value of a population parameter. A confidence interval is sometimes abbreviated as CI. Confidence interval Associated with a confidence level Confidence level is the probability 1 (or area 1 ) is the complement of confidence level E.g. for 95% confidence level, =

7 7. Estimating a Population Proportion. Why Do We Need Confidence Interval Definition The confidence level is the probability 1 that is the proportion of times that the confidence interval actually does contains the population parameter, assuming that the estimation process is repeated a large number of times. Example Here is an example of CI found later (in e.g.3), which is based on the sample data of 1501 adults polled, with 70% of them saying that they believe in global warming: The 95% confidence interval estimate of the population proportion p is < p <

8 7. Estimating a Population Proportion 3. Interpreting a Confidence Interval There are correct interpretation and incorrect creative interpretations Correct: we are 95% confident that the interval from to 0.73 actually does contain the true value of p. This means that if we were to select many different samples of size 1501 and construct the corresponding CI, 95% of them would actually contain the value of population proportion p. (success rate) Wrong: There is a 95% chance that the true value of p will fall between and

9 7. Estimating a Population Proportion 3. Interpreting a Confidence Interval At any specific point of time, the population has a fixed and constant value p, and a CI constructed from a sample either contain p or does not. A confidence level of 95% tells us that the process that we are using will, in the long run, result in confidence interval limits that contain the true population proportion 95% of time p = This confidence interval Does not contain p = Figure 7-1 CI s from 0 Different Samples 9

10 7. Estimating a Population Proportion 4. Critical Values Notation for Critical Value z and z z / z / Figure 7- Critical Value in the Standard Normal Distribution Definition a critical value is the number on the borderline separating sample statistics that are likely to occur from those that are unlikely to occur. The number z is a critical value that is a z score with the property that it separates an area of / in the right tail of the standard normal distribution. z 10

11 7. Estimating a Population Proportion 4. Critical Values e.g. Find the critical value level. z corresponding to a 95% confidence Confidence level 95% z / Figure 7-3 Finding By Table A-, we find that z z z / The total area to the left of this boundary is for a 95% CL 11

12 7. Estimating a Population Proportion 4. Critical Values We have the following brief table: Confidence Level Critical Value, z 90% % %

13 7. Estimating a Population Proportion 5. Margin of Error Definition The margin of error E, is the maximum likely (with probability 1 ) difference between the observed sample proportion pˆ and the true value of the population p. Formula 7 1 E z / pq ˆ ˆ n Margin of error for proportion 13

14 7. Estimating a Population Proportion 5. Margin of Error Requirements for this section 1) Simple random sample ) Conditions for binomial distribution are satisfied. 3) At least 5 success and at least 5 failures. (With p and q unknown, we estimate their values using sample proportion) Notation for Proportions p = population proportion pˆ x n qˆ 1 = sample proportion of x successes in a sample of size n pˆ n = number of sample values 14

15 7. Estimating a Population Proportion 5. Margin of Error Confidence Interval (or Interval Estimate) for the Population Proportion p Where The CI is often expressed in the following equivalent formats: or pˆ E pˆ p pˆ E pˆ E, pˆ E E E z / pq ˆ ˆ n Round-off Rule for CI estimate of p Round the CI limits for p to three significant digits. 15

16 7. Estimating a Population Proportion 5. Margin of Error Procedure for Constructing a Confidence Interval for p. 1) Verify that the requirements are satisfied. z / ) Find the critical value that corresponds to the desired CL. 3) Evaluate the margin or error: E z / pq ˆ ˆ n 4) Obtain the CI: pˆ E p pˆ E 5) Round the resulting CI limits to three significant digits. 16

17 7. Estimating a Population Proportion 5. Margin of Error e.g.3 Constructing a Confidence Interval: Poll Results. In the chapter problem we noted that a Pew Research poll of 1501 randomly selected U.S. adults showed that 70% of the respondents believe in global warming. n = 1501 and pˆ = a. Find the margin of error corresponding to 95% CL b. Find the 95% CI estimate of the population proportion p. c. Based on the results, can we conclude that the majority of adults believe in global warming? d. Assuming that you are a newspaper Sol. Requirements met. 17

18 7. Estimating a Population Proportion 5. Margin of Error e.g.3 (Chapter Problem, TI-84 demo). a. z 1.96, pˆ = 0.70, qˆ = 0.30, and n = 1501, so / E z pq ˆ ˆ (0.70)(0.30) / 1.96 n b = , = , so CI: < p < 0.73 or in interval notation CI = (0.677, 0.73) c. To interpret the results Based on the CI obtained in part b, it does appears that the proportion of adults who believe in global warming is greater than 50%, so we can safely conclude that the majority of adults believe in global warming. Because the limits of and 0.73 are likely to contain the true population, it appears that the proportion is a value greater than

19 7. Estimating a Population Proportion 5. Margin of Error Analyzing Polls e.g.3 Continue. When analyzing results from polls, we should consider the following: 1) The sample should be simple random sample, not an inappropriate sample (such as a voluntary response sample) ) The confidence level should be provided. (It is often 95%, but media reports often neglect to identify it) 3) The sample size should be provided. (It is usually provided by the media, but not always) 4) Except for relatively rare cases, the quality of the poll results depends on the sampling method and the size of the sample, but the size of the population is usually not a factor. 19

20 7. Estimating a Population Proportion 6. Determining Sample Size Sample Size for Estimating Proportion p Requirement: simple random sample. z When an estimate pˆ is known: Formula 7- n E z pˆ When no estimate is known: Formula 7-3 n ˆ ˆ / pq / 0. 5 E Round-off rule for Determining Sample Size If the computed sample size n is not a whole number, round it up to the whole number. 0

21 7. Estimating a Population Proportion 6. Determining Sample Size e.g.4 How many adults use the Internet? Assume that a manager for E-Bay wants to determine the current percentage of U.S. adults who now use the Internet. How many adults must be surveyed in order to be 95% confident that the sampling percentage is in error by no more than 3 percentage points? a. Use the result from a Pew Research Center poll: in 006, 73% of U.S. adults used the Internet b. Assume that we have no prior information. a. = 0.05, z / 1.96, pˆ 0.73, qˆ 0.7, E = 0.03 n = 84 samples b. n = 1068 samples 1

22 7. Estimating a Population Proportion 6. Determining Sample Size Interpretation. To be 95% confident that our sample percentage is within 3 percentage points of the true percentage for all adults, we should obtain a simple random sample of 1068 adults. By comparing this result to the sample size of 84 found in part (a), we can see that if we have no knowledge of a prior study, a larger sample is required to achieve the same result as when the value of pˆ can be estimated.

23 7. Estimating a Population Proportion 6. Determining Sample Size Finding the Point Estimate and E from a Confidence Interval. Point estimate of p: (upper confidence limit) (lower confidence limit) pˆ Margin of Error: ( upper confidence limit) (lower confidence limit) E e.g.5 Given that a confidence interval = (0.58, 0.81), Find pˆ and E pˆ E

24 7. Estimating a Population Proportion 7. Better-Performing Confidence Intervals Skip Using Technology for CI s Statdisk TI-84 (1-PropZInt, STAT/TESTS/1-PropZInt) 4

25 7.3 Estimating a Population Mean: Known 1. Introduction. Confidence Interval 3. Determining Sample Size Required to Estimate 4. Using Technology 5

26 7.3 Estimating a Population Mean: Known 1. Introduction Goal: present methods for using sample data to find a point estimate and CI estimate of population mean. Requirements: Simple random sample The population SD is known Normal distribution or n > 30. Known population standard deviation 6

27 7.3 Estimating a Population Mean: Known 1. Introduction The sample mean x is the best point estimate of the population mean. Unbiased estimator of population mean For many populations, the distribution of sample mean x tends to be more consistent (with less variation) than the distributions of other sample statistics. 7

28 7.3 Estimating a Population Mean: Known. Confidence Interval Margin of error for estimating mean ( known) is: E z / n Confidence Interval Estimate for the Population Mean (with known) is: x E x E Where E z / n or x E or ( x E, x E) 8

29 7.3 Estimating a Population Mean: Known. Confidence Interval Procedure for Constructing a Confidence Interval for p. 1) Verify that the requirements are satisfied. z / ) Find the critical value that corresponds to the desired CL. 3) Evaluate the margin or error: E z / n 4) Obtain the CI: xˆ E xˆ E 5) Rounding: next page. 9

30 7.3 Estimating a Population Mean: Known. Confidence Interval Round-Off Rule for CI Used to Estimate 1. When use the original set of data to construct a confidence interval, round the CI limits to one more decimal place than is used for the original set of data.. When use the summary statistics (n, x, s) to construct CI, round the CI limits to the same number of decimal places used for the sample mean. 30

31 7.3 Estimating a Population Mean: Known. Confidence Interval Interpreting a Confidence Interval similar to 7. for proportion be careful to interpret CI correctly. After obtaining a CI estimate of the population mean, such as 95% CI of < < Correct: we are 95% confident that the interval from to actually does contain the true value of. This means that if we were to select many different samples of the same size and construct the corresponding CI, 95% of them would actually contain the value of population mean. Wrong: There is a 95% chance that the true value of will fall between and

32 7.3 Estimating a Population Mean: Known. Confidence Interval e.g.1 Weights of Men. n = 40, x = lb, and = 6 lb. Using a 95% CL, find the following (TI-84): a. The margin of error E, and CI for. b. What do the results suggest about the mean weight of lb that was used to determine the safe passenger capacity in 1996? Sol. a. 6 E z / n 40 CI is x E x E: < < < < (round to two decimals as in x ) 3

33 7.3 Estimating a Population Mean: Known. Confidence Interval Interpretation. The confidence interval from part (a) could also be expressed in ± 8.06 or as (164.49, ). Based on the sample with n = 40 and x = and assumed to be 6, the confidence interval for the population means is lb < < lb and this interval has a 0.95 confidence level. This means that if we were to select many different simple random samples of 40 men and construct the confidence intervals as we did here, 95% of them would contain the value of the population mean 33

34 7.3 Estimating a Population Mean: Known. Confidence Interval Rationale for CI. By Central Limit Theorem, sample means (of sample size n) are normally distributed with mean and variance / n. In the equation z = ( x x ) / x, replace x with / n, replace x with, then solve for to get x z n 34

35 7.3 Estimating a Population Mean: Known 3. Determining Sample Size Required to Estimate Sample Size for Estimating mean Formula 7-4 n z E / where z / = critical z score based on the desired CL E = desired margin of error = population standard deviation Comments It does not depends on population size N. Round up. 35

36 7.3 Estimating a Population Mean: Known 3. Determining Sample Size Required to Estimate Dealing with Unknown When Finding Sample Size Use range rule of thumb: range / 4 Start the sample collection process without knowing and, using the first several values, calculate the sample standard deviation s and use it in places of. The estimated value of can then be improved as more sample data are obtained, and the sample size can be refined accordingly. Estimating the value of by using the results of some other study that was done earlier. Use other know results. E.g. IQ tests are typically designed so that the mean is 100 and SD is 15. For a population of statistics professors, IQ is more than 100 and SD is less than 15. But we can assume SD is 15 to play it safe. My comments: see next section!! 36

37 7.3 Estimating a Population Mean: Known 3. Determining Sample Size Required to Estimate e.g. IQ Scores of Statistics students. Want to estimate the mean IQ score for the population of statistics students. Q: how many statistics students must be randomly selected for IQ tests if we want 95% confidence that the sample mean is within 3 IQ points of the population mean? Sol. = 1.96 (since = 0.05) Now z / n E = 3 = 15 (see previous page) z / 1.96(15) E 3 97 samples 37

38 7.3 Estimating a Population Mean: Known 3. Determining Sample Size Required to Estimate Interpretation. Among the thousands of statistics students, we need to obtain a simple random sample of at least 97 of them. Then we get their IQ scores. With a simple random sample of only 97 statistics students, we will be 95% confident that the sample mean x is within 3 IQ points of the true population mean. 38

39 7.3 Estimating a Population Mean: Known Using Technology for CI s See e.g., TI-84 Demo Statdisk (analysis) 39

40 7.4 Estimating a Population Mean: Not Known 1. Introduction. Confidence Interval 3. Choosing the Appropriate Distribution 4. Finding Point Estimate and E from a CI 5. Using Technology 40

41 7.4 Estimating a Population Mean: Not Known 1. Introduction is not known Use Student t distribution (instead of normal distribution) Method in this section is realistic, practical, and often used Requirement Simple random sample From normally distributed population or n > 30 41

42 7.4 Estimating a Population Mean: Not Known 1. Introduction Just like in section 7.3, we have: The sample mean x is the best point estimate of the population mean. 4

43 7.4 Estimating a Population Mean: Not Known. Confidence Interval Before finding the CI, first introduce Student t distribution. Student t Distribution If a population has normal distribution, then the distribution of t x s n is a Student t Distribution for all sample of size n. A student t distribution often referred to as a t distribution, is used to find critical values denoted by. t / 43

44 7.4 Estimating a Population Mean: Not Known. Confidence Interval Definition The number of degrees of freedom for a collection of sample data is the number of sample values that can vary after certain restrictions have been imposed on all data values. For application in this section, degree of freedom = n 1 Example 1. Finding a Critical Value. n = 7 samples is selected from a normally distributed population. Find with 95% CL t / Sol. df = 7 1 = 6 Table A-3 6 th row for 95% CL, = 0.05, find the column listing values for an area of 0.05 in two tails =.447 (can do TI-84 demo). t / 44

45 7.4 Estimating a Population Mean: Not Known. Confidence Interval Margin of Error E for the Estimate of (with not known) E t / s n Confidence Interval for the Estimate (with not known) x E x E 45

46 7.4 Estimating a Population Mean: Not Known. Confidence Interval Procedure for Constructing a CI for (with not known) Step 1. Verify the requirements are met. (simple random sample, either data is from a normal distribution or n > 30) Step. Given CL, let df = n 1, use A-3 to find the critical value t / that corresponds to the desired CL Step 3. Evaluate the margin of error E t / s n Step 4. Find x. Then the CI is x E x E Step 5. Original data: add one decimal place; Summary: same. 46

47 7.4 Estimating a Population Mean: Not Known. Confidence Interval e.g. Constructing a Confidence Interval: Garlic for Reducing Cholesterol. Use summary data, n = 49, x 0.4, s = 1.0 and t /.009. The margin of error (95% CL) is: E The CI is < < , or 5.6 < < 6.4 Interpretation. Because the confidence interval contains the value of 0, it is possible that the mean of the changes in LDL cholesterol is equal to 0, suggesting that the garlic treatment did not affect the LDL cholesterol levels. It does not appear that the garlic treatment is effective in lowering LDL cholesterol. 47

48 7.4 Estimating a Population Mean: Not Known. Confidence Interval Important Properties of the Student t Distribution 1. Different for different sample size (unlike standard normal distribution). Same bell shape as the standard normal distribution. But reflect greater variability (with wider distributions) that is expected with small samples. (n = 3, wider, n = 1 narrower) 3. Mean = 0 (just like standard normal distribution) 4. SD varies with sample size. It is always greater than 1 (SD > 1). Unlike standard normal distribution where SD = As sample size gets larger, Student t distribution gets closer to standard normal distribution. 48

49 7.4 Estimating a Population Mean: Not Known 3. Choosing the Appropriate Distribution yes start Is known? no Figure 7 6 Choosing Between z and t yes Is the Population normally Distributed? no yes Is the Population normally Distributed? no yes Is n > 30? no yes Is n > 30? no Z Use the normal distribution Use nonparametric or bootstrapping methods t Use the t distribution Use nonparametric or bootstrapping methods 49

50 7.4 Estimating a Population Mean: Not Known 3. Choosing the Appropriate Distribution Method Use normal (z) distribution Use t distribution Use a nonparametric method or bootstrapping Conditions Known and normally distributed population Or Known and n > 30 not known and normally distributed population Or not known and n > 30 Population is not normally distribution and n 30 Note: 1. Criteria for deciding whether the population is normally distributed: Population need not be exactly norm, but is should appear to be somewhat symmetric with one mode no outliers. Sample size > 30: This is a commonly used guideline, but sample sizes of 15 to 30 are adequate if the population appears to have a distribution that is not far from being normal and there is no outliers. For some population distributions that are extremely far from normal, the sample size might need be much larger than

51 7.4 Estimating a Population Mean: Not Known 3. Choosing the Appropriate Distribution Example 3. Choosing Distributions. To construct CI for. Use the given data to determine whether the margin or error E should be calculated using z / (from normal distribution), t / (from Student t distribution) or neither? a. n = 9, x 75, s = 15, and population has a normal distribution t / b. n = 5, x 0, s =, and population has a very skewed distribution Neither c. n = 1, x 98.6, = 0.6, and population has a normal distribution z / d. n = 75, x 98.6, = 0.6, and distribution is skewed z / e. n = 75, x 98.6, s = 0.6, and distribution is extremely skewed t / 51

52 7.4 Estimating a Population Mean: Not Known 3. Choosing the Appropriate Distribution e.g.4 CI for Alcohol in Video Games. Twelve different video games showing substance use were observed. The duration times (in second) of alcohol use were recorded, with the times listed below. The design of the study justifies the assumption that the sample can be treated as a simple random sample. 84, 14, 583, 50, 0, 57, 07, 43, 178, 0,, 57 Use this data to construct 95% CI estimate of, the mean duration time that the video showed the use of alcohol. Sol. Next page. 5

53 Frequency 7.4 Estimating a Population Mean: Not Known 3. Choosing the Appropriate Distribution e.g.4 Continue. Check requirements: not normal, n > 30 not satisfied. Requirement are not satisfied. 5 4 (TI-84 demo, Tinterval) to get CI = (1.8, 10.7) 3 1 Interpretation. Because the requirements not satisfied, we don t have the 95% confidence that (1.8 sec, 10.7 sec) interval contains the true population mean. We should use some other method. 53

54 7.4 Estimating a Population Mean: Not Known 4. Finding Point Estimate and E from a CI Given the upper and lower limits of a CI, find the mean and margin of error: Point estimate of : x (upper confidence limit) (lower confidence limit) Margin of error: E (upper confidence limit) (lower confidence limit) e.g.5 Weights of Garbage. Data Set in Appendix B. Given 95% CI (4.8, ) Find the mean and margin of error. Sol x lb E lb 54

55 7.4 Estimating a Population Mean: Not Known 5. Using Technology TI-84 demo summary data original data Statdisk 55

56 7.5 Estimating a Population Variance 1. Chi-Square Distribution. Estimator of 56

57 7.5 Estimating a Population Variance 1. Chi-Square Distribution Given a normally distribution population with variance, randomly select a sample of size n Compute the sample variance s, The sample statistic = (n 1)s / has sampling distribution called chi-square distribution. Chi-Square Distribution Formula 7 5 Where n = sample size ( n 1) s = sample variance = population variance s 57

58 7.5 Estimating a Population Variance 1. Chi-Square Distribution Denote the chi-square by Pronounced kigh square To find the critical value, refer to A-4 Chi-square distribution depends on degree of freedom Degree of freedom df = n 1 58

59 7.5 Estimating a Population Variance 1. Chi-Square Distribution Properties of the Distribution of the Statistic Not symmetric, as df increases, it becomes more symmetric Value of can be positive or zero, but never negative The distribution is different for different df. As df increases, the distribution approaches to normal distribution (just the t distribution) 59

60 7.5 Estimating a Population Variance 1. Chi-Square Distribution e.g.1 Finding Critical Values of. A Simple random sample of ten voltage levels is obtained. Construction of a confidence level for the population standard deviation requires that the left and right critical values of corresponding to the confidence level of 95% and a sample size of n = 10. Find the critical value of separating an area of 0.05 in the left tail, and find the critical value of separating an area of 0.05 in the right tail. Sol. Df = 10 1 = 9. (next page) 60

61 7.5 Estimating a Population Variance e.g.1 continue (can also use Statdisk, Excel, and Minitab) Sol. 61

62 . Estimator of Point estimator 7.5 Estimating a Population Variance The sample variance s is the best point estimate of the population variance The sample standard deviation s is commonly used as a point estimate of (even though it is a biased estimate) 6

63 . Estimator of Interval estimator Requirements: 7.5 Estimating a Population Variance 1) The sample is a simple random sample ) The population must have normally distributed values CI for the population variance : ( n 1) s R ( n 1) s L CI for the population standard deviation: ( n 1) s R ( n 1) s L 63

64 7.5 Estimating a Population Variance. Estimator of Procedure for constructing a confidence interval for or. Step 1. verify that the requirements are satisfied Step. Using n 1 degree of freedom, Table A-4 to find critical values L and R corresponding to the desired confidence level. Step 3. Find the CI: ( n 1) s R ( n 1) s L Step 4. Find CI for. Take the square root on all three places. Step 5. Rounding. Data: add one decimal place; summary: same. 64

65 7.5 Estimating a Population Variance. Estimator of e.g. Confidence Interval for Home Voltage. Sample of 10: 13.3, 13.5, 13.7, 13.4, 13.6, 13.5, 13.5, 13.4, 13.6, 13.8 Step 1. requirements: Histogram is normal. Simple random sample. 65

66 7.5 Estimating a Population Variance. Estimator of e.g. Confidence Interval for Home Voltage. Sample of 10: 13.3, 13.5, 13.7, 13.4, 13.6, 13.5, 13.5, 13.4, 13.6, 13.8 Step. For 95% CL, double sided. Found that Step 3. R = (10 1)(0.15) or, < < Step 4. Take square root: 0.10 volt < < 0.7 volt =.700, and Interpretation: Based on this result, the confidence interval is (0.10, 0.7). The limits of the CI is 0.10 and 0.7. But the format s E cannot be used because the CI does not have s at its center. L (10 1)(0.15)

67 7.5 Estimating a Population Variance. Estimator of Rationale for the CI. If we obtain simple random samples of size n from a population with variance, there is a probability of 1 that the statistic (n 1)s / will fall between the critical values of L and R, i.e. there is a probability that the following is true: ( n 1) s Combine both in equality into one inequality, we have: ( n 1) s R R and ( n 1) s ( n L 1) s L 67

68 7.5 Estimating a Population Variance. Estimator of Determine the sample size. Harder than in the case of mean and proportion. Use Table 7- Statdisk/Analysis/Sample size determination/estimate St Dev Excel, Minitab, TI-84 does not provide sample size. 68

69 . Estimator of 7.5 Estimating a Population Variance 69

70 . Estimator of 7.5 Estimating a Population Variance e.g.3. Finding the sample size for estimating. We want to estimate the standard deviation for all voltage levels in a home. We want to be 95% confident that our estimate is within 0% of the true value of. How large should the sample size be? Assume that the population is normally distributed. Ans. at least 48 samples 70