1 Sampling Distribution of a Sample Proportion From earlier material remember that if X is the count of successes in a sample of n trials of a binomial random variable then the proportion of success is given by = X/n. is the sample statistic which is used to estimate the population parameter p assuming a SRS and the population is at least 10 times larger than the sample size. At this juncture we want to look at the sampling distribution of. How do we determine the center, spread, and shape of the sampling distribution? In this situation it is understood that X (the count of successes) and are random variables with X having a binomial distribution but does not have a binomial distribution. Let's define the mean and standard deviation of the random variable. Because the mean of the sampling distribution is equal to the true population proportion we say that the sample proportion p-hat is an unbiased estimator of p. Notice the denominator in the formula for the standard deviation. As the sample size increases the standard deviation decreases. If we take larger samples we can lower the standard deviation. Don't take this concept too far though. For example, it wouldn't make sense to take samples of size 75 from a population of size 150. If you have the resources to take samples of this size you probably could just as well take a census. Quick concept check: By what factor would you increase the sample size to cut the standard deviation in half? Since n is under the square root sign you would have to increase the sample size by college seniors were surveyed and asked if they were going to graduate with honors. 114 responded "Yes". We can safely assume that population of college seniors is 10 times greater than 324. We will also assume that this was a well designed survey, i.e., a SRS. What are the mean and standard deviation of this survey. Since the assumptions are satisfied we can proceed to the calculations. The mean of the sample is an unbiased estimator of the population mean.

2 The standard deviation is calculated as We now know how to identify the center and the spread of the distribution of p-hat but what about its shape? The shape of the sampling distribution of p-hat is approximately normal if a few conditions are met. These conditions are the same ones encountered earlier in the course. These conditions also allow us to use the normal approximation when dealing with the sampling distribution of p-hat. We will use the normal approximation to the sampling distribution of values of n and p that satisfy np > 10 and n(1-p) > 10. There is an underlying assumption here. When p is close to 1 or 0 the approximation is not going to be as accurate. The most accurate approximation will be when p is 1/2. A national polling company asks a SRS of 500 registered voters whether they will vote for a candidate representing the Democratic Party in the upcoming presidential election. 48% of all registered voters are registered Democrats and we are assuming they will vote along party lines. What is the probability that the random sample of 500 registered voters will give a result within 3 percentage points of the true value? Let's look at the assumptions first. We have a SRS of size n = 500. We also assume that the sampling distribution of p-hat has a mean of We can safely assume that the number of registered voters is 10 times greater than 500 so we can calculate the standard deviation. Next we want to use the normal distribution to approximate the sampling distribution of p-hat. Checking the assumptions, np= 0.48*500 = 240 and n(1-p)= 0.52*500 = 260 we see they both are clearly greater than 10 and we can use the normal approximation.

3 Our goal is to find the probability that graphing calculator to show the area. is within 3% of 48%, i.e., 45%-51%. We can use the We also can do it by calculating z-scores. You should always remember how to do z-scores and not rely only on the graphing calculator to calculate the area under the curve.. Using the z-scores and the table of areas for a standard normal distribution yields an area of This means we have an 82.1% chance of getting a proportion between 45% and 51% when we draw samples of size 500 from a population with a known proportion of 48%. Let's look at one more example that ties together concepts learned earlier in the course in addition to the concepts in this lesson. Random samples of size n = 500 were selected from a binomial population with p =.1. A. Is it appropriate in this case to use the normal distribution to approximate the distribution of? What are the mean, standard deviation, and shape of the distribution? A. The sampling distribution of can be approximated by a normal distribution. Since the population is large enough, we easily have np 10 and nq 10. Thus = 50 10, and = =.1

4 B. Using the normal approximation without the continuity correction, find the probability that <.12. B. P( <.12) = P( z < 1.491) = normalcdf( 100,1.491) =.9320 C. Using the normal approximation, find P( pˆ <.12). This time use the continuity correction. Hint: To use the continuity correction you'll need to use the binomial distribution of counts: Convert.12 to a count by multiplying it by 500. Use np and np(1-p) to find the mean and standard deviation for the normal approximation to the binomial for counts. You're looking for the probability of getting a count below a certain number. To use the continuity correction, add.5 to your true upper bound. (You don't need to increase your lower bound because the lower bound is essentially infinity.) C. 0.12(500) = 60 μ = np =.1(500) = 50 For the continuity correction, use 59.5 as the upper bound. normalcdf( E^99,59.5,50,6.708) =.9216 D. Using the exact binomial calculation, find P( <.12). Compare your answer with the answers you got for parts B and C. Why are they different?

5 D. x =.12(500) = 60 P(x < 60) = P(x 59) = binomcdf(500,.1,59) =.9190 The answer for part D is slightly lower than those for parts B and C because in parts B and C I was using an approximation that includes some extra area.

