Confidence Interval for a Proportion

Size: px
Start display at page:

Download "Confidence Interval for a Proportion"

Transcription

1 Confidence Interval for a Proportion Example 74% of a company s customers would like to see new product packaging. A random sample of 50 customers is taken. X is the number of customers in the sample who would like to see the new packaging; then the sample proportion is X n. The mean and standard deviation for are Mean of : p = 0.74 Standard deviation of : p 1 p n Since np = 50(0.74) = 37 and n(1 p) = 50(0.26) = 13 are both at least 10, the distribution of is approximately Normal. Using the Normal approximation we know that the probability is approximately 95% that falls within 1.96 standard deviations of the mean. 95% of all samples have a sample proportion between ( ) = and ( ) = As 1.96( ) = we could equivalently state is within of For this situation approximately 95% of all possible samples yield a proportion within of (within 12.2% of 74.0%). In general, provided a Normal approximation can be used, 95% of all possible samples yield a proportion within 1.96 p1 p n of p. An approximate 95% Confidence Interval The section begins with a description of steps that lead to a usable result. (A rigorous treatment of the issue requires a good deal of mathematical statistics.) The explanations provided below are simplified. We ve observed the following: For approximately 95% of all samples is within 1.96 p1 p n of p. Flip-flopping p and yields the following statement (which is true): For approximately 95% of all samples p is within n of. In this second statement the interval is random. Different samples yield different values for, which result in different intervals. Confidence Interval for a Proportion 1

2 The Result a 95% Confidence Interval for p For one sample yielding a result, the interval ˆ where E 1.96 p1 p n p E forms an approximate 95% confidence interval for p. is the point estimate of p; E is the error margin associated with the estimate. That is: (approximately) 95% of all random samples (of size n) produce an interval including p within the bounds. When we obtain the random sample and compute the interval from it, we no longer have anything random. At this point we state that we are (approximately) 95% confident that p is within the interval bounds. A couple restrictions are on this formula. 1. It should be applied to situations where units are randomly selected. 2. If sampling is without replacement, check the 20 Times Rule the population must be at least 20 times the sample size to use this result. (If not, and you know the population size, you can use the adjustment described a bit later in this section. However: small populations are uncommon, and the adjustment is rarely needed. You should recognize that when the population is not at least 20 times the sample size, then our recipe above for error margin does not work. 1 ) 3. The actual counts of Successes X and Failures (n X) must both be at least (If not, either you have too small a sample, or the Success probability is likely too close to 0 or 1, for the Normal to be a decent approximation.) If this is not the case, you must seek alternative strategies. (Minitab, and most other statistical software, can obtain the confidence interval by an exact method that doesn t require the Normal distribution. When the counts X and (n X) both at least 10, the exact method and the approximate interval given here will be quite similar.) We state such an interval in one of four equivalent styles: p E p E ˆ p E, E ˆ ˆ E to E E The first is preferred, as it indicates what is being estimated: p. In the first through third versions, the lower value is always stated first. Every confidence interval should be accompanied by an interpretation that states the confidence level (here 95%). ˆ 1 In fact our formula gives too large an error margin. So in essence you can be more than 95% confident in our result when its misapplied to situations where the population is small. Probably not the worse error in the world. 2 Some sources use 5 in place of is better. 5 is somewhat OK, but if either of these values is between 5 and 9, you d be better off getting some help from a statistician, rather than using our methods. Confidence Interval for a Proportion 2

3 Example A simple random sample of 1000 adults finds that 343 approve of the President. Obtain a 95% confidence interval for the proportion of all adults who approve of the President. This is a random sample where p = the proportion of all adults who approve of the President (p is an unknown, but fixed and unvarying, quantity). The population size is huge much larger than 20(1,000) = 20,000. The number of Successes and Failures are 343 and 657 respectively both are well above 10). We can obtain and then interpret a confidence interval with the presented method. x = 343 out of n = 1000 trials. So = Then 1 n E 1.96 This 95% confidence interval for p (unknown) can be written any of four ways: < p < (0.3136, ) to ± The interval is properly interpreted as follows. We are (approximately) 95% confident that the proportion of all adults who approve of the President is. within of (2.94% of 34.30%). between and (31.36% and 37.24%). These two are equivalent. Approximately 95% of all possible samples yield a proportion such that the confidence interval includes p. We have one such sample, chosen at random. We are (approximately) 95% confident that the interval obtained from our sample, to , includes p. Notice! is computed. This involves both the prevalence, 3 in the sample, of Successes/approvers = and the prevalence of Failures/disapprovers = Define prevalence as pretty much the same thing as proportion. In disease and addiction, the word prevalence is often used: The proportion of individuals that has some disease For instance: The prevalence of smoking in US adults is approximately 25%. Confidence Interval for a Proportion 3

4 Confidence interval for a (population) proportion p This section summarizes our results. We extend the treatment to include a confidence interval for a population proportion p with any confidence level. The typical textbook treatment of the issue takes an overblown approach to notation. In addition to the confidence C% is the error rate (lack of confidence?) for the procedure, which is generally denoted with, where almost always is rather small. C% = 1 -. Then take z /2 to be the Z score with /2 area in the right tail of the Normal distribution; its opposite z /2 is the Z score with /2 area in the left tail. Between z /2 and z /2 is area (1 ) = C% the confidence. The 1 Sample Z Confidence Interval for a Proportion An approximate C% confidence interval for p is E where E is the error margin 1 E z 2, n with z /2 from the Standard Normal distribution, taking into consideration the confidence C%. When should you use this formula for a confidence interval? You have a random sample drawn from a population with unknown population proportion p. If sampling is without replacement, the population must be at least 20 times the size of the sample (in most practical cases this is almost trivially true). The sample result has at least 10 Successes and 10 Failures. What if the population is too small or the sample is too large? The at least 20 times rule is violated. There is a relatively simple fix, but to employ it you must know at least reasonably accurately the size N of the population. The error margin becomes E z 2 1 n N n N 1 If you examine the adjustment (a multiplication) you can see the logic in the 20 times rule: When N is more than 20 times n this multiplier will be quite close to, and just below, 1. In ignoring the adjustment, we end up with a slightly bigger E than is required so if anything we will be understating the confidence. Confidence Interval for a Proportion 4

5 What if the both at least 10 rule isn t met? There is a method that works whether or not the both at least 5 rule is met. We won t cover it here. (If you like, learn the Wald Interval described in textbooks. This is also an approximate method but it pretty well for counts below 10.) What if the sampling isn t random? Example No statistical method can guarantee results with any particular reliability (confidence) when sampling is not random. The situation may well be hopeless. A company s human resources department investigates the application materials submitted by 84 applicants for an entry level position over a six month period. One finding is that 15 of the applicants falsified information in the application materials. Assume that the 84 applicants are a random sample from a larger pool of similar applicants. Give a 99% confidence interval for the proportion of all applicants who falsify information in application materials. Solution First check conditions. We take it for granted that the complete pool of applications is at least 20 times larger than the 84 in the sample; we are told to assume the sample is random. (In reality, random sampling in such a circumstance might be hard to accomplish. Still, we might reasonably assume that the sample applicants are representative of the population of all applicants.) Both the number of falsifiers (15) and nonfalsifiers (69) exceed 10. We may use the 1 Sample Z Confidence Interval. For 99% confidence, z 2 z = The estimated/observed proportion of falsifiers is = 15/84 = Then the error margin is E z n Then is the 99% confidence interval: < p < We are 99% confident that between 7.1% and 28.7% of all applicants falsify information. Why should you be able use this formula for a confidence interval? Of course a computer can do this, so why should you know how to it? Two reasons: 1) The formulas are relatively straightforward 4, and requires mostly that you be able to identify and understand the relevant quantities. These same quantities are the inputs into 4 Granted: The formula for error margin is not trivial. If you have trouble getting the proper error margin, you need to practice using the formula until doing so correctly becomes second nature. Confidence Interval for a Proportion 5

6 statistical software. 2) The formula along with trial and error and some examples can assist with your understanding of properties of confidence intervals. Fact worth knowing 95% is the standard confidence level for scientific polls published in the media and online. If a poll does not publish an error margin, you may assume that sampling is not random the poll is not a scientific one. Keep in mind also that many polls with stated error margins are not done properly. You should have less than 95% confidence in results from such polls. Nomenclature There s some terminology that goes with each of these quantities. (The terminology is useful because it is generalized to other situations.) The confidence level (or just confidence) is C. Usually C = 0.90, 0.95 or 0.99 generally we prefer to have a high amount of confidence in our statements. However: There is nothing illegal or necessarily wrong about a 50% CI. (It s just that 50% CIs miss the target quantity half of the time.) Don t confuse the confidence level (or just plain confidence) with the confidence interval. The confidence interval is the interval of values you obtain. is the (point) estimate of p. It s a single good estimate for p from the sample of data. It is the prevalance of Successes in the sample. z 2 is the critical value (from the Standard Normal) that goes with C% confidence. (Some reference materials use simpler notation like z* - and leave it to common sense that this z is the one that goes with the confidence C.) The two endpoints of the interval are the bounds: lower bound and upper bound. The width W of a confidence interval is the distance from the lower to the upper bound. The part in total is referred to as the error margin E. 2E = W. 1 is often called the (estimated) standard error of. A standard error is n essentially a standard deviation. 5 Recall: Standard deviation measures typical deviation from mean. The deviation of from its mean p is a (sampling) error. It s common to see the abbreviation SE for standard error. In this case: 1 SE. n The error margin E for this interval can be expressed E z SE. 2 5 This quantity really is an estimated standard deviation, as the standard deviation of is p1 p n. Confidence Interval for a Proportion 6

7 Mind your p's and q s Many textbooks make the formulas look shorter by using a second letter q to stand for Failure rate. So q = (1 p) and qˆ 1 n x n. When this is done More on interpreting the interval 1 n pq ˆ ˆ. n If you followed the development above, you can deduce the proper interpretation of a confidence interval. It's also possible to take the justification for granted, and come to an interpretive understanding. Different samples give different results. Consider all possible samples. Obtain, for each sample, a 95% confidence interval. Some of these intervals include p, some do not. Most do. In fact: 95% of all of them do. In a statistical study, a single sample is drawn randomly. The data are collected and summarized, and a 95% confidence interval is computed. We have one sample - selected at random from the collection of all samples. Because 95% of all samples lead to an interval that covers p, we are 95% confident that the particular interval we have covers p. We use the word confidence, rather than probability. In statistical applications where parameters are estimated, those parameters are thought of as fixed values describing populations. They do not vary. The parameter p is either in the interval or not. There is no probability involved. Where did the probability go? There was probability - before the sample was selected. This is similar to tossing a coin. Before it's tossed the probability of a Head is 1/2. But once the toss is completed, the probability - for that toss - is either 0 or 1, depending on the outcome. In this application, the probability is either 0 (the interval covers p) or 1 (the interval doesn't), depending on whether in fact the interval does or does not cover p. Not knowing p, we cannot tell. All we know is that 95% of all samples yield an interval covering p. So we are 95% confident that ours does. (Similarly, after the coin is tossed, if you're unable to see the result, you can be 50% confident it's a Head. The word probability doesn't apply here.) In short: Use the word "probability" for random things that haven't yet taken place. Once they've taken place, even if there are unknowns, use the word confidence. The unknowns merely reflect human ignorance about events. Confidence Interval for a Proportion 7

8 Properties of Confidence Intervals Three values impact the error margin of a confidence interval. 1. the prevalence of Success ( ) 2. the sample size (n) 3. the confidence level (C) Undertake an investigation: How do changes in each of these impact the error margin? These issues are addressed through the exercises. In some respects the properties you discover will convince you that statistics makes sense: The numbers work out in ways that common sense would anticipate in advance (Common sense would never anticipate the precise results. 6 But certain procedural properties do make sense. That's what you want to discover.) What a Confidence Interval Cannot Do Notice that the 95% in a 95% confidence interval refers to the percent of all samples that yield an interval that covers p. If we choose one such sample at random, we re 95% confident in that result. The error margin in a confidence interval addresses errors due to random sampling. The error margin in a confidence interval does not include the effects of other errors. Poorly recorded data is one source of error. Or perhaps the study didn t really sample randomly. In these cases, quantifying sampling error is not enough. All these other factors will lead to additional estimation error error that is not captured by our formula. So while you can still use the formula when other types of error are present, it doesn t give a 95% confidence interval. The actual confidence is unknown. For analyses involving nonrandom data, the actual confidence will be considerably lower than 95%. That s a real issue in many studies. Polling Refusals Suppose (to oversimplify) that 88 million people approve of the President and 72 million disapprove. So the President s approval rating is p = 88/160 = A telephone poll is taken. But: The people that approve of the President are crankier than those that do not. They are less likely to put up with an intruding phone call. In fact, 40% of the approvers will not respond (that s 35.2 million people). The disapprovers are more willing to take the call: only 10% of them will refuse (that s 7.2 million people). Here s the breakdown 6 In fact, given the randomness involved in selecting the samples, and the various other attributes that change from problem to problem (n, p, x, as well as the size of the population), it is remarkable that the formula we have is so simple. Confidence Interval for a Proportion 8

9 Approve Disapprove Total Respond Refuse Total The problem here is that our sample is going to reflect the views of only the responders. Of the million responders, 52.8 million approve, for a rating of 52.8/117.6 = While people will be randomly called, those who refuse to respond will not be included in the results. So: Our poll will be estimating With a sample size of 1000, the error margin will be around While some samples will give results higher than 0.45, it is highly unlikely that we ll get a sample that produces a confidence interval including After all: The interval is designed to include This is an example of a biased estimation. A result is biased if it systematically [on average] produces the wrong result. Yes: We could get a random sample that has unusually high amounts of Approvers, and luckily gives an interval including But we are unlikely to do so, because on average our estimate is 0.45 not That s what bias is: The average result from the sampling procedure is not equal to the intended result. If we know that nonresponse occurred with 40% probability among Approvers, and 10% among Disapprovers, we could adjust the survey results accordingly, and produce an unbiased estimate. But generally nonresponse rates are unknown, and the rate changes from survey to survey, depending on what the issue is. It is difficult to adjust results to compensate for the nonresponse issue. Poll results are even harder to interpret when sampling is not done randomly. Internet polls choose subjects by convenience and interest. Only people who care enough to vote will vote. These people may be significantly different in their views than the population of interest. No error margin can fix up such polls. (Hopefully results are stated without an error margin.) Statistical Software Statistical software will compute the confidence interval for p. All you need to do is input three values: n, x, and the confidence C, along with specifying the method the software should use. The interval you have learned is the (approximate) 1 sample Z interval for a proportion. Good software has other choices they use a different "formula" than that above. The formula you have is approximate, and requires at least ten Successes and Failures to allow using the Normal. For cases where this condition is not met, you may have statistical software compute the interval using a different method/formula. 7 In fact, where there are at least ten Successes and 7 Usually called the Exact Binomial interval or method. Confidence Interval for a Proportion 9

10 Failures, you may use the alternative method in place of the 1 sample Z interval; you ll get slightly different results. 8 For really large samples, these differences will be quite small. One quirk about one method software may use: The intervals may not balanced: The estimate is not exactly in the middle of the interval. This is particularly noticeable for results with small precents of either Successes or Failures. (If you have, say, 2 Successes in only trials, then ultimately the value of p is quite small. So the distribution is clamped against the left edge of the range of values, and has right skew. Skewness is an expression of imbance; so it s no surprise that the interval is not balanced. This is a case where you could not use the formula stated above 2 is too few Successes. The both at least 10 restriction prevents use of a Normal approximation when things aren t at all close to Normal.) If both x and in n x are large, the interval not matter which method is used is nearly balanced about, and in fact, the exact interval and the interval from your formula will give very similar results. Sample Size Determination The error margin for our confidence interval is 2 E z 2 SE p ˆ z 2 1 n. This is z 2 equivalent to n 1 E. Suppose, prior to the study, we desire an error margin of E. If we can produce a reasonable educated guess for the prevalence of Successes ( ), then an z E 2 appropriate minimum sample size for the study is n 1 Example 1 Suppose you want to estimate the proportion of students at a large university who are nearsighted. The prevalence for the general population is around Use this as a guess to determine how many students would need to be included in a random sample if you wanted the error margin for a 95% confidence interval to be less than or equal to 2%. Recall that the error margin quantifies the maximum reasonable difference between the observed value and the population value p For really large samples, these differences will be quite small. One quirk about the Exact Binomial method: The intervals may not balanced: The estimate is generally not exactly in the middle of the interval. This is particularly noticeable for results with small prevalence of either Successes or Failures. (Any p other than 0.5 implies some asymmetry, and these intervals reflect this.) If both x and n x are large, the Exact Binomial interval is nearly symmetric about the sample propotion, and in fact, the exact interval and the interval from your formula will give very similar results. Confidence Interval for a Proportion 10

11 Solution: The desired error margin is E = Our guess is = The required sample size is n Of course we cannot sample 0.99 of a student, so we move 0.02 up to The actual study was run and it turned out that 951 of 2377 randomly sampled students were nearsighted. This yields a 95% confidence interval of Remark 1 If our guess is closer to 0.5 than prevalence observed in the data, then the actual error margin will be greater than desired. If the guess is further from 0.5, then the actual error margin will be less than the desired E d. (If the two are equal then someone is a very lucky guesser.) Example 2 In a study of a new drug, the researchers assume that the cure rate for the drug is the same, 0.60, as for the established drug. What sample size is required to obtain a 99% confidence interval with error margin no greater than 0.05? Solution: The desired error margin is E = Our guess is The required sample size is n Of course one cannot sample 0.03 of a patient. To ensure a 0.05 large enough sample size, round up to 638. When the data are collected, it s found that the drug is much more effective than the established drug: 573 of the 638 patients (that s 89.8%) are cured. The 99% confidence interval is Notice that the error margin is far below But this came at a cost. If they had known that the prevalence would be around 0.90, the researchers could have used 0.90 for a guess, and determined a sample size of 239 (because n ). Notice that = They ended up sampling 399 more patients than necessary. This cost 239 them time and money if they d had any clue the cure rate would be far higher, they could have taken advantage of it. Remark 2 If the actual error margin is less than the desired value, then, while the error margin is bettered, the expense of conducting the study was larger than necessary. A smaller sample size would have sufficed to obtain the desired error margin. There is no way to choose exactly the right sample size. In some cases, we may have no idea what the prevalence is in advance. If no guess is possible we are completely in the dark as to 2 Confidence Interval for a Proportion 11

12 sample size n the prevalence we can use 0.5. This guarantees that actual error margin to be no larger than what is desired. On the other hand, it also pretty much guarantees that we will take a larger sample than is necessary (only if the prevalence turns out to be 50% will the sample be just large enough ). Remark 3 Whenever the range of plausible guesses includes 0.5, use 0.5 as the guess. This rule works when one has no idea what the prevalence is: the range of plausible guesses is from 0 to 1, which certainly includes 0.5. Most two-candidate political races are reasonably close. Pollsters 9 generally use 0.50 to determine the sample size. Using a guess of 0.50 tends not to lead to dramatic oversampling unless the result falls below 1/3 or above 2/3. Example 3 Production line defects occur infrequently at an industrial plant. In the past the rate has generally been between 2% and 6% (this value would change over time as the production line, and the employees working on it, change). What sample size is required to estimate the current rate at 90% confidence with error margin no larger than 4%? If we assume 2%, then the required size is 33; if 6% is assumed, the required size is 96. Here s a plot of the relationship. (The relationship is not exactly linear. However: Linear interpolation would work well here. In general, as long as the proportion is confined to a small range of values to one side of 0.50, interpolation does work fine.) proportion p You can see that 6% requires the largest n (it s closest to 0.5). To cover all historical possibilities, use n = 96. If the rate is actually less than 0.06, you will have oversampled. What if you sample less than 96? Perhaps a good idea, but if the rate is near 0.06 you won t get the desired error margin. And, of course, if production falls seriously out of control, you might see a result much higher than 6% leading to an error margin considerably larger than Remark 4 A good idea is to produce a range of plausible guesses, and find the sample size for a number of values within that range. Graph this relationship. If the final decision isn t yours, you can place your graph in front of the decision maker. 9 People or organizations who are paid to conduct polls. Confidence Interval for a Proportion 12

13 sample size n This point is illustrated by Example 3. The decision maker on sample size needs to see that graph. At right is a plot of the required sample size for 95% confidence intervals having error margin 3%. The curve has the same shape for other confidence levels and error margins. You can see that a prevalence of 0.50 requires the largest sample size. Appendix A general format for confidence intervals The confidence interval we just studied is proportion p z /2 SE( ) Where z /2 is the critical value, found from the Normal, taking into consideration the desired level of confidence, and the standard error of the estimate is SE( ) = 1, yielding n 1 the error margin E z * SE z *. n More generally, provided the conditions are right, a confidence interval is determined with Where Estimate Error margin Error margin = critical value SE(estimate) This formula is broadly applicable to all sorts of data analyses. In almost all circumstances SE(Estimate) has the square root of the sample size(s) in the denominator. Confidence Interval for a Proportion 13

14 Exercises 1. There are 8640 students enrolled at SUNY Oswego this semester; 5146 live more than 50 miles from campus. A professor (unaware of these figures) samples 92 students and finds that 60 of them live more than 50 miles from campus. a) Identify the following: i) The population proportion p; ii) The sample count X; iii) The sample proportion. b) Which of p and is a parameter? Which is a statistic? A student (also unaware of the whole-campus figures) is about to randomly select 142 students to estimate the proportion who live more than 50 miles away. c) For the student: What are the mean and standard deviation for? Interpret this mean. (Be sure to include the phrase all possible samples in your statement.) 2. The saturation rate for a particular kind of marketing via a newspaper ad is 15%. That is: 15% of all newspaper buyers will read the ad. For a new ad, marketers randomly sample 30 buyers and determines that 2 have read the ad. a) Identify values for p, X, and. b) Which of p and is a parameter? Which is a statistic? For the following exercises, when you interpret results, use the word all or population. 3. A random sample of 212 adoptive parents finds that 85 of them stated No Preference for their child s gender. Use this sample data to construct a 95% confidence interval estimate for the proportion of adoptive parents who state No Preference. Explicitly identify the following: a) The point estimate. b) The critical value (Z /2 ). c) The error margin. d) Write the interval bounds in this format: E. e) Express the interval in this format: < <. f) What confidence do you have in this result? g) Explain what p represents in this situation. Is its value known? h) : Parameter or Statistic? p: Parameter or Statistic? i) Interpret your interval in words. We are 95% confident that Confidence Interval for a Proportion 14

15 4. The Genetics and IVF Institute conducted a clinical trial of the XSORT method designed to increase the probability of conceiving a girl. 325 babies were born to parents using XSORT, and 295 of them were girls. Use this data to construct a 99% confidence interval for the proportion of girls born to parents using XSORT. Interpret your result. 5. Do individuals have the ability to temporarily postpone death to survive a major holiday? (The hypothesis would be that these holidays are family affairs that give a dying person incentive to live a bit longer.) In one study, deaths, over the period from one week before to one week after Thanksgiving, were examined. Of these, 6062 occurred in the week before Thanksgiving. Give a 95% confidence interval for the proportion of deaths in this two week period that occur in the earlier week. Interpret your result. Does your data conclusively support the postpone death theory? (Hint: Check where 0.5 lands relative to your interval.) 6. Complete the small table indicating which critical value from the Standard Normal table goes with the given levels of confidence. C 50% 75% 90% 95% 98% 99% 99.9% 99.99% Z / Over a period of 11 years in Hidalgo County, Texas, 870 people were selected for grand jury duty, and 39% of them were Mexican-American. Notice that you are told the value of - you don t have to compute it: = From this you can deduce that the number X of Mexican-Americans in the sample. Since 0.39(870) = 339.3, the number must be 339 (it can t be you can t select 3-10ths of a Mexican-American). The given value is rounded for convenience: 339/870 = to four significant digits, and to three, which is sufficient for computing purposes.) a) Assume these data represent a random sample of jury-duty-eligible county citizens. Obtain a 99% confidence for the percent of all county citizens that are Mexican- American. Interpret your result. b) It was determined that 79.1% of all county citizens were Mexican-American. What does your confidence interval suggest about selection for jury duty? 8. Perform an investigation of the relationship between confidence and error margin. Here s how. a) Take exercise 7, where n = 870 and = You ve already obtained a 99% confidence interval: The error margin is Now obtain a 95% confidence interval; determine the error margin. b) Compute intervals for each of the confidence levels specified in the table. Fill in the table below with the error margin for the various levels of confidence. Confidence Interval for a Proportion 15

16 C 50% 75% 90% 95% 99% 99.9% E c) Write a sentence describing how the error margin changes as the confidence is increased (decreased). 9. A recent survey of 4276 randomly selected households showed that 94.0% of them had telephones. a) How many of the 4276 households have telephones? Answer with a whole number. b) What is the value of to the nearest ? c) Using these results, construct a 99% confidence interval for the proportion of households with telephones. Interpret your result. What is the error margin for this interval? d) Give a 99% confidence interval for the proportion of households without telephones. How does the error margin compare to that in part c? 10. Gregor Mendel was responsible for famous genetics experiments with peas. In one experiment he crossed lines of peas, and the results included 428 green peas and 152 yellow peas. a) Find a 95% confidence interval for the proportion of all peas that are green. Interpret your result. b) Mendel s theory of genetic propagation of inherited traits predicted that 75% of all peas would be green. Is the theory refuted by his data? 11. Perform an investigation of the relationship between sample size and error margin. Here s how. Take exercise 7 where = 428/580 = You ve already obtained a 95% confidence interval: (keep figures to the nearest for this). The error margin is a) Suppose hypothetically the study had investigated four times fewer peas, but the percent that are green is the same: 107 green and 38 yellow. Determine a 95% confidence interval for this outcome. Place the error margin in the table below. The sample size is 4 times smaller: How many times larger is this error margin? b) Suppose hypothetically the study had investigated twenty-five times more peas than in the actual study, with green and 3800 yellow. Determine a 95% confidence interval for this outcome. Place the error margin in the table below. The sample size is 25 times larger: How many times smaller is this error margin? n E Confidence Interval for a Proportion 16

17 c) Write a sentence describing how the error margin changes when the sample size is k times larger. Check the solution to make sure you have the right result in mind as you go forward. 12. Jack conducts a student opinion poll and gets an error margin of 10% for his result. He is not happy. He wants 3%. How must he adjust his sample size? (Assume the confidence and sample proportion remain the same.) 13. A poll of 4000 people gives an error margin of What would the error margin be for a similar poll of 800 people? (Assume the confidence and sample proportion remain the same.) Summary Here is the formula for the error margin: to look at the formula and deduce that 10 : E z 2 1 n. At this point you ought to be able The error margin increases when the confidence is increased. (This happens through the value z /2.) The error margin decreases when the sample size is increased. (The sample size appears in the denominator of the formula.) In particular, the relationship is that increasing (decreasing) the sample size by a factor of k decreases (increases) the error margin by a factor of k. (That s because the sample size appears in the square root.) The error margin does not depend on what is called a Success and what is called a Failure. (Exercise 9 parts c and d explicitly address this.) 14. Perform an investigation of the relationship between the sample prevalence of Success and error margin. Here s an exercise that will help you do this. What is the relation between the socio-economic status of parents and college graduation of their children? Different groups are sampled in order to make comparisons. For each socioeconomic status, n = 400 children are sampled and tracked through adulthood. The number of the 400 who graduate from college is recorded. a) Obtain a 95% CI for each socio-economic status. Determine the error margin for each interval. Place your results in the table below. b) How do error margins compare for the cases = 0.10 and = 0.90? How about = 0.20 and = 0.80? = 0.30 and = 0.70? Why does this make sense? 10 Assuming that all other factors stay the same. Confidence Interval for a Proportion 17

18 c) Write a single sentence describing the relationship between the proportion and the error margin of the confidence interval. Parents Status # of grads 95% CI Error Margin Welfare 40 Poor 80 Low Income 120 Middle Income 200 High Income 280 Wealthy 320 Super rich Go back to problem 10. The 95% confidence interval is < p < The point estimate for the proportion of green peas is = with error margin is State the 95% confidence interval for the proportion of peas that are yellow. You should be able to do so using the results shown here and addition and subtraction. 16. A poll reveals that candidate D has 44.2% of sampled voters leaning towards D (error margin 3.5%). Remember: All media polls are done at 95% confidence unless stated otherwise. a) Interpret this result. b) Suppose there is only one other candidate, R. Give the 95% confidence interval for candidate R. c) Suppose there are instead three candidates, R, D and U. Is it possible to give an estimate and error margin for candidate R? d) In a two candidate race, is it possible that D is actually ahead? (Hint: Suppose U doesn t stand for a candidate, but indicates Undecided. ) 17. A recent newspaper opinion poll found that 81% of Americans are in favor of a military drawdown in Iraq (error margin 4%). Interpret this statement. Include a confidence level. Summary In #12 above you confirmed that (assuming the sample size and confidence stay the same) the error margin is largest when = 0.5 and gets smaller as the proportion falls away from 0.5. This makes intuitive sense: When the prevalence is near 50% there is more uncertainty; when the prevalence is near 0 (or 1) there is more certainty. Confidence Interval for a Proportion 18

19 18. At SUNY Oswego n = 125 students are randomly selected; 100 of them are opposed to a proposal that calls for the college to jam cell phone signals in classrooms. (This would prevent texting in class.) You can confirm that a 90% confidence interval is < p < a) Give the values of the point estimate and error margin for the interval. b) This survey was also conducted at Penn State University. The results of the survey were exactly the same: 100 of 125 sampled students opposed the proposal. What is the 90% confidence interval for Penn State? c) It should be noted that Penn State has about 5 times more students than does SUNY Oswego. What impact does the population size have on the error margin for a confidence interval? Consult the formula: Where does the population size play in to matters? 19. Jake, John and Jaspar are conducting a study: What proportion of SUNY Oswego students stay in Oswego over the Halloween weekend. a) Describe in words what p represents. Is p a parameter or a statistic? Would the value of p be easy to obtain? b) They sample students randomly: 56 of 80 sampled students stay in Oswego. This gives = Is this value a statistic or a parameter? c) Jake determines correctly that the error margin for a 95% confidence interval is The confidence interval is < p < John says I think we should use a 90% confidence level. If John s directive is followed, will the error margin increase or decrease? d) Jaspar too is frustrated by these results. He s OK with the 90% confidence. But for him, the error margin of 10% is too large. Jaspar says I want an error margin of about 2%. Assuming the result falls at 70%, is a larger or smaller sample required to achieve an error margin of 0.02? What sample size is required to achieve an error margin of 0.02? 20. In one study of college students, 83.0% admitted to having cheated on a test, with an error margin of 7.0% (using 95% confidence). A second similar study found the same result of 83.0%; however, the sample size was three times as large. Is the error margin for the second study (again at 95% confidence) larger or smaller than 7.0%? What is the error margin for the second study? 21. Surveys of people were taken in three countries: Mexico, the United States, and Canada. The same number of people were surveyed within each country. In the U.S., 40% of people agreed with the statement There is an urgent need to take action on global warming. In Mexico the result was 25%; in Canada 80%. a) Does the size of the country s population have anything to do with the error margin? Confidence Interval for a Proportion 19

20 b) Convince yourself that the answer to part a is No. For which of these countries is the error margin for a confidence interval the largest? The smallest? (Assume the same confidence level is used for all three results.) 22. You want a 95% confidence interval estimate with error margin 4% for the proportion of science majors who are left handed. How many science majors do you sample? a) Describe in words the parameter you are estimating. What symbol is it given? b) Assume you have no idea what the prevalence of lefties is for this population. Use a guess of 0.5 to determine the required sample size. c) In the general population, 10% of people are lefties. Use this value to determine the sample size. d) Which of the answers from c or d is the better choice? e) It turns out that 24 of 217 sampled science majors are lefties. Obtain the confidence interval. How does the error margin compare to 4%? 23. Suppose you undertook a study of the day of the week that babies are born. You are interested in the proportion of babies born on a weekend (Saturday or Sunday). Your goal is a 90% confidence interval with error margin no greater than 3.5%. a) Explain why a guess of 0.50 is unreasonable. b) What is a better value for this guess? c) In fact, 25% is probably an adequate value for the guess. If all days are equally likely, then 2/7 = 28.6% should be born on weekends. However, in recent years there is more of a trend for doctors to induce pregnancy, which usually happens on a weekday! Use 0.25 to obtain a sample size for this study. d) If the actual prevalence is = 0.25, determine the confidence interval when the sample size from c is used. Identify the error margin does it meet the goal of 0.035? What would such a result say about the 2/7 hypothesis? e) If the actual prevalence is 0.20 and the sample size from c is used, how will the error margin compare to 0.035? Explain. 24. Consider a large city s mayoral race where there are two candidates. a) Determine the required sample size for a media poll to estimate the percent of people who favor the Republican candidate with error margin 3%. b) Does the required sample size depend on the population of the city? 25. What proportion of people die during summer (as officially defined)? You decide to investigate this issue by collecting data. How many obituaries would you examine in order to obtain a 98% confidence interval estimate with error margin of 1%? Confidence Interval for a Proportion 20

21 Solutions 1. a) i) p = 5146/8640 = ; ii) X = 60; iii) = 60/92 = p = is a parameter; = is a statistic, c) The mean is p = If we examined all possible samples of 142 students, determining for each sample the proportion who live more than 50 miles from home, the mean of these proportions is The standard deviation of these proporitons is = a) p = This is a parameter. X = 2. = 2/30 = p = 0.15 is a parameter; is are statistic. 3. a) = 85/212 = b) c) E = E d) e) < p < f) 95%. g) p is the proportion of all adoptive parents who state No Preference for their child s gender. h) : Statistic; p: Parameter. i) We are 95% confident that between 33.5% and 46.7% of all adoptive parents state No Preference for their child s gender or (0.866, 0.949) or < p < These are equivalent and either is correct. You are 99% confident that between 86.6% and 94.9% of all babies conceived using XSORT will be girls. 5. (0.496, 0.514) or < p < I am 95% confident that between 49.6% and 51.4% of all deaths occurring in the two week period surrounding Thanksgiving take place in the week before it. There is very little evidence supporting the theory. 0.5 is within this interval, indicated that the postpone death theory is not conclusively supported by the data. 6. Results are shown to the nearest Typically it s ok to report Z scores to the nearest However, when you compute with Z scores, you should use as much accuracy as is possible. Since accuracy is automatically preserved by the spreadsheet function NORMSDIST (even if you round the cell display), you should be actually using all decimal places of accuracy when you compute with a Z score. C 50% 75% 90% 95% 98% 99% 99.9% 99.99% Z / a) (0.347, 0.432) or < p < I am 99% confident that between 34.7% and 43.2% of all jurors are Mexican-American. b) Since this interval estimates the population proportion, and is the actual value for this proportion, something is wrong. It must be the case that jury selection is not random, and that, in fact, there is systematic bias keeping Mexican-Americans from jury duty. 8. For all these, the estimated proportion is Confidence Interval for a Proportion 21

22 a) The 95% confidence interval is < p < The error margin is b) Error margins are stated in the table. C 50% 75% 90% 95% 99% 99.9% E c) As the confidence increases, the error margin increases. 9. a) 4276(0.940) = so 4019 have telephones. b) = 4019/4276 = c) The error margin is about (0.9305, ) or < p < I am 99% confident that the proportion of all households (the population of households) having a telephone is between and The error margin is a little short of 1%. d) or < p < The error margins are the same. The intervals are essentially the same: One is expressed in terms of a proportion of people having a telephone; the other in terms of those not having a telephone. 10. a) < p < or b) This does not refute the theory, as 0.75 is among the plausible values for p given in the interval (in which we have fairly high confidence). 11. a) For n = 145 the interval is < p < or is the error margin. This is twice as large as b) The interval is Error margin E = ( to be more precise). This is 5 times smaller than in the actual study. c) If the sample size is increased to k times larger, the error margin decreases by k times. Or, if the error margin is to be k times smaller, the sample size must be k 2 times larger. 12. Jack wants an error margin that is 10/3 = 3.33 times smaller. His sample size must be times larger: = times larger. 13. Since the sample size is 5 times smaller, the error margin will be 5 = 2.24 times smaller. 0.01/2.24 = a) Parents Status # of grads 95% CI Error Margin Welfare < p < Poor < p < Low Income < p < Middle Income < p < High Income < p < Wealthy < p < Super rich < p < Confidence Interval for a Proportion 22

23 b) They are the same for each pair. A Success prevalence of 90% is equivalent to a Failure prevalence of 10%, so the error margins must be the same. c) Error margin is largest for prevalence = 0.5 and drops (symmetrically) as the prevalence gets further from 0.5 on either side of Yellow has prevalence = So the interval is Or take = and = to get (0.226, 0.298). 16. a) I am 95% confident that between 40.7% and 47.7% of all voters lean towards D. b) 55.8% 3.5%. c) No. We don t know how the 55.8% is split up. d) Yes. If there are, for instance, 20% undecided, then the result for R is around 35.8%. 17. This is a media poll the confidence is 95%. I am 95% confident that between 77% and 85% of all Americans favor a drawdown. 18. a) The point estimate is 0.800, the error margin is b) The interval at Penn State is exactly the same. c) The population size does not play into this. The formula for error margin depends only upon the prevalence and the sample size n. This is an underappreciated fact about sampling and statistical analysis: As long as a population is large (at least 20 times bigger than the sample), its size is pretty much immaterial. What matters in most practical situations is the sample size. 19. a) p is the proportion of all SUNY Oswego students who stay in Oswego. It s a parameter. It d be difficult to get this value you d have to census virtually every student. b) 0.70 is a statistic it describes a sample. c) For a 90% confidence interval the error margin will be smaller. (See #7.) d) To get an error margin that is 5 times smaller will require a sample size that is 5 2 = 25 times larger. (See exercises ) That s 2000 students. 20. The error margin for a larger sample size will be smaller. It will not be three times smaller. It will be 3 = times smaller: 0.07 / = about 4%. (See exercises ) 21. a) No. b) The error margin is smallest for Canada and largest for the United States. (Go back and examine #14.) 22. a) The symbol is p. p = the proportion of all science majors at this university who are left handed. b) n Select 601 science majors c) n Select 217 science majors. d) 217 is the better choice. The lefty 0.04 rate for scientists is going to be fairly close to that for the general population. (Not only that, but the sample size is smaller!) e) Pretty close to 0.04 for the error margin. It missed a little because the actual lefty rate was slightly closer to 0.5 than the guess of 0.10 that was used to determine the sample size. Confidence Interval for a Proportion 23

24 23. a) The weekend constitutes 2 days out of 7. We wouldn t expect half of all births to occur in 2/7 th s of all days. b) A better guess would be 2/7 = (anything from 0.25 to 0.30 is reasonable; anything outside of this is not) c) n , so sample 415. d) The interval is (0.215, 0.285) The error margin is right on the target. The interval does not include = 2/7. So we have some evidence that the 2/7 hypothesis is false. e) It will be smaller. When the actual result falls further from 0.5, the error margin is smaller. (In fact, if 0.20 is the result, the error margin is ) 24. a) It s a media poll, so the confidence is 95%. The required sample size is then (This number is well known to pollsters. Many polls have 3% error rates because they use a sample size of around 1000.) b) Absolutely not. Exercises 18 and 21 covered this. 25. You shouldn t guess anything except The required sample size is then n , so sample 10,145 obituaries. Confidence Interval for a Proportion 24

CALCULATIONS & STATISTICS

CALCULATIONS & STATISTICS CALCULATIONS & STATISTICS CALCULATION OF SCORES Conversion of 1-5 scale to 0-100 scores When you look at your report, you will notice that the scores are reported on a 0-100 scale, even though respondents

More information

The Margin of Error for Differences in Polls

The Margin of Error for Differences in Polls The Margin of Error for Differences in Polls Charles H. Franklin University of Wisconsin, Madison October 27, 2002 (Revised, February 9, 2007) The margin of error for a poll is routinely reported. 1 But

More information

6.4 Normal Distribution

6.4 Normal Distribution Contents 6.4 Normal Distribution....................... 381 6.4.1 Characteristics of the Normal Distribution....... 381 6.4.2 The Standardized Normal Distribution......... 385 6.4.3 Meaning of Areas under

More information

5.1 Identifying the Target Parameter

5.1 Identifying the Target Parameter University of California, Davis Department of Statistics Summer Session II Statistics 13 August 20, 2012 Date of latest update: August 20 Lecture 5: Estimation with Confidence intervals 5.1 Identifying

More information

Statistics 2014 Scoring Guidelines

Statistics 2014 Scoring Guidelines AP Statistics 2014 Scoring Guidelines College Board, Advanced Placement Program, AP, AP Central, and the acorn logo are registered trademarks of the College Board. AP Central is the official online home

More information

Lesson 17: Margin of Error When Estimating a Population Proportion

Lesson 17: Margin of Error When Estimating a Population Proportion Margin of Error When Estimating a Population Proportion Classwork In this lesson, you will find and interpret the standard deviation of a simulated distribution for a sample proportion and use this information

More information

Review. March 21, 2011. 155S7.1 2_3 Estimating a Population Proportion. Chapter 7 Estimates and Sample Sizes. Test 2 (Chapters 4, 5, & 6) Results

Review. March 21, 2011. 155S7.1 2_3 Estimating a Population Proportion. Chapter 7 Estimates and Sample Sizes. Test 2 (Chapters 4, 5, & 6) Results MAT 155 Statistical Analysis Dr. Claude Moore Cape Fear Community College Chapter 7 Estimates and Sample Sizes 7 1 Review and Preview 7 2 Estimating a Population Proportion 7 3 Estimating a Population

More information

Simple Regression Theory II 2010 Samuel L. Baker

Simple Regression Theory II 2010 Samuel L. Baker SIMPLE REGRESSION THEORY II 1 Simple Regression Theory II 2010 Samuel L. Baker Assessing how good the regression equation is likely to be Assignment 1A gets into drawing inferences about how close the

More information

p ˆ (sample mean and sample

p ˆ (sample mean and sample Chapter 6: Confidence Intervals and Hypothesis Testing When analyzing data, we can t just accept the sample mean or sample proportion as the official mean or proportion. When we estimate the statistics

More information

Characteristics of Binomial Distributions

Characteristics of Binomial Distributions Lesson2 Characteristics of Binomial Distributions In the last lesson, you constructed several binomial distributions, observed their shapes, and estimated their means and standard deviations. In Investigation

More information

Introduction to Hypothesis Testing

Introduction to Hypothesis Testing I. Terms, Concepts. Introduction to Hypothesis Testing A. In general, we do not know the true value of population parameters - they must be estimated. However, we do have hypotheses about what the true

More information

4. Continuous Random Variables, the Pareto and Normal Distributions

4. Continuous Random Variables, the Pareto and Normal Distributions 4. Continuous Random Variables, the Pareto and Normal Distributions A continuous random variable X can take any value in a given range (e.g. height, weight, age). The distribution of a continuous random

More information

Normal distribution. ) 2 /2σ. 2π σ

Normal distribution. ) 2 /2σ. 2π σ Normal distribution The normal distribution is the most widely known and used of all distributions. Because the normal distribution approximates many natural phenomena so well, it has developed into a

More information

CONTINGENCY TABLES ARE NOT ALL THE SAME David C. Howell University of Vermont

CONTINGENCY TABLES ARE NOT ALL THE SAME David C. Howell University of Vermont CONTINGENCY TABLES ARE NOT ALL THE SAME David C. Howell University of Vermont To most people studying statistics a contingency table is a contingency table. We tend to forget, if we ever knew, that contingency

More information

Binomial Sampling and the Binomial Distribution

Binomial Sampling and the Binomial Distribution Binomial Sampling and the Binomial Distribution Characterized by two mutually exclusive events." Examples: GENERAL: {success or failure} {on or off} {head or tail} {zero or one} BIOLOGY: {dead or alive}

More information

Point and Interval Estimates

Point and Interval Estimates Point and Interval Estimates Suppose we want to estimate a parameter, such as p or µ, based on a finite sample of data. There are two main methods: 1. Point estimate: Summarize the sample by a single number

More information

6 3 The Standard Normal Distribution

6 3 The Standard Normal Distribution 290 Chapter 6 The Normal Distribution Figure 6 5 Areas Under a Normal Distribution Curve 34.13% 34.13% 2.28% 13.59% 13.59% 2.28% 3 2 1 + 1 + 2 + 3 About 68% About 95% About 99.7% 6 3 The Distribution Since

More information

Session 7 Bivariate Data and Analysis

Session 7 Bivariate Data and Analysis Session 7 Bivariate Data and Analysis Key Terms for This Session Previously Introduced mean standard deviation New in This Session association bivariate analysis contingency table co-variation least squares

More information

1.7 Graphs of Functions

1.7 Graphs of Functions 64 Relations and Functions 1.7 Graphs of Functions In Section 1.4 we defined a function as a special type of relation; one in which each x-coordinate was matched with only one y-coordinate. We spent most

More information

Unit 1 Number Sense. In this unit, students will study repeating decimals, percents, fractions, decimals, and proportions.

Unit 1 Number Sense. In this unit, students will study repeating decimals, percents, fractions, decimals, and proportions. Unit 1 Number Sense In this unit, students will study repeating decimals, percents, fractions, decimals, and proportions. BLM Three Types of Percent Problems (p L-34) is a summary BLM for the material

More information

6.3 Conditional Probability and Independence

6.3 Conditional Probability and Independence 222 CHAPTER 6. PROBABILITY 6.3 Conditional Probability and Independence Conditional Probability Two cubical dice each have a triangle painted on one side, a circle painted on two sides and a square painted

More information

Week 3&4: Z tables and the Sampling Distribution of X

Week 3&4: Z tables and the Sampling Distribution of X Week 3&4: Z tables and the Sampling Distribution of X 2 / 36 The Standard Normal Distribution, or Z Distribution, is the distribution of a random variable, Z N(0, 1 2 ). The distribution of any other normal

More information

Mind on Statistics. Chapter 12

Mind on Statistics. Chapter 12 Mind on Statistics Chapter 12 Sections 12.1 Questions 1 to 6: For each statement, determine if the statement is a typical null hypothesis (H 0 ) or alternative hypothesis (H a ). 1. There is no difference

More information

STA 130 (Winter 2016): An Introduction to Statistical Reasoning and Data Science

STA 130 (Winter 2016): An Introduction to Statistical Reasoning and Data Science STA 130 (Winter 2016): An Introduction to Statistical Reasoning and Data Science Mondays 2:10 4:00 (GB 220) and Wednesdays 2:10 4:00 (various) Jeffrey Rosenthal Professor of Statistics, University of Toronto

More information

The Binomial Distribution

The Binomial Distribution The Binomial Distribution James H. Steiger November 10, 00 1 Topics for this Module 1. The Binomial Process. The Binomial Random Variable. The Binomial Distribution (a) Computing the Binomial pdf (b) Computing

More information

What is a P-value? Ronald A. Thisted, PhD Departments of Statistics and Health Studies The University of Chicago

What is a P-value? Ronald A. Thisted, PhD Departments of Statistics and Health Studies The University of Chicago What is a P-value? Ronald A. Thisted, PhD Departments of Statistics and Health Studies The University of Chicago 8 June 1998, Corrections 14 February 2010 Abstract Results favoring one treatment over another

More information

Two-sample inference: Continuous data

Two-sample inference: Continuous data Two-sample inference: Continuous data Patrick Breheny April 5 Patrick Breheny STA 580: Biostatistics I 1/32 Introduction Our next two lectures will deal with two-sample inference for continuous data As

More information

Estimates and Sample Sizes

Estimates and Sample Sizes 7-1 Review and Preview 7-2 Estimating a Population Proportion 7-3 Estimating a Population Mean: s Known 7-4 Estimating a Population Mean: s Not Known 7-5 Estimating a Population Variance Estimates and

More information

Chapter 8 Hypothesis Testing Chapter 8 Hypothesis Testing 8-1 Overview 8-2 Basics of Hypothesis Testing

Chapter 8 Hypothesis Testing Chapter 8 Hypothesis Testing 8-1 Overview 8-2 Basics of Hypothesis Testing Chapter 8 Hypothesis Testing 1 Chapter 8 Hypothesis Testing 8-1 Overview 8-2 Basics of Hypothesis Testing 8-3 Testing a Claim About a Proportion 8-5 Testing a Claim About a Mean: s Not Known 8-6 Testing

More information

Independent samples t-test. Dr. Tom Pierce Radford University

Independent samples t-test. Dr. Tom Pierce Radford University Independent samples t-test Dr. Tom Pierce Radford University The logic behind drawing causal conclusions from experiments The sampling distribution of the difference between means The standard error of

More information

Association Between Variables

Association Between Variables Contents 11 Association Between Variables 767 11.1 Introduction............................ 767 11.1.1 Measure of Association................. 768 11.1.2 Chapter Summary.................... 769 11.2 Chi

More information

Fairfield Public Schools

Fairfield Public Schools Mathematics Fairfield Public Schools AP Statistics AP Statistics BOE Approved 04/08/2014 1 AP STATISTICS Critical Areas of Focus AP Statistics is a rigorous course that offers advanced students an opportunity

More information

Chapter 1 Introduction to Correlation

Chapter 1 Introduction to Correlation Chapter 1 Introduction to Correlation Suppose that you woke up one morning and discovered that you had been given the gift of being able to predict the future. Suddenly, you found yourself able to predict,

More information

" Y. Notation and Equations for Regression Lecture 11/4. Notation:

 Y. Notation and Equations for Regression Lecture 11/4. Notation: Notation: Notation and Equations for Regression Lecture 11/4 m: The number of predictor variables in a regression Xi: One of multiple predictor variables. The subscript i represents any number from 1 through

More information

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r),

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r), Chapter 0 Key Ideas Correlation, Correlation Coefficient (r), Section 0-: Overview We have already explored the basics of describing single variable data sets. However, when two quantitative variables

More information

Recall this chart that showed how most of our course would be organized:

Recall this chart that showed how most of our course would be organized: Chapter 4 One-Way ANOVA Recall this chart that showed how most of our course would be organized: Explanatory Variable(s) Response Variable Methods Categorical Categorical Contingency Tables Categorical

More information

CHAPTER 4 DIMENSIONAL ANALYSIS

CHAPTER 4 DIMENSIONAL ANALYSIS CHAPTER 4 DIMENSIONAL ANALYSIS 1. DIMENSIONAL ANALYSIS Dimensional analysis, which is also known as the factor label method or unit conversion method, is an extremely important tool in the field of chemistry.

More information

The right edge of the box is the third quartile, Q 3, which is the median of the data values above the median. Maximum Median

The right edge of the box is the third quartile, Q 3, which is the median of the data values above the median. Maximum Median CONDENSED LESSON 2.1 Box Plots In this lesson you will create and interpret box plots for sets of data use the interquartile range (IQR) to identify potential outliers and graph them on a modified box

More information

Hypothesis Testing for Beginners

Hypothesis Testing for Beginners Hypothesis Testing for Beginners Michele Piffer LSE August, 2011 Michele Piffer (LSE) Hypothesis Testing for Beginners August, 2011 1 / 53 One year ago a friend asked me to put down some easy-to-read notes

More information

Probability Distributions

Probability Distributions CHAPTER 5 Probability Distributions CHAPTER OUTLINE 5.1 Probability Distribution of a Discrete Random Variable 5.2 Mean and Standard Deviation of a Probability Distribution 5.3 The Binomial Distribution

More information

Chapter 6: Probability

Chapter 6: Probability Chapter 6: Probability In a more mathematically oriented statistics course, you would spend a lot of time talking about colored balls in urns. We will skip over such detailed examinations of probability,

More information

TEACHER NOTES MATH NSPIRED

TEACHER NOTES MATH NSPIRED Math Objectives Students will understand that normal distributions can be used to approximate binomial distributions whenever both np and n(1 p) are sufficiently large. Students will understand that when

More information

9. Sampling Distributions

9. Sampling Distributions 9. Sampling Distributions Prerequisites none A. Introduction B. Sampling Distribution of the Mean C. Sampling Distribution of Difference Between Means D. Sampling Distribution of Pearson's r E. Sampling

More information

Introduction to. Hypothesis Testing CHAPTER LEARNING OBJECTIVES. 1 Identify the four steps of hypothesis testing.

Introduction to. Hypothesis Testing CHAPTER LEARNING OBJECTIVES. 1 Identify the four steps of hypothesis testing. Introduction to Hypothesis Testing CHAPTER 8 LEARNING OBJECTIVES After reading this chapter, you should be able to: 1 Identify the four steps of hypothesis testing. 2 Define null hypothesis, alternative

More information

Lab 11. Simulations. The Concept

Lab 11. Simulations. The Concept Lab 11 Simulations In this lab you ll learn how to create simulations to provide approximate answers to probability questions. We ll make use of a particular kind of structure, called a box model, that

More information

Week 4: Standard Error and Confidence Intervals

Week 4: Standard Error and Confidence Intervals Health Sciences M.Sc. Programme Applied Biostatistics Week 4: Standard Error and Confidence Intervals Sampling Most research data come from subjects we think of as samples drawn from a larger population.

More information

How To Check For Differences In The One Way Anova

How To Check For Differences In The One Way Anova MINITAB ASSISTANT WHITE PAPER This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. One-Way

More information

Chapter 4. Probability and Probability Distributions

Chapter 4. Probability and Probability Distributions Chapter 4. robability and robability Distributions Importance of Knowing robability To know whether a sample is not identical to the population from which it was selected, it is necessary to assess the

More information

Before the Conventions: Insights into Trump and Clinton Voters July 8-12, 2016

Before the Conventions: Insights into Trump and Clinton Voters July 8-12, 2016 CBS NEWS/NEW YORK TIMES POLL For release: Thursday, July 14, 2016 6:30 pm EDT Before the Conventions: Insights into Trump and Clinton Voters July 8-12, 2016 Trump supporters have negative views of the

More information

Mind on Statistics. Chapter 10

Mind on Statistics. Chapter 10 Mind on Statistics Chapter 10 Section 10.1 Questions 1 to 4: Some statistical procedures move from population to sample; some move from sample to population. For each of the following procedures, determine

More information

Math 251, Review Questions for Test 3 Rough Answers

Math 251, Review Questions for Test 3 Rough Answers Math 251, Review Questions for Test 3 Rough Answers 1. (Review of some terminology from Section 7.1) In a state with 459,341 voters, a poll of 2300 voters finds that 45 percent support the Republican candidate,

More information

Linear Programming Notes VII Sensitivity Analysis

Linear Programming Notes VII Sensitivity Analysis Linear Programming Notes VII Sensitivity Analysis 1 Introduction When you use a mathematical model to describe reality you must make approximations. The world is more complicated than the kinds of optimization

More information

5/31/2013. 6.1 Normal Distributions. Normal Distributions. Chapter 6. Distribution. The Normal Distribution. Outline. Objectives.

5/31/2013. 6.1 Normal Distributions. Normal Distributions. Chapter 6. Distribution. The Normal Distribution. Outline. Objectives. The Normal Distribution C H 6A P T E R The Normal Distribution Outline 6 1 6 2 Applications of the Normal Distribution 6 3 The Central Limit Theorem 6 4 The Normal Approximation to the Binomial Distribution

More information

c. Construct a boxplot for the data. Write a one sentence interpretation of your graph.

c. Construct a boxplot for the data. Write a one sentence interpretation of your graph. MBA/MIB 5315 Sample Test Problems Page 1 of 1 1. An English survey of 3000 medical records showed that smokers are more inclined to get depressed than non-smokers. Does this imply that smoking causes depression?

More information

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4) Summary of Formulas and Concepts Descriptive Statistics (Ch. 1-4) Definitions Population: The complete set of numerical information on a particular quantity in which an investigator is interested. We assume

More information

Bayesian Tutorial (Sheet Updated 20 March)

Bayesian Tutorial (Sheet Updated 20 March) Bayesian Tutorial (Sheet Updated 20 March) Practice Questions (for discussing in Class) Week starting 21 March 2016 1. What is the probability that the total of two dice will be greater than 8, given that

More information

Understanding Options: Calls and Puts

Understanding Options: Calls and Puts 2 Understanding Options: Calls and Puts Important: in their simplest forms, options trades sound like, and are, very high risk investments. If reading about options makes you think they are too risky for

More information

Unit 26 Estimation with Confidence Intervals

Unit 26 Estimation with Confidence Intervals Unit 26 Estimation with Confidence Intervals Objectives: To see how confidence intervals are used to estimate a population proportion, a population mean, a difference in population proportions, or a difference

More information

statistics Chi-square tests and nonparametric Summary sheet from last time: Hypothesis testing Summary sheet from last time: Confidence intervals

statistics Chi-square tests and nonparametric Summary sheet from last time: Hypothesis testing Summary sheet from last time: Confidence intervals Summary sheet from last time: Confidence intervals Confidence intervals take on the usual form: parameter = statistic ± t crit SE(statistic) parameter SE a s e sqrt(1/n + m x 2 /ss xx ) b s e /sqrt(ss

More information

Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur

Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur Module No. #01 Lecture No. #15 Special Distributions-VI Today, I am going to introduce

More information

Non-random/non-probability sampling designs in quantitative research

Non-random/non-probability sampling designs in quantitative research 206 RESEARCH MET HODOLOGY Non-random/non-probability sampling designs in quantitative research N on-probability sampling designs do not follow the theory of probability in the choice of elements from the

More information

Probability. a number between 0 and 1 that indicates how likely it is that a specific event or set of events will occur.

Probability. a number between 0 and 1 that indicates how likely it is that a specific event or set of events will occur. Probability Probability Simple experiment Sample space Sample point, or elementary event Event, or event class Mutually exclusive outcomes Independent events a number between 0 and 1 that indicates how

More information

Decision Making under Uncertainty

Decision Making under Uncertainty 6.825 Techniques in Artificial Intelligence Decision Making under Uncertainty How to make one decision in the face of uncertainty Lecture 19 1 In the next two lectures, we ll look at the question of how

More information

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. STATISTICS/GRACEY PRACTICE TEST/EXAM 2 MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. Identify the given random variable as being discrete or continuous.

More information

Good luck! BUSINESS STATISTICS FINAL EXAM INSTRUCTIONS. Name:

Good luck! BUSINESS STATISTICS FINAL EXAM INSTRUCTIONS. Name: Glo bal Leadership M BA BUSINESS STATISTICS FINAL EXAM Name: INSTRUCTIONS 1. Do not open this exam until instructed to do so. 2. Be sure to fill in your name before starting the exam. 3. You have two hours

More information

The Math. P (x) = 5! = 1 2 3 4 5 = 120.

The Math. P (x) = 5! = 1 2 3 4 5 = 120. The Math Suppose there are n experiments, and the probability that someone gets the right answer on any given experiment is p. So in the first example above, n = 5 and p = 0.2. Let X be the number of correct

More information

Review of Fundamental Mathematics

Review of Fundamental Mathematics Review of Fundamental Mathematics As explained in the Preface and in Chapter 1 of your textbook, managerial economics applies microeconomic theory to business decision making. The decision-making tools

More information

WHERE DOES THE 10% CONDITION COME FROM?

WHERE DOES THE 10% CONDITION COME FROM? 1 WHERE DOES THE 10% CONDITION COME FROM? The text has mentioned The 10% Condition (at least) twice so far: p. 407 Bernoulli trials must be independent. If that assumption is violated, it is still okay

More information

MAT 155. Key Concept. September 22, 2010. 155S5.3_3 Binomial Probability Distributions. Chapter 5 Probability Distributions

MAT 155. Key Concept. September 22, 2010. 155S5.3_3 Binomial Probability Distributions. Chapter 5 Probability Distributions MAT 155 Dr. Claude Moore Cape Fear Community College Chapter 5 Probability Distributions 5 1 Review and Preview 5 2 Random Variables 5 3 Binomial Probability Distributions 5 4 Mean, Variance, and Standard

More information

Psychology 60 Fall 2013 Practice Exam Actual Exam: Next Monday. Good luck!

Psychology 60 Fall 2013 Practice Exam Actual Exam: Next Monday. Good luck! Psychology 60 Fall 2013 Practice Exam Actual Exam: Next Monday. Good luck! Name: 1. The basic idea behind hypothesis testing: A. is important only if you want to compare two populations. B. depends on

More information

Sample Size Issues for Conjoint Analysis

Sample Size Issues for Conjoint Analysis Chapter 7 Sample Size Issues for Conjoint Analysis I m about to conduct a conjoint analysis study. How large a sample size do I need? What will be the margin of error of my estimates if I use a sample

More information

Kenken For Teachers. Tom Davis [email protected] http://www.geometer.org/mathcircles June 27, 2010. Abstract

Kenken For Teachers. Tom Davis tomrdavis@earthlink.net http://www.geometer.org/mathcircles June 27, 2010. Abstract Kenken For Teachers Tom Davis [email protected] http://www.geometer.org/mathcircles June 7, 00 Abstract Kenken is a puzzle whose solution requires a combination of logic and simple arithmetic skills.

More information

1 Error in Euler s Method

1 Error in Euler s Method 1 Error in Euler s Method Experience with Euler s 1 method raises some interesting questions about numerical approximations for the solutions of differential equations. 1. What determines the amount of

More information

Lecture Notes Module 1

Lecture Notes Module 1 Lecture Notes Module 1 Study Populations A study population is a clearly defined collection of people, animals, plants, or objects. In psychological research, a study population usually consists of a specific

More information

A Few Basics of Probability

A Few Basics of Probability A Few Basics of Probability Philosophy 57 Spring, 2004 1 Introduction This handout distinguishes between inductive and deductive logic, and then introduces probability, a concept essential to the study

More information

Def: The standard normal distribution is a normal probability distribution that has a mean of 0 and a standard deviation of 1.

Def: The standard normal distribution is a normal probability distribution that has a mean of 0 and a standard deviation of 1. Lecture 6: Chapter 6: Normal Probability Distributions A normal distribution is a continuous probability distribution for a random variable x. The graph of a normal distribution is called the normal curve.

More information

Statistical estimation using confidence intervals

Statistical estimation using confidence intervals 0894PP_ch06 15/3/02 11:02 am Page 135 6 Statistical estimation using confidence intervals In Chapter 2, the concept of the central nature and variability of data and the methods by which these two phenomena

More information

Introduction to Hypothesis Testing OPRE 6301

Introduction to Hypothesis Testing OPRE 6301 Introduction to Hypothesis Testing OPRE 6301 Motivation... The purpose of hypothesis testing is to determine whether there is enough statistical evidence in favor of a certain belief, or hypothesis, about

More information

JHSPH HUMAN SUBJECTS RESEARCH ETHICS FIELD TRAINING GUIDE

JHSPH HUMAN SUBJECTS RESEARCH ETHICS FIELD TRAINING GUIDE JHSPH HUMAN SUBJECTS RESEARCH ETHICS FIELD TRAINING GUIDE This guide is intended to be used as a tool for training individuals who will be engaged in some aspect of a human subject research interaction

More information

1. The RSA algorithm In this chapter, we ll learn how the RSA algorithm works.

1. The RSA algorithm In this chapter, we ll learn how the RSA algorithm works. MATH 13150: Freshman Seminar Unit 18 1. The RSA algorithm In this chapter, we ll learn how the RSA algorithm works. 1.1. Bob and Alice. Suppose that Alice wants to send a message to Bob over the internet

More information

Question: What is the probability that a five-card poker hand contains a flush, that is, five cards of the same suit?

Question: What is the probability that a five-card poker hand contains a flush, that is, five cards of the same suit? ECS20 Discrete Mathematics Quarter: Spring 2007 Instructor: John Steinberger Assistant: Sophie Engle (prepared by Sophie Engle) Homework 8 Hints Due Wednesday June 6 th 2007 Section 6.1 #16 What is the

More information

Introduction to the Practice of Statistics Fifth Edition Moore, McCabe

Introduction to the Practice of Statistics Fifth Edition Moore, McCabe Introduction to the Practice of Statistics Fifth Edition Moore, McCabe Section 5.1 Homework Answers 5.7 In the proofreading setting if Exercise 5.3, what is the smallest number of misses m with P(X m)

More information

Using Proportions to Solve Percent Problems I

Using Proportions to Solve Percent Problems I RP7-1 Using Proportions to Solve Percent Problems I Pages 46 48 Standards: 7.RP.A. Goals: Students will write equivalent statements for proportions by keeping track of the part and the whole, and by solving

More information

1. I have 4 sides. My opposite sides are equal. I have 4 right angles. Which shape am I?

1. I have 4 sides. My opposite sides are equal. I have 4 right angles. Which shape am I? Which Shape? This problem gives you the chance to: identify and describe shapes use clues to solve riddles Use shapes A, B, or C to solve the riddles. A B C 1. I have 4 sides. My opposite sides are equal.

More information

Sampling and Sampling Distributions

Sampling and Sampling Distributions Sampling and Sampling Distributions Random Sampling A sample is a group of objects or readings taken from a population for counting or measurement. We shall distinguish between two kinds of populations

More information

Comparison of frequentist and Bayesian inference. Class 20, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom

Comparison of frequentist and Bayesian inference. Class 20, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom Comparison of frequentist and Bayesian inference. Class 20, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom 1 Learning Goals 1. Be able to explain the difference between the p-value and a posterior

More information

Financial Mathematics and Simulation MATH 6740 1 Spring 2011 Homework 2

Financial Mathematics and Simulation MATH 6740 1 Spring 2011 Homework 2 Financial Mathematics and Simulation MATH 6740 1 Spring 2011 Homework 2 Due Date: Friday, March 11 at 5:00 PM This homework has 170 points plus 20 bonus points available but, as always, homeworks are graded

More information

MONT 107N Understanding Randomness Solutions For Final Examination May 11, 2010

MONT 107N Understanding Randomness Solutions For Final Examination May 11, 2010 MONT 07N Understanding Randomness Solutions For Final Examination May, 00 Short Answer (a) (0) How are the EV and SE for the sum of n draws with replacement from a box computed? Solution: The EV is n times

More information

Chapter 19 Operational Amplifiers

Chapter 19 Operational Amplifiers Chapter 19 Operational Amplifiers The operational amplifier, or op-amp, is a basic building block of modern electronics. Op-amps date back to the early days of vacuum tubes, but they only became common

More information

1.6 The Order of Operations

1.6 The Order of Operations 1.6 The Order of Operations Contents: Operations Grouping Symbols The Order of Operations Exponents and Negative Numbers Negative Square Roots Square Root of a Negative Number Order of Operations and Negative

More information

Random variables, probability distributions, binomial random variable

Random variables, probability distributions, binomial random variable Week 4 lecture notes. WEEK 4 page 1 Random variables, probability distributions, binomial random variable Eample 1 : Consider the eperiment of flipping a fair coin three times. The number of tails that

More information

Pie Charts. proportion of ice-cream flavors sold annually by a given brand. AMS-5: Statistics. Cherry. Cherry. Blueberry. Blueberry. Apple.

Pie Charts. proportion of ice-cream flavors sold annually by a given brand. AMS-5: Statistics. Cherry. Cherry. Blueberry. Blueberry. Apple. Graphical Representations of Data, Mean, Median and Standard Deviation In this class we will consider graphical representations of the distribution of a set of data. The goal is to identify the range of

More information

The Perverse Nature of Standard Deviation Denton Bramwell

The Perverse Nature of Standard Deviation Denton Bramwell The Perverse Nature of Standard Deviation Denton Bramwell Standard deviation is simpler to understand than you think, but also harder to deal with. If you understand it, you can use it for sensible decision

More information

Estimation and Confidence Intervals

Estimation and Confidence Intervals Estimation and Confidence Intervals Fall 2001 Professor Paul Glasserman B6014: Managerial Statistics 403 Uris Hall Properties of Point Estimates 1 We have already encountered two point estimators: th e

More information

The normal approximation to the binomial

The normal approximation to the binomial The normal approximation to the binomial In order for a continuous distribution (like the normal) to be used to approximate a discrete one (like the binomial), a continuity correction should be used. There

More information

The Presidential Election, Same-Sex Marriage, and the Economy May 11-13, 2012

The Presidential Election, Same-Sex Marriage, and the Economy May 11-13, 2012 CBS NEWS/NEW YORK TIMES POLL For release: Monday, May 14th, 2012 6:30 pm (ET) The Presidential Election, Same-Sex Marriage, and the Economy May 11-13, 2012 The race for president remains close, but Republican

More information

HYPOTHESIS TESTING (ONE SAMPLE) - CHAPTER 7 1. used confidence intervals to answer questions such as...

HYPOTHESIS TESTING (ONE SAMPLE) - CHAPTER 7 1. used confidence intervals to answer questions such as... HYPOTHESIS TESTING (ONE SAMPLE) - CHAPTER 7 1 PREVIOUSLY used confidence intervals to answer questions such as... You know that 0.25% of women have red/green color blindness. You conduct a study of men

More information