a) Construct a comparative Dotplot for protein content for the two groups of plants.

AP Statistics Fall exam review Problems 1. A treatment that has no active ingredients is called a(n). 2. A sample consisting of the entire population is called a(n). 3. is the tendency for a sample to differ from the corresponding population in some systematic way. 4. bias occurs when the responses are not actually obtained from all individuals selected for inclusion in the sample. 5. The list of the objects or individuals in the population available for sampling is called the. 6. random sampling is a method that independently selects simple random samples from each subgroup of the population. 7. In the article, "Reducing complex diets to simple rules: food selection by olive baboons," the authors compare the percent of protein in 7 plants eaten by Kenyan baboons and in 7 plants that were ignored (not eaten) by the baboons. They think that these animals choose plants high in protein rather than randomly select plants to eat. The percent protein for the preferred and ignored plants is listed below. Preferre 21.0 12.9 49.2 47.0 39.9 24.3 22.2 d Ignored 13.8 12.5 18.6 11.8 10.8 17.5 12.8 a) Construct a comparative Dotplot for protein content for the two groups of plants. b) Do the data support the researchers, views about food choice by the baboons? What specific aspects of the plot support your answer? 8. In the second half of 1861 the American Civil War was just beginning. The several Confederate States recruited soldiers and sent many of them to Virginia for training. The soldiers traveled in groups called "companies" and camped together when they arrived in Virginia. In their new surroundings the men were exposed to stress, strain, and disease, and as a result would be reported as "sick" on the bi-monthly company report. The histograms below display the numbers of companies with different percentages of men reported sick in June and in December. For example, about 250 companies reported between 0% and 10% sick at the end of June. (a) Describe the differences in the distributions of percentages of men reported sick at the end of June and at the end of December. (b) Some historians believe that as more companies were sent into Virginia later in 1861, more soldiers would become sick due to the crowded and unsanitary conditions. Considering both the shapes of the distributions and the vertical scales, does it appear that a higher proportion of companies reported more sickness in their ranks?

10. Investigators studied 8 coins known to have been produced by the mint in Rome in an attempt to identify a trace element profile for these coins, and have identified gold and lead as possible factors in identifying other coins as having been minted in Rome. The gold and lead content, measured as a % of weight of each coin, is given in the table at right, and a scatter plot of these data is presented below. Gold % by Wt. Lead % by Wt. 0.22 0.41 0.24 0.31 0.2 0.89 0.23 0.62 0.18 0.41 0.15 0.88 0.17 0.67 0.17 0.59 a) What is the equation of the least squares best fit line? b) Sketch the best fit line on the scatter plot. c) What is the value of the correlation coefficient? Interpret this value. d) What is the value of r 2? Give an interpretation of this value. 12. Six candidates for a new position of vice-president for academic affairs have been selected. Three of the candidates are female. The candidates' years of experience are as follows: Candidate Experience Female 1 7 Female 2 9 Female 3 4 Male 1 8 Male 2 10 Male 3 3 Suppose one of the candidates is selected at random. Define the following events: A = person selected has at least 6 years experience B = person selected is a female and less than 4 years experience Find P(B) Find P(A) 13. A college math professor has surveyed his records and found the following frequency distribution of the grades he has assigned to the 1187 students who have taken his calculus classes. Grade A B C D F Frequency 125 352 461 187 62 What is the probability that a randomly selected student received an A from this professor?

What is the probability that a randomly selected student received an A or a B from this professor? What is the probability that two independently selected students both received an A from this professor? What is the probability that two independently selected students both failed calculus? 14. At George Washington High School students are heavily involved in extra-curricular activities. Suppose that a student is selected at random from the students at this school. Let the events A, M, and S be defined as follows, with probabilities listed: A = student is active in the performing arts: P(A) = 0.20 M = student is active in vocal or instrumental music: P(M) = 0.32 S = student is active in sports: P(S) = 0.35 P(M S) = 0.30 Calculate each of the following: i) ii) iii) 16. In November 2002, Janet Napolitano, a Democrat, was elected Governor of Arizona, defeating Republican Matt Salmon and Independent Richard Mahoney. This was a surprising outcome, since there are more registered Republicans than Democrats in the state. The table below presents the results of a sample of voters in the election. Suppose that a voter is randomly chosen from these respondents. Voters who are registered as... Voted for D R I Totals Napolitano (D) 184 42 56 282 Salmon (R) 26 205 45 276 Mahoney (I) 6 5 31 42 Totals 216 252 132 600 Use the information in the table to answer the questions below. a) What is the probability that a randomly chosen voter voted for Napolitano? b) What is the probability that a randomly chosen voter is a registered Democrat? c) What is the probability that a randomly chosen voter cast a vote for Napolitano, given that the selected voter is a Democrat? d) Commenting on this election. A local reporter said, "Napolitano won because she attracted a larger share of crossover voters." (A crossover voter is defined as one who votes differently than his or her party affiliation). What is the probability that a randomly chosen voter cast a vote for Napolitano, given that he or she is a crossover voter?

18. In order to ensure the safety of school classrooms the local Fire Marshall does an inspection at Thomas Jefferson High School every month, looking for faulty wiring, overloaded circuits, etc. At TJHS the new Academic Wing has 5 math rooms, 10 science rooms, and 10 English rooms. The science rooms are divided into 8 biology and 2 chemistry rooms. Each month, the Fire Marshall randomly picks one of the rooms in the new wing to inspect each month. Define the following events: S = the event the selected room is a science room B = the event the selected room is a biology room M = the event the selected room is a math room E = the event the selected room is an English room C = the event the selected room is a chemistry room Calculate the probabilities of the events described below: a) P(S) b) P(M or E) c) P(E or B) d) P(S and not C) 19. While playing Monopoly, Andi estimated the probabilities of the non-zero rents according to the following probability distribution: $2 $14 $20 $100 0.40 0.20 0.20 0.20 Consider the random variable x = dollar amount in rent collected in a Monopoly roll. a) What is the mean of the random variable x? b) What is the standard deviation of the random variable x? 20. Bias is a serious problem that sometimes arises when one takes a sample. Explain what bias is. What is the difference between selection bias and non-response bias? 21. The two paragraphs below discuss aspects of two studies, each of which exhibit a bias. For each study, decide whether the problem is selection bias, response bias, or nonresponse bias. Explain your answer. a) One part of the Nurses' Health Study is concerned with possible causes of skin cancer. Nurses were asked about different behaviors and aspects of their health when they entered the study. Then, the nurses were given the questionnaire again if they were diagnosed with cancer. When the questionnaires were analyzed, the investigators discovered that after the nurses were diagnosed with cancer they tended to report a reduced ability to tan. It is thought that the shift in reporting might be caused by an awareness of their diagnosis.

b) One part of the Demographic and Health Surveys Program is concerned with measures of malnutrition. Investigators measure physical aspects of growing children, and attempt to document the physical characteristics of a population at different ages. Sadly, in some countries many children die early, and thus a bias is introduced in the study when the investigators can not collect the data from the deceased children. 22. We have distinguished two types of studies: observational and experimental. Briefly explain the essential difference(s) between these two types of study. 23. Bias, the tendency for samples to differ from the corresponding population in some systematic way, might be due to: (a) selection bias, (b) response bias, and/or (c) nonresponse bias. In a few sentences, discuss the differences among these different biases. 24. The following paragraph describes an actual study. After reading the description, determine whether the study is an observational study or an experiment. Justify your answer with specific references to the information in the study. "We compared paired daytime and night counts of wild brook trout, brown trout, and rainbow trout made by the same snorkelers in five streams during August 1994. Overall, we counted 109 trout in the daytime and 333 trout at night. We speculate that trout counted at night were present during the daytime but were hidden from view. Biologists should consider that trout behavior and susceptibility to being seen might vary a great deal between daytime and night, even during summer. In some streams, the majority of trout may not be seen during the daytime." 25. Three methods for random sampling are: (a) simple random sampling, (b) stratified random sampling, and (c) cluster sampling. In a few sentences, discuss the similarities and differences among these sampling methods. Specifically, what sampling circumstances would lead you to choose each of these methods? 26. The following paragraph describes an actual study. After reading the description, determine whether the study is an observational or experimental study. Justify your answer with specific references to the information in the study. "60 subjects tasted cola from two cups, one marked L, the other marked S, in two sessions separated by 2 weeks. For the first session, both cups contained Coke. For the second session, both cups contained Pepsi. Whether both cups contained Pepsi or Coke, subjects overwhelmingly reported cup S contained the better-tasting product. Position of the cups, which were on a table in front of the subject, was alternated for successive subjects, and the subject was always handed the left cup (facing the subject) first. Thirty students were tested in each session. Those in session two were questioned to ensure they had not participated in session one. In both sessions, no comments were made by the researcher concerning what was contained in the cups. Her only instructions to the subjects were: "Please taste each of these and tell me the letter of the cup with the drink you prefer." 27. What are the four key concepts in experimental design?

Fall exam review - Answer Section 1. Placebo 2. census 3. Bias 4. Nonresponse 5. Sampling frame 6. Stratified 7. Yes, there are generally higher protein values for the preferred foods. All but one of the preferred food's protein content is larger than the protein content of the highest of the ignored group. 8. There were more companies reporting high percentages of sick soldiers in December. In June, over half of the companies reported 10% or fewer sick, while in December, the number with less than 10% sick was perhaps a third of the companies.; Yes, only about 6% of the companies had more than 40% sickness in June, but by December this had grown to about 8% of the companies. Also, the highest % sick in June was less than 60%, but in December it was more than 70%. And, as noted in part a), the proportion of companies with very few sick was much lower in December. 10. a) 1.306-3.635GoldWt b) c) -0.549. This indicates a moderately negative linear relationship between %Lead and %Gold. d) 0.302. About 30% of the differences in lead content can be explained by differences in the gold content. 12. 13. 0; 0.333333 0.105; 0.402; 0.011; 0.0027

14. i) 0.5625 ii.) 0.34 iii.) 0.105 18. a) 10/25 b) 15/25 c) 18/25 d) 8/25 19. 27.6; 36.87 20. Bias is the tendency for a sample to differ from the corresponding population in some systematic way. Selection bias occurs when some part of the population is systematically excluded from the sample. However, non-response bias occurs when responses are not actually obtained from the individuals who were selected for the sample. 21. This is an example of response bias, since the awareness of their diagnosis may have caused them to change their response. It isn't non-response bias since they were able to obtain responses from the nurses and it isn't selection bias since they did not attempt to generalize to a larger population.; This is an example of non-response bias, since some of the children selected for the study were not able to participate after they died. It is not selection bias since the children were not left out on purpose and it isn't response bias since the researchers were unable to obtain responses in the first place. 22. In an experiment, researchers observe how a response variable behaves when they manipulate one or more factors. However, in an observational study, the researchers do not manipulate any factors. Instead, they observe characteristics of a subset of the members of one or more existing populations. 23. Selection bias occurs when some part of the population is systematically excluded from the sample. Non-response bias occurs when responses are not actually obtained from all individuals who were selected for the sample. With response bias, however, responses are obtained from the subjects, but the method of observation tends to produce values that systematically differ from the true population value in some way. 24. This study is an observational study since the explanatory variable was not manipulated by the researchers and the subjects were not randomly assigned to different treatments. Instead, the researchers simply observed the number of trout visible during the day and during the evening. 25. In simple random sampling, every individual and every possible sample of size n has an equal chance of being selected for the study. In stratified random sampling, the population is divided into nonoverlapping homogeneous groups (called strata) and a simple random sample is selected from each strata. In cluster sampling, the population is divided into non-overlapping (preferably heterogeneous) groups called clusters and then a random sample of clusters is selected and every member of the selected clusters is studied. Cluster sampling works best when the population is already divided into easily identifiable groups that are heterogeneous (i.e. each cluster can reasonably be assumed to be representative of the entire population). Stratified random sampling works best when there are easily identified groups in the population that are anticipated to have very different responses to the question of interest. Simple random sampling is best when neither of the circumstances listed above are present. 26. This is an experiment since there was a planned intervention undertaken to observe the effects of an explanatory variable (the letter on the cup) on a response variable (cup preference). 27. blocking, direct control, randomization, replication