Statistics 102 Problem Set 4 Due Monday, 9 March 2015

Statistics 02 Problem Set 4 Due Monday, 9 March 205 This p-set is due Monday, March 9 at 3:00 pm. We will post the solutions at 3:30 pm, so no late p-sets will be accepted. The p-set is a bit longer than some previous p-sets, but doing these problems will help you prepare for the midterm on March. The midterm will also include material from earlier p-sets. Problem set policies: Please provide concise but clear answers for each question; just writing the result of a calculation (e.g., SD 3.3 ) with no explanation is not sufficient. Each problem set is due by 5:00 pm on the due date; please deposit your problem set in the dropboxes outside Science Center 300, using the box labeled with your TF s name. If you do not have an assigned TF, use the box for the Head TF, Erik Otárola-Castillo. We encourage you to discuss problems with other students (and, of course, with the course head and the TFs), but you must write your final answer, in your own words. s prepared in committee are not acceptable, and you will not have the benefit of having worked the problem when you take the examinations. If you do collaborate with class mates on a problem, please list your collaborators on your solution. This p-set is the last one before the midterm exam. Material on this p-set will consists of some of the topics that will appear in the midterm.. This problem is very similar to the clicker question used in the February 20 lecture (slide 0 in Unit 2). Suppose that the test for drug use mentioned in that problem has a false negative rate of 0.0; i.e., if an employee is a drug user, the test will incorrectly test negative with probability 0.0 and correctly indicate positive with probability 0.90. Now suppose the company HR department will test 4 employees who it correctly suspects are drug users. (a) Using the algebraic method we used to calculate the answer to the clicker question, find the probability that at least one of these employees will correctly test positive. (b) Optional part, not graded, no extra credit, but a good learning experience. Modify the R code we used in lecture to solve the clicker question to verify your answer to part (a). Important note: R coding at this level will not be covered on the midterm exam.

(a) There are 4 employees, all of whom use drugs. P (At least one tested positive) P (none tested positive) P (all tested negative) P (e tested -) P (e2 tested -) P (e3 tested -) P (e4 tested -) (0.) 4 0.000 0.9999 Assuming independence between employees results. 2. This problem also is based on a clicker question, and is more realistic than the version we did in class. Here is the problem statement: An autosomal recessive condition affects newborn in 0,000. If a parent of a child affected by this condition remarries, what is the risk of producing an affected child in the new marriage? Unlike in class, however, find this probability under the assumption that homozygotes with this condition do mate and reproduce. You may find it easier to solve this by starting with a tree diagram, such as the one on slide 44. There are essentially 4 steps: (a) What could have been the genotypes of the parents in the old marriage that is, if X having an old affected child, what are P(AaAa X), P(Aaaa X), and P(aaaa X)? Note: We do not need to look at cases involving AA, because such parent could never produce an affected offspring. (b) Given that, what are the probabilities of having specific genotypes for a single parent P(Aa X) and P(aa X)? (c) Now that we have the probabilities in (b), what are the probabilities of the possible combinations of new marriages that could result in an affected child? That is, if ij represents the genotype of a new partner, what are P(Aa x Aa ), P(Aa x aa ), P(aa x Aa ), and P(aa x aa ), given X? (d) With these probabilities, defining Y having a new affected child, we can finally find P(Y X). This could be better illustrated with a tree as follows. We want all the probabilities of the branches that end in Y: 2

X AaAa Aaaa aaaa Aa Aa aa Y Y Y Y Aa Aa aa Y Y Y Y aa Aa aa Y Y Y Y aa Aa aa Y Y Y Y (a) First Branches P (X AaAa)P (AaAa) P (AaAa X) P (X) P (X) P (X AaAa)P (AaAa) + P (X Aaaa)P (Aaaa) + P (X aaaa)p (aaaa) 4 [P (Aa)]2 + 2 2[(Aa)][P (aa)] + [P (aa)]2 Assume independent mating Now, we know that this autosomal recessive condition affects 0,000 Thus, from Hardy-Weinberg, we know that P (aa) q 2 0,000 P (a) q. P (A) p is therefore 99. 00 00 00 It follows that P (Aa) 2pq, and P (aa) q 2. P (X) 4 [2pq]2 + 2 2[2pq][q2 ] + [q 2 ] 2 0.000 (b) Second Branches of newborns. and thus P (X AaAa)[P (Aa)]2 P (AaAa X) P (X) () 4 4 [2pq]2 + 2 2[2pq][q2 ] + [q 2 ] 2 (2) P (X Aaaa)[P (Aa)][P (aa)] P (Aaaa X) P (X) (3) 2 ] 4 [2pq]2 + 2 2[2pq][q2 ] + [q 2 ] 2 (4) P (X aaaa)[p (aa)]2 P (aaaa X) P (X) (5) [q 2 ] 2 4 [2pq]2 + 2 2[2pq][q2 ] + [q 2 ] 2 (6) 3

P(Aa first-left branch) P(Aa first-middle branch) 2 P(aa first-middle branch) 2 P(aa first-right branch) Thus, the probability of the affected-child parent being (Aa) (2) + (4), 2 and the probability of the affected-child parent being (aa) (4) + (6). 2 (c) Third Branches In the new marriage, assuming independent mating, we get the following probabilities: P (Aa Aa ) ( (2) + 2 ) (4) (2[p][q]) (7) P (Aa aa ) ( (2) + 2 ) (4) ([q 2 ] ) (8) ( ) P (aa Aa ) (4) + (6) (2[p][q]) (9) 2 ( ) P (aa aa ) (4) + (6) ([q 2 ] ) (0) 2 (d) Fourth Branches Now we look at the probability of having an affected child in a new marriage (Y). In the case of Aa Aa, P (Y Aa Aa ) P (Y Aa Aa )P (Aa Aa ) 4 (7). In the case of Aa aa, P (Y Aa aa ) P (Y Aa aa )P (Aa aa ) 2 (8). In the case of aa Aa, P (Y aa Aa ) P (Y aa Aa )P (aa Aa ) 2 (9). In the case of aa aa, P (Y aa aa ) P (Y aa aa )P (aa aa ) (0). The final answer is therefore P (Y X) (7) + (8) + (9) + (0). Equations 4 2 2 (7) (0) are defined above. Substituting the values of p and q into in the expression above, and we get the final answer, P (Y X). 3. This problem was stated at the beginning of Unit 2, and is stated again in slide 62 of Unit 2. Solve this problem algebraically, using the formula on slide 60; the use of the formula will be illustrated in lecture. The National Cancer Institute estimates that approximately 3.65% of women in their 60 s get breast cancer. A mammogram typically identifies a breast cancer about 85% of the time, and is correct 95% of the time when a woman does not have breast cancer. 4

If a woman in her 60 s has a positive mammogram, what is the likelihood she has breast cancer? The code for solving this computationally is on slides 65-67 and is posted on the web site, under resources. If you run the code (unchanged), you can use the result to check your algebraic solution. P (+ mammogram cancer) P (cancer) P (cancer + mammogram) P (+ mammogram) P (+ mammogram) P (+ mammogram cancer) P (cancer) Thus, + P (+ mammogram NO cancer) P (NO cancer) 0.85 (0.0365) + ( 0.95) ( 0.0365) P (+ mammogram cancer) P (cancer) P (cancer + mammogram) P (+ mammogram) 0.85 (0.0365) 0.85 (0.0365) + ( 0.95) ( 0.0365) 0.397 Therefore, the probability that a woman has breast cancer, given that she s in her 60 s and has a positive mammogram, is 0.397. 4. A psychologist conducts a study on intelligence in which participants are asked to take an IQ test consisting of n questions, each with m choices. (a) One thing the psychologist must be careful about when analyzing the results is accounting for lucky guesses. Suppose that for a given question a particular participant either knows the answer or guesses. He or she knows the correct answer with probability p, and does not know the answer (and therefore will have to guess) with probability p. He/she guesses completely randomly. What is the conditional probability that the participant knew the answer to a question, given that he answered it correctly? (b) About in 00 people have IQs over 50. If a subject receives a score of greater than some specified amount, he/she is considered by the psychologist to have an IQ over 50. But the psychologist s test is not perfect. Although all individuals with IQ over 50 will definitely receive such a score, individuals with IQs less than 50 can also receive such scores about 0.% of the time due to lucky guessing. Given that a subject in the study is labeled as having an IQ over 50, what is the probability that he actually has an IQ below 50? 5

(a) P(knows answer answered correctly) P (A B) P (A B) P (B A)P (A) P (B) P (B) P (B A)P (A) + P (B A c )P (A c ) P (A) p P (A c ) p P (B A) P (B A c ) /m P (A B) P (B A)P (A) P (B) P (B A)P (A) P (B A)P (A) + P (B A c )P (A c ) p p + ( p) m (b) P(true IQ<50 Score indicates IQ > 50)P (A B) By Bayes Rule P (A B) P (B A)P (A) P (B A)P (A) + P (B A c )P (A c ) P (A c ) P (IQ 50) 00 P (A) P (Ac ) 00 099 00 P (B A) 0.% 000 P (B Ac ) P (A B) P (B A)P (A) P (B A)P (A) + P (B A c )P (A c ) 000 099 00 000 099 00 + 00 0.524 5. (FB, 3.). Suppose a birth defect has a recessive form of inheritance. In a study population, the recessive gene (a) initially has a prevalence of 25%. A member of the population has the birth defect if he/she has genotype (aa). (a) In the general population, what is the probability that an individual will have the birth defect, assuming non-associative mating. Another study finds that after 0 generations ( 200 years), considerable inbreeding has occurred in the population. The general population now consists of two subpopulations (A and B) that make up 30% and 70% of the general population. In subpopulation A, the prevalence of the recessive gene is 40%; in subpopulation B the prevalence is 0%. 6

(b) Suppose that in 25% of marriages both parents are from subpopulation A, in 65% both are from B, and in 0%, one partner comes from A and one from B. What is the probability of a birth defect in the next generation? (c) Suppose that a baby is born with a birth defect, but the genotypes of the baby s parents are unknown. What is the probability that the baby has both parents from A, both from B, or one from each subpopulation? Use Bayes rule for this last part. (a) P (aa) q 2 [P (a)] 2 0.25 2 0.065 (b) q a P (a) in A 0.40 q b P (a) in B0.0 P (AA) Probability both parents from A0.25 P (BB) both from B 0.65 P (AB) one from A and B0.0 P (aa) P (AA)qa 2 + P (BB)qb 2 + P (AB)q a q b 0.25(0.40) 2 + 0.65(0.0) 2 + 0.0(0.40)(0.0) 0.0505 (c) P (AA aa) P (BB aa) P (AB aa) P (aa AA)P (AA) P (aa) P (aa BB)P (BB) P (aa) P (aa AB)P (AB) P (aa) 0.42 0.25 0.0505 0.2 0.65 0.0505 0.4 0. 0. 0.0505 0.79 0.3 0.08 or P (AB aa) P (AA aa) P (BB aa) 0.79 0.3 0.08 6. (IPS 6 th edition, 4.42). Some traits of plants and animals depend on inheritance of a single gene. This is called Mendelian inheritance, after Gregor Mendel (822 884). The exercises are based on the following information about Mendelian inheritance of blood type. Each of us has an ABO blood type, which describes whether two characteristics called A and B are present. Every human being has two blood type alleles (gene forms), one inherited from our mother and one from our father. Each of these alleles can be A, B, or O. Which two we inherit determines our blood type. The following table shows what our blood type is for each combination of two alleles: 7

Alleles inherited A and A A and B A and O B and B B and O O and O Blood type A AB A B B O We inherit each of a parent s two alleles with probability 0.5. We inherit independently from our mother and father. Here is the problem statement: Hannah and Jacob both have alleles A and B. (a) What blood types can their children have? (b) What is the probability that their next child has each of these blood types? (a) Parents both have one allele A and one allele B. Children can therefore have genotypes AA, AB, BA, and BB, resulting in a blood type A, AB, AB, and B. (b) From (a) we see that P (type A) +, P (type AB), and P (type B) 4 4 2 4 7. This problem also uses the information on Mendelian inheritance. Jasmine has alleles A and O. Joshua has alleles B and O. (a) What is the probability that a child of these parents has blood type O? (b) If Jasmine and Joshua have three children, what is the probability that all three have blood type O? What is the probability that the first child has blood type O and the next two do not? (a) One half of the parents alleles are O, one quarter are A, and the other quarter are B. Therefore, P(O)0.5, P(A)0.25, and P(B)0.25. P(Blood TypeO)P(genotypeOO)P(O) 2 0.5 2 0.25 (b) Using the multiplication rule for independent events, P (OOx3) [P (OO)] 3 0.25 3 64 0.05625 P (not OO) P (OO) 0.25 0.75 P (OO and not OOx2) P (OO)[P (not OO)] 2 0.25 0.75 2 0.4 8

8. (Open Intro, 2.20). Assortative mating is a non-random mating pattern where individuals with similar genotypes and/or phenotypes mate with one another more frequently than would be expected under a random mating pattern. Researchers studying this topic collected data on eye colors of Scandinavian men and their female partners. The table below summarizes the results. For simplicity, we only include heterosexual relationships in this exercise. The full study appears in Laeng, et al., Behavioral Ecology and Sociobiology (2007) pp 37-384. Self (male) Partner (female) Blue Brown Green Total Blue 78 23 3 4 Brown 9 23 2 54 Green 9 6 36 Total 08 55 4 (a) What is the probability that a randomly chosen male respondent or his partner has blue eyes? (b) What is the probability that a randomly chosen male respondent with blue eyes has a partner with blue eyes? (c) What is the probability that a randomly chosen male respondent with brown eyes has a partner with blue eyes? What about the probability of a randomly chosen male respondent with green eyes having a partner with blue eyes? (d) Does it appear that the eye colors of male respondents and their partners are independent? Explain your reasoning. (a) A: femaleblue; B:maleBlue P (A) 08 P (B) 4 P (A B) 78 P (A B) P (A) + P (B) P (A B) 08 + 4 78 44 (b) This is the conditional probability P(female has blue eyes male has blue eyes) P (A B). P (A B) male and female with blue eyes blue-eyed males row, column row 78 4 or P (A B) P (A B) P (B) 78/ 4/ 78 4 9

(c) P (female with blue eyes male with brown eyes) row 2, column row 2 9 54 row 3, column P (female with blue eyes male with green eyes) row 3 36 (d) No, the events are not independent. P (female with blue eyes male with blue eyes) P (female with blue eyes) 78 4 08 Similar justifications based on green or brown eyes also receive full credit. 0