Assignment 3, MATH 2560, Due November 16th Question 1: all graphs and calculations have to be done using the computer The following table gives the 1999 payroll (rounded to the nearest million dolars) and the percentage of games won during the 1999 season by each of the American League baseball teams. Team Total Payroll (millions) Percentage of games won Anaheim Angels 50 43.2 Baltimore Orioles 71 48.1 Boston Red Sox 72 58 Chicago White Sox 25 46.6 Cleveland Indians 74 59.9 Detroit Tigers 35 42.9 Kansas City Royals 17 39.8 Minnesota Twins 16 39.4 New York Yankees 88 60.5 Oakland A s 24 53.7 Seattle Mariners 44 48.8 Tampa Bay Devil Rays 38 42.6 Texas Rangers 81 51.6 Toronto Blue Jays 48 51.9 (a) Find the least squares regression line with total payroll as an independent variable and percentage of games won as a dependent variable. (b) Give the residuals and verify that their sum is equal to 0. Plot these residuals versus total payroll. What do you observe? (c) What is the proportion of the total variation which is due to regression? (d) Predict the percentage of games won for a team with a total payroll of $38 million. (e) Compute the correlation coefficient between total payroll and percentage of games won. What happens to this correlation coefficient is payroll is expressed in dollars instead of million of dollars? Solution Let us regress percentage of games versus total payroll. MTB > set c1 DATA> 50 71 72 25 74 35 17 16 88 24 44 38 81 48 DATA> end MTB > set c2 DATA> 43.2 48.1 58 46.6 59.9 42.9 39.8 39.4 60.5 53.7 48.8 42.6 51.6 51.9 DATA> end MTB > regress c2 1 on c1 The regression equation is C2 = 38.5 + 0.216 C1 1
Predictor Coef Stdev tratio p Constant 38.549 3.061 12.60 0.000 C1 0.21568 0.05644 3.82 0.002 s = 4.998 Rsq = 54.9% Rsq(adj) = 51.1% Analysis of Variance SOURCE DF SS MS F p Regression 1 364.72 364.72 14.60 0.002 Error 12 299.75 24.98 Total 13 664.47 Unusual Observations Obs. C1 C2 Fit Stdev.Fit Residual St.Resid 10 24.0 53.70 43.73 1.93 9.97 2.16R R denotes an obs. with a large st. resid. (a) The regression line is therefore C2 = 38.5 + 0.216C1 Let us now compute the fitted values Ŷi, i = 1,..., 14, the residuals e i = Y i Ŷi, i = 1,..., 14, and do the residual plot of e i versus X i. MTB > let C3=38.5+0.216*C1 MTB > write C3 49.300 53.836 54.052 43.900 54.484 46.060 42.172 41.956 57.508 43.684 48.004 46.708 55.996 48.868 MTB > let C4=C2C3 MTB > write C4 6.1000 5.7360 3.9480 2.7000 5.4160 2
3.1600 2.3720 2.5560 2.9920 10.0160 0.7960 4.1080 4.3960 3.0320 MTB > plot c4 c1 C4 * 6.0+ * * * * * * 0.0+ 2 * * * 6.0+ * * ++++++C1 15 30 45 60 75 90 Let us verify that the sum of the residuals is 0, compute the correlation coefficient and verify that the square of the correlation coefficient is equal to R 2 = R sq = 54.9%. MTB > sum C4 SUM = 0.47199 MTB > correlate c1 c2 Correlation of C1 and C2 = 0.741 MTB > let c5=0.741*0.741 MTB > write c5 0.549081 (b) The residuals are given by C4 above. The computer output for the sum of residuals is not zero due to roundoff errors but we have shown in class that when calculations are done precisely, the sum of the residuals is zero. The 3
residual plot show that the residuals are randomly dsitributed about the line e = 0. (c) The proportion of total variation due to regression is given bt R sq = 54.9%. MTB > let c6=38.5+0.216*38 MTB > write c6 46.708 (d) The expected percentage of games won if the total payroll is 38 is 46.708. (e) The output above confirms that R 2 = R sq = 54.9% is equal to the square of the correlation coefficient. Question2 The probability that a person favors genetic engineering is.55 and the prabability that a person is against it is.45. Two persons are randomly selected, and it is observed whether they favour or oppose genetic engineering. (a) Draw a tree diagram for this experiment Solution: From the root of the tree come out 2 branches Y and N with respective probabilities.55 and.45. Then out of Y and out of N come 2 branches Y and N with respective probabilities.55 and.45. (b) Find the probability that at least one of the two persons favours genetic engineering. Solution: The set of all possble outcomes is {(Y Y ), (Y N), (NY ), (NN))}. probability that at least one of the persons favours genetic engineering is P (Y Y, Y N, NY ) = P (Y Y ) + P (Y N) + P (NY ) = P (Y )P (Y ) + P (Y )P (N) + P (N)P (Y ) =.55 2 +.55(.45) +.45(.55) Question 3 A player plays a game of roulette in a casino by betting on a single number each time. Since the wheel has 38 numbers, the probability that the player will win in a single play is 1/38. Note that each play of the game is independent of the previous play. (a) Find the probability that the player will win for the first time on the 10th play. Solution: Let W denote winning and L losing. Then the probability that the player will win for the first time on the 10th play is since the plays are independent. P (LLLLLLLLLW ) = ( 37 38 )9 1 38 (b) Find the probability that it takes the player more than 50 plays to win for the first time. The 4
Solution: P (51 wins before) + P (52 wins before) + P (53 wins before) +...)nonumber (0.1) i=50 = 1 P (the player will win for the first time on the ith play) = 1 i=1 ( i=50 i=1 ( 37 38 )i 1 38 ). (0.2) (c) The gambler claims that since he has one chance in 38 of winning each time he plays, he is certain to win at least once if he plays 38 times. Does this sound reasonable to you? Find the probability that he will win at least once in 38 plays. Solution: The player cannot be certain he will win, of course. His probability of winning at least once in 38 plays is the probability of winning once plus the probability of winning twice plus etc...plus the probability of winning 38 times. But this is also equal to 1 minus the probability of never winning which is 1 ( 37 38 )38 = 1 0.3629851 = 0.6370149. So, the player has a 63.7% chance of winning only! Question 4 A hotel owner has determined that 83% of the hotel s guests eat either dinner or breakfast in the restaurant. Further investigation reveals that 30% of the guests eat dinner and 60% of the guests eat breakfast in the hotel restaurant. (a) What proportion of the hotel guests eat both dinner and breakfast in the hotel restaurant? Solution: Let B be the event of eating breakfast at the restaurant, L eating lunch and D dinner. P (B D) =.83 = P (B) + P (D) P (B D) =.6 +.3 P (B D). So, P (B D) =.9.83 =.07. (b) What proportion of the hotel guests eat neither dinner nor breakfast in the hotel restaurant? Solution:P (B c D c ) = P ((B D) c ) = 1 P (B D) = 1.07 =.93. (c) What proportion of the hotel guests eat dinner but not breakfast in the hotel restaurant? Solution:P (D B c ) = P (D) P (D B) =.3.07 =.23. 5
Question 5 Let X be the number of errors that appear on a randomly selected page of a book. The following table lists the probability distribution of X. x 0 1 2 3 4 P(x).73.16.06.04.01 (a) Find the mean and standard deviation of X. Solution: µ X = 0(.73) + 1(.16) + 2(.06) + 3(.04) + 4(.01) =.44 and σ 2 X = (0.44) 2 (.73)+(1.44) 2 (.16)+(2.44) 2 (.06)+(3.44) 2 (.04)+(4.44) 2 (.01) = 0.7264. Therefore σ = 0.852291 (b) Two pages are selected at random. We denote X 1 the number of errors on the first page and X 2 the number of errors on the second page. We assume that the errors on different pages are independent. Find the mean and standard deviation of X 1 + X 2. Solution:µ X1 +X 2 =.88, σ 2 X 1 +X 2 = σ 2 X 1 + σ 2 X 2 = 1.704582 and σ X1 +X 2 = 1.305596 (c) Two pages are selected at random. We denote X 1 the number of errors on the first page and X 2 the number of errors on the second page. We assume that the errors on different pages are not independent and their correlation coefficient is ρ =.3. Find the mean and standard deviation of X 1 + X 2. Solution:µ X1 +X 2 =.88, σ 2 X 1 +X 2 = σ 2 X 1 + σ 2 X 2 + 2ρσ X1 σ X2 = 1.704582 + 2 (.3) (0.852291) 2 = 2.140422 Question 6 A school teacher gives a 50question multiple choice exam in which each question has four choices. The scoring includes a penalty for guessing and each wrong answer costs 1/2 point. For example, if a student answers 35 questions correctly, eight incorrectly, and does not answer 7 questions, the total score for this student will be 35 (1/2)8 = 31. (a) What is the expected score of a student who answers 38 questions correctly and guesses on the other 12 questions? Assume that the student randomly chooses one of the four answers for each of the 12 guesses questions. Solution: Let X be the number of points a student gets on a question for which he guesses the answer. If he guesses right x = 1 ad this happens with probability.25 since there are 4 choices for each question. If he guesses wrong, x =.5 and this happens with probability.75. So, the total number of points is Y = 38 + X and therefore µ y = 38 + 12µ x = 38 + 12(.25 (.5)(.75)) = 36.5 (b) Does a student increase his expected score by guessing on a question if he has no idea what the correct answer is? Explain. Solution: If a student does not guess on an expected score and does not answer the questions, then his expected score is 38. So, the expcted mark is higher if the student does not guess. (c) Does a student increase her expected score by guessing on a question for which she can eliminate one of the wrong answers? Explain. Solution: If the student can eliminate one of the answers, then X takes the value 1 with probability 1/3 and the value.5 with probability 2/3. In this case µ y = 6
38 + µ x = 38 + 12( 1 3 (.5) ( 2 3 )) = 38. As we can see, in this case, the expected value is the same. So, it does not matter whether the student guesses or not. 7