How To Predict A Team'S Winnings

THIS PAPER IS NOT TO BE REMOVED FROM THE EXAMINATION HALLS University of London BSc Examination 2012 BA1040 (BBA0040) +Enc Business Administration Business Statistics Date tba: Time tba DO NOT TURN OVER UNTIL TOLD TO BEGIN Time allowed: TWO hours Answer FOUR Questions All questions carry equal marks Electronic calculators may be used. These should be of a hand-held non-programmable (where relevant) type and the name and model should be stated CLEARLY on the front of your answer book. Appropriate statistical tables are attached, you may not necessarily need to use them all. PLEASE TURN OVER University of London 2012 UL12/ 1 of 6

Question 1: a) What is the difference between sampling with replacement and without replacement? Give an example. b) What is the difference between probability and non-probability sampling? Give an example c) In regression analysis, what is meant by the method of least squares? d) What are the differences between a Type I error and a Type II error? Give an example. e) What is the difference between parametric and non-parametric statistical methods? Give an example. Sub-Total: 2 Page 2 of 6

Question 2: Crazy Dave, a well-known baseball analyst, wants to determine which variables are important in predicting a team s wins in a given season. He has collected data related to wins, earned run average (ERA), and runs scored for the 2008 season (see below): Team League Wins E.R.A. Runs Scored Hits Allowed Walks Allowed Saves Errors Baltimore 0 68 5.13 782 1538 687 35 100 Boston 0 95 4.01 845 1369 548 47 85 Chicago White Sox 0 88 4.09 810 1469 457 33 108 Cleveland 0 81 4.45 805 1530 444 31 94 Detroit 0 74 4.90 821 1541 644 34 113 Kansas City 0 75 4.48 691 1473 515 44 96 Los Angeles Angels 0 100 3.99 765 1455 457 66 91 Minnesota 0 88 4.18 829 1563 403 42 108 New York Yankees 0 89 4.28 789 1478 489 42 83 Oakland 0 75 4.01 646 1364 576 33 98 Seattle 0 61 4.73 671 1544 626 36 99 Tampa Bay 0 97 3.82 774 1349 526 52 90 Texas 0 79 5.37 901 1647 625 36 132 Toronto 0 86 3.49 714 1330 467 44 84 Arizona 1 82 3.98 720 1403 451 39 113 Atlanta 1 72 4.46 753 1439 586 26 107 Chicago Cubs 1 97 3.87 855 1329 548 44 99 Cincinnati 1 74 4.55 704 1542 557 34 114 Colorado 1 74 4.77 747 1547 562 36 96 Florida 1 84 4.43 770 1421 586 36 117 Houston 1 86 4.36 712 1453 492 48 67 Los Angeles Dodgers 1 84 3.68 700 1381 480 35 101 Milwaukee 1 90 3.85 750 1415 528 45 101 New York Mets 1 89 4.07 799 1415 590 43 83 Philadelphia 1 92 3.88 799 1444 533 47 90 Pittsburgh 1 67 5.08 735 1631 657 34 107 St. Louis 1 86 4.19 779 1517 496 42 85 San Diego 1 63 4.41 637 1466 561 30 85 San Francisco 1 72 4.38 640 1416 652 41 96 Washington 1 59 4.66 641 1496 588 28 123 Page 3 of 6

Below is the excel output of the model developed to predict the number of wins based on ERA and runs scored: Regression Statistics Multiple R 0.92320741 R Square 0.852311923 Adjusted R 0.841372065 Square Standard Error 4.405807111 Observations 30 ANOVA df SS MS F Significance F Regression 2 3024.59932 1512.29966 77.90886822 6.11172E-12 Residual 27 524.1006801 19.4111363 Total 29 3548.7 Coefficients Standard t Stat P-value Error Intercept 79.7718417 11.59984327 6.876975821 2.17647E-07 E.R.A. -17.64487887 1.828349093-9.650716562 3.02745E-10 Runs Scored 0.102716029 0.011989049 8.56748787 3.50588E-09 a) State the multiple regression equation for the above model (define your Y and X values clearly) b) Interpret the meaning of the slopes in this equation. c) Predict the number of wins for a team that has an ERA of 4.50 and has scored 750 runs. d) Is there a significant relationship between number of wins and the two independent variables (ERA and runs scored) at the 0.05 level of significance? e) Interpret the R square statistic above. 3 marks f) Why would the adjusted R-square be superior to the R-square? 2 marks Sub-Total 2 Page 4 of 6

Question 3: A survey conducted by the National Post entitled send your infants to nursery reports that children (aged 3 months 5 yrs) that attend a play group or nursery scheme three or more mornings a week achieve higher academic levels in subsequent years than those who were kept at home or babysat in a relatives or friend s home. a) What information would you want to know before you accepted the results of this survey? 12 marks b) Assume you are in charge of this study. Briefly explain how you would organise this research exercise. You should mention something about the sampling frame, the sampling method, the survey questions, and the hypotheses you would test. 13 marks Sub-Total: 2 Question 4: The following data represent total revenues (in millions of constant 2000 pounds) by a car rental agency over the 11-year period between 2000 and 2005: 4.0, 5.0, 7.0, 6.0, 8.0, 9.0, 5.0, 2.0 a) Compute the 3 year moving averages for this annual time series. b) Plot the original figures and the (MA(3)) figures in a rough diagram and use it to discuss the trend. c) Interpret your results in simple management terms. d) What other method(s) could you use to forecast the figures for 2006. e) Explain what is meant by the Classical Multiplicative Time-Series Model. How and why would one want to deseasonalise a variable? Sub-Total 2 5 of 6

Question 5: A survey was conducted for drivers of Sedans in 2009 on fuel consumption. The overall results per gallon (MPG) of 2009 Sedans priced under 20,000 are as follows: 27; 31; 30; 28; 27; 24; 29; 32; 32; 27; 26; 26; 25; 26; 25; 24 a) Compute the mean, median and mode 3 marks b) Compute the variance c) Compute standard deviation d) Compute range e) Compute the coefficient of variation f) Are the data skewed? If so how? 2 marks Sub-Total: 2 Question 6: Approximately 5% of US families are millionaires (i.e. have a net worth in excess of $1 million). However, 30% of Microsoft s 31 000 employees are millionaires. If random samples of 100 Microsoft employees are selected, what proportion of the sample will have? a) between 25% and 35% millionaires? b) between 20% and 40% millionaires? c) more than 40% millionaires? d) If samples of size 50 are taken, how does this change your answers to (a)-(c)? e) Explain intuitively why the normal distribution which is a continuous distribution can be used to make inferences about a dichotomous random process (such as the one described above). Sub-Total: 2 END OF PAPER Page 6 of 6