Solution Let us regress percentage of games versus total payroll.



Similar documents
Organizing Topic: Data Analysis

1. The parameters to be estimated in the simple linear regression model Y=α+βx+ε ε~n(0,σ) are: a) α, β, σ b) α, β, ε c) a, b, s d) ε, 0, σ

Regression Analysis: A Complete Example

Yankee Stadium Party Suites

Elementary Statistics and Inference. Elementary Statistics and Inference. 17 Expected Value and Standard Error. 22S:025 or 7P:025.

Ch5: Discrete Probability Distributions Section 5-1: Probability Distribution

Elementary Statistics and Inference. Elementary Statistics and Inference. 16 The Law of Averages (cont.) 22S:025 or 7P:025.

INTRODUCTION TO MULTIPLE CORRELATION

Section 6.1 Discrete Random variables Probability Distribution

2. Simple Linear Regression

4. Multiple Regression in Practice

STT315 Chapter 4 Random Variables & Probability Distributions KM. Chapter 4.5, 6, 8 Probability Distributions for Continuous Random Variables

International Statistical Institute, 56th Session, 2007: Phil Everson

Using Baseball Data as a Gentle Introduction to Teaching Linear Regression

Chicago Booth BUSINESS STATISTICS Final Exam Fall 2011

An econometric analysis of the 2013 major league baseball season

AMS 5 CHANCE VARIABILITY

PROBATE LAW CASE SUMMARY

TRINITY COLLEGE. Faculty of Engineering, Mathematics and Science. School of Computer Science & Statistics

" Y. Notation and Equations for Regression Lecture 11/4. Notation:

Univariate Regression

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

MA 1125 Lecture 14 - Expected Values. Friday, February 28, Objectives: Introduce expected values.

The overall size of these chance errors is measured by their RMS HALF THE NUMBER OF TOSSES NUMBER OF HEADS MINUS NUMBER OF TOSSES

5. Linear Regression

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

The Math. P (x) = 5! = = 120.

17. SIMPLE LINEAR REGRESSION II

STT 200 LECTURE 1, SECTION 2,4 RECITATION 7 (10/16/2012)

$ ( $1) = 40

University of California, Los Angeles Department of Statistics. Random variables

Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares

Comparing & Contrasting. - mathematically. Ways of comparing mathematical and scientific quantities...

Second Midterm Exam (MATH1070 Spring 2012)

The econometrics of baseball: A statistical investigation

Predictor Coef StDev T P Constant X S = R-Sq = 0.0% R-Sq(adj) = 0.

Chapter 4. Probability Distributions

Joint Exam 1/P Sample Exam 1

The Effects of Atmospheric Conditions on Pitchers

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Statistics 104: Section 6!

Smoking Policies at Major League Baseball Stadiums April 2, 2015

Introduction to Linear Regression

The Volatility Index Stefan Iacono University System of Maryland Foundation

1.1. Simple Regression in Excel (Excel 2010).

3.2 Roulette and Markov Chains

Chapter 5. Discrete Probability Distributions

Section 1: Simple Linear Regression

You can place bets on the Roulette table until the dealer announces, No more bets.

WHERE DOES THE 10% CONDITION COME FROM?

Mind on Statistics. Chapter 8

10. Analysis of Longitudinal Studies Repeat-measures analysis

CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression

14.4. Expected Value Objectives. Expected Value

Statistics 151 Practice Midterm 1 Mike Kowalski

Solution. Solution. (a) Sum of probabilities = 1 (Verify) (b) (see graph) Chapter 4 (Sections ) Homework Solutions. Section 4.

2015 NFL Annual Selection Meeting R P O CLUB PLAYER POS COLLEGE ROUND 2

c 2015, Jeffrey S. Simonoff 1

Chapter 13 Introduction to Linear Regression and Correlation Analysis

Probability, statistics and football Franka Miriam Bru ckler Paris, 2015.

HONORS STATISTICS. Mrs. Garrett Block 2 & 3

Chapter 4 Lecture Notes

-- Special Ebook -- Bookie Buster: Secret Systems Used by Pro Sports Gamblers Finally REVEALED!

KSTAT MINI-MANUAL. Decision Sciences 434 Kellogg Graduate School of Management

Random Variables. Chapter 2. Random Variables 1

August 2012 EXAMINATIONS Solution Part I

Topic 9 ~ Measures of Spread

6.042/18.062J Mathematics for Computer Science. Expected Value I

The importance of graphing the data: Anscombe s regression examples

Example: Find the expected value of the random variable X. X P(X)

AP STATISTICS (Warm-Up Exercises)

Outline: Demand Forecasting

Introduction to Linear Regression

DRAFT. New York State Testing Program Grade 7 Common Core Mathematics Test. Released Questions with Annotations

The University of the State of New York REGENTS HIGH SCHOOL EXAMINATION MATHEMATICS B. Thursday, January 29, :15 a.m. to 12:15 p.m.

Probability: The Study of Randomness Randomness and Probability Models. IPS Chapters 4 Sections

DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9

Introduction to the Practice of Statistics Fifth Edition Moore, McCabe

Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools

Calculating P-Values. Parkland College. Isela Guerra Parkland College. Recommended Citation

Betting systems: how not to lose your money gambling

Betting on Excel to enliven the teaching of probability

Answer: C. The strength of a correlation does not change if units change by a linear transformation such as: Fahrenheit = 32 + (5/9) * Centigrade

1 Simple Linear Regression I Least Squares Estimation

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Elementary Statistics Sample Exam #3

How to Win the Stock Market Game

Exercise 1.12 (Pg )

Statistics in Baseball. Ali Bachani Chris Gomsak Jeff Nitz

Estimating the Value of Major League Baseball Players

Random Fibonacci-type Sequences in Online Gambling

Statistics and Random Variables. Math 425 Introduction to Probability Lecture 14. Finite valued Random Variables. Expectation defined

Baseball and Statistics: Rethinking Slugging Percentage. Tanner Mortensen. December 5, 2013

Discrete Math in Computer Science Homework 7 Solutions (Max Points: 80)

Normality Testing in Excel

table to see that the probability is (b) What is the probability that x is between 16 and 60? The z-scores for 16 and 60 are: = 1.

Transcription:

Assignment 3, MATH 2560, Due November 16th Question 1: all graphs and calculations have to be done using the computer The following table gives the 1999 payroll (rounded to the nearest million dolars) and the percentage of games won during the 1999 season by each of the American League baseball teams. Team Total Payroll (millions) Percentage of games won Anaheim Angels 50 43.2 Baltimore Orioles 71 48.1 Boston Red Sox 72 58 Chicago White Sox 25 46.6 Cleveland Indians 74 59.9 Detroit Tigers 35 42.9 Kansas City Royals 17 39.8 Minnesota Twins 16 39.4 New York Yankees 88 60.5 Oakland A s 24 53.7 Seattle Mariners 44 48.8 Tampa Bay Devil Rays 38 42.6 Texas Rangers 81 51.6 Toronto Blue Jays 48 51.9 (a) Find the least squares regression line with total payroll as an independent variable and percentage of games won as a dependent variable. (b) Give the residuals and verify that their sum is equal to 0. Plot these residuals versus total payroll. What do you observe? (c) What is the proportion of the total variation which is due to regression? (d) Predict the percentage of games won for a team with a total payroll of $38 million. (e) Compute the correlation coefficient between total payroll and percentage of games won. What happens to this correlation coefficient is payroll is expressed in dollars instead of million of dollars? Solution Let us regress percentage of games versus total payroll. MTB > set c1 DATA> 50 71 72 25 74 35 17 16 88 24 44 38 81 48 DATA> end MTB > set c2 DATA> 43.2 48.1 58 46.6 59.9 42.9 39.8 39.4 60.5 53.7 48.8 42.6 51.6 51.9 DATA> end MTB > regress c2 1 on c1 The regression equation is C2 = 38.5 + 0.216 C1 1

Predictor Coef Stdev tratio p Constant 38.549 3.061 12.60 0.000 C1 0.21568 0.05644 3.82 0.002 s = 4.998 Rsq = 54.9% Rsq(adj) = 51.1% Analysis of Variance SOURCE DF SS MS F p Regression 1 364.72 364.72 14.60 0.002 Error 12 299.75 24.98 Total 13 664.47 Unusual Observations Obs. C1 C2 Fit Stdev.Fit Residual St.Resid 10 24.0 53.70 43.73 1.93 9.97 2.16R R denotes an obs. with a large st. resid. (a) The regression line is therefore C2 = 38.5 + 0.216C1 Let us now compute the fitted values Ŷi, i = 1,..., 14, the residuals e i = Y i Ŷi, i = 1,..., 14, and do the residual plot of e i versus X i. MTB > let C3=38.5+0.216*C1 MTB > write C3 49.300 53.836 54.052 43.900 54.484 46.060 42.172 41.956 57.508 43.684 48.004 46.708 55.996 48.868 MTB > let C4=C2C3 MTB > write C4 6.1000 5.7360 3.9480 2.7000 5.4160 2

3.1600 2.3720 2.5560 2.9920 10.0160 0.7960 4.1080 4.3960 3.0320 MTB > plot c4 c1 C4 * 6.0+ * * * * * * 0.0+ 2 * * * 6.0+ * * ++++++C1 15 30 45 60 75 90 Let us verify that the sum of the residuals is 0, compute the correlation coefficient and verify that the square of the correlation coefficient is equal to R 2 = R sq = 54.9%. MTB > sum C4 SUM = 0.47199 MTB > correlate c1 c2 Correlation of C1 and C2 = 0.741 MTB > let c5=0.741*0.741 MTB > write c5 0.549081 (b) The residuals are given by C4 above. The computer output for the sum of residuals is not zero due to roundoff errors but we have shown in class that when calculations are done precisely, the sum of the residuals is zero. The 3

residual plot show that the residuals are randomly dsitributed about the line e = 0. (c) The proportion of total variation due to regression is given bt R sq = 54.9%. MTB > let c6=38.5+0.216*38 MTB > write c6 46.708 (d) The expected percentage of games won if the total payroll is 38 is 46.708. (e) The output above confirms that R 2 = R sq = 54.9% is equal to the square of the correlation coefficient. Question2 The probability that a person favors genetic engineering is.55 and the prabability that a person is against it is.45. Two persons are randomly selected, and it is observed whether they favour or oppose genetic engineering. (a) Draw a tree diagram for this experiment Solution: From the root of the tree come out 2 branches Y and N with respective probabilities.55 and.45. Then out of Y and out of N come 2 branches Y and N with respective probabilities.55 and.45. (b) Find the probability that at least one of the two persons favours genetic engineering. Solution: The set of all possble outcomes is {(Y Y ), (Y N), (NY ), (NN))}. probability that at least one of the persons favours genetic engineering is P (Y Y, Y N, NY ) = P (Y Y ) + P (Y N) + P (NY ) = P (Y )P (Y ) + P (Y )P (N) + P (N)P (Y ) =.55 2 +.55(.45) +.45(.55) Question 3 A player plays a game of roulette in a casino by betting on a single number each time. Since the wheel has 38 numbers, the probability that the player will win in a single play is 1/38. Note that each play of the game is independent of the previous play. (a) Find the probability that the player will win for the first time on the 10th play. Solution: Let W denote winning and L losing. Then the probability that the player will win for the first time on the 10th play is since the plays are independent. P (LLLLLLLLLW ) = ( 37 38 )9 1 38 (b) Find the probability that it takes the player more than 50 plays to win for the first time. The 4

Solution: P (51 wins before) + P (52 wins before) + P (53 wins before) +...)nonumber (0.1) i=50 = 1 P (the player will win for the first time on the ith play) = 1 i=1 ( i=50 i=1 ( 37 38 )i 1 38 ). (0.2) (c) The gambler claims that since he has one chance in 38 of winning each time he plays, he is certain to win at least once if he plays 38 times. Does this sound reasonable to you? Find the probability that he will win at least once in 38 plays. Solution: The player cannot be certain he will win, of course. His probability of winning at least once in 38 plays is the probability of winning once plus the probability of winning twice plus etc...plus the probability of winning 38 times. But this is also equal to 1 minus the probability of never winning which is 1 ( 37 38 )38 = 1 0.3629851 = 0.6370149. So, the player has a 63.7% chance of winning only! Question 4 A hotel owner has determined that 83% of the hotel s guests eat either dinner or breakfast in the restaurant. Further investigation reveals that 30% of the guests eat dinner and 60% of the guests eat breakfast in the hotel restaurant. (a) What proportion of the hotel guests eat both dinner and breakfast in the hotel restaurant? Solution: Let B be the event of eating breakfast at the restaurant, L eating lunch and D dinner. P (B D) =.83 = P (B) + P (D) P (B D) =.6 +.3 P (B D). So, P (B D) =.9.83 =.07. (b) What proportion of the hotel guests eat neither dinner nor breakfast in the hotel restaurant? Solution:P (B c D c ) = P ((B D) c ) = 1 P (B D) = 1.07 =.93. (c) What proportion of the hotel guests eat dinner but not breakfast in the hotel restaurant? Solution:P (D B c ) = P (D) P (D B) =.3.07 =.23. 5

Question 5 Let X be the number of errors that appear on a randomly selected page of a book. The following table lists the probability distribution of X. x 0 1 2 3 4 P(x).73.16.06.04.01 (a) Find the mean and standard deviation of X. Solution: µ X = 0(.73) + 1(.16) + 2(.06) + 3(.04) + 4(.01) =.44 and σ 2 X = (0.44) 2 (.73)+(1.44) 2 (.16)+(2.44) 2 (.06)+(3.44) 2 (.04)+(4.44) 2 (.01) = 0.7264. Therefore σ = 0.852291 (b) Two pages are selected at random. We denote X 1 the number of errors on the first page and X 2 the number of errors on the second page. We assume that the errors on different pages are independent. Find the mean and standard deviation of X 1 + X 2. Solution:µ X1 +X 2 =.88, σ 2 X 1 +X 2 = σ 2 X 1 + σ 2 X 2 = 1.704582 and σ X1 +X 2 = 1.305596 (c) Two pages are selected at random. We denote X 1 the number of errors on the first page and X 2 the number of errors on the second page. We assume that the errors on different pages are not independent and their correlation coefficient is ρ =.3. Find the mean and standard deviation of X 1 + X 2. Solution:µ X1 +X 2 =.88, σ 2 X 1 +X 2 = σ 2 X 1 + σ 2 X 2 + 2ρσ X1 σ X2 = 1.704582 + 2 (.3) (0.852291) 2 = 2.140422 Question 6 A school teacher gives a 50question multiple choice exam in which each question has four choices. The scoring includes a penalty for guessing and each wrong answer costs 1/2 point. For example, if a student answers 35 questions correctly, eight incorrectly, and does not answer 7 questions, the total score for this student will be 35 (1/2)8 = 31. (a) What is the expected score of a student who answers 38 questions correctly and guesses on the other 12 questions? Assume that the student randomly chooses one of the four answers for each of the 12 guesses questions. Solution: Let X be the number of points a student gets on a question for which he guesses the answer. If he guesses right x = 1 ad this happens with probability.25 since there are 4 choices for each question. If he guesses wrong, x =.5 and this happens with probability.75. So, the total number of points is Y = 38 + X and therefore µ y = 38 + 12µ x = 38 + 12(.25 (.5)(.75)) = 36.5 (b) Does a student increase his expected score by guessing on a question if he has no idea what the correct answer is? Explain. Solution: If a student does not guess on an expected score and does not answer the questions, then his expected score is 38. So, the expcted mark is higher if the student does not guess. (c) Does a student increase her expected score by guessing on a question for which she can eliminate one of the wrong answers? Explain. Solution: If the student can eliminate one of the answers, then X takes the value 1 with probability 1/3 and the value.5 with probability 2/3. In this case µ y = 6

38 + µ x = 38 + 12( 1 3 (.5) ( 2 3 )) = 38. As we can see, in this case, the expected value is the same. So, it does not matter whether the student guesses or not. 7