Cetral Limit Theorem ad Its Applicatios to Baseball by Nicole Aderso A project submitted to the Departmet of Mathematical Scieces i coformity with the requiremets for Math 4301 (Hoours Semiar) Lakehead Uiversity Thuder Bay, Otario, Caada copyright c (2014) Nicole Aderso
Abstract This hoours project is o the Cetral Limit Theorem (CLT). The CLT is cosidered to be oe of the most powerful theorems i all of statistics ad probability. I probability theory, the CLT states that, give certai coditios, the sample mea of a sufficietly large umber or iterates of idepedet radom variables, each with a well-defied expected value ad well-defied variace, will be approximately ormally distributed. I this project, a brief historical review of the CLT is provided, some basic cocepts, two proofs of the CLT ad several properties are discussed. As a applicatio, we discuss how to use the CLT to study the samplig distributio of the sample mea ad hypothesis testig usig baseball statistics. i
Ackowledgemets I would like to thak my supervisor, Dr. Li, who helped me by sharig his kowledge ad may resources to help make this paper come to life. I would also like to thak Dr. Adam Va Tuyl for all of his help with Latex, ad support throughout this project. Thak you very much! ii
Cotets Abstract Ackowledgemets i ii Chapter 1. Itroductio 1 1. Historical Review of Cetral Limit Theorem 1 2. Cetral Limit Theorem i Practice 1 Chapter 2. Prelimiaries 3 1. Defiitios 3 2. Cetral Limit Theorem 7 Chapter 3. Proofs of Cetral Limit Theorem 8 1. Proof of Cetral Limit Theorem Usig Momet Geeratig Fuctios 8 2. Proof of Cetral Limit Theorem Usig Characteristic Fuctios 12 Chapter 4. Applicatios of the Cetral Limit Theorem i Baseball 14 Chapter 5. Summary 19 Chapter 6. Appedix 20 Bibliography 21 iii
CHAPTER 1 Itroductio 1. Historical Review of Cetral Limit Theorem The Cetral Limit Theorem, CLT for short, has bee aroud for over 275 years ad has may applicatios, especially i the world of probability theory. May mathematicias over the years have proved the CLT i may differet cases, therefore provided differet versios of the theorem. The origis of the Cetral Limit Theorem ca be traced to The Doctrie of Chaces by Abraham de Moivre i 1738. Abraham de Moivre s book provided techiques for solvig gamblig problems, ad i this book he provided a statemet of the theorem for Beroulli trails as well as gave a proof for p = 1. This was a very importat 2 discovery at the time which ispired may other mathematicias years later to look at de Moivre s previous work ad cotiue to prove it for other cases. [7] I 1812, Pierre Simo Laplace published his ow book titled Theorie Aalytique des Probabilities, i which he geeralized the theorem for p 1. He also gave a proof, although 2 ot a rigorous oe, for his fidig. It was ot util aroud 1901-1902 did the Cetral Limit Theorem become more geeralized ad a complete proof was give by Aleksadr Lyapuov. A more geeral statemet of the Cetral Limit Theorem did appear i 1922 whe Lideberg gave the statemet, the sequece of radom variables eed ot be idetically distributed, istead the radom variables oly eed zero meas with idividual variaces small compared to their sum [3]. May other cotributios to the statemet of the theorem, as well as may differet ways to prove the theorem bega to surface aroud 1935, whe both Levy ad Feller published their ow idepedet papers regardig the Cetral Limit Theorem. The Cetral Limit Theorem has had, ad cotiues to have, a great impact i the world of mathematics. Not oly was the theorem used i probability theory, but it was also expaded ad ca be used i topology, aalysis ad may other fields i mathematics. 2. Cetral Limit Theorem i Practice The Cetral Limit Theorem is a powerful theorem i statistics that allows us to make assumptios about a populatio ad states that a ormal distributio will occur regardless of what the iitial distributio looks like for a sufficietly large sample size. May applicatios, such as hypothesis testig, cofidece itervals ad estimatio, use the Cetral Limit Theorem to make reasoable assumptios cocerig the populatio 1
Chapter 1. Itroductio 2 sice it is ofte difficult to make such assumptios whe it is ot ormally distributed ad the shape of the distributio is ukow. The goal of this project is to focus o the Cetral Limit Theorem ad its applicatios i statistics, as well as aswer the questios, Why is the Cetral Limit Theorem Importat?, How ca we prove the theorem? ad How ca we apply the Cetral Limit Theorem i baseball? Our paper is structured as follows. I Chapter 2 we will first give key defiitios that are importat i uderstadig the Cetral Limit Theorem. The we will give three differet statemets of the Cetral Limit Theorem. Chapter 3 will aswer the secod problem posed by provig the Cetral Limit Theorem. We will first give a proof usig momet geeratig fuctios, ad the we will give a proof usig characteristic fuctios. I Chapter 4 we will aswer the third problem ad show that the Cetral Limit Theorem ca be used to aswer the questio, Is there such thig as a home-field advatage i baseball? by usig a importat applicatio kow as hypothesis testig. Fially, Chapter 5 will summarize the results of the project ad discuss future applicatios.
CHAPTER 2 Prelimiaries This chapter will provide some basic defiitios, as well as some examples, to help uderstad the various compoets of the Cetral Limit Theorem. Sice the Cetral Limit Theorem has strog applicatios i probability ad statistics, oe must have a good uderstadig of some basic cocepts cocerig radom variables, probability distributio, mea ad variace, ad the like. 1. Defiitios There are may defiitios that must first be uderstood before we give the statemet of the Cetral Limit Theorem. The followig defiitios ca be foud i [12]. Defiitio 2.1. A populatio cosists of the etire collectio of observatios i which we are cocered. Defiitio 2.2. A experimet is a set of positive outcomes that ca be repeated. Defiitio 2.3. A sample is a subset of the populatio. Defiitio 2.4. A radom sample is a sample of size i which all observatios are take at radom ad assumes idepedece. Defiitio 2.5. A radom variable, deoted by X, is a fuctio that associates a real umber with every outcome of a experimet. We say X is a discrete radom variable if it ca assume at most a fiite or a coutably ifiite umber of possible values. A radom variable is cotiuous if it ca assume ay value i some iterval or itervals of real umbers ad the probability that it assumes ay specific value is 0. Example 2.6. Cosider if we wish to kow how well a baseball player performed this seaso by lookig at how ofte they got o base. Defie the radom variable X by X = { 1, if the hitter got o base, 0, if the hitter did ot get o base. This is a example of a radom variable with a Beroulli distributio. 3
Chapter 2. Prelimiaries 4 Defiitio 2.7. The probability distributio of a discrete radom variable X is a fuctio f that associates a probability with each possible value of x if it satisfies the followig three properties, 1. f(x) 0, 2. x f(x) = 1, 3. P (X = x) = f(x). where P (X = x) refers to the probability that the radom variable X is equal to a particular value, deoted by x. Defiitio 2.8. A probability desity fuctio for a cotiuous radom variable X, deoted f(x), is a fuctio such that 1. f(x) 0, for all x i R, 2. + f(x) dx = 1, 3. P (a < X < b) = b a f(x) dx for all a < b. Defiitio 2.9. Let X be a discrete radom variable with probability distributio fuctio f(x). The expected value or mea of X, deoted µ or E(X) is µ = E(X) = x f(x). Example 2.10. We are iterested i fidig the expected umber of home rus that Jose Bautista will hit ext seaso based o his previous three seasos. To do this, we ca compute the expected value of home rus based o his last three seasos. Table 1. Jose Bautista s Yearly Home Rus Year Home Rus 2011 43 2012 27 2013 28
Chapter 2. Prelimiaries 5 µ = E(X) = 43f(43) + 27f(27) + 28f(28) ( ) ( ) ( ) 1 1 1 = 43 + 27 + 28 3 3 3 = 98 3 33. This tells us that based o the past three seasos, Jose Bautista is expected to hit approximately 33 home rus i the 2014 seaso. These statistics are take from [5]. Defiitio 2.11. Let X be a radom variable with mea µ. The variace of X, deoted Var(x) or σ 2, is σ 2 = E[X E(X)] 2 = E(X 2 ) (E(X)) 2 = E(X 2 ) µ 2. Defiitio 2.12. The stadard deviatio of a radom variable X, deoted σ, is the positive square root of the variace. Example 2.13. Usig Alex Rodriguez s yearly triples from Table 2 below, compute the variace ad stadard deviatio. E(X 2 ) = X 2 = X 2 20 = 02 +2 2 +1 2 + +0 2 +1 2 +0 2 20 = 96 20 = 4.8 E(X) = X = X 20 = 0+2+1+3+ +0+1+0 20 = 30 20 = 1.5 σ 2 = E(X 2 ) E(X) 2 = 4.8 (1.5) 2 = 2.55 σ = 2.55 = 1.5968719422671 1.6 These statistics are take from [5]. Defiitio 2.14. A samplig distributio is the probability distributio of a statistic. Defiitio 2.15. A cotiuous radom variable X is said to follow a Normal Distributio with mea µ ad variace σ 2 if it has a probability desity fuctio We write X N(µ, σ 2 ). f(x) = 1 2πσ e 1 2σ 2 (x µ)2 < x <. Example 2.16. Cosider the battig averages of Major League Baseball Players i the 2013 baseball seaso.
Chapter 2. Prelimiaries 6 Table 2. Alex Rodriguez Stats 1994-2013 Year AVG Triples Home Rus 1994.204 0 0 1995.232 2 5 1996.358 1 36 1997.300 3 23 1998.310 5 42 1999.285 0 42 2000.316 2 41 2001.318 1 52 2002.300 2 57 2003.298 6 47 2004.286 2 36 2005.321 1 48 2006.290 1 35 2007.314 0 54 2008.302 0 35 2009.286 1 30 2010.270 2 30 2011.276 0 16 2012.272 1 18 2013.244 0 7 These statistics are take from [5]. Takig all of their battig averages, we ca see i the graph that the averages follow a bell curve, which is uique to ormal distributio. We see that the majority of players have a average betwee.250 ad.300, ad that few players have a average betwee.200 ad.225, ad.325 ad.350. This gives a perfect example of how ormal distributio
Chapter 2. Prelimiaries 7 ca help approximate eve discrete radom variables. Just by lookig at the graph we ca make some ifereces about the populatio. 2. Cetral Limit Theorem Over the years, may mathematicias have cotributed to the Cetral Limit Theorem ad its proof, ad therefore may differet statemets of the theorem are accepted. The first statemet of the theorem is widely kow as the de Moivre-Laplace Theorem, which was the very first statemet of the Cetral Limit Theorem. Theorem 2.17. [3] Cosider a sequece of Beroulli trials with probability p of success, where 0 < p < 1. Let S deote the umber of successes i the first trials, 1. For ay a, b R {± } with a < b, ( lim P a S p b p(1 p) ) = 1 2π b e z 2 a 2 dz. Aother statemet of the Cetral Limit Theorem was give by Lyapuov which states: Theorem 2.18. [8] Suppose X, 1, are idepedet radom variables with mea 0 0 for some δ > 2, the ad k=1 E( X k δ ) s δ S s distr N(0, 1), where S = X 1 + X 2 +... + X, s = k=1 E(X2 distr k ), 1 ad where represets covergece i distributio. Before givig the fial statemet of the Cetral Limit Theorem, we must defie what it meas for radom variables to be idepedet ad idetically distributed. Defiitio 2.19. A sequece of radom variables is said to be idepedet ad idetically distributed if all radom variables are mutually idepedet, ad if each radom variable has the same probability distributio. We will ow give the fial statemet of the Cetral Limit Theorem, a special case of the Lideberg-Feller theorem. This statemet is the oe we will use throughout the rest of the paper. Theorem 2.20. [8] Suppose X 1, X 2,, X are idepedet ad idetically distributed with mea µ ad variace σ 2 > 0. The, S µ σ 2 distr N(0, 1), where S = X 1 + X 2 +... + X, 1 ad distr represets covergece i distributio.
CHAPTER 3 Proofs of Cetral Limit Theorem There are may ways to prove the Cetral Limit Theorem. I this chapter we will provide two proofs of the Cetral Limit Theorem. The first proof uses momet geeratig fuctios, ad the secod uses characteristic fuctios. We will first prove the Cetral Limit Theorem usig momet geeratig fuctios. 1. Proof of Cetral Limit Theorem Usig Momet Geeratig Fuctios Before we give the proof of the Cetral Limit Theorem, it is importat to discuss some basic defiitios, properties ad remarks cocerig momet geeratig fuctios. First, we will give the defiitio of a momet geeratig fuctio as follows: Defiitio 3.1. The momet-geeratig fuctio (MGF) of a radom variable X is defied to be { M X (t) = E(e tx x ) = etx f(x), if X is discrete, + etx f(x)dx, if X is cotiuous. Momets ca also be foud by differetiatio. Theorem 3.2. Let X be a radom variable with momet-geeratig fuctio M X (t). We have where µ r = E(X r ). d r M X (t) dt r t=0 = µ r, Remark 3.3. µ r = E(X r ) describes the rth momet about the origi of the radom variable X. We ca see the that µ 1 = E(X) ad µ 2 = E(X 2 ) which therefore allows us to write the mea ad variace i terms of momets. Momet geeratig fuctios also have the followig properties. Theorem 3.4. M a+bx (t) = E(e t(a+bx) ) = e at M X (bt). Proof. M a+bx (t) = E[e t(a+bx) ] = E(e at ) E(e t(bx) ) = e at E(e (bt)x ) = e at M X (bt). Theorem 3.5. Let X ad Y be radom variables with momet-geeratig fuctios M X (t) ad M Y (t) respectively. The M X+Y (t) = M X (t) M Y (t). 8
Chapter 3. Proofs of Cetral Limit Theorem 9 Proof. M X+Y (t) = E(e t(x+y ) ) = E(e tx e ty ) = E(e tx ) E(e ty ) (by idepedece of radom variables) = M X (t) M Y (t). Corollary 3.6. Let X 1, X 2,..., X be radom variables, the M X1 +X 2 +...+X (t) = M X1 (t) M X2 (t) M X (t). The proof is early idetical to the proof of the previous theorem. To prove the Cetral Limit Theorem, it is ecessary to kow the momet geeratig fuctio of the ormal distributio: Lemma 3.7. The momet geeratig fuctio (MGF) of the ormal radom variable X with mea µ ad variace σ 2, (i.e., X N(µ, σ 2 )) is M X (t) = e µt+ σ2 t 2 2. Proof. First we will fid the MGF for the ormal distributio with mea 0 ad variace 1, i.e, N(0, 1). If Y N(0, 1), the M Y (t) = E(e ty ) = = + + e ty f(y)dx e ty ( 1 2π e 1 2 y2 )dy = 1 + e ty e 1 2 y2 dy 2π = 1 + e (ty 1 2 y2) dy 2π = 1 + e ( 1 2 t2 +[ 1 2 (y2 +2ty+t 2 )]) dy 2π = 1 + e 1 2 t2 e 1 2 (y2 2ty+t 2) dy 2π = e 1 1 + 2 t2 e 1 2 (y t)2 dy. 2π But ote that by Defiitio 2.14, 1 2π + e 1 2 (y t)2 dy is just the probability distributio fuctio of ormal distributio. So
Chapter 3. Proofs of Cetral Limit Theorem 10 Now, if X N(µ, σ 2 ), ad by Theorem 3.3, M Y (t) = e 1 2 t2. M X (t) = M µ+σy (t) = e µt M Y (σt) = e µt e ( 1 2 σ2 t 2 ) = e (µt+ σ2 t 2 2 ). Before we begi the proof of the Cetral Limit Theorem, we must recall the followig remark from calculus: Lemma 3.8. e x = 1 + x + x2 2! + x3 3! + Now we are ready to prove the Cetral Limit Theorem. We will prove a special case of where M X (t) exists i a eighbourhood of 0. Proof. (of Theorem 2.20) Let Y i = X i µ σ for i = 1, 2, 3,... ad R = Y 1 +Y 2 +... +Y. So we have S µ σ 2 = Y 1 + Y 2 +... + Y = R. So S µ σ 2 = R = Z. Sice R is the sum of idepedet radom variables, we see that its momet geeratig fuctio is M R (t) = M Y1 (t)m Y2 (t) M Y (t) = [M Y (t)]
Chapter 3. Proofs of Cetral Limit Theorem 11 by Corollary 3.5. We ote that this is true because each Y i is idepedet ad idetically distributed. Now, ( ) ( ) ( M Z (t) = M R (t) = E e R (t) = E e (R)( t ) = M R Takig the atural logarithm of each side, But ote alog with usig Remark 3.7 that, lm Z (t) = lm Y ( t ). ( ) ( ) t M Y = E e t Y ( ) 1 where O stads for lim sup α We see that So we have, ( = E 1 + ty + ( t2 Y ) 2 2 ( ) = 1 + t2 E(Y 2 ) 1 + O 2 3 2 ( ) = 1 + t2 1 2 + O. ( O ) 1 α 1 α 3 2 <. The lm Z (t) = l (1 + t2 2 + O ( ( t 2 1 = 2 + O = t2 2 + O ( 1 1 2 3 2 ). lm Z (t) = t2 2 + O ( 1 1 2 ( 1 + O ( 1 )) ), M Z (t) e t2 2 as. 3 2 ) t = 3 2 )) )) (M Y ( )) t. Thus, Z N(0, 1), i.e, S µ σ 2 N(0, 1).
Chapter 3. Proofs of Cetral Limit Theorem 12 2. Proof of Cetral Limit Theorem Usig Characteristic Fuctios Now we will prove the Cetral Limit Theorem aother way by lookig at characteristic fuctios. Momet geeratig fuctios do ot exist for all distributios. This is because some momets of the distributios are ot fiite. I these istaces, we look at aother geeral fuctio kow as the characteristic fuctio. is Defiitio 3.9. The characteristic fuctio of a cotiuous radom variable X C X (t) = E(e itx ) = + eitx f(x)dx, where t is a real valued fuctio, ad i = 1. C X (t) will always exist because e itx is a bouded fuctio, that is, e itx = 1 for all t, x R, ad so the itegral exists. The characteristic fuctio also has may similar properties to momet geeratig fuctios. To prove the cetral limit theorem usig characteristic fuctios, we eed to kow the characteristic fuctio of the ormal distributio. Lemma 3.10. Let R, 1 be a sequece of radom variables. If, as, ) C R (t) = E (e irt e t2 2 for all t (, ), the R N(0, 1). We ca ow prove the Cetral Limit Theorem usig characteristics fuctios. Proof. (Of Theorem 2.20) Similar to the proof usig momet geeratig fuctios, let Y i = X i µ for i = 1, 2, 3... ad let R σ = Y 1 + Y 2 +... + Y so, S µ σ 2 = R = Z, where S = X 1 + X 2 +... + X. Note that R is the sum of idepedet radom variables, so we see that the characteristic fuctio of R is C Y (t) = C Y1 (t)c Y2 (t) C Y (t) = [C Y (t)] sice all Y i s are idepedet ad idetically distributed. Now,
Chapter 3. Proofs of Cetral Limit Theorem 13 C Z (t) = C R (t) = E[e i R t ] ( t = E[e i (R ) )] ( ) t = C R = [C Y ( t )]. Takig the atural logarithm of each side, lc Z (t) = lc Y ( t ). We ca ote from the previous proof with some modificatios that ( ) ( ) t C Y = 1 t2 + O 1. 2 The, Usig Remark 3.8, we see that 3 2 lc Z (t) = l(1 t2 + O( 1 )). 2 3 2 lc Z (t) = t2 2 + O( 1 1 2 ), So, as, lc Z (t) t2 2 ad Thus by Lemma 3.10, we coclude that C Z (t) e t2 2 as. Z = S µ σ 2 N(0, 1).
CHAPTER 4 Applicatios of the Cetral Limit Theorem i Baseball The Cetral Limit Theorem has may applicatios i probability theory ad statistics, but oe very iterestig applicatio is kow as hypothesis testig. This chapter will focus o the applicatio of hypothesis testig, ad i particular, aswer the followig questio: Problem 4.1. Is there such thig as a home-field advatage i Major League Baseball? Before we begi, there are a few defiitios that must be uderstood. Defiitio 4.2. A cojecture cocerig oe or more populatios is kow as a statistical hypothesis. Defiitio 4.3. A ull hypothesis is a hypothesis that we wish to test ad is deoted H 0. Defiitio 4.4. A alterative hypothesis represets the questio to be aswered i the hypothesis test ad is deoted by H 1. Remark 4.5. The ull hypothesis H 0 opposes the alterative hypothesis H 1. H 0 is commoly see as the complemet of H 1. Cocerig our problem, the ull hypothesis ad the alterative hypothesis are: H 0 : There is o home-field advatage, H 1 : There is a home-field advatage. Whe we do a hypothesis test, the goal is to determie if we will reject the ull hypothesis or if we fail to reject the ull hypothesis. If we reject H 0, we are i favour of H 1 because of sufficiet evidece i the data. If we fail to reject H 0, the we have isufficiet evidece i the data. Defiitio 4.6. A test statistic is a sample that is used to determie whether or ot a hypothesis is rejected or ot. Defiitio 4.7. A critical value is a cut off value that is compared to the test statistic to determie whether or ot the ull hypothesis is rejected. Defiitio 4.8. The level of sigificace of a test statistic is the probability that H 0 is rejected, although it is true. Defiitio 4.9. A z-score or z-value is a umber that idicates how may stadard deviatios a elemet is away from the mea. 14
Chapter 4. Applicatios of the Cetral Limit Theorem i Baseball 15 Defiitio 4.10. A cofidece iterval is a iterval that cotais a estimated rage of values i which a ukow populatio parameter is likely to fall ito. Remark 4.11. If the test statistic falls ito the iterval, the we fail to reject H 0, but if the test statistic is ot i the iterval, the we reject H 0. Defiitio 4.12. A p-value is the lowest level of sigificace i which the test statistic is sigificat. Remark 4.13. We reject H 0 if the p-value is very small, usually less tha 0.05. Now to retur to our problem, is there such thig as a home-field advatage? How ca we test this otio? I the 2013 Major League Baseball seaso, there were 2431 games played, ad of those games, 1308 of them were wo at home. This idicates that approximately 53.81% of the games played were wo at home. We will let our observed value be this value, so ˆp = 0.5381. It seems as though there is such thig as a home-field advatage, but we must test this otio to be certai. To do this, we will test the hypothesis that there is o such thig as a home-field advatage, so our ull hypothesis will be H 0 : p = 0.50 That is, 50% of the Major League Baseball games are wo at home, hece, there is o home-field advatage. Our alterative hypothesis will be H 1 : p > 0.50. If there is o home-field advatage, the we would expect our proportio to be 0.50, sice half of the games would be wo at home ad the other half o the road. Before we begi to compute if there is such thig as a home-field advatage we must first satisfy four coditios; idepedece assumptio, radom coditio, 10% coditio, ad the success/failure coditio. These coditios will assure that we ca test our hypothesis. Each game is idepedet of oe aother ad oe game does ot effect how aother game is played. Although i some cases whe a key batter or pitcher is ijured, the team may ot do as well i the immediate upcomig games, but roughly speakig, the games played are geerally idepedet of oe aother, ad so our idepedece coditio holds. Sice there have bee may games played over the years, each year havig roughly 2430 games, it ca be see that takig just oe year to observe the data will accout for
Chapter 4. Applicatios of the Cetral Limit Theorem i Baseball 16 our radomizatio coditio. Also, as stated above, we ca see that the 2431 games played i the 2013 seaso, are less tha 10% of the total games played over the years that Major League Baseball has bee aroud, so our 10% coditio also holds true, that is, the sample size is o more tha 10% of the populatio. Fially we must check that the umber of games multiplied by our proportio of 0.50, is larger tha 10. So we have p = 2431(0.50) = 1215.5 which is larger tha 10, so our success/failure coditio holds as well. Sice all of these coditios are met, we are ow able to use the Normal Distributio model to help us test our hypothesis. We will test our hypothesis usig two differet methods: the first by usig a cofidece iterval, ad the secod usig a p-value. First, we will test our hypothesis usig a cofidece iterval. For testig H 0 : p = 0.50 vs. H 1 : p > 0.50 at the 0.05 level of sigificace, we may costruct a right-sided 95% cofidece iterval for p. If our test statistic of p = 0.50 is i the iterval, the we fail to reject H 0 at the 0.05 level of sigificace. If p = 0.50 is ot i the iterval, we reject H 0. The right-sided 100(1 α)% cofidece iterval for p for a large sample is give by ˆp(1 ˆp) ˆp z α < p 1 where α is the level of sigificace. Sice = 2431, ˆp = 0.5381, ad α = 0.05, we see from the Normal Distributio table i the Appedix that z 0.05 = 1.645. So a right-sided 95% cofidece iterval for p is (0.5381)(1 0.5381) 0.5381 1.645 < p 1 2431 0.5381 1.645(0.0101114) < p 1 0.5215 < p 1. Sice 0.50 / (0.5215, 1], we reject H 0 : p = 0.50 i favour of H 1 : p > 0.50 at the 0.05 level of sigificace, that is, we have eough evidece to support that there is a home-field advatage, ad the home team wis more tha 50% of the games played at home. Now we will use the p-value approach to test our hypothesis. We must fid the z- value for testig our observed value. We use the followig equatio to do so;
Chapter 4. Applicatios of the Cetral Limit Theorem i Baseball 17 z = (ˆp po) poqo Now, with p = 0.50, ˆp = 0.5381, ad = 2431, we have z = (ˆp po) poqo This results i a p-value < 0.0001. = 0.5381 0.5 0.5 0.5 2431 = 0.0381 0.010140923 = 3.76 So we ca coclude, sice the p-value < 0.0001 is less tha 0.05, we reject H 0. That is, the data seems to support that the home field team wis more tha 50% of the time, ad hece there is such thig as a home-field advatage i Major League Baseball. We have show that takig all of the games played i the 2013 Major League Baseball seaso, that there is a home-field advatage, but is there a differece betwee the America League ad the Natioal League? Do both leagues have a home-field advatage? We will test this otio usig a 100(1 α)% cofidece iterval at the 0.01 level of sigificace. This will allow us to be 99% cofidet of our results. I the 2013 seaso, the Natioal League played 1211 games, ad wo 660 of those games at home. So this idicates that approximately 54.5% of the games were wo at home. As we calculated above, we will let the observed value be ˆp = 0.545 ad we will test the same hypothesis, that is, H 0 : p = 0.50 vs. H 1 : p > 0.50 Sice = 1211, ˆp = 0.545 ad α = 0.01, we ca see from the Normal Distributio table i the Appedix that z 0.01 = 2.33. So a right-sided 99% cofidece iterval for p is (0.545)(1 0.545) 0.545 2.33 < p 1 1211 0.545 2.33(0.014309744) < p 1 0.5117 < p 1. Sice 0.50 / (0.5117, 1], we reject H 0 : p = 0.50 i favour of H 1 : p > 0.50. So we ca coclude that the Natioal League has a home-field advatage. Will the same be true for the America League? We will agai test the same hypothesis, usig a 99% cofidece iterval for the America League. I the 2013 seaso, the America League played slightly more games tha the Natioal League. They played 1220 games ad of those games, 648 of them were wo at home. So this idicates that approximately 53.11% of the games played were wo at home. Oce agai, let our observed value be ˆp = 0.5311, ad testig the same hypothesis above, we
Chapter 4. Applicatios of the Cetral Limit Theorem i Baseball 18 see that a 99% cofidece iterval for p is (0.5311)(1 0.5311) 0.5311 2.33 < p 1 1220 0.5311 2.33(0.01428724) < p 1 0.4978 < p 1. Sice 0.50 (0.4978, 1], we fail to reject H 0 : p = 0.50. That is, we do ot have eough evidece to support that there is a home-field advatage i the America League. We ca see that by testig these hypotheses for the Natioal League ad the America League, that we ca cofidetly state that there is a home-field advatage i the Natioal League, but we caot say the same thig for the America League based o the 2013 Major League Baseball seaso.
CHAPTER 5 Summary The Cetral Limit Theorem is very powerful i the world of mathematics ad as umerous applicatios i probability theory as well as statistics. I this paper, we have stated the Cetral Limit Theorem, proved the theorem two differet ways, oe usig momet geeratig fuctios ad aother usig characteristic fuctios, ad fially showed a applicatio of the Cetral Limit Theorem by usig hypothesis testig to aswer the questio, Is there such thig as a home-field advatage? We proved that we could express ormal distributio i terms of a momet geeratig fuctio, ad used this to prove the Cetral Limit Theorem, by showig that the momet geeratig fuctio coverges to the ormal distributio model. We the applied our results from the first proof usig momet geeratig fuctios to characteristic fuctios, otig that momet geeratig fuctios are ot always defied, ad oce agai arrived at the same coclusio ad provig the Cetral Limit Theorem. I our fial chapter, we successfully proved by takig statistics from the 2013 baseball seaso ad usig cofidece itervals, as well as a p-value, to show that there is ideed such thig as a home-field advatage i Major League Baseball. We also showed that we ca come to the same coclusio about the Natioal League, but we do ot have eough evidece to show that there is a home-field advatage i the America League. I the future, it may be iterestig to use my applicatio o other sports such as hockey, or football, although we must make sure that we have a sufficietly large sample size to have accurate results. Other applicatios of the Cetral Limit Theorem, as well as other properties such as covergece rates may also be iterestig areas of study for the future. 19
CHAPTER 6 Appedix 20
Bibliography [1] Albert, Jim. Teachig Statistics Usig Baseball. Washigto, DC: The Mathematical Associatio of America, 2003. [2] Characteristic Fuctios ad the Cetral Limit Theorem. Uiversity of Waterloo. Chapter 6. Web. http://sas.uwaterloo.ca/ dlmcleis/s901/chapt6.pdf. [3] Dubar, Steve R. The de Moivre-Laplace Cetral Limit Theorem. Topics i Probability Theory ad Stochastic Processes. http://www.math.ul.edu 1, 7 [4] Emmauel Lesige. Heads or Tails: A Itroductio to Limit Theorems i Probability, Vol 28 of Studet Mathematical Library. America Mathematical Society, 2005. [5] ESPN.com. 2013. ESPN Iteret Vetures. Web. http://esp.go.com/mlb. 5, 6 [6] Filmus, Yuval. Two Proofs of the Cetral Limit Theorem. Ja/Feb 2010. Lecture. www.cs.toroto.edu/ yuvalf/clt.pdf [7] Gristead, Charles M., ad J. Laurie Sell. Cetral Limit Theorem. Itroductio to Probability. Dartmouth College. 325-364. Web. http://www.dartmouth.edu/chace/teachig aids/books articles/probability book/chapter9.pdf. 1 [8] Hildebrad, A.J. The Cetral Limit Theorem. Lecture. http://www.math.uiuc.edu/hildebr/370/408clt.pdf 7 [9] Itroductio to The Cetral Limit Theorem. The Theory of Iferece. NCSSM Statistics Leadership Istitute Notes. Web. http://courses.cssm.edu/math/stat Ist/PDFS/SEC 4 f.pdf [10] Krylov, N.V. A Udergraduate Lecture o The Cetral Limit Theorem. Lecture. www.math.um.edu/ krylov/clt1.pdf [11] Momet Geeratig Fuctios. Chapter 6. Web. http://www.am.qub.ac.uk/users/g.gribaki/sor/chap6.pdf. [12] Walpole, Roald E, Raymod H. Myers, Sharo L. Myers, ad Keyig Ye. Probability & Statistics For Egieers & Scietists. Pretice Hall. 2012. 3 21