Central Limit Theorem and Its Applications to Baseball

Similar documents

1. C. The formula for the confidence interval for a population mean is: x t, which was

Properties of MLE: consistency, asymptotic normality. Fisher information.

Hypothesis testing. Null and alternative hypotheses

Chapter 7 - Sampling Distributions. 1 Introduction. What is statistics? It consist of three major areas:

Inference on Proportion. Chapter 8 Tests of Statistical Hypotheses. Sampling Distribution of Sample Proportion. Confidence Interval

University of California, Los Angeles Department of Statistics. Distributions related to the normal distribution

In nite Sequences. Dr. Philippe B. Laval Kennesaw State University. October 9, 2008

I. Chi-squared Distributions

Overview of some probability distributions.

Chapter 6: Variance, the law of large numbers and the Monte-Carlo method

Case Study. Normal and t Distributions. Density Plot. Normal Distributions

Center, Spread, and Shape in Inference: Claims, Caveats, and Insights

Lecture 4: Cauchy sequences, Bolzano-Weierstrass, and the Squeeze theorem

Sampling Distribution And Central Limit Theorem

Math C067 Sampling Distributions

CHAPTER 7: Central Limit Theorem: CLT for Averages (Means)

Practice Problems for Test 3

Lesson 15 ANOVA (analysis of variance)

Determining the sample size

Section 11.3: The Integral Test

PSYCHOLOGICAL STATISTICS

The following example will help us understand The Sampling Distribution of the Mean. C1 C2 C3 C4 C5 50 miles 84 miles 38 miles 120 miles 48 miles

5: Introduction to Estimation

Z-TEST / Z-STATISTIC: used to test hypotheses about. µ when the population standard deviation is unknown

A probabilistic proof of a binomial identity

SAMPLE QUESTIONS FOR FINAL EXAM. (1) (2) (3) (4) Find the following using the definition of the Riemann integral: (2x + 1)dx

1 Computing the Standard Deviation of Sample Means

Confidence Intervals. CI for a population mean (σ is known and n > 30 or the variable is normally distributed in the.

Measures of Spread and Boxplots Discrete Math, Section 9.4

Confidence Intervals for One Mean

Chapter 7 Methods of Finding Estimators

Statistical inference: example 1. Inferential Statistics

One-sample test of proportions

Chapter 14 Nonparametric Statistics

Convexity, Inequalities, and Norms

Overview. Learning Objectives. Point Estimate. Estimation. Estimating the Value of a Parameter Using Confidence Intervals

Lesson 17 Pearson s Correlation Coefficient

Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 13

Normal Distribution.

Chapter 7: Confidence Interval and Sample Size

4.3. The Integral and Comparison Tests

Sequences and Series

Exam 3. Instructor: Cynthia Rudin TA: Dimitrios Bisias. November 22, 2011

Our aim is to show that under reasonable assumptions a given 2π-periodic function f can be represented as convergent series

Maximum Likelihood Estimators.

Output Analysis (2, Chapters 10 &11 Law)

Infinite Sequences and Series

Lecture 13. Lecturer: Jonathan Kelner Scribe: Jonathan Pines (2009)

Research Method (I) --Knowledge on Sampling (Simple Random Sampling)

STA 2023 Practice Questions Exam 2 Chapter 7- sec 9.2. Case parameter estimator standard error Estimate of standard error

MARTINGALES AND A BASIC APPLICATION

Week 3 Conditional probabilities, Bayes formula, WEEK 3 page 1 Expected value of a random variable

Definition. A variable X that takes on values X 1, X 2, X 3,...X k with respective frequencies f 1, f 2, f 3,...f k has mean

3 Basic Definitions of Probability Theory

UC Berkeley Department of Electrical Engineering and Computer Science. EE 126: Probablity and Random Processes. Solutions 9 Spring 2006

Mann-Whitney U 2 Sample Test (a.k.a. Wilcoxon Rank Sum Test)

MEI Structured Mathematics. Module Summary Sheets. Statistics 2 (Version B: reference to new book)

How To Solve The Homewor Problem Beautifully

Asymptotic Growth of Functions

Department of Computer Science, University of Otago

A Recursive Formula for Moments of a Binomial Distribution

A Mathematical Perspective on Gambling

BASIC STATISTICS. f(x 1,x 2,..., x n )=f(x 1 )f(x 2 ) f(x n )= f(x i ) (1)

Modified Line Search Method for Global Optimization

PROCEEDINGS OF THE YEREVAN STATE UNIVERSITY AN ALTERNATIVE MODEL FOR BONUS-MALUS SYSTEM

Example 2 Find the square root of 0. The only square root of 0 is 0 (since 0 is not positive or negative, so those choices don t exist here).

Chapter 5: Inner Product Spaces

THE REGRESSION MODEL IN MATRIX FORM. For simple linear regression, meaning one predictor, the model is. for i = 1, 2, 3,, n

A Test of Normality. 1 n S 2 3. n 1. Now introduce two new statistics. The sample skewness is defined as:

WHEN IS THE (CO)SINE OF A RATIONAL ANGLE EQUAL TO A RATIONAL NUMBER?

SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES

Trigonometric Form of a Complex Number. The Complex Plane. axis. ( 2, 1) or 2 i FIGURE The absolute value of the complex number z a bi is

, a Wishart distribution with n -1 degrees of freedom and scale matrix.

FIBONACCI NUMBERS: AN APPLICATION OF LINEAR ALGEBRA. 1. Powers of a matrix

1 Correlation and Regression Analysis

Theorems About Power Series

5 Boolean Decision Trees (February 11)

Confidence Intervals

hp calculators HP 12C Statistics - average and standard deviation Average and standard deviation concepts HP12C average and standard deviation

THE HEIGHT OF q-binary SEARCH TREES

Soving Recurrence Relations

GCSE STATISTICS. 4) How to calculate the range: The difference between the biggest number and the smallest number.

The Stable Marriage Problem

THE TWO-VARIABLE LINEAR REGRESSION MODEL

Lecture 5: Span, linear independence, bases, and dimension

CS103A Handout 23 Winter 2002 February 22, 2002 Solving Recurrence Relations

Hypergeometric Distributions

OMG! Excessive Texting Tied to Risky Teen Behaviors

Multi-server Optimal Bandwidth Monitoring for QoS based Multimedia Delivery Anup Basu, Irene Cheng and Yinzhe Yu

3. Greatest Common Divisor - Least Common Multiple

LECTURE 13: Cross-validation

Unit 8: Inference for Proportions. Chapters 8 & 9 in IPS

Descriptive Statistics

Incremental calculation of weighted mean and variance

Irreducible polynomials with consecutive zero coefficients

THE ABRACADABRA PROBLEM

Confidence intervals and hypothesis tests

Quadrat Sampling in Population Ecology

Transcription:

Cetral Limit Theorem ad Its Applicatios to Baseball by Nicole Aderso A project submitted to the Departmet of Mathematical Scieces i coformity with the requiremets for Math 4301 (Hoours Semiar) Lakehead Uiversity Thuder Bay, Otario, Caada copyright c (2014) Nicole Aderso

Abstract This hoours project is o the Cetral Limit Theorem (CLT). The CLT is cosidered to be oe of the most powerful theorems i all of statistics ad probability. I probability theory, the CLT states that, give certai coditios, the sample mea of a sufficietly large umber or iterates of idepedet radom variables, each with a well-defied expected value ad well-defied variace, will be approximately ormally distributed. I this project, a brief historical review of the CLT is provided, some basic cocepts, two proofs of the CLT ad several properties are discussed. As a applicatio, we discuss how to use the CLT to study the samplig distributio of the sample mea ad hypothesis testig usig baseball statistics. i

Ackowledgemets I would like to thak my supervisor, Dr. Li, who helped me by sharig his kowledge ad may resources to help make this paper come to life. I would also like to thak Dr. Adam Va Tuyl for all of his help with Latex, ad support throughout this project. Thak you very much! ii

Cotets Abstract Ackowledgemets i ii Chapter 1. Itroductio 1 1. Historical Review of Cetral Limit Theorem 1 2. Cetral Limit Theorem i Practice 1 Chapter 2. Prelimiaries 3 1. Defiitios 3 2. Cetral Limit Theorem 7 Chapter 3. Proofs of Cetral Limit Theorem 8 1. Proof of Cetral Limit Theorem Usig Momet Geeratig Fuctios 8 2. Proof of Cetral Limit Theorem Usig Characteristic Fuctios 12 Chapter 4. Applicatios of the Cetral Limit Theorem i Baseball 14 Chapter 5. Summary 19 Chapter 6. Appedix 20 Bibliography 21 iii

CHAPTER 1 Itroductio 1. Historical Review of Cetral Limit Theorem The Cetral Limit Theorem, CLT for short, has bee aroud for over 275 years ad has may applicatios, especially i the world of probability theory. May mathematicias over the years have proved the CLT i may differet cases, therefore provided differet versios of the theorem. The origis of the Cetral Limit Theorem ca be traced to The Doctrie of Chaces by Abraham de Moivre i 1738. Abraham de Moivre s book provided techiques for solvig gamblig problems, ad i this book he provided a statemet of the theorem for Beroulli trails as well as gave a proof for p = 1. This was a very importat 2 discovery at the time which ispired may other mathematicias years later to look at de Moivre s previous work ad cotiue to prove it for other cases. [7] I 1812, Pierre Simo Laplace published his ow book titled Theorie Aalytique des Probabilities, i which he geeralized the theorem for p 1. He also gave a proof, although 2 ot a rigorous oe, for his fidig. It was ot util aroud 1901-1902 did the Cetral Limit Theorem become more geeralized ad a complete proof was give by Aleksadr Lyapuov. A more geeral statemet of the Cetral Limit Theorem did appear i 1922 whe Lideberg gave the statemet, the sequece of radom variables eed ot be idetically distributed, istead the radom variables oly eed zero meas with idividual variaces small compared to their sum [3]. May other cotributios to the statemet of the theorem, as well as may differet ways to prove the theorem bega to surface aroud 1935, whe both Levy ad Feller published their ow idepedet papers regardig the Cetral Limit Theorem. The Cetral Limit Theorem has had, ad cotiues to have, a great impact i the world of mathematics. Not oly was the theorem used i probability theory, but it was also expaded ad ca be used i topology, aalysis ad may other fields i mathematics. 2. Cetral Limit Theorem i Practice The Cetral Limit Theorem is a powerful theorem i statistics that allows us to make assumptios about a populatio ad states that a ormal distributio will occur regardless of what the iitial distributio looks like for a sufficietly large sample size. May applicatios, such as hypothesis testig, cofidece itervals ad estimatio, use the Cetral Limit Theorem to make reasoable assumptios cocerig the populatio 1

Chapter 1. Itroductio 2 sice it is ofte difficult to make such assumptios whe it is ot ormally distributed ad the shape of the distributio is ukow. The goal of this project is to focus o the Cetral Limit Theorem ad its applicatios i statistics, as well as aswer the questios, Why is the Cetral Limit Theorem Importat?, How ca we prove the theorem? ad How ca we apply the Cetral Limit Theorem i baseball? Our paper is structured as follows. I Chapter 2 we will first give key defiitios that are importat i uderstadig the Cetral Limit Theorem. The we will give three differet statemets of the Cetral Limit Theorem. Chapter 3 will aswer the secod problem posed by provig the Cetral Limit Theorem. We will first give a proof usig momet geeratig fuctios, ad the we will give a proof usig characteristic fuctios. I Chapter 4 we will aswer the third problem ad show that the Cetral Limit Theorem ca be used to aswer the questio, Is there such thig as a home-field advatage i baseball? by usig a importat applicatio kow as hypothesis testig. Fially, Chapter 5 will summarize the results of the project ad discuss future applicatios.

CHAPTER 2 Prelimiaries This chapter will provide some basic defiitios, as well as some examples, to help uderstad the various compoets of the Cetral Limit Theorem. Sice the Cetral Limit Theorem has strog applicatios i probability ad statistics, oe must have a good uderstadig of some basic cocepts cocerig radom variables, probability distributio, mea ad variace, ad the like. 1. Defiitios There are may defiitios that must first be uderstood before we give the statemet of the Cetral Limit Theorem. The followig defiitios ca be foud i [12]. Defiitio 2.1. A populatio cosists of the etire collectio of observatios i which we are cocered. Defiitio 2.2. A experimet is a set of positive outcomes that ca be repeated. Defiitio 2.3. A sample is a subset of the populatio. Defiitio 2.4. A radom sample is a sample of size i which all observatios are take at radom ad assumes idepedece. Defiitio 2.5. A radom variable, deoted by X, is a fuctio that associates a real umber with every outcome of a experimet. We say X is a discrete radom variable if it ca assume at most a fiite or a coutably ifiite umber of possible values. A radom variable is cotiuous if it ca assume ay value i some iterval or itervals of real umbers ad the probability that it assumes ay specific value is 0. Example 2.6. Cosider if we wish to kow how well a baseball player performed this seaso by lookig at how ofte they got o base. Defie the radom variable X by X = { 1, if the hitter got o base, 0, if the hitter did ot get o base. This is a example of a radom variable with a Beroulli distributio. 3

Chapter 2. Prelimiaries 4 Defiitio 2.7. The probability distributio of a discrete radom variable X is a fuctio f that associates a probability with each possible value of x if it satisfies the followig three properties, 1. f(x) 0, 2. x f(x) = 1, 3. P (X = x) = f(x). where P (X = x) refers to the probability that the radom variable X is equal to a particular value, deoted by x. Defiitio 2.8. A probability desity fuctio for a cotiuous radom variable X, deoted f(x), is a fuctio such that 1. f(x) 0, for all x i R, 2. + f(x) dx = 1, 3. P (a < X < b) = b a f(x) dx for all a < b. Defiitio 2.9. Let X be a discrete radom variable with probability distributio fuctio f(x). The expected value or mea of X, deoted µ or E(X) is µ = E(X) = x f(x). Example 2.10. We are iterested i fidig the expected umber of home rus that Jose Bautista will hit ext seaso based o his previous three seasos. To do this, we ca compute the expected value of home rus based o his last three seasos. Table 1. Jose Bautista s Yearly Home Rus Year Home Rus 2011 43 2012 27 2013 28

Chapter 2. Prelimiaries 5 µ = E(X) = 43f(43) + 27f(27) + 28f(28) ( ) ( ) ( ) 1 1 1 = 43 + 27 + 28 3 3 3 = 98 3 33. This tells us that based o the past three seasos, Jose Bautista is expected to hit approximately 33 home rus i the 2014 seaso. These statistics are take from [5]. Defiitio 2.11. Let X be a radom variable with mea µ. The variace of X, deoted Var(x) or σ 2, is σ 2 = E[X E(X)] 2 = E(X 2 ) (E(X)) 2 = E(X 2 ) µ 2. Defiitio 2.12. The stadard deviatio of a radom variable X, deoted σ, is the positive square root of the variace. Example 2.13. Usig Alex Rodriguez s yearly triples from Table 2 below, compute the variace ad stadard deviatio. E(X 2 ) = X 2 = X 2 20 = 02 +2 2 +1 2 + +0 2 +1 2 +0 2 20 = 96 20 = 4.8 E(X) = X = X 20 = 0+2+1+3+ +0+1+0 20 = 30 20 = 1.5 σ 2 = E(X 2 ) E(X) 2 = 4.8 (1.5) 2 = 2.55 σ = 2.55 = 1.5968719422671 1.6 These statistics are take from [5]. Defiitio 2.14. A samplig distributio is the probability distributio of a statistic. Defiitio 2.15. A cotiuous radom variable X is said to follow a Normal Distributio with mea µ ad variace σ 2 if it has a probability desity fuctio We write X N(µ, σ 2 ). f(x) = 1 2πσ e 1 2σ 2 (x µ)2 < x <. Example 2.16. Cosider the battig averages of Major League Baseball Players i the 2013 baseball seaso.

Chapter 2. Prelimiaries 6 Table 2. Alex Rodriguez Stats 1994-2013 Year AVG Triples Home Rus 1994.204 0 0 1995.232 2 5 1996.358 1 36 1997.300 3 23 1998.310 5 42 1999.285 0 42 2000.316 2 41 2001.318 1 52 2002.300 2 57 2003.298 6 47 2004.286 2 36 2005.321 1 48 2006.290 1 35 2007.314 0 54 2008.302 0 35 2009.286 1 30 2010.270 2 30 2011.276 0 16 2012.272 1 18 2013.244 0 7 These statistics are take from [5]. Takig all of their battig averages, we ca see i the graph that the averages follow a bell curve, which is uique to ormal distributio. We see that the majority of players have a average betwee.250 ad.300, ad that few players have a average betwee.200 ad.225, ad.325 ad.350. This gives a perfect example of how ormal distributio

Chapter 2. Prelimiaries 7 ca help approximate eve discrete radom variables. Just by lookig at the graph we ca make some ifereces about the populatio. 2. Cetral Limit Theorem Over the years, may mathematicias have cotributed to the Cetral Limit Theorem ad its proof, ad therefore may differet statemets of the theorem are accepted. The first statemet of the theorem is widely kow as the de Moivre-Laplace Theorem, which was the very first statemet of the Cetral Limit Theorem. Theorem 2.17. [3] Cosider a sequece of Beroulli trials with probability p of success, where 0 < p < 1. Let S deote the umber of successes i the first trials, 1. For ay a, b R {± } with a < b, ( lim P a S p b p(1 p) ) = 1 2π b e z 2 a 2 dz. Aother statemet of the Cetral Limit Theorem was give by Lyapuov which states: Theorem 2.18. [8] Suppose X, 1, are idepedet radom variables with mea 0 0 for some δ > 2, the ad k=1 E( X k δ ) s δ S s distr N(0, 1), where S = X 1 + X 2 +... + X, s = k=1 E(X2 distr k ), 1 ad where represets covergece i distributio. Before givig the fial statemet of the Cetral Limit Theorem, we must defie what it meas for radom variables to be idepedet ad idetically distributed. Defiitio 2.19. A sequece of radom variables is said to be idepedet ad idetically distributed if all radom variables are mutually idepedet, ad if each radom variable has the same probability distributio. We will ow give the fial statemet of the Cetral Limit Theorem, a special case of the Lideberg-Feller theorem. This statemet is the oe we will use throughout the rest of the paper. Theorem 2.20. [8] Suppose X 1, X 2,, X are idepedet ad idetically distributed with mea µ ad variace σ 2 > 0. The, S µ σ 2 distr N(0, 1), where S = X 1 + X 2 +... + X, 1 ad distr represets covergece i distributio.

CHAPTER 3 Proofs of Cetral Limit Theorem There are may ways to prove the Cetral Limit Theorem. I this chapter we will provide two proofs of the Cetral Limit Theorem. The first proof uses momet geeratig fuctios, ad the secod uses characteristic fuctios. We will first prove the Cetral Limit Theorem usig momet geeratig fuctios. 1. Proof of Cetral Limit Theorem Usig Momet Geeratig Fuctios Before we give the proof of the Cetral Limit Theorem, it is importat to discuss some basic defiitios, properties ad remarks cocerig momet geeratig fuctios. First, we will give the defiitio of a momet geeratig fuctio as follows: Defiitio 3.1. The momet-geeratig fuctio (MGF) of a radom variable X is defied to be { M X (t) = E(e tx x ) = etx f(x), if X is discrete, + etx f(x)dx, if X is cotiuous. Momets ca also be foud by differetiatio. Theorem 3.2. Let X be a radom variable with momet-geeratig fuctio M X (t). We have where µ r = E(X r ). d r M X (t) dt r t=0 = µ r, Remark 3.3. µ r = E(X r ) describes the rth momet about the origi of the radom variable X. We ca see the that µ 1 = E(X) ad µ 2 = E(X 2 ) which therefore allows us to write the mea ad variace i terms of momets. Momet geeratig fuctios also have the followig properties. Theorem 3.4. M a+bx (t) = E(e t(a+bx) ) = e at M X (bt). Proof. M a+bx (t) = E[e t(a+bx) ] = E(e at ) E(e t(bx) ) = e at E(e (bt)x ) = e at M X (bt). Theorem 3.5. Let X ad Y be radom variables with momet-geeratig fuctios M X (t) ad M Y (t) respectively. The M X+Y (t) = M X (t) M Y (t). 8

Chapter 3. Proofs of Cetral Limit Theorem 9 Proof. M X+Y (t) = E(e t(x+y ) ) = E(e tx e ty ) = E(e tx ) E(e ty ) (by idepedece of radom variables) = M X (t) M Y (t). Corollary 3.6. Let X 1, X 2,..., X be radom variables, the M X1 +X 2 +...+X (t) = M X1 (t) M X2 (t) M X (t). The proof is early idetical to the proof of the previous theorem. To prove the Cetral Limit Theorem, it is ecessary to kow the momet geeratig fuctio of the ormal distributio: Lemma 3.7. The momet geeratig fuctio (MGF) of the ormal radom variable X with mea µ ad variace σ 2, (i.e., X N(µ, σ 2 )) is M X (t) = e µt+ σ2 t 2 2. Proof. First we will fid the MGF for the ormal distributio with mea 0 ad variace 1, i.e, N(0, 1). If Y N(0, 1), the M Y (t) = E(e ty ) = = + + e ty f(y)dx e ty ( 1 2π e 1 2 y2 )dy = 1 + e ty e 1 2 y2 dy 2π = 1 + e (ty 1 2 y2) dy 2π = 1 + e ( 1 2 t2 +[ 1 2 (y2 +2ty+t 2 )]) dy 2π = 1 + e 1 2 t2 e 1 2 (y2 2ty+t 2) dy 2π = e 1 1 + 2 t2 e 1 2 (y t)2 dy. 2π But ote that by Defiitio 2.14, 1 2π + e 1 2 (y t)2 dy is just the probability distributio fuctio of ormal distributio. So

Chapter 3. Proofs of Cetral Limit Theorem 10 Now, if X N(µ, σ 2 ), ad by Theorem 3.3, M Y (t) = e 1 2 t2. M X (t) = M µ+σy (t) = e µt M Y (σt) = e µt e ( 1 2 σ2 t 2 ) = e (µt+ σ2 t 2 2 ). Before we begi the proof of the Cetral Limit Theorem, we must recall the followig remark from calculus: Lemma 3.8. e x = 1 + x + x2 2! + x3 3! + Now we are ready to prove the Cetral Limit Theorem. We will prove a special case of where M X (t) exists i a eighbourhood of 0. Proof. (of Theorem 2.20) Let Y i = X i µ σ for i = 1, 2, 3,... ad R = Y 1 +Y 2 +... +Y. So we have S µ σ 2 = Y 1 + Y 2 +... + Y = R. So S µ σ 2 = R = Z. Sice R is the sum of idepedet radom variables, we see that its momet geeratig fuctio is M R (t) = M Y1 (t)m Y2 (t) M Y (t) = [M Y (t)]

Chapter 3. Proofs of Cetral Limit Theorem 11 by Corollary 3.5. We ote that this is true because each Y i is idepedet ad idetically distributed. Now, ( ) ( ) ( M Z (t) = M R (t) = E e R (t) = E e (R)( t ) = M R Takig the atural logarithm of each side, But ote alog with usig Remark 3.7 that, lm Z (t) = lm Y ( t ). ( ) ( ) t M Y = E e t Y ( ) 1 where O stads for lim sup α We see that So we have, ( = E 1 + ty + ( t2 Y ) 2 2 ( ) = 1 + t2 E(Y 2 ) 1 + O 2 3 2 ( ) = 1 + t2 1 2 + O. ( O ) 1 α 1 α 3 2 <. The lm Z (t) = l (1 + t2 2 + O ( ( t 2 1 = 2 + O = t2 2 + O ( 1 1 2 3 2 ). lm Z (t) = t2 2 + O ( 1 1 2 ( 1 + O ( 1 )) ), M Z (t) e t2 2 as. 3 2 ) t = 3 2 )) )) (M Y ( )) t. Thus, Z N(0, 1), i.e, S µ σ 2 N(0, 1).

Chapter 3. Proofs of Cetral Limit Theorem 12 2. Proof of Cetral Limit Theorem Usig Characteristic Fuctios Now we will prove the Cetral Limit Theorem aother way by lookig at characteristic fuctios. Momet geeratig fuctios do ot exist for all distributios. This is because some momets of the distributios are ot fiite. I these istaces, we look at aother geeral fuctio kow as the characteristic fuctio. is Defiitio 3.9. The characteristic fuctio of a cotiuous radom variable X C X (t) = E(e itx ) = + eitx f(x)dx, where t is a real valued fuctio, ad i = 1. C X (t) will always exist because e itx is a bouded fuctio, that is, e itx = 1 for all t, x R, ad so the itegral exists. The characteristic fuctio also has may similar properties to momet geeratig fuctios. To prove the cetral limit theorem usig characteristic fuctios, we eed to kow the characteristic fuctio of the ormal distributio. Lemma 3.10. Let R, 1 be a sequece of radom variables. If, as, ) C R (t) = E (e irt e t2 2 for all t (, ), the R N(0, 1). We ca ow prove the Cetral Limit Theorem usig characteristics fuctios. Proof. (Of Theorem 2.20) Similar to the proof usig momet geeratig fuctios, let Y i = X i µ for i = 1, 2, 3... ad let R σ = Y 1 + Y 2 +... + Y so, S µ σ 2 = R = Z, where S = X 1 + X 2 +... + X. Note that R is the sum of idepedet radom variables, so we see that the characteristic fuctio of R is C Y (t) = C Y1 (t)c Y2 (t) C Y (t) = [C Y (t)] sice all Y i s are idepedet ad idetically distributed. Now,

Chapter 3. Proofs of Cetral Limit Theorem 13 C Z (t) = C R (t) = E[e i R t ] ( t = E[e i (R ) )] ( ) t = C R = [C Y ( t )]. Takig the atural logarithm of each side, lc Z (t) = lc Y ( t ). We ca ote from the previous proof with some modificatios that ( ) ( ) t C Y = 1 t2 + O 1. 2 The, Usig Remark 3.8, we see that 3 2 lc Z (t) = l(1 t2 + O( 1 )). 2 3 2 lc Z (t) = t2 2 + O( 1 1 2 ), So, as, lc Z (t) t2 2 ad Thus by Lemma 3.10, we coclude that C Z (t) e t2 2 as. Z = S µ σ 2 N(0, 1).

CHAPTER 4 Applicatios of the Cetral Limit Theorem i Baseball The Cetral Limit Theorem has may applicatios i probability theory ad statistics, but oe very iterestig applicatio is kow as hypothesis testig. This chapter will focus o the applicatio of hypothesis testig, ad i particular, aswer the followig questio: Problem 4.1. Is there such thig as a home-field advatage i Major League Baseball? Before we begi, there are a few defiitios that must be uderstood. Defiitio 4.2. A cojecture cocerig oe or more populatios is kow as a statistical hypothesis. Defiitio 4.3. A ull hypothesis is a hypothesis that we wish to test ad is deoted H 0. Defiitio 4.4. A alterative hypothesis represets the questio to be aswered i the hypothesis test ad is deoted by H 1. Remark 4.5. The ull hypothesis H 0 opposes the alterative hypothesis H 1. H 0 is commoly see as the complemet of H 1. Cocerig our problem, the ull hypothesis ad the alterative hypothesis are: H 0 : There is o home-field advatage, H 1 : There is a home-field advatage. Whe we do a hypothesis test, the goal is to determie if we will reject the ull hypothesis or if we fail to reject the ull hypothesis. If we reject H 0, we are i favour of H 1 because of sufficiet evidece i the data. If we fail to reject H 0, the we have isufficiet evidece i the data. Defiitio 4.6. A test statistic is a sample that is used to determie whether or ot a hypothesis is rejected or ot. Defiitio 4.7. A critical value is a cut off value that is compared to the test statistic to determie whether or ot the ull hypothesis is rejected. Defiitio 4.8. The level of sigificace of a test statistic is the probability that H 0 is rejected, although it is true. Defiitio 4.9. A z-score or z-value is a umber that idicates how may stadard deviatios a elemet is away from the mea. 14

Chapter 4. Applicatios of the Cetral Limit Theorem i Baseball 15 Defiitio 4.10. A cofidece iterval is a iterval that cotais a estimated rage of values i which a ukow populatio parameter is likely to fall ito. Remark 4.11. If the test statistic falls ito the iterval, the we fail to reject H 0, but if the test statistic is ot i the iterval, the we reject H 0. Defiitio 4.12. A p-value is the lowest level of sigificace i which the test statistic is sigificat. Remark 4.13. We reject H 0 if the p-value is very small, usually less tha 0.05. Now to retur to our problem, is there such thig as a home-field advatage? How ca we test this otio? I the 2013 Major League Baseball seaso, there were 2431 games played, ad of those games, 1308 of them were wo at home. This idicates that approximately 53.81% of the games played were wo at home. We will let our observed value be this value, so ˆp = 0.5381. It seems as though there is such thig as a home-field advatage, but we must test this otio to be certai. To do this, we will test the hypothesis that there is o such thig as a home-field advatage, so our ull hypothesis will be H 0 : p = 0.50 That is, 50% of the Major League Baseball games are wo at home, hece, there is o home-field advatage. Our alterative hypothesis will be H 1 : p > 0.50. If there is o home-field advatage, the we would expect our proportio to be 0.50, sice half of the games would be wo at home ad the other half o the road. Before we begi to compute if there is such thig as a home-field advatage we must first satisfy four coditios; idepedece assumptio, radom coditio, 10% coditio, ad the success/failure coditio. These coditios will assure that we ca test our hypothesis. Each game is idepedet of oe aother ad oe game does ot effect how aother game is played. Although i some cases whe a key batter or pitcher is ijured, the team may ot do as well i the immediate upcomig games, but roughly speakig, the games played are geerally idepedet of oe aother, ad so our idepedece coditio holds. Sice there have bee may games played over the years, each year havig roughly 2430 games, it ca be see that takig just oe year to observe the data will accout for

Chapter 4. Applicatios of the Cetral Limit Theorem i Baseball 16 our radomizatio coditio. Also, as stated above, we ca see that the 2431 games played i the 2013 seaso, are less tha 10% of the total games played over the years that Major League Baseball has bee aroud, so our 10% coditio also holds true, that is, the sample size is o more tha 10% of the populatio. Fially we must check that the umber of games multiplied by our proportio of 0.50, is larger tha 10. So we have p = 2431(0.50) = 1215.5 which is larger tha 10, so our success/failure coditio holds as well. Sice all of these coditios are met, we are ow able to use the Normal Distributio model to help us test our hypothesis. We will test our hypothesis usig two differet methods: the first by usig a cofidece iterval, ad the secod usig a p-value. First, we will test our hypothesis usig a cofidece iterval. For testig H 0 : p = 0.50 vs. H 1 : p > 0.50 at the 0.05 level of sigificace, we may costruct a right-sided 95% cofidece iterval for p. If our test statistic of p = 0.50 is i the iterval, the we fail to reject H 0 at the 0.05 level of sigificace. If p = 0.50 is ot i the iterval, we reject H 0. The right-sided 100(1 α)% cofidece iterval for p for a large sample is give by ˆp(1 ˆp) ˆp z α < p 1 where α is the level of sigificace. Sice = 2431, ˆp = 0.5381, ad α = 0.05, we see from the Normal Distributio table i the Appedix that z 0.05 = 1.645. So a right-sided 95% cofidece iterval for p is (0.5381)(1 0.5381) 0.5381 1.645 < p 1 2431 0.5381 1.645(0.0101114) < p 1 0.5215 < p 1. Sice 0.50 / (0.5215, 1], we reject H 0 : p = 0.50 i favour of H 1 : p > 0.50 at the 0.05 level of sigificace, that is, we have eough evidece to support that there is a home-field advatage, ad the home team wis more tha 50% of the games played at home. Now we will use the p-value approach to test our hypothesis. We must fid the z- value for testig our observed value. We use the followig equatio to do so;

Chapter 4. Applicatios of the Cetral Limit Theorem i Baseball 17 z = (ˆp po) poqo Now, with p = 0.50, ˆp = 0.5381, ad = 2431, we have z = (ˆp po) poqo This results i a p-value < 0.0001. = 0.5381 0.5 0.5 0.5 2431 = 0.0381 0.010140923 = 3.76 So we ca coclude, sice the p-value < 0.0001 is less tha 0.05, we reject H 0. That is, the data seems to support that the home field team wis more tha 50% of the time, ad hece there is such thig as a home-field advatage i Major League Baseball. We have show that takig all of the games played i the 2013 Major League Baseball seaso, that there is a home-field advatage, but is there a differece betwee the America League ad the Natioal League? Do both leagues have a home-field advatage? We will test this otio usig a 100(1 α)% cofidece iterval at the 0.01 level of sigificace. This will allow us to be 99% cofidet of our results. I the 2013 seaso, the Natioal League played 1211 games, ad wo 660 of those games at home. So this idicates that approximately 54.5% of the games were wo at home. As we calculated above, we will let the observed value be ˆp = 0.545 ad we will test the same hypothesis, that is, H 0 : p = 0.50 vs. H 1 : p > 0.50 Sice = 1211, ˆp = 0.545 ad α = 0.01, we ca see from the Normal Distributio table i the Appedix that z 0.01 = 2.33. So a right-sided 99% cofidece iterval for p is (0.545)(1 0.545) 0.545 2.33 < p 1 1211 0.545 2.33(0.014309744) < p 1 0.5117 < p 1. Sice 0.50 / (0.5117, 1], we reject H 0 : p = 0.50 i favour of H 1 : p > 0.50. So we ca coclude that the Natioal League has a home-field advatage. Will the same be true for the America League? We will agai test the same hypothesis, usig a 99% cofidece iterval for the America League. I the 2013 seaso, the America League played slightly more games tha the Natioal League. They played 1220 games ad of those games, 648 of them were wo at home. So this idicates that approximately 53.11% of the games played were wo at home. Oce agai, let our observed value be ˆp = 0.5311, ad testig the same hypothesis above, we

Chapter 4. Applicatios of the Cetral Limit Theorem i Baseball 18 see that a 99% cofidece iterval for p is (0.5311)(1 0.5311) 0.5311 2.33 < p 1 1220 0.5311 2.33(0.01428724) < p 1 0.4978 < p 1. Sice 0.50 (0.4978, 1], we fail to reject H 0 : p = 0.50. That is, we do ot have eough evidece to support that there is a home-field advatage i the America League. We ca see that by testig these hypotheses for the Natioal League ad the America League, that we ca cofidetly state that there is a home-field advatage i the Natioal League, but we caot say the same thig for the America League based o the 2013 Major League Baseball seaso.

CHAPTER 5 Summary The Cetral Limit Theorem is very powerful i the world of mathematics ad as umerous applicatios i probability theory as well as statistics. I this paper, we have stated the Cetral Limit Theorem, proved the theorem two differet ways, oe usig momet geeratig fuctios ad aother usig characteristic fuctios, ad fially showed a applicatio of the Cetral Limit Theorem by usig hypothesis testig to aswer the questio, Is there such thig as a home-field advatage? We proved that we could express ormal distributio i terms of a momet geeratig fuctio, ad used this to prove the Cetral Limit Theorem, by showig that the momet geeratig fuctio coverges to the ormal distributio model. We the applied our results from the first proof usig momet geeratig fuctios to characteristic fuctios, otig that momet geeratig fuctios are ot always defied, ad oce agai arrived at the same coclusio ad provig the Cetral Limit Theorem. I our fial chapter, we successfully proved by takig statistics from the 2013 baseball seaso ad usig cofidece itervals, as well as a p-value, to show that there is ideed such thig as a home-field advatage i Major League Baseball. We also showed that we ca come to the same coclusio about the Natioal League, but we do ot have eough evidece to show that there is a home-field advatage i the America League. I the future, it may be iterestig to use my applicatio o other sports such as hockey, or football, although we must make sure that we have a sufficietly large sample size to have accurate results. Other applicatios of the Cetral Limit Theorem, as well as other properties such as covergece rates may also be iterestig areas of study for the future. 19

CHAPTER 6 Appedix 20

Bibliography [1] Albert, Jim. Teachig Statistics Usig Baseball. Washigto, DC: The Mathematical Associatio of America, 2003. [2] Characteristic Fuctios ad the Cetral Limit Theorem. Uiversity of Waterloo. Chapter 6. Web. http://sas.uwaterloo.ca/ dlmcleis/s901/chapt6.pdf. [3] Dubar, Steve R. The de Moivre-Laplace Cetral Limit Theorem. Topics i Probability Theory ad Stochastic Processes. http://www.math.ul.edu 1, 7 [4] Emmauel Lesige. Heads or Tails: A Itroductio to Limit Theorems i Probability, Vol 28 of Studet Mathematical Library. America Mathematical Society, 2005. [5] ESPN.com. 2013. ESPN Iteret Vetures. Web. http://esp.go.com/mlb. 5, 6 [6] Filmus, Yuval. Two Proofs of the Cetral Limit Theorem. Ja/Feb 2010. Lecture. www.cs.toroto.edu/ yuvalf/clt.pdf [7] Gristead, Charles M., ad J. Laurie Sell. Cetral Limit Theorem. Itroductio to Probability. Dartmouth College. 325-364. Web. http://www.dartmouth.edu/chace/teachig aids/books articles/probability book/chapter9.pdf. 1 [8] Hildebrad, A.J. The Cetral Limit Theorem. Lecture. http://www.math.uiuc.edu/hildebr/370/408clt.pdf 7 [9] Itroductio to The Cetral Limit Theorem. The Theory of Iferece. NCSSM Statistics Leadership Istitute Notes. Web. http://courses.cssm.edu/math/stat Ist/PDFS/SEC 4 f.pdf [10] Krylov, N.V. A Udergraduate Lecture o The Cetral Limit Theorem. Lecture. www.math.um.edu/ krylov/clt1.pdf [11] Momet Geeratig Fuctios. Chapter 6. Web. http://www.am.qub.ac.uk/users/g.gribaki/sor/chap6.pdf. [12] Walpole, Roald E, Raymod H. Myers, Sharo L. Myers, ad Keyig Ye. Probability & Statistics For Egieers & Scietists. Pretice Hall. 2012. 3 21