3. Continuous Random Variables

Transcription

1 3. Continuous Random Variables A continuous random variable is one which can take any value in an interval (or union of intervals) The values that can be taken by such a variable cannot be listed. Such variables are normally measured according to a scale. Examples of continuous random variables: age, height, weight, time, air pressure. Such variables are normally only measured to a given accuracy (e.g. the age of a person is normally given to the nearest year). 1 / 112

2 3.1 The notion of a density function Suppose X is a continuous random variable. Let f δ (x) = P(x < X < x + δ) δ This is the probability that X lies in an interval of length δ divided by the length of the interval. i.e. this can be thought of as the average probability density on the interval (x, x + δ). 2 / 112

3 The notion of a density function Let f X (x) = lim δ 0 P(x < X < x + δ) δ Then f X (x) is the probability density function of the random variable X. If it is clear which variable we are talking about then the subscript may be left out. Likely values of X correspond to areas where the density function is large. Unlikely values of X correspond to areas where the density function is small. 3 / 112

4 3.2 Properties of a density function A density function f (x) of a random variable X satisfies 2 conditions: 2) 1) f (x) 0, for all x. f (x)dx=1. The second condition simply states that the total area under the density curve is 1. 4 / 112

5 The support of a continuous random variable The support of a continuous random variable X, S X, is the set of values for which f (x) > 0. We have S X f (x)dx = 1 In general, we only have to integrate over intervals where the density function is positive. 5 / 112

6 Density curves and probability The probability that X lies between a and b is the area under the density curve between x = a and x = b. 6 / 112

7 Density curves and probability Hence, In particular, P(a < X < b) = b a f (x)dx. 1. P(X > a)= 2. P(X < b)= a b f (x)dx f (x)dx. Note that for any constant a, P(X = a) = 0. 7 / 112

8 3.4 Expected value of a continuous random variable The expected value of a random variable X with density function f (x) is E(X ) = µ X = xf (x)dx. S X i.e. we integrate over the interval(s) where the density function is positive. If a distribution is symmetrical about x = x 0, then (as long as the expected value exists) E(X ) = x 0. The expected value of the function g(x) is E[g(X )] = g(x)f (x)dx. S X The k-th moment of X is given by E(X k ), where E[X k ] = x k f (x)dx. S X 8 / 112

9 3.5 Variance of a continuous random variable The variance of X is given by σx 2 = Var(X ) = E[(X µ)2 ] = (x µ) 2 f (x)dx S X It can be shown that σ 2 X = E(X 2 ) E(X ) 2. The proof of this is analogous to the one presented for the case of a discrete random variable. σ X is the standard deviation of the random variable X. Note that these formulas are analogous to the definitions of expected values for discrete random variables. The only change is that the summations become integrals. All the properties of E(X ) and Var(X ) given in Chapter 2 hold for continuous distributions, e.g. Var(aX + b) = a 2 Var(X ). 9 / 112

10 3.6 The Cumulative Distribution Function and Quartiles of a distribution The cumulative distribution function of a continuous random variable X is denoted F X. By definition, F X (x) = P(X x) = P(X < x) = where f X is the density function. x f X (t)dt, Differentiating this equation we obtain F X (x) = f X (x). Suppose S X = [a, b], where a and b are finite. For x a, F X (x) = 0. Also, for x b, F X (x) = / 112

11 The Quartiles of a distribution For 0 < p < 1, the p-quartile of a continuous random variable, q p satisfies F X (q p ) = p. q 0.5 is the median of X. q 0.25 and q 0.75 are called the lower and upper quartiles of X, respectively. If the support S X is an interval, then all quartiles are uniquely defined. 11 / 112

12 Relation between the mean and the median for a continuous distribution If a continuous random variable X has a distribution which is symmetric around x 0, then q 0.5 = x 0 and as long as E[X ] exists E[X ] = q 0.5 = x 0. Many continuous distributions have a long right hand tail (e.g. the distribution of wages, the exponential and gamma distributions [see later]). For such distributions, the mean is greater than the median, i.e. in everyday language the average (median) person earns less than the average (understood as the mean) wage. For distributions with a long left had tail, the median is greater than the mean. 12 / 112

13 Example 3.1 Suppose the random variable X has density function f (x) = cx on the interval [0,5] and f (x) = 0 outside this interval. 1. Calculate the value of the constant c. 2. Calculate the probability that (X 2) Calculate E(X ) and σ X. 4. Derive the cumulative distribution function of X. 5. Calculate the median, lower quartile and upper quartile of this distribution. 13 / 112

14 Example We use the fact that Hence S X f (x)dx = cxdx = 1 0.5c[x 2 ] 5 0 = 1 0.5c 25 = 1 c = 2/25 = / 112

15 Example In order to find P[(X 2) 2 1], we first transform the problem into one of the form P(X A). Solving graphically or algebraically (X 2) 2 1, X 1 or X 3. These two events are mutually exclusive. 15 / 112

16 Example 3.1 Since we only need to integrate over intervals where the density function is positive (between 0 and 5), P(X 1 X 3) = P(X 1)+P(X 3) = Thus, P(X 1 X 3)= 1 0 cxdx cxdx 1 0 f (x)dx+ 5 =[0.04x 2 ] [0.04x 2 ] 5 3 = = f (x)dx 16 / 112

17 Example We have E(X )= 5 0 = c 3 [x 3 ] 5 0 = xf (x)dx = cx 2 dx = / 112

18 Example 3.1 To calculate the standard deviation, we first calculate the variance. We use Var(X ) = E(X 2 ) E(X ) 2. We have Hence, E(X 2 )= 5 0 x 2 f (x)dx = =[0.02x 4 ] 5 0 = 25 2 Var(X ) = 25 2 Hence, s = 25/ x 3 dx ( ) 10 2 = / 112

19 Example The support of X is [0, 5]. It follows that for x 0, F X (x) = 0. Also, for x 5, F X (x) = 1. For 0 < x < 5, since f X (x) = 0 for x < 0 F (x)= x f (t)dt = x =[0.04t 2 ] x 0 = 0.04x 2. 0 f (t)dt = x tdt 19 / 112

20 Example The median, q 0.5, satisfies F (q 0.5 ) = 0.5. Note that for x 0, F (x) = 0 and for x 5, F (x) = 1. Hence, any quartile must lie in the interval (0, 5). F (q 0.5 ) = 0.04q = 0.5 q = Since the median must be positive, it follows that q 0.5 = / 112

21 Example 3.1 Similarly, the lower quartile, q 0.25, satisfies F (q 0.25 )=0.04q = 0.25 q = 6.25 q 0.25 = 2.5 The upper quartile, q 0.75, satisfies F (q 0.75 )=0.04q = 0.75 q = q / 112

22 3.7 Standard continuous distributions The uniform distribution on the interval [a, b]. We write X U[a, b]. f (x) 1 b a 0 a b x 22 / 112

23 The uniform distribution The area under the density function (a rectangle) is 1. The width of this rectangle is (b a), the height of this rectangle is f (x). Hence, for x [a, b] Otherwise, f (x) = 0. (b a)f (x) = 1 f (x) = 1 b a 23 / 112

24 The uniform distribution By symmetry E(X ) is the mid-point of the interval i.e. E(X ) = a + b 2. Suppose a calculator calculates to k decimal places. The rounding error involved in a calculation may be assumed to be uniform on the interval [ 5 10 (k+1), 5 10 (k+1) ]. 24 / 112

25 Example 3.2 Suppose the length of the side of a square is chosen from the uniform distribution on [0, 3]. Calculate 1. the probability that the length of the side is between 2 and 4 2. the expected area of this square. 25 / 112

26 Example Let X be the length of the side of the square. The density function is f (x) = 1/3, x [0, 3]. Otherwise, f (x) = 0. Hence, P(2 < X < 4) = 4 2 f (x)dx = dx, since there is no density on the interval [3,4]. Thus, [ x ] 3 P(2 < X < 4) = = = 1 3. Also, a geometric argument could be made to find the appropriate area under the density curve / 112

27 Example If X is the length of the side of a square, then the area is X 2. The expected area is E(X 2 ). E(X 2 )= 3 0 [ x 3 = 9 x 2 f (x)dx = ] 3 0 = x 2 3 dx 27 / 112

28 The exponential distribution The density function of an exponential random variable with parameter λ is given by f (x) = λe λx, for x 0 and f (x) = 0 for x < 0. We write X Exp(λ). 28 / 112

29 The exponential distribution This distribution may be used to model the time between the arrival of telephone calls. λ is the rate at which calls arrive (i.e. the expected length of time between calls is 1/λ). The parameter λ as defined here is called the rate parameter. 29 / 112

30 The exponential distribution It should be noted that sometimes the parameter of the exponential distribution is given as the expected value, i.e. here the expected time between calls. Denote this parameter by θ(= 1/λ). We have f (x) = e x/θ. θ I will use the rate, rather than the expected value, as the parameter. 30 / 112

31 The exponential, geometric and Poisson distributions The probability of a call coming in some small unit of time (say millisecond) is small. Consider a succession of milliseconds and think that if a call comes in during a particular millisecond then we have a success. The arrival time of the first call is thus the time to the first success in such a series of experiments. It follows that the exponential distribution is a continuous analogue of the geometric distribution. It will be shown during the tutorials that the exponential distribution also has the memoryless property. 31 / 112

32 The exponential, geometric and Poisson distributions From this interpretation of the exponential distribution, we can see that there is also a connection between the exponential distribution and the Poisson distribution. Since the probability of a call arriving in a short period of time is small, if we consider a large number of short periods, the number of calls arriving will have a Poisson distribution. Namely, If the time between observations, X, has an Exp(λ) distribution, then the number of observations in time t has a Poisson(λt) distribution. Since λ is the rate at which calls come in per unit time, λt is the expected number of calls to arrive in time t. 32 / 112

33 Example 3.3 The average number of calls coming into a call centre is 3/minute. Calculate 1) the probability that the time between two calls is greater than k mins. 2) t, where t is the time such that the length of time between two calls is less than t with probability 0.8, i.e. t = q / 112

34 Example Let X be the time between calls, we have X Exp(3). f (x) = 3e 3x, for x 0. Otherwise, f (x) = 0. Note that the units of time are minutes. P(X > k)= k f (x)dx = = [ e 3x] k = e 3k. k 3e 3x dx 34 / 112

35 Example We need to find t such that P(X < t) = 0.8. This means that 0.8 = t Hence, e 3t = 0.2. Taking logarithms 0 t f (x)dx= 3e 3x dx 0 = [ e 3x] t 0 = 1 e 3t. 3t = ln(0.2) t = ln(0.2) mins 32.19secs. 35 / 112

36 3.7.3 The normal (Gaussian) distribution X has a normal distribution with expected value (mean) µ and variance σ 2 if f (x) = 1 [ ] (x µ) 2 σ 2π exp 2σ 2. We write X N(µ, σ 2 ). This is the very commonly met bell shaped distribution. Much of the theory of statistics is based upon the properties of this distribution. The normal distribution will be the subject of much of the rest of this chapter. The normal distribution with expected value 0 and variance 1, N(0, 1), is called the standard normal distribution. 36 / 112

37 The normal (Gaussian) distribution 37 / 112

38 3.7.4 The Gamma distribution Suppose the random variable X has a gamma distribution with shape parameter α and rate parameter β. We write X Γ(α, β). In this case, the density function of X is f (x) = βα x α 1 e βx Γ(α) If α is a positive integer, then Γ(α) = (α 1)!. 38 / 112

39 The Gamma distribution 39 / 112

40 Relation of the Gamma distribution to the exponential distribution Note that if α = 1, then the density function given above reduces to the density function of the exponential distribution, i.e. the Γ(1, β) distribution is the exponential(β) distribution. Moreover, if α is a positive integer, then if X Γ(α, β), X is the sum of α independent random variables with an exponential(β) distribution. In this case, X can be thought of as the time till the α-th call when calls come in at random with a constant rate. 40 / 112

41 Relation of the Gamma distribution to the exponential distribution If X Γ(α, β), then for any constant k > 0, kx Γ(α, β k ). In particular, if X Exp(λ), then Y = kx Exp( λ k ) (by multiplying the time to a call by k, the call rate is divided by k). 41 / 112

42 Relation of the Gamma distribution to the standard normal distribution Assume that X 1, X 2,..., X ν are independent random variables from the standard normal distribution. Let Y = X X X 2 ν. The distribution of Y is called the Chi-squared distribution with ν degrees of freedom. We write Y χ 2 (ν). This distribution is the same as the Γ( ν 2, 1 2 ) distribution. The Chi-squared distribution is often encountered in statistical problems. α is called the shape parameter of the Gamma distribution, since as α increases the distribution becomes more symmetrical. This is related to the central limit theorem (see later in this chapter). 42 / 112

43 Expected value and variance of standard continuous distributions Distribution Expected value Variance N(µ, σ 2 ) µ σ Exp(λ) λ λ α α 2 Γ(α, β) U[a, b] β a+b 2 β 2 (b a) 2 12 The expected value is sometimes referred to as the mean. However, it should not be confused with the sample mean. The derivation of some of these results is considered in the tutorials. 43 / 112

44 The Cauchy Distribution The standard Cauchy distribution has density function f (x) = 1 π(1 + x 2 ), x R. This distribution is symmetric around 0 and has a similar shape to the normal distribution (however, it is less peaked/more spread out). 44 / 112

45 The Cauchy Distribution Note that this does indeed define a probability distribution, since f (x) > 0, x R. Also, f (x)dx= 1 dx π 1 + x 2 = 1 [ tan 1 (x) ] π = 1 [ π π 2 π ] = / 112

46 The Cauchy Distribution However, the expected value is undefined for this distribution, since E(X )= xf (x)dx = 1 π = [ ln(1 + x 2 ) ] xdx 1 + x 2 This integral is undefined as ln(1 + x 2 ) is unbounded as x tends to or. 46 / 112

47 3.8 Two Inequalities Markov s Inequality Assume that X is a non-negative random variable. P(X > k) E(X ) k Chebyshev s Inequality P( X E(X ) > kσ) 1 k / 112

48 Proof of Markov s Inequality for Continuous Distributions Since X is assumed to be non-negative, we have Note that i) k 0 ii) k E(X )= = 0 k 0 xf (x)dx xf (x)dx + xf (x)dx 0 and xf (x)dx k k k xf (x)dx f (x)dx = kp(x > k). 48 / 112

49 Proof of Markov s Inequality for Continuous Distributions It follows that E(X ) kp(x > k) P(X > k) E(X ) k 49 / 112

50 Proof of Chebyshev s Inequality for Continuous Distributions We have Var(X ) = σ 2 = = (x E[X ]) 2 f (x)dx (x E[X ]) 2 f (x)dx + x E(x) kσ (x E[X ]) 2 f (x)dx x E(x) >kσ The first of these integrals is non-negative and (x E[X ]) 2 f (x)dx k 2 σ 2 f (x)dx = k 2 σ 2 P( X E(X ) > kσ) x E(x) >kσ x E(x) >kσ 50 / 112

51 Proof of Chebyshev s Inequality for Continuous Distributions It follows that σ 2 k 2 σ 2 P( X E(X ) > kσ) 1 P( X E(X ) > kσ) k2 51 / 112

52 Example I throw a coin 100 times. Let X be the number of heads. i) Using Markov s inequality find an upper bound on P(X > 70). ii) Using Chebyshev s inequality find a lower bound on P(30 X 70). iii) Using your answer to ii) and the symmetry of the distribution of X, obtain a better upper bound on P(X > 70). 52 / 112

53 Example We have X Bin(100,0.5). Thus E(X ) = 50, Var(X ) = 25. i) Using Markov s inequality P(X > k) E(X ) k P(X > 70) E(X ) = / 112

54 Example A.1 ii) Note that P(30 X 70) = P( X E(X ) 4σ). Using Chebyshev s inequality We have P( X E(X ) > kσ) 1 k 2 P( X E(X ) > 4σ) P( X E(X ) 4σ) = 1 P( X E(X ) > 4σ) = / 112

55 Example A.1 Using the symmetry of the distribution of X around 50, we have P(X > 70) = P(X < 30). Hence, P(( X E(X ) > 4σ) = P(X < 30)+P(X > 70) = 2P(X > 70) It follows that P(X > 70) / 112

56 Jensen s Inequalities Suppose g is a convex function. It follows that E[g(X )] g(e[x ]). Note that since g(x ) = X 2 is a convex function, we have E[g(X )] = E[X 2 ] g(e[x ]) = E[X ] 2. Suppose that h is a concave function. It follows that E[h(X )] h(e[x ]). Since h(x ) = ln X is a concave function, we have E[h(X )] = E[ln X ] h[e(x )] = ln(e[x ]). 56 / 112

57 3.9 The Normal Distribution and the Central Limit Theorem The importance of the normal distribution results from the central limit theorem, which explains why this bell shaped distribution is so often observed in nature. 57 / 112

58 3.8.1 The standard normal distribution The density function cannot be integrated algebraically. Hence, tables for the standard normal distribution are used in order to calculate probabilities associated with the normal distribution. A standard normal random variable has expected value 0 and standard deviation equal to 1. Such a random variable is denoted by Z i.e. Z N(0, 1). 58 / 112

59 Using tables for the standard normal distribution The table for the standard normal distribution used in this course gives probabilities of the form P(Z > k) for k 0 [note that other tables may give P(Z < k)]. Of course, often we have to calculate probabilities of events which take a different form. In order to do this we use the following 3 rules. These follow from the interpretation of the probability of an event as the appropriate area under the density curve. 59 / 112

60 1. The law of complementarity The law of complementarity P(Z < k) = 1 P(Z > k) It should be noted that P(Z = k) = 0. The area under the density curve is 1, hence P(Z < k) + P(Z > k) = 1 i.e. P(Z < k) = 1 P(Z > k). This is a general rule for continuous distributions. 60 / 112

61 The law of complementarity 61 / 112

62 2. The law of symmetry The law of symmetry Since the standard normal distribution is symmetric about 0, P(Z < k) = P(Z > k) This is used to calculate probabilities when the constant is negative. This law is specific to distributions which are symmetric around / 112

63 The law of symmetry 63 / 112

64 3. The interval rule The interval rule P(a < Z < b)=p(z > a) P(Z > b) General for continuous distributions 64 / 112

65 Reading the table for the standard normal distribution In order to read P(Z > k), where k is given to 2 decimal places, we find the row corresponding to the digits either side of the decimal point and the column corresponding to the second place after the decimal point. The table on the next slide illustrates a fragment of the table. 65 / 112

66 Reading the table for the standard normal distribution For example, P(Z > 1.22) = Since P(Z > k) is decreasing in k, we assume that for k > 4, P(Z > k) / 112

67 Example 3.4 Calculate i) P(Z > 1.76) ii) P(Z < 0.18) iii) P(Z > 0.83) iv) P( 0.43 < Z < 1.36). 67 / 112

68 Example 3.4 i) This can be read directly from the table (row corresponding to 1.7, column corresponding to 0.06) P(Z > 1.76) = ii) This is a probability in the left hand tail of the distribution. We use the law of symmetry P(Z < 0.18) = P(Z > 0.18) = In general, when we have a negative constant, we first use the law of symmetry to obtain a positive constant. 68 / 112

69 Example 3.4 iii) In some cases, neither the law of symmetry nor the law of complementarity transforms the calculation immediately into the correct form. [P(Z > k) where k > 0]. In this case we have to use both rules. Here, Using the law of symmetry P(Z > 0.83) = P(Z < 0.83) Using the law of complementarity P(Z < 0.83) = 1 P(Z > 0.83) = = / 112

70 Example 3.4 When we have to calculate something of the form P(a < Z < b), we always use the interval rule P( 0.43 < Z < 1.36) = P(Z > 0.43) P(Z > 1.36). To calculate the first probability we first use symmetry. The second probability can be read directly P(Z > 0.43) P(Z > 1.36)=P(Z < 0.43) P(Z > 1.36) =1 P(Z > 0.43) P(Z > 1.36) = = / 112

71 Reading the table for the standard normal distribution Sometimes it is necessary to find the number k for which P(Z > k) = p, where p 0.5. In this case we find the value closest to p in the heart of the table and the value of k is read from the values corresponding to appropriate row and column. The rules of complementarity and symmetry may be needed to obtain the desired form i.e. P(Z > k) = p, where p / 112

72 Example 3.5 Find the value of k satisfying P(Z > k) = / 112

73 Example 3.5 Since P(Z > 0) = 0.5, it is clear that k < / 112

74 Example 3.5 First we use the the law of complementarity to obtain a suitable value for p P(Z < k) = 1 P(Z > k) P(Z < k) = Now we use the law of symmetry to obtain the required form P(Z < k) = P(Z > k) = / 112

75 Example 3.5 The number closest to 0.17 in the heart of the table is This is in the row corresponding to 0.9 and the column corresponding to Hence, we have P(Z > k) = 0.17; P(Z > 0.95) Thus, k 0.95, i.e. k / 112

76 3.8.2 Standardisation of a normal random variable Clearly, the technique used in the previous subsection only works for a standard normal random variable. How do we calculate appropriate probabilities for a general normal distribution i.e. X N(µ, σ 2 )? The first step is to standardise the variable. 76 / 112

77 Standardisation of a normal random variable If X N(µ, σ 2 ), then Z = X µ σ N(0, 1) Subtracting the expected value first centres the distribution around 0 and then division by the standard deviation shrinks the dispersion of the distribution to the dispersion of the standard normal distribution. It should be noted that such standardisation is specific to the normal distribution. 77 / 112

78 Transformations of normal random variables In general, if X N(µ, σ 2 ), then Y = ax + b also has a normal distribution. In particular, Y N(aµ + b, a 2 σ 2 ). The sum of independent, normal random variables is also normally distributed. 78 / 112

79 Transformations of normal random variables Moreover, any linear combination of independent normally distributed random variables has a normal distribution. Note that if X 1, X 2,..., X n are independent random variables and {α i } n i=1 a set of constants, then E[α 1 X 1 +α 2 X α n X n ]=α 1 E[X 1 ]+α 2 E[X 2 ]+...+ α n E[X n ] Var[α 1 X 1 +α 2 X α n X n ]=α 2 1Var[X 1 ]+α 2 2Var[X 2 ]+...+α 2 nvar[x n ] After appropriate standardisation of such a sum, we can calculate the appropriate probabilities as before. 79 / 112

80 Example 3.6 The height of male students is normal with an expected value of 175cm and variance of 144cm 2. The height of female students is normal with an expected value of 165cm and variance of 81cm 2. a) What is the probability that a randomly picked male student is i) taller than 190cm ii) between 163 and 181cm? iii) taller than a randomly chosen female student? b) 10% of male students are shorter than what height? 80 / 112

81 Example 3.6 Let X and Y denote the height of a male and female student, respectively. i) We must calculate P(X > 190). First we standardise: ( X µ P(X > 190) = P σ > ) = P(Z > 1.25). This can be now read directly from the table P(X > 190) = P(Z > 1.25) = / 112

82 Example 3.6 ii) We must calculate P(163 < X < 181). Again, we first standardise ( P(163 < X < 181)=P < X µ ) < 144 σ 144 =P( 1 < Z < 0.5). Using the interval rule P( 1 < Z < 0.5) = P(Z > 1) P(Z > 0.5). 82 / 112

83 Example 3.6 Using symmetry for the first probability P(Z > 1) P(Z > 0.5)=P(Z < 1) P(Z > 0.5) =1 P(Z > 1) P(Z > 0.5) = = / 112

84 Example 3.6 iii) We must calculate P(X > Y ). This can be rewritten as P(X Y > 0). We first must derive the distribution of U = X Y. Since both the male and female are chosen at random, we may assume that X and Y are independent. It follows that U = X Y has a normal distribution. E[U]=E[X Y ] = E[X ] E[Y ] = 10 Var[U]=Var[X + ( Y )] = Var[X ] + ( 1) 2 Var[Y ] = = 225 = / 112

85 Example 3.6 Thus U N(10, 15 2 ). We must calculate P(U > 0). Standardising P(U > 0) = P( U ) > 10 ) = P(Z > 0.67) 15 Using symmetry and then the law of complementarity P(Z > 0.67)=P(Z < 0.67) = 1 P(Z > 0.67) = = / 112

86 Example 3.6 b) We have to find k, such that P(X < k) = 0.1. First we standardise ( X µ P < k 175 ) = 0.1. σ σ Thus P(Z < c) = 0.1, where c = k Since we have a left hand tail probability i.e. P(Z < c) < 0.5, we use the law of symmetry P(Z > c) = P(Z < c) = / 112

87 Example 3.6 The value closest to 0.1 in the heart of the table is in the row corresponding to 1.2 and the column corresponding to Hence, P(Z > c) = 0.1; P(Z > 1.28) = Thus, c 1.28, i.e. c Since c = k , we have k k = Thus, 10% of the population of male students are shorter than 159.6cm tall. 87 / 112

88 3.8.3 The central limit theorem Suppose I throw a coin once. The distribution of the number of heads, X, is P(X = 0) = 0.5; P(X = 1) = 0.5, i.e. nothing like a bell shape distribution. However, suppose I throw the coin a large number of times, say k times. I am reasonably likely to get around k 2 heads, but the probability of getting either a large number or small number of heads (with respect to k 2 ) is very small. The distribution of the number of heads thrown, X, has a bell like shape (i.e. similar to the normal distribution). 88 / 112

89 The central limit theorem This is a particular case of the central limit theorem. Note that X can be written as X = X 1 + X X n, where X i = 1 if the i-th toss results in heads X i = 0 if the i-th toss results in tails. 89 / 112

90 The central limit theorem (CLT) Suppose X = X 1 + X X n, where n is large and the X i are independent random variables, then X is approximately normally distributed, i.e. X approx N(µ, σ 2 ), where µ=e(x ) = σ 2 =Var(X ) = n E(X i ) i=1 n Var(X i ). This approximation is good if n 30, the variances of the X i are comparable and the distributions of the X i s are reasonably symmetrical. If the distributions of the X i s are clearly asymmetric, then this approximation will be less accurate. i=1 90 / 112

91 Example 3.7 n independent observations are taken from the exponential distribution with expected value 1 (note that the sum of these random variables has a gamma distribution with parameters α = n and β = 1). Using an appropriate approximation, estimate the probability that the mean of these observations (the sample mean X ) is between 0.9 and 1.1 when i) n = 30, ii) n = / 112

92 Example 3.7 i) For n = 30 P(0.9 < X < 1.1) = P(0.9 < 30 i=1 X i 30 < 1.1) = P(27 < 30 i=1 X i < 33) Since X i Exp(1), we have E(X i ) = Var(X i ) = 1. Therefore, E( 30 i=1 X i ) = 30 i=1 E(X i ) = 30. Since the observations are independent Var( 30 i=1 X i ) = 30 i=1 Var(X i ) = / 112

93 Example 3.7 Using the central limit theorem 30 S = X i approx N(30, 30) i=1 Standardising ( P(27 < S < 33)=P < S µ < 30 σ P( 0.55 < Z < 0.55) ) / 112

94 Example 3.7 Using the interval rule P( 0.55 < Z < 0.55)=P(Z > 0.55) P(Z > 0.55) =P(Z < 0.55) P(Z > 0.55) =[1 P(Z > 0.55)] P(Z > 0.55) = = / 112

95 Example 3.7 ii) For n = 100, P(0.9 < X < 1.1) = P(0.9 < 100 i=1 X i < 1.1) = P(90 < X i < 110). i=1 Since X i Exp(1), we have E(X i ) = Var(X i ) = X i ) = E(X i ) = E( i=1 i=1 Since the observations are independent Var( X i ) = Var(X i ) = 100. i=1 i=1 95 / 112

96 Example 3.7 Using the central limit theorem 100 S = X i approx N(100, 100). 1 Standardising ( P(90 < S < 110)=P < S µ 100 σ =P( 1 < Z < 1) < ) / 112

97 Example 3.7 Using the interval rule P( 1 < Z < 1)=P(Z > 1) P(Z > 1) =P(Z < 1) P(Z > 1) =[1 P(Z > 1)] P(Z > 1) = = / 112

98 The relation between the central limit theorem and sampling Note 1: As the sample size grows, the probability of the sample mean being close to the expected value (the theoretical mean) increases. Note 2: For the example above, the exact probabilities can be calculated (using a computer), since the sum of the variables has a gamma distribution. 98 / 112

99 The relation between the central limit theorem and sampling In the first case, the exact probability (to 4 d.p.) is (compared to the estimate ). In the second case, the exact probability (to 4 d.p.) is (compared to the estimate ). Hence, as the number of observations increases, the more accurate the approximation using the CLT is. Since the exponential distribution is clearly asymmetrical, the approximation using CLT is relatively poor. 99 / 112

100 Proportion of observations from a normal distribution within one standard deviation of the mean Note 3: After standardisation, the constants indicate the number of standard deviations from the mean (a negative sign indicates deviations below the mean). Here, P( 1 < Z < 1) = shows that if X comes from a normal distribution, the probability of being within one standard deviation of the mean is just over 2 3. Similarly, P( 2 < Z < 2) = Thus, with a probability of just over 0.95 an observation from a normal distribution will be less than 2 standard deviations from the mean. 100 / 112

101 3.8.4 The normal approximation to the binomial distribution Suppose n is large and X Bin(n, p), then X approx N(µ, σ 2 ), where µ = np, σ 2 = np(1 p). This approximation is used when n 30, 0.1 p 0.9. For values of p outside this range, the Poisson approximation tends to work better. 101 / 112

102 The continuity correction for the normal approximation to the binomial distribution It should be noted that X has a discrete distribution, but we are using a continuous distribution in the approximation. For example, suppose we wanted to estimate the probability of obtaining exactly k heads when we throw a coin n times. This probability will in general be positive. However, if we use the normal approximation without an appropriate correction, we cannot sensibly estimate P(X = k) [for continuous distributions P(X = k) = 0]. 102 / 112

103 The continuity correction for the normal approximation to the binomial distribution Suppose the random variable X takes only integer values and has an approximately normal distribution. In order to estimate P(X = k), we use the continuity correction. This uses the fact that when k is an integer P(X = k) = P(k 0.5 < X < k + 0.5). 103 / 112

104 Example 3.8 Suppose a coin is tossed 36 times. Using CLT, estimate the probability that exactly 20 heads are thrown. 104 / 112

105 Example 3.8 Let X be the number of heads. We have X Bin(36, 0.5). Hence, E(X )=np = = 18 Var(X )=np(1 p) = = 9 It follows that X approx N(18, 9). We wish to estimate P(X = 20). Using the continuity correction, P(X = 20)=P(19.5 < X < 20.5) =P( < X µ < ) 9 σ 9 P(0.5 < Z < 0.83) = P(Z > 0.5) P(Z > 0.83) 105 / 112

106 Example 3.8 Hence, P(Z > 0.5) P(Z > 0.83)= = / 112

107 The continuity correction for the normal approximation to the binomial distribution This continuity correction can be adapted to problems in which we have to estimate the probability that the number of successes is in a given interval. e.g. P(15 X < 21)=P(X = 15) + P(X = 16) P(X = 20) =P(14.5 < X < 15.5) P(19.5 < X < 20.5) =P(14.5 < X < 20.5) 107 / 112

108 Example 3.8 A die is thrown 180 times. Estimate the probability that 1) at least 35 sixes are thrown 2) between 27 and 33 sixes are thrown (inclusively). 108 / 112

109 Example 3.8 Let X be the number of sixes. We have X Bin(180, 1 6 ) E(X )=np = = 30 Var(X )=np(1 p) = = / 112

110 Example 3.8 i) Using the continuity correction P(X 35)=P(X = 35) + P(X = 36) +... Standardising =P(34.5 < X < 35.5) + P(35.5 < X < 36.5) +... =P(X > 34.5) P(X > 34.5)=P( X µ > ) σ 25 P(Z > 0.9) = / 112

111 Example 3.8 ii) Using the continuity correction P(27 X 33)=P(X = 27) + P(X = 28) P(X = 33) Standardising =P(26.5 < X < 27.5) P(32.5 < X < 33.5) =P(26.5 < X < 33.5) P(26.5 < X < 33.5)=P( < X µ 25 σ < ) =P( 0.7 < Z < 0.7) = P(Z > 0.7) P(Z > 0.7) =P(Z < 0.7) P(Z > 0.7) =1 P(Z > 0.7) P(Z > 0.7) = = / 112

112 The normal approximation to the binomial It should be noted that the normal approximation to the binomial is most accurate when n is large and p is close to 0.5. This is due to the fact that X = X 1 + X X n, where X i 0 1(p). The distribution of X i is symmetric when p = / 112