Continuous Random Variables COMP 245 STATISTICS Dr N A Heard Contents 1 Continuous Random Variables 2 11 Introduction 2 12 Probability Density Functions 3 13 Transformations 5 2 Mean, Variance and Quantiles 6 21 Epectation 6 22 Variance 6 23 Quantiles 7 3 Continuous Distributions 8 31 Uniform 8 32 Eponential 9 33 Gaussian 11 1
1 Continuous Random Variables 11 Introduction Definition Suppose again we have a random eperiment with sample space S and probability measure P Recall our definition of a random variable as a mapping X : S R from the sample space S to the real numbers inducing a probability measure P X (B) = P{X 1 (B)}, B R We define the random variable X to be (absolutely) continuous if f X : R R st P X (B) = B f X ()d, B R, (1) in which case f X is referred to as the probability density function (pdf) of X Comments A connected sequence of comments: One consequence of this definition is that the probability of any singleton set B = {}, R is zero for a continuous random variable, P X (X = ) = P X ({}) = 0 This in turn implies that any countable set B = { 1, 2, } R will have zero probability measure for a continuous random variable, since P X (X B) = P X (X = 1 ) + P X (X = 2 ) + This automatically implies that the range of a continuous random variable will be uncountable This tells us that a random variable cannot be both discrete and continuous Eamples The following quantities would typically be modelled with continuous random variables They are measurements of time, distance and other phenomena that can, at least in theory, be determined to an arbitrarily high degree of accuracy The height or weight of a randomly chosen individual from a population The duration of this lecture The volume of fuel consumed by a bus on its route The total distance driven by a tai cab in a day Note that there are obvious ways to discretise all of the above eamples 2
12 Probability Density Functions pdf as the derivative of the cdf From (1), we notice the cumulative distribution function (cdf) for a continuous random variable X is therefore given by F X () = f X (t)dt, R, This epression leads us to a definition of the pdf of a continuous rv X for which we already have the cdf; by the Fundamental Theorem of Calculus we find the pdf of X to be given by f X () = d d F X() or F X() Properties of a pdf Since the pdf is the derivative of the cdf, and because we know that a cdf is non-decreasing, this tells us the pdf will always be non-negative So, in the same way as we did for cdfs and discrete pmfs, we have the following checklist to ensure f X is a valid pdf pdf: 1 f X () 0, R; 2 f X ()d = 1 Interval Probabilities Suppose we are interested in whether continuous a rv X lies in an interval (a, b] From the definition of a continuous random variable, this is given by P X (a < X b) = That is, the area under the pdf between a and b b a f X ()d 3
f() P(a<X<b) a b Further Comments: Besides still being a non-decreasing function satisfying F X () = 0, F X ( ) = 1, the cdf F X of a continuous random variable X is also (absolutely) continuous For a continuous rv since, P(X = ) = 0, F () = P(X ) P(X < ) For small δ, f X ()δ is approimately the probability that X takes a value in the small interval, say, [, + δ) Since the density (pdf) f X () is not itself a probability, then unlike the pmf of a discrete rv we do not require f X () 1 From (1) it is clear that the pdf of a continuous rv X completely characterises its distribution, so we often just specify f X Eample Suppose we have a continuous random variable X with probability density function { c f () = 2, 0 < < 3 0, otherwise for some unknown constant c Questions 1 Determine c 2 Find the cdf of X 3 Calculate P(1 < X < 2) Solutions 1 To find c: We must have 1 = c = 1 9 f ()d = 3 0 [ c 2 3 d = c 3 ] 3 0 = 9c 4
0, < 0 2 F() = f (u)du = u 2 3 0 9 du = 27, 0 3 1, > 3 3 P(1 < X < 2) = F(2) F(1) = 8 27 1 27 = 7 27 02593 f() 00 02 04 06 08 10 pdf F() 00 02 04 06 08 10 cdf 1 0 1 2 3 4 5 1 0 1 2 3 4 5 13 Transformations Transforming random variables Suppose we have a continuous random variable X and wish to consider the transformed random variable Y = g(x) for some function g : R R, st g is continuous and strictly monotonic (so g 1 eists) Suppose g is monotonic increasing Then for y R, Y y X g 1 (y) So F Y (y) = P Y (Y y) = P X (X g 1 (y)) = F X (g 1 (y)) By the chain rule of differentiation, f Y (y) = F Y (y) = f X{g 1 (y)}g 1 (y) Note g 1 (y) = d dy g 1 (y) is positive since we assumed g was increasing If we had g monotonic decreasing, Y y X g 1 (y) and So by comparison with before, we would have with g 1 (y) always negative So overall, for Y = g(x) we have F Y (y) = P X (X g 1 (y)) = 1 F X (g 1 (y)) f Y (y) = F Y (y) = f X{g 1 (y)}g 1 (y) f Y (y) = f X {g 1 (y)} g 1 (y) (2) 5
2 Mean, Variance and Quantiles 21 Epectation E(X) For a continuous random variable X we define the mean or epectation of X, µ X or E X (X) = f X ()d More generally, for a function of interest of the random variable g : R R we have E X {g(x)} = g() f X ()d Linearity of Epectation Clearly, for continuous random variables we again have linearity of epectation E(aX + b) = ae(x) + b, a, b R, and that for two functions g, h : R R, we have additivity of epectation 22 Variance E{g(X) + h(x)} = E{g(X)} + E{h(X)} Var(X) The variance of a continuous random variable X is given by σ 2 X or Var X(X) = E{(X µ X ) 2 } = ( µ X ) 2 f X ()d and again it is easy to show that Var X (X) = 2 f X ()d µ 2 X = E(X 2 ) {E(X)} 2 R For a linear transformation ax + b we again have Var(aX + b) = a 2 Var(X), a, b 6
23 Quantiles Q X (α) Recall we defined the lower and upper quartiles and median of a sample of data as points (¼,¾,½)-way through the ordered sample as For α [0, 1] and a continuous random variable X we define the α-quantile of X, Q X (α), Q X (α) = min q R {q : F X(q) = α} If F X is invertible then Q X (α) = FX 1(α) In particular the median of a random variable X is Q X (05) That is, the median is a solution to the equation F X () = 1 2 Eample (continued) Again suppose we have a continuous random variable X with probability density function given by f () = { 2 /9, 0 < < 3 0, otherwise Questions 1 Calculate E(X) 2 Calculate Var(X) 3 Calculate the median of X Solutions 1 E(X) = 2 E(X 2 ) = f ()d = 2 f ()d = 3 0 3 0 2 9 4 d = 36 2 2 9 3 0 5 d = 45 = 34 36 = 225 3 0 = 35 45 = 54 So Var(X) = E(X 2 ) {E(X)} 2 = 54 225 2 = 03375 3 From earlier, F() = 3, for 0 < < 3 27 Setting F() = 1 2 median and solving, we get 3 27 = 1 2 = 3 27 2 = 3 3 2 23811 for the 7
3 Continuous Distributions 31 Uniform U(a, b) Suppose X is a continuous random variable with probability density function f () = { 1 b a, a < < b 0, otherwise, and hence corresponding cumulative distribution function 0, a a F() = b a, a < < b 1, b Then X is said to follow a uniform distribution on the interval (a, b) and we write X U(a, b) Eample: U(0,1) 1 f () 0 1 1 F() 0 1 Notice from the cdf that the quantiles of U(0,1) are the special case where Q(α) = α Relationship between U(a, b) and U(0,1) Suppose X U(0, 1), so F X () =, 0 1 For a < b R, if Y = a + (b a)x then Y U(a, b) 0 1 X a Y b Proof: We first observe that for any y (a, b), Y y a + (b a)x y X y a b a ( From this we find Y U(a, b), since F Y (y) = P(Y y) = P X y a ) b a y a b a = F X ( y a b a ) = 8
Mean and Variance of U(a, b) To find the mean of X U(a, b), E(X) = f ()d = b a [ 1 b a d = 2 2(b a) = b2 a 2 (b a)(b + a) = = a + b 2(b a) 2(b a) 2 Similarly we get Var(X) = E(X 2 ) E(X) 2 = (b a)2 12, so 32 Eponential µ = a + b 2, σ2 = (b a)2 12 Ep(λ) Suppose now X is a random variable taking value on R + = [0, ) with pdf for some λ > 0 f () = λe λ, 0, Then X is said to follow an eponential distribution with rate parameter λ and we write X Ep(λ) Straightforward integration between 0 and leads to the cdf, F() = 1 e λ, 0 ] b a The mean and variance are given by µ = 1 λ, σ2 = 1 λ 2 Eample: Ep(1), Ep(05) & Ep(02) pdfs f() 00 02 04 06 08 10 λ = 1 λ = 05 λ = 02 0 2 4 6 8 10 9
Eample: Ep(02), Ep(05) & Ep(1) cdfs F() 00 02 04 06 08 10 0 2 4 6 8 10 Lack of Memory Property First notice that from the eponential distribution cdf equation we have P(X > ) = e λ An important (and not always desirable) characteristic of the eponential distribution is the so called lack of memory property For, s > 0, consider the conditional probability P(X > + s X > s) of the additional magnitude of an eponentially distributed random variable given we already know it is greater than s Well P(X > + s) P(X > + s X > s) =, P(X > s) which, when X Ep(λ), gives P(X > + s X > s) = e λ(+s) e λs = e λ, again an eponential distribution with parameter λ So if we think of the eponential variable as the time to an event, then knowledge that we have waited time s for the event tells us nothing about how much longer we will have to wait - the process has no memory Eamples Eponential random variables are often used to model the time until occurrence of a random event where there is an assumed constant risk (λ) of the event happening over time, and so are frequently used as a simplest model, for eample, in reliability analysis So eamples include: the time to failure of a component in a system; the time until we find the net mistake on my slides; the distance we travel along a road until we find the net pothole; the time until the net jobs arrives at a database server; 10
Link with Poisson Distribution Notice the duality between some of the eponential rv eamples and those we saw for a Poisson distribution In each case, number of events has been replaced with time between events Claim: If events in a random process occur according to a Poisson distribution with rate λ then the time between events has an eponential distribution with rate parameter λ Proof: Suppose we have some random event process such that > 0, the number of events occurring in [0, ], N, follows a Poisson distribution with rate parameter λ, so N Poi(λ) Such a process is known as an homogeneous Poisson process Let X be the time until the first event of this process arrives Then we notice that P(X > ) P(N = 0) = (λ)0 e λ 0! = e λ and hence X Ep(λ) The same argument applies for all subsequent inter-arrival times 33 Gaussian N(µ, σ 2 ) Suppose X is a random variable taking value on R with pdf f () = 1 } { σ 2π ep ( µ)2 2σ 2, for some µ R, σ > 0 Then X is said to follow a Gaussian or normal distribution with mean µ and variance σ 2, and we write X N(µ, σ 2 ) The cdf of X N(µ, σ 2 ) is not analytically tractable for any (µ, σ), so we can only write F() = 1 } σ (t µ)2 ep { 2π 2σ 2 dt 11
Eample: N(0,1), N(2,1) & N(0,4) pdfs f() 00 01 02 03 04 N(0,1) N(0,4) N(2,1) 4 2 0 2 4 6 Eample: N(0,1), N(2,1) & N(0,4) cdfs F() 00 02 04 06 08 10 4 2 0 2 4 6 N(0, 1) Setting µ = 0, σ = 1 and Z N(0, 1) gives the special case of the standard normal, with simplified density f (z) φ(z) = 1 2π e z2 2 Again for the cdf, we can only write F(z) Φ(z) = 1 2π z e t2 2 dt Statistical Tables Since the cdf, and therefore any probabilities, associated with a normal distribution are not analytically available, numerical integration procedures are used to find approimate probabilities 12
In particular, statistical tables contain values of the standard normal cdf Φ(z) for a range of values z R, and the quantiles Φ 1 (α) for a range of values α (0, 1) Linear interpolation is used for approimation between the tabulated values But why just tabulate N(0, 1)? We will now see how all normal distribution probabilities can be related back to probabilities from a standard normal distribution Linear Transformations of Normal Random Variables Suppose we have X N(µ, σ 2 ) Then it is also true that for any constants a, b R, the linear combination ax + b also follows a normal distribution More precisely, X N(µ, σ 2 ) ax + b N(aµ + b, a 2 σ 2 ), a, b R (Note that the mean and variance parameters of this transformed distribution follow from the general results for epectation and variance of any random variable under linear transformation) In particular, this allows us to standardise any normal rv, X N(µ, σ 2 ) X µ σ N(0, 1) Standardising Normal Random Variables So if X N(µ, σ 2 ) and we set Z = X µ, then since σ > 0 we can first observe that for σ any R, X X µ µ σ σ Z µ σ Therefore we can write the cdf of X in terms of Φ, ( F X () = P(X ) = P Z µ ) σ ( ) µ = Φ σ Table of Φ z Φ(z) z Φ(z) z Φ(z) z Φ(z) 0 5 09 816 18 964 28 997 1 540 10 841 19 971 30 998 2 579 11 864 20 977 35 9998 3 618 12 885 21 982 1282 9 4 655 13 903 22 986 1645 95 5 691 14 919 23 989 196 975 6 726 15 933 24 992 2326 99 7 758 16 945 25 994 2576 995 8 788 17 955 26 995 309 999 13
Using Table of Φ First of all notice that Φ(z) has been tabulated for z > 0 This is because the standard normal pdf φ is symmetric about 0, so φ( z) = φ(z) For the cdf Φ, this means Φ(z) = 1 Φ( z) So for eample, Φ( 12) = 1 Φ(12) 1 0885 = 0115 Similarly, if Z N(0, 1) and we want P(Z > z), then for eample P(Z > 15) = 1 P(Z 15) = 1 Φ(15) Important Quantiles of N(0, 1) We will often have cause to use the 975% and 995% quantiles of N(0, 1), given by Φ 1 (0975) and Φ 1 (0995) Φ(196) 975% So with 95% probability an N(0, 1) rv will lie in [ 196, 196] ( [ 2, 2]) Φ(258) = 995% So with 99% probability an N(0, 1) rv will lie in [ 258, 258] More generally, for α (0, 1) and defining z 1 α/2 to be the (1 α/2) quantile of N(0, 1), if Z N(0, 1) then P Z (Z [ z 1 α/2, z 1 α/2 ]) = 1 α More generally still, if X N(µ, σ 2 ), then P X (X [µ σz 1 α/2, µ + σz 1 α/2 ]) = 1 α, and hence [µ σz 1 α/2, µ + σz 1 α/2 ] gives a (1 α) probability region for X centred around µ This can be rewritten as P X ( X µ σz 1 α/2 ) = 1 α Eample An analogue signal received at a detector (measured in microvolts) may be modelled as a Gaussian random variable X N(200, 256) 1 What is the probability that the signal will eceed 240µV? 2 What is the probability that the signal is larger than 240µV given that it is greater than 210µV? Solutions: 1 P(X > 240) = 1 P(X 240) = 1 Φ ( 240 200 256 ) = 1 Φ(25) 000621 14
( ) P(X > 240) 2 P(X > 240 X > 210) = P(X > 210) = 1 Φ 240 200 ( 256 ) 002335 1 Φ 210 200 256 The Central Limit Theorem Let X 1, X 2,, X n be n independent and identically distributed (iid) random variables from any probability distribution, each with mean µ and variance σ 2 ( n ) ( n ) ( n ) From before we know E X i = nµ,var X i = nσ 2 First notice E X i nµ = i=1 i=1 i=1 0, Var ( n X i nµ i=1 ) = nσ 2 Dividing by ( n ) ( nσ, E i=1 X i nµ n ) = 0, Var i=1 X i nµ = nσ nσ 1 But we can now present the following, astonishing result i=1 n lim X i nµ Φ n nσ This can also be written as lim n X µ σ/ n Φ, where X = n i=1 X i n Or finally, for large n we have approimately or X N ) (µ, σ2, n n X i N ( nµ, nσ 2) i=1 We note here that although all these approimate distributional results hold irrespective of the distribution of the {X i }, in the special case where X i N(µ, σ 2 ) these distributional results are, in fact, eact This is because the sum of independent normally distributed random variables is also normally distributed Eample Consider the most simple eample, that X 1, X 2, are iid Bernoulli(p) discrete random variables taking value 0 or 1 Then the {X i } each have mean µ = p and variance σ 2 = p(1 p) By definition, we know that for any n, n X i Binomial(n, p) i=1 15
But now, by the Central Limit Theorem (CLT), we also have for large n that approimately So for large n n X i N ( nµ, nσ 2) N(np, np(1 p)) i=1 Binomial(n, p) N(np, np(1 p)) Notice that the LHS is a discrete distribution, and the RHS is a continuous distribution Binomial(10,½) pmf & N(5,25) pdf p() 000 005 010 015 020 025 Binomial(10,05) f() 000 005 010 015 020 025 N(5,25) 0 2 4 6 8 10 0 2 4 6 8 10 Binomial(100,½) pmf & N(50,25) pdf p() 000 002 004 006 008 Binomial(100,05) f() 000 002 004 006 008 N(50,25) 20 30 40 50 60 70 80 20 30 40 50 60 70 80 16
Binomial(1000,½) pmf & N(500,250) pdf Binomial(1000,05) N(500,250) p() 0000 0005 0010 0015 0020 0025 f() 0000 0005 0010 0015 0020 0025 400 450 500 550 600 400 450 500 550 600 So suppose X was the number of heads found on 1000 tosses of a fair coin, and we were interested in P(X 490) Using the binomial distribution pmf, we would need to calculate P(X 490) = p X (0) + p X (1) + p X (2) + + p X (490) (!) ( 027) ( However, using ) the CLT we have approimately X N(500, 250) and so P(X 490) 490 500 Φ = Φ( 0632) = 1 Φ(0632) 026 250 Log-Normal Distribution Suppose X N(µ, σ 2 ), and consider the transformation Y = e X Then if g() = e, g 1 (y) = log(y) and g 1 (y) = 1 y Then by (2) we have f Y (y) = and we say Y follows a log-normal distribution Eample: LN(0,1), LN(2,1) & LN(0,4) pdfs ] 1 [ σy 2π ep {log(y) µ}2 2σ 2, y > 0, f() 00 02 04 06 08 10 12 LN(0,4) LN(0,1) LN(2,1) 0 2 4 6 8 10 17