Transformations and Expectations of random variables

Transformations and Epectations of random variables X F X (): a random variable X distributed with CDF F X. Any function Y = g(x) is also a random variable. If both X, and Y are continuous random variables, can we find a simple way to characterize F Y and f Y (the CDF and PDF of Y ), based on the CDF and PDF of X? For the CDF: PDF: f Y (y) = F y(y) F Y (y) = P Y (Y y) = P Y (g(x) y) = P X ( X : g(x) y) (X is sample space for X) = f X (s)ds. { X :g(x) y} Caution: need to consider support of y. Consider several eamples:. X U[, ] and y = ep() That is: f X () = { 2 if [, ] 0 otherwise F X () = 2 +, for [, ]. 2 F Y (y) = P rob(ep(x) y) = P rob(x log y) Be careful about the bounds of the support! = F X (log y) = 2 + 2 log y, for y [ e, e]. f Y (y) = y F Y (y) = f X (log y) y = 2y, for y [ e, e].

2. X U[, ] and Y = X 2 F Y (y) = P rob(x 2 y) = P rob( y X y) = F X ( y) F X ( y) = 2F X ( y), by symmetry: F X ( y) = F X ( y). f Y (y) = y F Y (y) = 2f X ( y) 2 y = 2, for y [0, ]. y As the first eample above showed, it s easy to derive the CDF and PDF of Y when g( ) is a strictly monotonic function: Theorems 2..3, 2..5: When g( ) is a strictly increasing function, then F Y (y) = g (y) f X ()d = F X (g (y)) f Y (y) = f X (g (y)) y g (y) using chain rule. Note: by the inverse function theorem, y g (y) = / [g ()] =g (y). When g( ) is a strictly decreasing function, then F Y (y) = g (y) f X ()d = F X (g (y)) f Y (y) = f X (g (y)) y g (y) using chain rule. These are the change of variables formulas for transformations of univariate random variables. Thm 2..8 generalizes this to piecewise monotonic transformations. 2

Here is a special case of a transformation: Thm 2..0: Let X have a continuous CDF F X ( ) and define the random variable Y = F X (X). Then Y U[0, ], i.e., F Y (y) = y, for y [0, ]. Note: all that is required is that the CDF F X is continuous, not that it must be strictly increasing. The result also goes through when F X is continuous but has flat parts (cf. discussion in CB, pg. 34). Epected value (Definition 2.2.): The epected value, or mean, of a random variable g(x) is { Eg(X) = g()f X()d if X continuous X g()p (X = ) if X discrete provided that the integral or the sum eists The epectation is a linear operator (just like integration): so that [ ] n n E α g i (X) + b = α Eg i (X) + b. i= Note: Epectation is a population average, i.e., you average values of the random variable g(x) weighting by the population density f X (). A statistical eperiment yields sample observations X, X 2,..., X n F X. From these sample observations, we can calculate sample avg. Xn n i X i. In general: Xn EX. But under some conditions, as n, then X n EX in some sense (which we discuss later). Epected value is commonly used measure of central tendency of a random variable X. Eample: But mean may not eist: Cauchy random variable with density f() = π(+ 2 ) for (, ). Note that π( + 2 ) d = 0 = 0 lim a π( + 2 ) d + 0 a i= d + lim π( + 2 ) b 0 π( + 2 ) d b 0 b = lim 0 a [log( 2 )] 0 a + lim = + undefined π( + 2 ) d [log( 2 )] b 0 3

Other measures:. Median: med(x) = m such that F X () = 0.5. Robust to outliers, and has nice invariance property: for Y = g(x) and g( ) monotonic increasing, then med(y ) = g(med(x)). 2. Mode: Mode(X) = ma f X (). Moments: important class of epectations For each integer n, the n-th (uncentred) moment of X F X ( ) is µ n EX n. The n-th centred moment is µ n E(X µ) n = E(X EX) n. (It is centred around the mean EX.) For n = 2: µ 2 = E(X EX) 2 is the Variance of X. µ 2 is the standard deviation. Important formulas: V ar(ax + b) = a 2 V arx (variance is not a linear operation) V arx = E(X 2 ) (EX) 2 : alternative formula for the variance The moments of a random variable are summarized in the moment generating function. Definition: the moment-generating function of X is M X (t) E ep(tx), provided that the epectation eists in some neighborhood t [ h, h] of zero. This is also called the Laplace transform. Specifically: { M (t) = et f X ()d for X continuous X et P (X = ) for X discrete. The uncentred moments of X are generated from this function by: EX n = M (n) dn X (0) dt M X(t) n, t=0 4

which is the n-th derivative of the MGF, evaluated at t = 0. Eample: standard normal distribution: ) M X (t) = ep (t 2 d 2 ( = ep ) 2 (( t)2 t 2 ) d = ep( 2 t2 ) ep ( 2 ) ( t)2 d = ep( 2 t2 ) where last term on RHS is integral over density function of N(t, ), which integrates to one. First moment: EX = M X (0) = t ep( 2 t2 ) t=0 = 0. Second moment: EX 2 = M 2 X (0) = ep( 2 t2 ) + t 2 ep( 2 t2 ) =. In many cases, the MGF can characterize a distribution. But problem is that it may not eist (eg. Cauchy distribution) For a RV X, is its distribution uniquely determined by its moment generating function? Thm 2.3.: For X F X and Y F Y, if M X and M Y eist, and M X (t) = M Y (t) for all t in some neighborhood of zero, then F X (u) = F Y (u) for all u. Note that if the MGF eists, then it characterizes a random variable with an infinite number of moments (because the MGF is infinitely differentiable). Converse not necessarily true. (e. log-normal random variable: X N(0, ), Y = ep(x)) Characteristic function: The characteristic function of a random variable g(), defined as where f() is the density for. φ g() (t) = E ep(itg()) = This is also called the Fourier transform. Features of characteristic function: 5 ep(itg())f()d

The CF always eists. This follows from the equality e it = cos(t) + i sin(t), and both the real and comple parts of the integrand are bounded functions. Consider a symmetric density function, with f( ) = f() (symmetric around zero). Then resulting φ(t) is real-valued, and symmetric around zero. The CF completely determines the distribution of X (every cdf has a unique characteristic function). Let X have characteristic function φ X (t). Then Y = ax +b has characteristic function φ Y (t) = e ibt φ X (at). X and Y, independent, with characteristic functions φ X (t) and φ Y (t). Then φ X+Y (t) = φ X (t)φ Y (t) φ(0) =. For a given characteristic function φ X (t) such that φ X(t) dt <, the corresponding density f X () is given by the inverse Fourier transform, which is f X () = φ X (t) ep( it)dt. Eample: N(0, ) distribution, with density f() = ep( 2 /2). Take as given that the characteristic function of N(0, ) is φ N(0,) (t) = ep ( it 2 /2) ) d = ep( t 2 /2). () Hence the inversion formula yields f() = Now making substitution z = t, we get ep ( iz z 2 /2 ) dz ep( t 2 /2) ep( it)dt. = φ N(0,) () = ep( 2 /2) = f N(0,) (). (Use Eq. ()) Here denotes the modulus of a comple number. For + iy, we have + iy = 2 + y 2. 6

Characteristic function also summarizes the moments of a random variable. Specifically, note that the h-th derivative of φ(t) is φ h (t) = i h g() h ep(itg())f()d. (2) Hence, assuming the h-th moment, denoted µ h g() E[g()]h eists, it is equal to µ h g() = φ h (0)/i h. Hence, assuming that the required moments eist, we can epand the characteristic function around t = 0 to get: φ(t) = + it µ g() + (it)2 2 µ2 g() +... + (it)k µ k g() + o(t k ). k! Cauchy distribution, cont d: The characteristic function for the Cauchy distribution is φ(t) = ep( t ). This is not differentiable at t = 0, which by Eq. (2) is saying that its mean does not eist. Hence, the epansion of the characteristic function in this case is invalid. 7