Change of Continuous Random Variable



Similar documents
Microeconomic Theory: Basic Math Concepts

Practice with Proofs

3. Mathematical Induction

Discrete Mathematics and Probability Theory Fall 2009 Satish Rao, David Tse Note 2

1 if 1 x 0 1 if 0 x 1

6.4 Logarithmic Equations and Inequalities

Maximum Likelihood Estimation

Equations, Inequalities & Partial Fractions

Lecture 7: Continuous Random Variables

Math 461 Fall 2006 Test 2 Solutions

MATH 425, PRACTICE FINAL EXAM SOLUTIONS.

3 Contour integrals and Cauchy s Theorem

Inner Product Spaces

Math 4310 Handout - Quotient Vector Spaces

To give it a definition, an implicit function of x and y is simply any relationship that takes the form:

Section 4.4. Using the Fundamental Theorem. Difference Equations to Differential Equations

Inverse Functions and Logarithms

LS.6 Solution Matrices

Differentiation and Integration

0.8 Rational Expressions and Equations

CONTENTS 1. Peter Kahn. Spring 2007

Lecture 6: Discrete & Continuous Probability and Random Variables

Continued Fractions and the Euclidean Algorithm

Properties of Real Numbers

Constrained optimization.

Representation of functions as power series

Student Outcomes. Lesson Notes. Classwork. Discussion (10 minutes)

Chapter 4 - Lecture 1 Probability Density Functions and Cumul. Distribution Functions

Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur

Linear Algebra Notes for Marsden and Tromba Vector Calculus

3.1. Solving linear equations. Introduction. Prerequisites. Learning Outcomes. Learning Style

Lecture 13 - Basic Number Theory.

The Method of Partial Fractions Math 121 Calculus II Spring 2015

Vieta s Formulas and the Identity Theorem

1 Lecture: Integration of rational functions by decomposition

Linear and quadratic Taylor polynomials for functions of several variables.

8 Divisibility and prime numbers

36 CHAPTER 1. LIMITS AND CONTINUITY. Figure 1.17: At which points is f not continuous?

26 Integers: Multiplication, Division, and Order

Quotient Rings and Field Extensions

Calculus 1: Sample Questions, Final Exam, Solutions

Section 6.1 Joint Distribution Functions

Cartesian Products and Relations

MATH 4330/5330, Fourier Analysis Section 11, The Discrete Fourier Transform

Solutions to Math 51 First Exam January 29, 2015

Integrals of Rational Functions

Math 1B, lecture 5: area and volume

Matrices 2. Solving Square Systems of Linear Equations; Inverse Matrices

CHAPTER 3. Methods of Proofs. 1. Logical Arguments and Formal Proofs

2.2 Separable Equations

Linear Programming Notes V Problem Transformations

PYTHAGOREAN TRIPLES KEITH CONRAD

Applications of Fermat s Little Theorem and Congruences

Chapter 3 RANDOM VARIATE GENERATION

Polynomial Invariants

M2S1 Lecture Notes. G. A. Young ayoung

Mathematics Review for MS Finance Students

Lies My Calculator and Computer Told Me

MATH Fundamental Mathematics IV

x 2 + y 2 = 1 y 1 = x 2 + 2x y = x 2 + 2x + 1

Solution to Homework 2

DIFFERENTIABILITY OF COMPLEX FUNCTIONS. Contents

Covariance and Correlation

Real Roots of Univariate Polynomials with Real Coefficients

Math 120 Final Exam Practice Problems, Form: A

5544 = = = Now we have to find a divisor of 693. We can try 3, and 693 = 3 231,and we keep dividing by 3 to get: 1

NOTES ON LINEAR TRANSFORMATIONS

5.1 Radical Notation and Rational Exponents

Solutions to Homework 10

3.3 Real Zeros of Polynomials

8 Square matrices continued: Determinants

Math 319 Problem Set #3 Solution 21 February 2002

GROUPS ACTING ON A SET

MA4001 Engineering Mathematics 1 Lecture 10 Limits and Continuity

Linear Equations and Inequalities

Chapter 17. Orthogonal Matrices and Symmetries of Space

Section 4.1 Rules of Exponents

DIVISIBILITY AND GREATEST COMMON DIVISORS

Solving Linear Systems, Continued and The Inverse of a Matrix

9.4. The Scalar Product. Introduction. Prerequisites. Learning Style. Learning Outcomes

Definition 8.1 Two inequalities are equivalent if they have the same solution set. Add or Subtract the same value on both sides of the inequality.

Limits and Continuity

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 5 9/17/2008 RANDOM VARIABLES

DERIVATIVES AS MATRICES; CHAIN RULE

Domain of a Composition

Two Fundamental Theorems about the Definite Integral

5 Homogeneous systems

6.3 Conditional Probability and Independence

Metric Spaces. Chapter Metrics

a 1 x + a 0 =0. (3) ax 2 + bx + c =0. (4)

Statistics 100A Homework 7 Solutions

The sample space for a pair of die rolls is the set. The sample space for a random number between 0 and 1 is the interval [0, 1].

Determine If An Equation Represents a Function

This unit will lay the groundwork for later units where the students will extend this knowledge to quadratic and exponential functions.

REVIEW EXERCISES DAVID J LOWRY

ALGEBRA REVIEW LEARNING SKILLS CENTER. Exponents & Radicals

Public Key Cryptography: RSA and Lots of Number Theory

Review of Fundamental Mathematics

Math 370/408, Spring 2008 Prof. A.J. Hildebrand. Actuarial Exam Practice Problem Set 3 Solutions

Revised Version of Chapter 23. We learned long ago how to solve linear congruences. ax c (mod m)

Transcription:

Change of Continuous Random Variable All you are responsible for from this lecture is how to implement the Engineer s Way (see page 4) to compute how the probability density function changes when we make a change of random variable from a continuous random variable X to Y by a strictly increasing change of variable y = h(x). So for the purpose of surviving Stat 400 you can start reading in 2. I give two examples following the statement of the theorem and its proof(s) where the method gives the correct result, then I give an example where it doesn t work when h(x) is not one-to-one. Let X be a continuous random variable. I will assume for convenience that the set of values taken by X is the entire real line R. If the set of values taken by X is an interval, for example [0, 1], the formula for the change of density if the same but we don t know the interval where the new density will be nonzero (the support). We will treat this point later. Let y = h(x) be a real-valued strictly-increasing function (so h is one-to-one). Since h is one-to-one it has an inverse function x = g(y). We want to define a new random variable Y = h(x). There is only one possible definition, to find it we pretend Y exists and compute for each pair of numbers c and d with c<dwhat the Y -probability P (c Y d) has to be in terms of an X-probability. P (c Y d) =P (c h(x) d) =P (g(c) g(h(x)) g(d)) = P (a X b). Here we define a and b by g(c) =a and g(d) =b or equivalently h(a) =c and h(b) =d. Note that the last equation holds because g(h(x)) = X since g is the inverse of h. Note c h(x) d g(c) g(h(x)) g(d) because h is strictly increasing and the inverse of a strictly increasing function is strictly increasing so g preserves inequalities; that is, a b c g(a) g(b) g(c). So the above calculation forces us to define the new random variable Y by P (c Y d) :=P (a X b) where a = g(c) and b = g(d). (1) In other words, the probability that the new random variable Y will be in an interval [c, d] is defined to be the probability that the old random variable X will be in the transformed interval [g(c), g(d)] = [a, b]. To begin with, Equation (1) is just a rule for assiging a real number to each interval [c, d]. It turns out (it follows from the second proof of the next theorem) that this formula (1) defines a probability measure P on the line. In other words if we define P as above then P satisfies the axioms for a probability measure. Also it follows from the second proof that the new random variable Y (with probabilities defined using Equation (1)) is continuous with a new density function related to the density function of the original random variable X in a simple way via the inverse g(y) of the function h(x). The fact that Y as defined by (1) is continuous also follows from the either proof of the next theorem. The point of this lecture is to see how this works and to show you how to make explicit computations in examples. 1

1 The Theoretical Justification of the Engineer s Way I will give two proofs of the formula for f Y (y) The first proof assumes that Equation (1) does in fact define a continuous random variable. It procedes in two stages. First, we compute the cdf F Y of the new random variable Y in terms of F X. We then find the density function f Y (y) of the new random variable Y we differentiate the cdf f Y (y) = d dy F Y (y). The second proof uses the change of variable theorem from calculus. Don t let the next proof(s) scare you - you won t be tested on them. But they justify the Engineer s Way, a simple rule to compute the probability density function of the new random variable Y in terms of the probability density function of the original random variable X. Theorem 1.1 Suppose X is continuous with probability density function f X (x). Let y = h(x) with h a strictly increasing continuously differentiable function with inverse x = g(y). Then Y = h(x) defined by (1) is continuous with probability density function f Y (y) given by f Y (y) =f X (g(y))g (y) (2) Proof. We will give two proofs of Equation (2). The first proof has the advantage that it is easier to understand and gives a formula for the new cdf as well but involves a tricky point-the appearance of the constant C. To state it loosely, the problem is that we might not have g( ) = (we state this problem carefully in terms of limits below). The second proof Equation (2) uses the change of variable theorem. It has the advantage of giving a direct computation of P (c Y d). From this formula we see that Equation (1) does in fact define a probability measure and moreover the associated random variable Y is continuous. Indeed, in the second proof we show directly by applying the change of variable formula to P (a X b) that we have P (c Y d) = d c f X (g(y))g (y)dy (3) Equation (3) means that the equation (1) does in fact define a probability measure and the corresponding random variable Y is continuous with probability density function f X (g(y))g (y). First proof We first compute F Y (y) in terms of F X (x). There is a tricky point here. There is no reason why lim y g(y) =. But the limit does exist (I leave that to you). Suppose lim y g(y) =L. F Y (y) =P (Y y) =P ( <Y y) =P ( <h(x) y) =P (L X g(y)) = F X (g(y)) F X (L) =F X (g(y)) C. So we get F Y (y) =F X (g(y)) C (4) 2

where (roughly) C = F X (g( )). Note that the third equality holds because g(y) is also strictly increasing (because the inverse of a strictly increasing function is strictly increasing) so g preserves inequalities; that is, a b g(a) g(b). So apply g to each side of the inequality h(x) y to get g(h(x)) h(y). But g(h(x) =X since g h = Id because g is the inverse of h. Next we differentiate the function on the right of Equation (4) with respect to y using the Chain Rule to get f Y (y) (since the derivative of the cdf F Y (y) with respect to y is the pdf f Y (y)). f Y (y) = d dy F Y (y) = d dy (F X(g(y)) C) =F X(g(y))g (y). (5) But since F X (x) =f X(x) for any number x we get F X (g(y)) = f X(g(y)) and substituting into the last term of Equation (5) we get f Y (y) =f X (g(y))g (y). This completes the first proof of the Engineer s Way. Second proof Let a, b be real numbers with a<b. By definition P (c Y d) =P (a X b) with a = g(c) and b = g(d). But since X is continuous with density function f X we have P (a X b) = b a f X (x)dx = g(d) g(c) f X (x)dx = d c f X (g(y))g (y)dy. The last inequality is the change of variable theorem for definite integrals. So we get: for every c, d R with c<dwe have P (c Y d) = d c f X (g(y))g (y)dy. But this says that f X (g(y))g (y) is the probability density function of Y. I didn t prove this theorem in class. The previous proofs are probably a little hard for many of you right now but they justify what I called The Engineer s Way in class. The Engineer s Way from Class Here is the way I stated the Engineer s Way in class. Start with f X (x)dx. Substitute x = g(y) for the x in f X (x) and the x in dx to get f(g(y))dg(y). Now use dg(y) =g (y)dy to get f(g(y))g (y)dy. Then I told you that the function f(g(y))g (y) multiplying dy is the probability density function of the new (transformed) random variable Y. This is the Engineer s Way from class. But the function f(g(y))g (y) really is the density function of the new random variable Y according to the theorem above. So the simple rule works. There is only one problem, to implement the Engineer s Way given X,Y and h you have to compute the inverse function x = g(y) to y = h(x). This amounts to solving the equation h(x) =y (6) for x in terms of y. This can be impossible to do. However the functions I give you on tests will be easy to invert. 3

2 How to Implement the Engineer s Way Here is where you should start reading for the purpose of preparing for tests. I will work out two examples. 2.1 An easy example Suppose X has the linear density so 2x, 0 x 1 f X (x) = 0, otherwise. We will make the change of variable y = x.soh(x) = x so we want to compute the density function of the random variable Y = X. The inverse function to h(x) is given by x = y 2 = g(y). Now we implement the Engineer s Way. Step One: Multiply the density by dx to get f X (x)dx =2xdx. Step Two: find the inverse function g(y) toh(x) = x, so we have to solve Equation (6), that is we have to solve for x in terms of y in the equation x = y The solution is clearly x = y 2 so g(y) =y 2. Step Three: Using the formula x = y 2 rewrite f X (x)dx =2xdx in terms of y. So substituting y 2 for x in both places we get f X (g(y))dg(y) =2y 2 d(y 2 )=2y 2 2ydy =4y 3 dy. Step Four: The Engineer s Way tells us the result must be the new density function f Y (y) ofy multiplied by dy and hence f Y (y)dy =4y 3 dy and so f Y (y) =4y 3 Step Five: Find the support of Y, roughly, the domain where Y is nonzero (see Section 4). From Section 4, we know that the support is [h(0),h(1)] = [ 0, 1] = [0, 1]. On the complement of [0, 1] the f Y (y) is zero. So Y has the cubic density f Y (y) = 4y 3, 0 y 1 0, otherwise. 4

2.2 A much more important example We will use the Engineer s way to prove that standardizing a general normal random variable produces a standard normal variable. This is a result we use over and over in the course so it is nice to understand why it is true. Note that we will be using z instead of y in what follows. We will use the change of variable z = h(x) = x µ hence x = g(z) =z + µ. Theorem 2.1 Suppose X N(µ, 2 ). Then Z = X µ N(0, 1). Proof. We have f X (x) = 1 x µ) ( e )2. 2π Now we apply the Engineer s Way step by step. This is what you need to learn to do. Step One: Multiply the density by dx to get f X (x)dx = 1 2π e x µ) ( )2 dx. Step Two: find the inverse function g(z) toh(x) = x µ, so we have to solve Equation (6), that is we have to solve for x in terms of y in the equation x µ = z The solution is clearly x = z + µ so g(z) =z + µ. Step Three: Using the formulas x = z+µ and z = x µ 1 rewrite in terms of z. So we get (noting since z = x µ x µ ), the argument ( exponential function is in fact just (z) 2 ) f X (g(z))dz = 1 e 1 2 (z)2 d(z + µ) = 1 e 1 2 (z)2 dz. 2π 2π Step Four: Cancel the s to get f X (g(z))dz = 1 2π e z2 2 dz. 2π e 1 2 ( x µ) )2 dx )2 of the Step Five: We have h( ) = and h( ) = so the support of Z is still R =(, ). But the whole point of the Engineer s Way is that f X (g(z))dz is the density function f Z (z) of the new random variable Z multiplied by dz. So f Z (z)dz = 1 e z2 2 dz. 2π Cancelling the dz s we get f Z (z) = 1 2π e z2 2, <z<. But the right-hand side is the density function of a standard normal random variable, so Z has standard normal distribution. 5

Remark 2.2 Don t forget to substitute g(z) for the x in the dx. I will now show that the Engineer s Way does not always give the right answer if h(x) is not one-to-one. 3 A Quadratic Change of Variable We will now prove the following theorem that is very important in statistics. Definition 3.1 A random variable X is said to have chi-squared distribution with ν degrees of freedom abbreviated X χ 2 (ν), if 1 2 f X (x) = ν 2 Γ( ν 2 1 e x 2 x 0 2 )xν (7) 0 otherwise We are now going to prove a theorem which is very important in statistics - the square of a standard normal random variable has chi-squared distribution with one degree of freedom. This amounts to solving 4.4, Problem 71 in the text. In terms of equations Theorem 3.2 Z N(0, 1) Y = Z 2 χ 2 (1). We first note that if ν = 1 (and changing x to y and X to Y in the Equation (7)) we have 1 f Y (y) = 2Γ( 1 y 1 e y 2 ) 2, x 0 0, otherwise Substituting Γ( 1 )= π in Equation (8) we obtain 2 (8) 1 y 1 f Y (y) = 2π e y 2, x 0 (9) 0, otherwise So we have to get the density function on the right-hand side of Equation (9) when we make the change of variable y = z 2 starting with the standard normal density f Z (z) = 1 e z2 2. 2π Note that the change of variable is two-to-one, so there is no guarantee that the Engineer s Way will work and in fact it gives 1 times the correct answer (try it). So 2 the answer is off by a factor of 1. It is no coincidence that the map h(z) istwo-to-one. 2 So let s prove the theorem. Proof. The idea of the proof (what I called the Careful Way in class) is to first compute the cdf F Y (y) of the transformed random variable Y = Z 2 in terms of the cdf of the original random variable Z. Recall we have denoted the cdf of the standard normal random variable by Φ(z). Once we have the cdf F Y (y) ofy we can get the pdf f Y (y) by differentiating it with respect to y: f Y (y) = d dy F Y (y). 6

Away we go. We have F Y (y) =P (Y y) =P (Z 2 y) =P ( y Z y) = 2Φ( y) 1 Here the last equation comes from what I called the handy formula for the probability that a standard normal random variable is between ±a: P ( a Z a) = 2Φ(a) 1. In fact, the key step is the next-to-last equality. The point is that we can solve the nonlinear inequality y 2 c for y easily. Indeed we have Thus we have our desired expression y 2 c c y c (10) F Y (y) = 2Φ( y) 1. Now we have to differentiate this equation with respect to y using that the derivative of Φ at z is the standard normal density Φ (z) = 1 2π e z2 2. First we get without effort d dy F Y (y) = d dy [2Φ( y) 1]=2 d dy [Φ( y)]. Now comes the hard part - the chain rule part. We use the chain rule to get the first equality below. In the third term below the notation (e z2 2 ) z= y ) means you take the function e z2 2 and evaluate it at z = y to get e y 2 which gives the fourth term. So 2 d dy [Φ( y)] = 2Φ ( y) d dy [ 1 y]=2[ (e z2 2 z= y )] [ 1 1 ]=2[ 1 1 1 ][ e y 1 1 2 ]= y e y 2. 2π 2 y 2 y 2π 2π But this last expression is the pdf of a chi-squared random variable with one degree of freedom (compare with Equation (9)). I don t expect many people to understand the next remark but I ll put it in for those people who have taken some more advanced math courses. Remark 3.3 The fact that the support (see the next definition) of f Y is [0, ) is because the support of f Z was (, ) and the image of (, ) under the map h(z) =z 2 is [0, ). The support of the new density is always the image of the support of the old density under the change of variable map. 7

4 How the End-Points Change under h(x) We begin this section with a very useful definition (you will learn the definition of closure in Math 410). Definition 4.1 The support of a function f on the real line is the closure of set of all points x where f(x) is nonzero. In all our examples of density functions the set of points where f is nonzero is either a single closed interval [a, b], a single open interval (a, b), a single half open interval (a, b] or[a, b) or [0, ), (0, ), (, ). Taking the closure just adds the missing end points. So, in the first four cases the support is [a, b], in the next two the support is [0, ) and the last it is (, ). We now state Theorem 4.2 Suppose the density function f X (x) has support the interval [a, b] and y = h(x) with h strictly increasing. Then the support of the density function f Y (y) is the (image) interval [h(a),h(b)]. Proof. The next proof is not quite correct but it gives the main idea. For convenience we assume h (x) and hence g (y),is never zero (this isn t true for the strictly increasing function h(x) =x 3 but I want to keep things easy here). Suppose also for convenience that we are in the first case. Then since f Y (y) =f X (g(y)))g (y)) and g (y) is never zero we find that f Y (y) 0 f X (g(y)) 0 a g(y) b h(a) h(g(y)) h(b). The last step follows since h(y) is (strictly) increasing iand any increasing function preserves inequalities. But h(g(y)) =y since g is the inverse function to h. Hence we obtain f Y (y) 0 h(a) y h(b). In other words the set where f Y (y) is nonzero is exactly the closed interval [h(a),h(b)]. Remark 4.3 The point is that h maps the set (no matter what it is) where f X nonzero to the set where f X g is nonzero. is 5 Linear change of a uniform random variable We now do an example. Suppose X U(0, 1), that is X has uniform distribution on [0, 1] so 1, 0 x 1 f X (x) = 0, otherwise. Let y = h(x) =ax + b with a>0sox = g(y) = y b. So we are making the linear a change of random variable Y = ax +b. So the support of f X is the interval [0, 1]. Now h(0) = b and h(1) = a + b so the support of the density function of the transformed 8

random variable Y = ax + b is [a, a + b] by Theorem 4.2. We now compute the density of Y. Assuming y [0, 1] we have So We have proved f Y (y) dy = f X ( y b a ) d(y b a )=1 1 a dy. f Y (y) = 1, a b y a + b 0 otherwise Theorem 5.1 The linear change y = ax + b of a random variable X with uniform distribution on [0, 1] produces a random variable Y with uniform distribution on [a, a+ b]. 6 The Law of the Unconscious Statistican Theorem 6.1 Suppose X is a continuous random variable with density f X (x) with support [a, b]. Suppose we change variables to Y = h(x). Then the expected value of the new random variable Y can be computed from the density of the original random variable X according to the formula E(Y )= b a h(x)f X (x)dx. (11) The theorem gets its name because a statistician who didn t know what he was doing would get the right answer by plugging h(x) into the integral on the right-hand side of Equation (11) and would thereby unconsciously compute the expected value of the new random variable Y. One of the main points of the theorem is that you can compute E(Y ) without computing f Y (y). I want to emphasize that given y = h(x) it can be impossible to solve for the inverse function x = g(y) so you can t use the Engineer s Way. Even in this case you can still compute E(Y ). 7 Comparison with the Discrete Case The corresponding result for how the probability mass function changes under a oneto-one change of variable y = h(x) is very easy. Theorem 7.1 Suppose X is a discrete random variable with probability mass function p X (x). Suppose h(x) is a one-to-one function. Put Y = h(x). Then p Y (y) =p X (g(y)). (12) There is no factor of g (y) multiplying p X (g(y)) in the discrete case. 9

Proof. By definition p Y (y) =P (Y = y) =P (h(x) =y) =P (X = g(y)) = p X (g(y)). The reason the continuous case is so difficult is because in the continuous case P (Y = y) = 0 for all y. The density function f Y (y) does not have a description as a probability of an event involving Y. Also it is not true in the discrete case that p Y (y) = d dy F Y (y). So there is no chain rule way to compute p Y (y). 10