2. Discrete Random Variables and Expectation

Similar documents
Question: What is the probability that a five-card poker hand contains a flush, that is, five cards of the same suit?

ST 371 (IV): Discrete Random Variables

Chapter 4 Lecture Notes

FEGYVERNEKI SÁNDOR, PROBABILITY THEORY AND MATHEmATICAL

Random variables, probability distributions, binomial random variable

Probability: Terminology and Examples Class 2, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom

The sample space for a pair of die rolls is the set. The sample space for a random number between 0 and 1 is the interval [0, 1].

HOMEWORK 5 SOLUTIONS. n!f n (1) lim. ln x n! + xn x. 1 = G n 1 (x). (2) k + 1 n. (n 1)!

WHERE DOES THE 10% CONDITION COME FROM?

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 5 9/17/2008 RANDOM VARIABLES

Basic Probability Concepts

Chapter 3: DISCRETE RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS. Part 3: Discrete Uniform Distribution Binomial Distribution

E3: PROBABILITY AND STATISTICS lecture notes

REPEATED TRIALS. The probability of winning those k chosen times and losing the other times is then p k q n k.

6.3 Conditional Probability and Independence

Probability Generating Functions

Random variables P(X = 3) = P(X = 3) = 1 8, P(X = 1) = P(X = 1) = 3 8.

Sums of Independent Random Variables

Discrete Mathematics and Probability Theory Fall 2009 Satish Rao, David Tse Note 18. A Brief Introduction to Continuous Probability

10.2 Series and Convergence

IEOR 6711: Stochastic Models I Fall 2012, Professor Whitt, Tuesday, September 11 Normal Approximations and the Central Limit Theorem

Master s Theory Exam Spring 2006

Discrete Mathematics and Probability Theory Fall 2009 Satish Rao, David Tse Note 13. Random Variables: Distribution and Expectation

Lecture Note 1 Set and Probability Theory. MIT Spring 2006 Herman Bennett

Continued Fractions and the Euclidean Algorithm

Expectation & Variance

6 PROBABILITY GENERATING FUNCTIONS

Math 55: Discrete Mathematics

Undergraduate Notes in Mathematics. Arkansas Tech University Department of Mathematics

You flip a fair coin four times, what is the probability that you obtain three heads.

Section 1.3 P 1 = 1 2. = P n = 1 P 3 = Continuing in this fashion, it should seem reasonable that, for any n = 1, 2, 3,..., =

Homework 4 - KEY. Jeff Brenion. June 16, Note: Many problems can be solved in more than one way; we present only a single solution here.

Binomial lattice model for stock prices

Math 115 Spring 2011 Written Homework 5 Solutions

Discrete Math in Computer Science Homework 7 Solutions (Max Points: 80)

9.2 Summation Notation

Section 5-3 Binomial Probability Distributions

1. (First passage/hitting times/gambler s ruin problem:) Suppose that X has a discrete state space and let i be a fixed state. Let

Unit 4 The Bernoulli and Binomial Distributions

Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur

ECE302 Spring 2006 HW4 Solutions February 6,

Discrete Mathematics and Probability Theory Fall 2009 Satish Rao, David Tse Note 10

Section 5.1 Continuous Random Variables: Introduction

Mathematical Induction

6.041/6.431 Spring 2008 Quiz 2 Wednesday, April 16, 7:30-9:30 PM. SOLUTIONS

3.4. The Binomial Probability Distribution. Copyright Cengage Learning. All rights reserved.

Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay

Chapter 5. Random variables

Linear Risk Management and Optimal Selection of a Limited Number

Basic Proof Techniques

CHAPTER 2 Estimating Probabilities

Probability density function : An arbitrary continuous random variable X is similarly described by its probability density function f x = f X

Chapter 5. Discrete Probability Distributions

Lecture 1 Introduction Properties of Probability Methods of Enumeration Asrat Temesgen Stockholm University

GEOMETRIC SEQUENCES AND SERIES

Lecture 13 - Basic Number Theory.

MATHEMATICAL METHODS OF STATISTICS

Probabilities. Probability of a event. From Random Variables to Events. From Random Variables to Events. Probability Theory I

The normal approximation to the binomial

14.1 Rent-or-buy problem

Math 431 An Introduction to Probability. Final Exam Solutions

4/1/2017. PS. Sequences and Series FROM 9.2 AND 9.3 IN THE BOOK AS WELL AS FROM OTHER SOURCES. TODAY IS NATIONAL MANATEE APPRECIATION DAY

Math/Stats 425 Introduction to Probability. 1. Uncertainty and the axioms of probability

Notes on Continuous Random Variables

Probability Distribution for Discrete Random Variables

Elements of probability theory

TOPIC 4: DERIVATIVES

A Few Basics of Probability

Stanford Math Circle: Sunday, May 9, 2010 Square-Triangular Numbers, Pell s Equation, and Continued Fractions

SCORE SETS IN ORIENTED GRAPHS

The normal approximation to the binomial

Sequential Data Structures

Zeros of a Polynomial Function

5. Continuous Random Variables

CONTINUED FRACTIONS AND PELL S EQUATION. Contents 1. Continued Fractions 1 2. Solution to Pell s Equation 9 References 12

arxiv: v1 [math.pr] 5 Dec 2011

Lecture 2 Binomial and Poisson Probability Distributions

Chapter 31 out of 37 from Discrete Mathematics for Neophytes: Number Theory, Probability, Algorithms, and Other Stuff by J. M.

x a x 2 (1 + x 2 ) n.

MATH10212 Linear Algebra. Systems of Linear Equations. Definition. An n-dimensional vector is a row or a column of n numbers (or letters): a 1.

I. GROUPS: BASIC DEFINITIONS AND EXAMPLES

An Introduction to Basic Statistics and Probability

MATH 289 PROBLEM SET 4: NUMBER THEORY

Definition and Calculus of Probability

Ch5: Discrete Probability Distributions Section 5-1: Probability Distribution

6.042/18.062J Mathematics for Computer Science. Expected Value I

Tenth Problem Assignment

k, then n = p2α 1 1 pα k

1 Gambler s Ruin Problem

Wald s Identity. by Jeffery Hein. Dartmouth College, Math 100

God created the integers and the rest is the work of man. (Leopold Kronecker, in an after-dinner speech at a conference, Berlin, 1886)

88 CHAPTER 2. VECTOR FUNCTIONS. . First, we need to compute T (s). a By definition, r (s) T (s) = 1 a sin s a. sin s a, cos s a

The Binomial Probability Distribution

Follow links for Class Use and other Permissions. For more information send to:

Math 461 Fall 2006 Test 2 Solutions

Notes from Week 1: Algorithms for sequential prediction

Overview. Essential Questions. Precalculus, Quarter 4, Unit 4.5 Build Arithmetic and Geometric Sequences and Series

The Prime Numbers. Definition. A prime number is a positive integer with exactly two positive divisors.

For a partition B 1,..., B n, where B i B j = for i. A = (A B 1 ) (A B 2 ),..., (A B n ) and thus. P (A) = P (A B i ) = P (A B i )P (B i )

Statistics 100A Homework 4 Solutions

Transcription:

2. Discrete Random Variables and Expectation In tossing two dice we are often interested in the sum of the dice rather than their separate values The sample space in tossing two dice consists of 36 events of equal probability, given by the ordered pairs of numbers {(1,1), (1,2),, (6, 6)} If the quantity we are interested in is the sum of the two dice, then we are interested in 11 events (of unequal probability) Any such function from the sample space to the real numbers is called a random variable MAT-72306 RandAl, Spring 2015 22-Jan-15 75 2.1. Random Variables and Expectation Definition 2.1: A random variable (RV) on a sample space is a real-valued function on ; that is,. A discrete random variable is a RV that takes on only a finite or countably infinite number of values For a discrete RV and a real value, the event " includes all the basic events of the sample space in which assumes the value I.e., " represents the set ) = } MAT-72306 RandAl, Spring 2015 22-Jan-15 76 1

We denote the probability of that event by Pr = Pr, If is the RV representing the sum of the two dice, the event =4corresponds to the set of basic events {(1, 3), (2,2), (3, 1)} Hence, Pr = 4 = 3 36 = 1 12 MAT-72306 RandAl, Spring 2015 22-Jan-15 77 Definition 2.2: Two RVs and are independent if and only if Pr(( )) = Pr Pr ) for all values and. Similarly, RVs,, are mutually independent if and only if, for any subset [1, ] and any values,, Pr ) = Pr MAT-72306 RandAl, Spring 2015 22-Jan-15 78 2

Definition 2.3: The expectation of a discrete RV, denoted by E], is given by = Pr where the summation is over all values in the range of. The expectation is finite if Pr, converges; otherwise, it is unbounded. E.g., the expectation of the RV representing the sum of two dice is = 1 36 2+ 2 36 3+ 3 36 4++ 1 36 12=7 MAT-72306 RandAl, Spring 2015 22-Jan-15 79 As an example of where the expectation of a discrete RV is unbounded, consider a RV that takes on the value 2 with probability 12 for = 1,2, The expected value of is ] = 1 2 2 = 1 expresses that ] is unbounded MAT-72306 RandAl, Spring 2015 22-Jan-15 80 3

2.1.1. Linearity of Expectations By this property, the expectation of the sum of RVs is equal to the sum of their expectations Theorem 2.1 [Linearity of Expectations]: For any finite collection of discrete RVs,, with finite expectations, = MAT-72306 RandAl, Spring 2015 22-Jan-15 81 Proof: We prove the statement for two random variables and (general case by induction). The summations that follow are understood to be over the ranges of the corresponding RVs: = Pr ) = Pr ) + Pr ) = Pr ) + Pr ) = Pr + Pr ] + The first equality follows from Definition 1.2. In the penultimate equation uses Theorem 1.6, the law of total probability. MAT-72306 RandAl, Spring 2015 22-Jan-15 82 4

Let us now compute the expected sum of two standard dice Let, where represents the outcome of die for = 1,2 Then = 1 6 =7 2 Applying the linearity of expectations, we have =7 Linearity of expectations holds for any collection of RVs, even if they are not independent MAT-72306 RandAl, Spring 2015 22-Jan-15 83 Lemma 2.2: For any constant and discrete RV, ]. Proof: The lemma is obvious for =0. For 0, ]= Pr Pr Pr. MAT-72306 RandAl, Spring 2015 22-Jan-15 84 5

2.1.2. Jensen's Inequality Let us choose the length of a side of a square uniformly at random from the range [1,99] What is the expected value of the area? We can write this as ] It is tempting to think of this as being equal to, but a simple calculation shows that this is not correct In fact, = = 50 = 2500 whereas = 99503 3317 > 2500 MAT-72306 RandAl, Spring 2015 22-Jan-15 85 More generally, [ ] ( ) Consider =() The RV is nonnegative and hence its expectation must also be nonnegative ] = [( ]) + ] ]+( ) ) To obtain the penultimate line, use the linearity of expectations To obtain the last line use Lemma 2.2 to simplify [[]] = [] [] MAT-72306 RandAl, Spring 2015 22-Jan-15 86 6

The fact that [ ] ( ) is an example of Jensen's inequality Jensen's inequality shows that, for any convex function, we have )] ]) Definition 2.4: A function is said to be convex if, for any and 1, + + (1 ) Lemma 2.3: If is twice differentiable function, then is convex if and only if "() 0 MAT-72306 RandAl, Spring 2015 22-Jan-15 87 MAT-72306 RandAl, Spring 2015 22-Jan-15 88 7

Theorem 2.4 [Jensen's Inequality]: If is a convex function, then )] ]). Proof: We prove the theorem assuming that has a Taylor expansion. Let ]. By Taylor's theorem there is a value such that (x)+ ) 2 (x ) since ) > 0 by convexity. Taking expectations and applying linearity of and Lemma 2.2 yields: ] + )( ) = ]). MAT-72306 RandAl, Spring 2015 22-Jan-15 89 2.2. The Bernoulli and Binomial Random Variables We run an experiment that succeeds with probability and fails with probability Let be a RV such that = iftheexperimentsucceeds, otherwise The variable is called a Bernoulli or an indicator random variable. Note that, for a Bernoulli RV, ]=1+(10==Pr(=1) MAT-72306 RandAl, Spring 2015 22-Jan-15 90 8

If we, e.g., flip a fair coin and consider heads a success, then the expected value of the corresponding indicator RV is 1/2 Consider a sequence of independent coin flips What is the distribution of the number of heads in the entire sequence? More generally, consider a sequence of independent experiments, each of which succeeds with probability If we let represent the number of successes in the experiments, then has a binomial distribution MAT-72306 RandAl, Spring 2015 22-Jan-15 91 Definition 2.5: A binomial RV with parameters and, denoted by ), is defined by the following probability distribution on = 0,1,2,,: Pr = I.e., the binomial RV (BRV) equals when there are exactly successes and failures in independent experiments, each of which is successful with probability Definition 2.5 ensures that the BRV is a valid probability function (Definition 1.2): Pr = = 1 MAT-72306 RandAl, Spring 2015 22-Jan-15 92 9

We want to gather data about the packets going through a router We want to know the approximate fraction of packets from a certain source / of a certain type We store a random subset or sample of the packets for later analysis Each packet is stored with probability and packets go through the router each day, the number of sampled packets each day is a BRV with parameters and To know how much memory is necessary for such a sample, determine the expectation of MAT-72306 RandAl, Spring 2015 22-Jan-15 93 If is a BRV with parameters and, then is the number of successes in trials, where each trial is successful with probability Define a set of indicator RVs,,, where =1if the th trial is successful and 0 otherwise Clearly, ]=and = and so, by the linearity of expectations, = MAT-72306 RandAl, Spring 2015 22-Jan-15 94 10

2.3. Conditional Expectation Definition 2.6: ] = Pr where the summation is over all in the range of The conditional expectation of a RV is, like, a weighted sum of the values it assumes Now each value is weighted by the conditional probability that the variable assumes that value MAT-72306 RandAl, Spring 2015 22-Jan-15 95, Suppose that we independently roll two standard six-sided dice Let be the number that shows on the first die, the number on the second die, and the sum of the numbers on the two dice Then = 2 = Pr =2 1 = 11 6 2 MAT-72306 RandAl, Spring 2015 22-Jan-15 96 11

As another example, consider =5: = 5 = Pr =5 = Pr =5 Pr =5 = 136 = 5 436 2 MAT-72306 RandAl, Spring 2015 22-Jan-15 97 Lemma 2.5: For any RVs and, ]= Pr ], where the sum is over all values in the range of and all of the expectations exist. Proof: Pr =Pr = Pr Pr Pr = Pr = Pr ] MAT-72306 RandAl, Spring 2015 22-Jan-15 98 12

The linearity of expectations also extends to conditional expectations Lemma 2.6: For any finite collection of discrete RVs,, with finite expectations and for any RV, = ] MAT-72306 RandAl, Spring 2015 22-Jan-15 99 Confusingly, the conditional expectation is also used to refer to the following RV Definition 2.7: The expression ] is a RV ) that takes on the value ] when ] is not a real value; it is actually a function of the RV Hence ] is itself a function from the sample space to the real numbers and can therefore be thought of as a RV MAT-72306 RandAl, Spring 2015 22-Jan-15 100 13

In the previous example of rolling two dice, =Pr = 1 6 + 7 2 We see that is a RV whose value depends on If ] is a RV, then it makes sense to consider its expectation ] We found that + 72 Thus, + 7 2 =7 2 +7 2 = 7 = ] MAT-72306 RandAl, Spring 2015 22-Jan-15 101 More generally, Theorem 2.7: Y = ] Proof: From Definition 2.7 we have, where takes on the value when. Hence = Pr The right-hand side equals Y by Lemma 2.5. MAT-72306 RandAl, Spring 2015 22-Jan-15 102 14

Consider a program that includes one call to a process Assume that each call to process recursively spawns new copies of the process, where the number of new copies is a BRV with parameters and We assume that these random variables are independent for each call to What is the expected number of copies of the process generated by the program? MAT-72306 RandAl, Spring 2015 22-Jan-15 103 To analyze this recursive spawning process, we use generations The initial process is in generation 0 Otherwise, we say that a process is in generation if it was spawned by another process in generation 1 Let denote the number of processes in generation Since we know that =1, the number of processes in generation 1 has a binomial distribution Thus, = MAT-72306 RandAl, Spring 2015 22-Jan-15 104 15

Similarly, suppose we knew that the number of processes in generation 1was, so Then Applying Theorem 2.7, we can compute the expected size of the th generation inductively We have ] By induction on, and using the fact that =1, we then obtain = MAT-72306 RandAl, Spring 2015 22-Jan-15 105 The expected total number of copies of process generated by the program is given by = = If 1then the expectation is unbounded; if <1, the expectation is 1 (1 ) The # of processes generated by the program is bounded iff the # of processes spawned by each process is less than 1 This is a simple example of a branching process, a probabilistic paradigm extensively studied in probability theory MAT-72306 RandAl, Spring 2015 22-Jan-15 106 16

2.4. The Geometric Distribution Let us flip a coin until it lands onheads What is the distribution of the number of flips? This is an example of a geometric distribution It arises when we perform a sequence of independent trials until the first success, where each trial succeeds with probability Definition 2.8: A geometric RV with parameter is given by the following probability distribution on = 1,2, : Pr ) = MAT-72306 RandAl, Spring 2015 22-Jan-15 107 Geometric RVs are said to be memoryless because the probability that you will reach your first success trials from now is independent of the number of failures you have experienced Informally, one can ignore past failures they do not change the distribution of the number of future trials until first success Formally, we have the following Lemma 2.8: For a geometric RV with parameter and for >0, Pr ) = Pr MAT-72306 RandAl, Spring 2015 22-Jan-15 108 17

When a RV takes values in the set of natural numbers = {0,1,2,3, } there is an alternative formula for calculating its expectation Lemma 2.9: Let be a discrete RV that takes on only nonnegative integer values. Then Proof: ]= Pr = Pr Pr = Pr = Pr ] MAT-72306 RandAl, Spring 2015 22-Jan-15 109 For a geometric RV with parameter, Pr = = Hence = = 1 (1) = 1 Thus, for a fair coin where = 1/2, on average it takes two flips to see the first heads MAT-72306 RandAl, Spring 2015 22-Jan-15 110 18

Finding the expectation of a geometric RV with parameter using conditional expectations and the memoryless property of geometric RVs Recall that corresponds to the number of flips until the first heads given that each flip isheads with probability Let =0if the first flip istails and =1if the first flip isheads By the identity from Lemma 2.5, = Pr = 0 =0 + Pr = 1 = 1] = (1 )[ = 0] + [ = 1] MAT-72306 RandAl, Spring 2015 22-Jan-15 111 If = 1 then = 1, so [ = 1] = 1 If =0, then >1 In this case, let the number of remaining flips (after the first flip until the first heads) be Then, by the linearity of expectations, ]=(1+1]+1=(1]+1 By the memoryless property of geometric RVs, is also a geometric RV with parameter Hence ] = ], since they both have the same distribution We therefore have ]=(1]+1= (1)[]+1, which yields [] = 1/ MAT-72306 RandAl, Spring 2015 22-Jan-15 112 19

2.4.1. Example: Coupon Collector's Problem Each box of cereal contains one of different coupons Once you obtain one of every type of coupon, you can send in for a prize Coupon in each box is chosen independently and uniformly at random from the possibilities and that you do not collaborate to collect coupons How many boxes of cereal must you buy before you obtain at least one of every type of coupon? MAT-72306 RandAl, Spring 2015 22-Jan-15 113 Let be the number of boxes bought until at least one of every type of coupon is obtained If is the number of boxes bought while you had exactly 1different coupons, then clearly = The advantage of breaking into a sum of random variables, = 1,,, is that each is a geometric RV When exactly 1coupons have been found, the probability of obtaining a new coupon is =1 1 MAT-72306 RandAl, Spring 2015 22-Jan-15 114 20

Hence, is a geometric RV with parameter : = 1 = +1 Using the linearity of expectations, we have that = = +1 1 MAT-72306 RandAl, Spring 2015 22-Jan-15 115 The summation harmonic number ) is known as the Lemma 2.10: The harmonic number = satisfies ) = ln (1). Thus, for the coupon collector's problem, the expected number of random coupons required to obtain all coupons is ln MAT-72306 RandAl, Spring 2015 22-Jan-15 116 21

Given the first and second moments, one can compute the variance and standard deviation of the RV Intuitively, the variance and standard deviation offer a measure of how far the RV is likely to be from its expectation Definition 3.2: The variance of a RV is ] The standard deviation of a RV is ]= MAT-72306 RandAl, Spring 2015 22-Jan-15 122 The two forms of the variance in the definition are equivalent, as is easily seen by using the linearity of expectations Keeping in mind that is a constant, we have [ ]= ] ] MAT-72306 RandAl, Spring 2015 22-Jan-15 123 22