Lecture 4: More on Continuous Random Variables and Functions of Random Variables

Similar documents
Principle of Data Reduction

Bayes and Naïve Bayes. cs534-machine Learning

STA 4273H: Statistical Machine Learning

Pattern Analysis. Logistic Regression. 12. Mai Joachim Hornegger. Chair of Pattern Recognition Erlangen University

1 Prior Probability and Posterior Probability

Chapter 3 RANDOM VARIATE GENERATION

Department of Mathematics, Indian Institute of Technology, Kharagpur Assignment 2-3, Probability and Statistics, March Due:-March 25, 2015.

Lecture Notes 1. Brief Review of Basic Probability

FEGYVERNEKI SÁNDOR, PROBABILITY THEORY AND MATHEmATICAL

Logistic Regression (1/24/13)

1 Sufficient statistics

CHAPTER 2 Estimating Probabilities

DERIVATIVES AS MATRICES; CHAIN RULE

Master s Theory Exam Spring 2006

Some probability and statistics

Statistics 100A Homework 8 Solutions

Chapter 6: Point Estimation. Fall Probability & Statistics

Overview of Monte Carlo Simulation, Probability Review and Introduction to Matlab

Math 461 Fall 2006 Test 2 Solutions

Chapter 3: DISCRETE RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS. Part 3: Discrete Uniform Distribution Binomial Distribution

Lecture 3: Linear methods for classification

Section 12.6: Directional Derivatives and the Gradient Vector

( ) is proportional to ( 10 + x)!2. Calculate the

Probability Calculator

Pr(X = x) = f(x) = λe λx

2WB05 Simulation Lecture 8: Generating random variables

MULTIVARIATE PROBABILITY DISTRIBUTIONS

NOV /II. 1. Let f(z) = sin z, z C. Then f(z) : 3. Let the sequence {a n } be given. (A) is bounded in the complex plane

MATHEMATICAL METHODS OF STATISTICS

E3: PROBABILITY AND STATISTICS lecture notes

6 PROBABILITY GENERATING FUNCTIONS

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Consumer Theory. The consumer s problem

STAT 360 Probability and Statistics. Fall 2012

Basics of Statistical Machine Learning

UNIT I: RANDOM VARIABLES PART- A -TWO MARKS

What is Statistics? Lecture 1. Introduction and probability review. Idea of parametric inference

Introduction to Probability

DIFFERENTIABILITY OF COMPLEX FUNCTIONS. Contents

Lecture 8: Signal Detection and Noise Assumption

To give it a definition, an implicit function of x and y is simply any relationship that takes the form:

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

Maximum Likelihood Estimation

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Probability Generating Functions

Probability density function : An arbitrary continuous random variable X is similarly described by its probability density function f x = f X

Sampling Distributions

Linear Threshold Units

Numerical methods for American options

Boutique AFNOR pour : JEAN-LUC BERTRAND-KRAJEWSKI le 16/10/ :35. GUIDE 98-3/Suppl.1

SAS Software to Fit the Generalized Linear Model

Lecture 7: Continuous Random Variables

SF2940: Probability theory Lecture 8: Multivariate Normal Distribution

BayesX - Software for Bayesian Inference in Structured Additive Regression

3.4. The Binomial Probability Distribution. Copyright Cengage Learning. All rights reserved.

The Analysis of Data. Volume 1. Probability. Guy Lebanon

Notes on the Negative Binomial Distribution

Spatial Statistics Chapter 3 Basics of areal data and areal data modeling

Part 2: One-parameter models

We can express this in decimal notation (in contrast to the underline notation we have been using) as follows: b + 90c = c + 10b

Class Meeting # 1: Introduction to PDEs

The Probit Link Function in Generalized Linear Models for Data Mining Applications

The Dirichlet-Multinomial and Dirichlet-Categorical models for Bayesian inference

Microeconomic Theory: Basic Math Concepts

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

For a partition B 1,..., B n, where B i B j = for i. A = (A B 1 ) (A B 2 ),..., (A B n ) and thus. P (A) = P (A B i ) = P (A B i )P (B i )

Homework 4 - KEY. Jeff Brenion. June 16, Note: Many problems can be solved in more than one way; we present only a single solution here.

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression

3F3: Signal and Pattern Processing

PUTNAM TRAINING POLYNOMIALS. Exercises 1. Find a polynomial with integral coefficients whose zeros include

Important Probability Distributions OPRE 6301

LINEAR MAPS, THE TOTAL DERIVATIVE AND THE CHAIN RULE. Contents

MATH 425, PRACTICE FINAL EXAM SOLUTIONS.

Statistical Machine Learning

Linear Classification. Volker Tresp Summer 2015

Sections 2.11 and 5.8

Review of Random Variables

SF2940: Probability theory Lecture 8: Multivariate Normal Distribution

Machine Learning.

6.1 Add & Subtract Polynomial Expression & Functions

MATH 10: Elementary Statistics and Probability Chapter 5: Continuous Random Variables

Lecture 6: Discrete & Continuous Probability and Random Variables

A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution

WHERE DOES THE 10% CONDITION COME FROM?

Multivariate Normal Distribution

Definition: Suppose that two random variables, either continuous or discrete, X and Y have joint density

LS.6 Solution Matrices

Math 431 An Introduction to Probability. Final Exam Solutions

LECTURE 16. Readings: Section 5.1. Lecture outline. Random processes Definition of the Bernoulli process Basic properties of the Bernoulli process

5 Double Integrals over Rectangular Regions

STAT 830 Convergence in Distribution

M2S1 Lecture Notes. G. A. Young ayoung

Probability and Random Variables. Generation of random variables (r.v.)

Introduction to Mobile Robotics Bayes Filter Particle Filter and Monte Carlo Localization

A Tutorial on Probability Theory

Probability Concepts Probability Distributions. Margaret Priest Nokuthaba Sibanda

Gaussian Processes in Machine Learning

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 2. x n. a 11 a 12 a 1n b 1 a 21 a 22 a 2n b 2 a 31 a 32 a 3n b 3. a m1 a m2 a mn b m

Transcription:

Lecture 4: More on Continuous Random Variables and Functions of Random Variables ELE 525: Random Processes in Information Systems Hisashi Kobayashi Department of Electrical Engineering Princeton University September 25, 2013 Textbook: Hisashi Kobayashi, Brian L. Mark and William Turin, Probability, Random Processes and Statistical Analysis (Cambridge University Press, 2012) 9/25/2013 Copyright Hisashi Kobayashi 2013 1

If F XY (x, y) is everywhere continuous and possesses a second partial derivative everywhere, we define the joint PDF by The conditional distribution function of RV Y, given X=x, is 9/25/2013 Copyright Hisashi Kobayashi 2013 2

The conditional expectation of X given Y is defined by where The law of iterated expectations holds: The conditional expectation is the best estimate of X as a function of Y in the minimum mean square error (MMSE) sense (see Section 22.1.3, pp. 649-651) 9/25/2013 Copyright Hisashi Kobayashi 2013 3

4.3.1 Bivariate normal (or Gaussian) distribution The standard bivariate normal distribution is defined by 9/25/2013 Copyright Hisashi Kobayashi 2013 4

When ρ=0, the RVs U 1 and U 2 are said to be uncorrelated and Thus, the bivariate normal variables are independent when they are uncorrelated. (Two uncorrelated RVs are not necessarily independent, unless they are normal RVs.) The conditional PDF of U 2 given U 1 =u 1 can be computed as which is also a normal distribution, with mean ρu 1 and variance 1-ρ 2. 9/25/2013 Copyright Hisashi Kobayashi 2013 5

Define RVs X 1 and X 2 by Then the joint PDF of X 1 and X 2 is where Adopt a vector notation: Then 9/25/2013 Copyright Hisashi Kobayashi 2013 6

where C is the covariance matrix, given by and 9/25/2013 Copyright Hisashi Kobayashi 2013 7

A family of PDFs (or PMFs) of the form is called an exponential family. The function T(x) is called the sufficient statistic. is called the canonical (or natural) exponential family. The exponential family of distributions includes the exponential, gamma, normal, Poisson, binomial distributions, etc. 9/25/2013 Copyright Hisashi Kobayashi 2013 8

9/25/2013 Copyright Hisashi Kobayashi 2013 9

Suppose that an observed sample X is drawn from a certain family of distributions specified by parameter θ. The Bayesian treats this parameter as a RV Θ, which is assigned a prior PDF π(θ)=f Θ (θ). If RV X is a discrete RV, we have from Bayes theorem (2.63) If the RV X is a continuous RV, 9/25/2013 Copyright Hisashi Kobayashi 2013 10

The conditional PDF f(x θ) is called the likelihood function, when it is viewed as a function of θ with given x, and is denoted as Then the posterior distribution can be written as For certain choices of the prior distribution, the posterior distribution has the same mathematical form as the prior distribution. Such prior distribution is called a conjugate prior (distribution) of the given likelihood function. 9/25/2013 Copyright Hisashi Kobayashi 2013 11

Example 4.4: The Bernoulli distribution and its conjugate prior, the beta distribution Write the probability of success as θ (instead of p). Define the binary variable X i which takes on 1 or 0, depending on the ith trial is a success (s) or failure (f). Then, we can write For n independent trials we observe the data The likelihood function of θ given x is As a prior distribution, consider the beta distribution: where α and β are called prior hyperparameters (cf, the model parameter θ). 9/25/2013 Copyright Hisashi Kobayashi 2013 12

9/25/2013 Copyright Hisashi Kobayashi 2013 13

(b) 9/25/2013 Note: The rightmost curve corresponds to (5, 2) Copyright Hisashi Kobayashi 2013 14

The beta function is related to the gamma function (see (4.31) of p. 78) The mean and variance of this prior distribution are The posterior probability can be evaluated as Thus, the posterior probability is also a beta distribution Beta(θ; α 1, β 1 ), 9/25/2013 Copyright Hisashi Kobayashi 2013 15

where we call α 1 and β 1 the posterior hyperparameters, and is the maximum likelihood estimate (MLE) of θ, which is the value that maximizes the likelihood function L x (θ) of (4.139). As the sample size n increases, the weight on the prior means diminishes, whereas the weight on the MLE approaches one. This behavior illustrates how Bayesian inference generally works. For a likelihood function that belongs to the exponential family, i.e., conjugate priors can be constructed as follows: then the posterior distribution takes the form i.e., α 1 = α + T(x), and β 1 =1+ β. 9/25/2013 Copyright Hisashi Kobayashi 2013 16

5 Functions of Random Variables and Their Distributions 5.1 Functions of One Random Variable Consider Y=g(X), where X is a RV and g( ) is a mapping from R to R. Then Y is also a RV with where Then where 9/25/2013 Copyright Hisashi Kobayashi 2013 17

Example 4.2 Square law detector. Consider Y=g(X)=X 2. Then By differentiating this, An alternative way to derive the above PDF: x 1 = and x 2 = - Note that y=x 2 has two solutions Then, 9/25/2013 Copyright Hisashi Kobayashi 2013 18

9/25/2013 Copyright Hisashi Kobayashi 2013 19

Generalization of the previous example: Suppose that for given y, y=g(x) has multiple solutions x 1, x 2,, x m, where the number of solutions, m, depends on y. So we write it as m(y). If g(x) is continuous at all these m(y) points, then 9/25/2013 Copyright Hisashi Kobayashi 2013 20

Let Then where Example 5.3: Sum of two RVs: Consider Z=X+Y. Then We can represent where ={(X, Y): y<y<y+dy, --- <X<z-y} is a horizontal strip of width dy. 9/25/2013 Copyright Hisashi Kobayashi 2013 21

Thus, 9/25/2013 Copyright Hisashi Kobayashi 2013 22

Consider Leibniz s rule (5.94): Then Thus, 9/25/2013 Copyright Hisashi Kobayashi 2013 23

If X and Y are independent, then 9/25/2013 Copyright Hisashi Kobayashi 2013 24

Assume that g(x, y) and h(x, y) are continuous and differentiable functions. Given (U, V)=(u,v), there are multiple solutions (X,Y)=(x i, y i ), i=1, 2,, m such that Let the inverse mapping be 9/25/2013 Copyright Hisashi Kobayashi 2013 25

Note: In the above figure (a) B, C and D should be labeled as B, D and C, respectively. In (b), C and D should be labeled as D and C, respectively. 9/25/2013 Copyright Hisashi Kobayashi 2013 26

The probability that (U, V) falls in the rectangular ABCD: = where is the area A B C D. Recall the formula (Problem 5.17) for the area S of a triangular defined by (x 1, y 1 ), (x 2, y 2 ) and (x 3, y 3 ) Then 9/25/2013 Copyright Hisashi Kobayashi 2013 27

Define the Jacobian matrix of the mapping p i (u, v) and q i (u, v): Then The determinant det J is called the Jacobian or Jacobian determinant. If we define the Jacobian matrix of the original mapping by then 9/25/2013 Copyright Hisashi Kobayashi 2013 28

Example 5.6: Two linear transformations. g(x, Y)=aX +by and h(x, Y)= cx + dy, (ad-bc 0) Thus, where 9/25/2013 Copyright Hisashi Kobayashi 2013 29

Consider a special case, a=b=c=1 and d=0, i.e., U=X + Y and V=X. Then If we set a=b=d=1 and c=0, i.e., U=X+Y and V=Y. Then 9/25/2013 Copyright Hisashi Kobayashi 2013 30