Correlation in Random Variables



Similar documents
Covariance and Correlation

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Part 2: Analysis of Relationship Between Two Variables

Econometrics Simple Linear Regression

MULTIVARIATE PROBABILITY DISTRIBUTIONS

Joint Exam 1/P Sample Exam 1

The Bivariate Normal Distribution

Introduction: Overview of Kernel Methods

Solution to HW - 1. Problem 1. [Points = 3] In September, Chapel Hill s daily high temperature has a mean

Department of Mathematics, Indian Institute of Technology, Kharagpur Assignment 2-3, Probability and Statistics, March Due:-March 25, 2015.

Autocovariance and Autocorrelation

Lecture 3: Linear methods for classification

Generating Random Numbers Variance Reduction Quasi-Monte Carlo. Simulation Methods. Leonid Kogan. MIT, Sloan , Fall 2010

SF2940: Probability theory Lecture 8: Multivariate Normal Distribution

Probability Calculator

Sections 2.11 and 5.8

Some probability and statistics

SF2940: Probability theory Lecture 8: Multivariate Normal Distribution

Lecture 3: Continuous distributions, expected value & mean, variance, the normal distribution

Point Biserial Correlation Tests

Module 3: Correlation and Covariance

A SURVEY ON CONTINUOUS ELLIPTICAL VECTOR DISTRIBUTIONS

Data Mining: Algorithms and Applications Matrix Math Review

17. SIMPLE LINEAR REGRESSION II

Review Jeopardy. Blue vs. Orange. Review Jeopardy

The sample space for a pair of die rolls is the set. The sample space for a random number between 0 and 1 is the interval [0, 1].

Algebra I Vocabulary Cards

For a partition B 1,..., B n, where B i B j = for i. A = (A B 1 ) (A B 2 ),..., (A B n ) and thus. P (A) = P (A B i ) = P (A B i )P (B i )

Lecture 4: Joint probability distributions; covariance; correlation

Linear Equations. Find the domain and the range of the following set. {(4,5), (7,8), (-1,3), (3,3), (2,-3)}

Lecture 2. Marginal Functions, Average Functions, Elasticity, the Marginal Principle, and Constrained Optimization

Section 14 Simple Linear Regression: Introduction to Least Squares Regression

SECOND DERIVATIVE TEST FOR CONSTRAINED EXTREMA

Lecture 5: Mathematical Expectation

Introduction to General and Generalized Linear Models

Section 6.1 Joint Distribution Functions

Correlation key concepts:

Lecture 9: Introduction to Pattern Analysis

CORRELATION ANALYSIS

Homework 11. Part 1. Name: Score: / null

Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares

Multivariate normal distribution and testing for means (see MKB Ch 3)

5 Systems of Equations

Chapter 4 Lecture Notes

1 Portfolio mean and variance

Financial Risk Management Exam Sample Questions/Answers

ST 371 (IV): Discrete Random Variables

Quadratic forms Cochran s theorem, degrees of freedom, and all that

CORRELATED TO THE SOUTH CAROLINA COLLEGE AND CAREER-READY FOUNDATIONS IN ALGEBRA

Web-based Supplementary Materials for Bayesian Effect Estimation. Accounting for Adjustment Uncertainty by Chi Wang, Giovanni

Lecture 6: Discrete & Continuous Probability and Random Variables

How To Find The Correlation Of Random Bits With The Xor Operator

Lecture 21. The Multivariate Normal Distribution

CS229 Lecture notes. Andrew Ng

Feb 28 Homework Solutions Math 151, Winter Chapter 6 Problems (pages )

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)

Review of Fundamental Mathematics

Lecture Notes on Elasticity of Substitution

Illustration (and the use of HLM)

Numerical Solution of Differential Equations

Math 151. Rumbos Spring Solutions to Assignment #22

1 Short Introduction to Time Series

Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur

Dimensionality Reduction: Principal Components Analysis

99.37, 99.38, 99.38, 99.39, 99.39, 99.39, 99.39, 99.40, 99.41, cm

Lesson 20. Probability and Cumulative Distribution Functions

Math 431 An Introduction to Probability. Final Exam Solutions

Univariate Time Series Analysis; ARIMA Models

Simple Regression Theory II 2010 Samuel L. Baker

Vector Notation: AB represents the vector from point A to point B on a graph. The vector can be computed by B A.

Lean Six Sigma Analyze Phase Introduction. TECH QUALITY and PRODUCTIVITY in INDUSTRY and TECHNOLOGY

What are the place values to the left of the decimal point and their associated powers of ten?

JUST THE MATHS UNIT NUMBER 1.8. ALGEBRA 8 (Polynomials) A.J.Hobson

Sales forecasting # 2

Multivariate Normal Distribution

Correlation. What Is Correlation? Perfect Correlation. Perfect Correlation. Greg C Elvers

DATA INTERPRETATION AND STATISTICS

Multivariate Analysis (Slides 13)

Least Squares Estimation

" Y. Notation and Equations for Regression Lecture 11/4. Notation:

Solving Quadratic Equations

MA107 Precalculus Algebra Exam 2 Review Solutions

3.2. Solving quadratic equations. Introduction. Prerequisites. Learning Outcomes. Learning Style

5. Multiple regression

Chapter 20. Vector Spaces and Bases

Variance Reduction. Pricing American Options. Monte Carlo Option Pricing. Delta and Common Random Numbers

ECE302 Spring 2006 HW5 Solutions February 21,

3.1 Solving Systems Using Tables and Graphs

Elements of a graph. Click on the links below to jump directly to the relevant section

Chapter 5 Risk and Return ANSWERS TO SELECTED END-OF-CHAPTER QUESTIONS

Regression III: Advanced Methods

Definition: Suppose that two random variables, either continuous or discrete, X and Y have joint density

Probability and Statistics Vocabulary List (Definitions for Middle School Teachers)

Statistics 100A Homework 8 Solutions

x 2 + y 2 = 1 y 1 = x 2 + 2x y = x 2 + 2x + 1

College of the Holy Cross, Spring 2009 Math 373, Partial Differential Equations Midterm 1 Practice Questions

Multiple Choice: 2 points each

Examples of Functions

5. Continuous Random Variables

Transcription:

Correlation in Random Variables Lecture 11 Spring 2002

Correlation in Random Variables Suppose that an experiment produces two random variables, X and Y. What can we say about the relationship between them? One of the best ways to visualize the possible relationship is to plot the (X, Y )pairthat is produced by several trials of the experiment. An example of correlated samples is shown at the right Lecture 11 1

Joint Density Function The joint behavior of X and Y is fully captured in the joint probability distribution. For a continuous distribution E[X m Y n ]= xm y n f XY (x, y)dxdy For discrete distributions E[X m Y n ]= x S x y S y x m y n P (x, y) Lecture 11 2

Covariance Function The covariance function is a number that measures the common variation of X and Y. It is defined as cov(x, Y ) = E[(X E[X])(Y E[Y ])] = E[XY ] E[X]E[Y ] The covariance is determined by the difference in E[XY ]and E[X]E[Y ]. If X and Y were statistically independent then E[XY ] would equal E[X]E[Y ] and the covariance would be zero. The covariance of a random variable with itself is equal to its variance. cov[x, X] =E[(X E[X]) 2 ]=var[x] Lecture 11 3

Correlation Coefficient The covariance can be normalized to produce what is known as the correlation coefficient, ρ. ρ = cov(x, Y) var(x)var(y) The correlation coefficient is bounded by 1 ρ 1. It will have value ρ = 0 when the covariance is zero and value ρ = ±1 when X and Y are perfectly correlated or anti-correlated. Lecture 11 4

Autocorrelation Function The autocorrelation function is very similar to the covariance function. It is defined as R(X, Y )=E[XY ]=cov(x, Y )+E[X]E[Y ] It retains the mean values in the calculation of the value. random variables are orthogonal if R(X, Y )= 0. The Lecture 11 5

Normal Distribution f XY (x, y) = 1 2πσ x σ y 1 ρ 2 exp ( x µx σ x ) 2 ( ) ( ) 2ρ x µx y µ y σ x σ y + 2(1 ρ 2 ) ( ) 2 y µy σ y Lecture 11 6

Normal Distribution The orientation of the elliptical contours is along the line y = x if ρ>0 and along the line y = x if ρ<0. The contours are a circle, and the variables are uncorrelated, if ρ =0. The center of the ellipse is (µ x,µ y ). Lecture 11 7

Linear Estimation The task is to construct a rule for the prediction Ŷ of Y based on an observation of X. If the random variables are correlated then this should yield a better result, on the average, than just guessing. We are encouraged to select a linear rule when we note that the sample points tend to fall about a sloping line. Ŷ = ax + b where a and b are parameters to be chosen to provide the best results. We would expect a to correspond to the slope and b to the intercept. Lecture 11 8

Minimize Prediction Error To find a means of calculating the coefficients from a set of sample points, construct the predictor error ε = E[(Y Ŷ ) 2 ] We want to choose a and b to minimize ε. Therefore, compute the appropriate derivatives and set them to zero. ε a ε b = 2E[(Y Ŷ ) Ŷ a ]=0 = 2E[(Y Ŷ ) Ŷ b ]=0 These can be solved for a and b in terms of the expected values. The expected values can be themselves estimated from the sample set. Lecture 11 9

Prediction Error Equations The above conditions on a and b are equivalent to E[(Y Ŷ )X] =0 E[Y Ŷ ]=0 The prediction error Y Ŷ must be orthogonal to X and the expected prediction error must be zero. Substituting Ŷ = ax + b leads to a pair of equations to be solved for a and b. E[(Y ax b)x] = E[XY ] ae[x 2 ] be[x] = 0 E[Y ax b] = E[Y ] ae[x] b = 0 [ E[X 2 ] E[X] E[X] 1 ][ a b ] = [ E[XY ] E[Y ] ] Lecture 11 10

Prediction Error a = cov(x, Y) var(x) b = cov(x, Y) E[Y ] var(x) E[X] The prediction error with these parameter values is ε =(1 ρ 2 )var(y) When the correlation coefficient ρ = ±1 the error is zero, meaning that perfect prediction can be made. When ρ = 0 the variance in the prediction is as large as the variation in Y, and the predictor is of no help at all. For intermediate values of ρ, whether positive or negative, the predictor reduces the error. Lecture 11 11

Linear Predictor Program The program lp.pro computes the coefficients [a, b] as well as the covariance matrix C and the correlation coefficient, ρ. The covariance matrix is [ ] var[x] cov[x, Y ] C = cov[x, Y ] var[y ] Usage example: N=100 X=Randomn(seed,N) Z=Randomn(seed,N) Y=2*X-1+0.2*Z p=lp(x,y,c,rho) print, Predictor Coefficients=,p print, Covariance matrix print,c print, Correlation Coefficient=,rho Lecture 11 12

Program lp function lp,x,y,c,rho,mux,muy ;Compute the linear predictor coefficients such that ;Yhat=aX+b is the minimum mse estimate of Y based on X. ;Shorten the X and Y vectors to the length of the shorter. n=n_elements(x) < n_elements(y) X=(X[0:n-1])[*] Y=(Y[0:n-1])[*] ;Compute the mean value of each. mux=total(x)/n muy=total(y)/n Continued on next page Lecture 11 13

;Compute the covariance matrix. V=[[X-mux],[Y-muy]] C=V##transpose(V)/(n-1) Program lp (continued) ;Compute the predictor coefficient and constant. a=c[0,1]/c[0,0] b=muy-a*mux ;Compute the correlation coefficient rho=c[0,1]/sqrt(c[0,0]*c[1,1]) Return,[a,b] END Lecture 11 14

Example Predictor Coefficients= 1.99598-1.09500 Correlation Coefficient=0.812836 Covariance Matrix 0.950762 1.89770 1.89770 5.73295 Lecture 11 15

IDL Regress Function IDL provides a number of routines for the analysis of data. function REGRESS does multiple linear regression. The Compute the predictor coefficient a and constant b by a=regress(x,y,const=b) print,[a,b] 1.99598-1.09500 Lecture 11 16

New Concept Introduction to Random Processes Today we will just introduce the basic ideas. Lecture 11 17

Random Processes Lecture 11 18

Random Process A random variable is a function X(e) that maps the set of experiment outcomes to the set of numbers. A random process is a rule that maps every outcome e of an experiment to a function X(t, e). A random process is usually conceived of as a function of time, but there is no reason to not consider random processes that are functions of other independent variables, such as spatial coordinates. The function X(u, v, e) would be a function whose value depended on the location (u, v) and the outcome e, and could be used in representing random variations in an image. Lecture 11 19

Random Process The domain of e is the set of outcomes of the experiment. We assume that a probability distribution is known for this set. The domain of t is a set, T, of real numbers. If T is the real axis then X(t, e) is a continuous-time random process If T is the set of integers then X(t, e) is a discrete-time random process We will often suppress the display of the variable e and write X(t) for a continuous-time RP and X[n] orx n for a discrete-time RP. Lecture 11 20

Random Process A RP is a family of functions, X(t, e). Imagine a giant strip chart recording in which each pen is identified with a different e. This family of functions is traditionally called an ensemble. A single function X(t, e k ) is selected by the outcome e k. This is just a time function that we could call X k (t). Different outcomes give us different time functions. If t is fixed, say t = t 1, then X(t 1,e) is a random variable. value depends on the outcome e. Its If both t and e are given then X(t, e) is just a number. Lecture 11 21

Moments and Averages X(t 1,e) is a random variable that represents the set of samples across the ensemble at time t 1 If it has a probability density function f X (x; t 1 ) then the moments are m n (t 1 )=E[X n (t 1 )] = xn f X (x; t 1 ) dx The notation f X (x; t 1 ) may be necessary because the probability density may depend upon the time the samples are taken. The mean value is µ X = m 1, which may be a function of time. The central moments are E[(X(t 1 ) µ X (t 1 )) n ]= (x µ X(t 1 )) n f X (x; t 1 ) dx Lecture 11 22

Pairs of Samples The numbers X(t 1,e)andX(t 2,e) are samples from the same time function at different times. They are a pair of random variables (X 1,X 2 ). They have a joint probability density function f(x 1,x 2 ; t 1,t 2 ). From the joint density function one can compute the marginal densities, conditional probabilities and other quantities that may be of interest. Lecture 11 23