Covariance and Correlation. Consider the joint probability distribution f XY (x, y).

Similar documents
Covariance and Correlation

MULTIVARIATE PROBABILITY DISTRIBUTIONS

Correlation in Random Variables

Section 6.1 Joint Distribution Functions

Solution to HW - 1. Problem 1. [Points = 3] In September, Chapel Hill s daily high temperature has a mean

Math 431 An Introduction to Probability. Final Exam Solutions

For a partition B 1,..., B n, where B i B j = for i. A = (A B 1 ) (A B 2 ),..., (A B n ) and thus. P (A) = P (A B i ) = P (A B i )P (B i )

Feb 28 Homework Solutions Math 151, Winter Chapter 6 Problems (pages )

The Bivariate Normal Distribution

Some probability and statistics

Statistics 100A Homework 7 Solutions

Lesson 20. Probability and Cumulative Distribution Functions

Microeconomic Theory: Basic Math Concepts

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Econometrics Simple Linear Regression

So, using the new notation, P X,Y (0,1) =.08 This is the value which the joint probability function for X and Y takes when X=0 and Y=1.

Lecture 6: Discrete & Continuous Probability and Random Variables

Joint Exam 1/P Sample Exam 1

Chapter 4 - Lecture 1 Probability Density Functions and Cumul. Distribution Functions

Part 2: Analysis of Relationship Between Two Variables

Statistics 100A Homework 8 Solutions

Lecture 8. Generating a non-uniform probability distribution

Definition: Suppose that two random variables, either continuous or discrete, X and Y have joint density

Mathematical Expectation

SF2940: Probability theory Lecture 8: Multivariate Normal Distribution

Department of Mathematics, Indian Institute of Technology, Kharagpur Assignment 2-3, Probability and Statistics, March Due:-March 25, 2015.

Section 5.1 Continuous Random Variables: Introduction

Lecture 5: Mathematical Expectation

Understanding and Applying Kalman Filtering

SF2940: Probability theory Lecture 8: Multivariate Normal Distribution

Section 3 Part 1. Relationships between two numerical variables

Definition 8.1 Two inequalities are equivalent if they have the same solution set. Add or Subtract the same value on both sides of the inequality.

Section 1.1 Linear Equations: Slope and Equations of Lines

Hypothesis testing - Steps

Session 7 Bivariate Data and Analysis

Practice with Proofs

Data Mining: Algorithms and Applications Matrix Math Review

Discrete Mathematics and Probability Theory Fall 2009 Satish Rao, David Tse Note 18. A Brief Introduction to Continuous Probability

ECE302 Spring 2006 HW5 Solutions February 21,

Lecture 4: Joint probability distributions; covariance; correlation

Linear and quadratic Taylor polynomials for functions of several variables.

Probability Calculator

Relationships Between Two Variables: Scatterplots and Correlation

Probability for Estimation (review)

ISyE 6761 Fall 2012 Homework #2 Solutions

Correlation Coefficient The correlation coefficient is a summary statistic that describes the linear relationship between two numerical variables 2

THE NUMBER OF GRAPHS AND A RANDOM GRAPH WITH A GIVEN DEGREE SEQUENCE. Alexander Barvinok

Fundamentals of Probability and Statistics for Reliability. analysis. Chapter 2

Math 461 Fall 2006 Test 2 Solutions

Describing Relationships between Two Variables

Dependence Concepts. by Marta Monika Wawrzyniak. Master Thesis in Risk and Environmental Modelling Written under the direction of Dr Dorota Kurowicka

Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares

M2S1 Lecture Notes. G. A. Young ayoung

CORRELATION ANALYSIS

Lecture 3: Continuous distributions, expected value & mean, variance, the normal distribution

Risk and return (1) Class 9 Financial Management,

Epipolar Geometry. Readings: See Sections 10.1 and 15.6 of Forsyth and Ponce. Right Image. Left Image. e(p ) Epipolar Lines. e(q ) q R.

Overview of Monte Carlo Simulation, Probability Review and Introduction to Matlab

1. Let A, B and C are three events such that P(A) = 0.45, P(B) = 0.30, P(C) = 0.35,

Martingale Ideas in Elementary Probability. Lecture course Higher Mathematics College, Independent University of Moscow Spring 1996

Generating Random Numbers Variance Reduction Quasi-Monte Carlo. Simulation Methods. Leonid Kogan. MIT, Sloan , Fall 2010

Pearson s Correlation

Notes on Continuous Random Variables

Math 370, Spring 2008 Prof. A.J. Hildebrand. Practice Test 2 Solutions

Math 1B, lecture 5: area and volume

Lecture 3: Linear methods for classification

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Introduction to Matrix Algebra

Lecture Notes 1. Brief Review of Basic Probability

Quadratic forms Cochran s theorem, degrees of freedom, and all that

Classification Problems

CHAPTER 6: Continuous Uniform Distribution: 6.1. Definition: The density function of the continuous random variable X on the interval [A, B] is.

Introduction to General and Generalized Linear Models

Math 432 HW 2.5 Solutions

Introduction: Overview of Kernel Methods

E3: PROBABILITY AND STATISTICS lecture notes

Multiple group discriminant analysis: Robustness and error rate

Linear Threshold Units

Math Quizzes Winter 2009

Algebra I Vocabulary Cards

PART 3 Dimensionless Model

2WB05 Simulation Lecture 8: Generating random variables

UNIT I: RANDOM VARIABLES PART- A -TWO MARKS

Second Order Linear Partial Differential Equations. Part I

" Y. Notation and Equations for Regression Lecture 11/4. Notation:

Math 151. Rumbos Spring Solutions to Assignment #22

1.5. Factorisation. Introduction. Prerequisites. Learning Outcomes. Learning Style

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r),

Algebra 2 Chapter 1 Vocabulary. identity - A statement that equates two equivalent expressions.

Review of Fundamental Mathematics

m i: is the mass of each particle

Factor Analysis. Factor Analysis

GRAPHING IN POLAR COORDINATES SYMMETRY

2013 MBA Jump Start Program

AP Physics 1 and 2 Lab Investigations

Zeros of Polynomial Functions

1 Determinants and the Solvability of Linear Systems

Multivariate Normal Distribution Rebecca Jennings, Mary Wakeman-Linn, Xin Zhao November 11, 2010

Transcription:

Chapter 5: JOINT PROBABILITY DISTRIBUTIONS Part 2: Section 5-2 Covariance and Correlation Consider the joint probability distribution f XY (x, y). Is there a relationship between X and Y? If so, what kind? If you re given information on X, does it give you information on the distribution of Y? (Think of a conditional distribution). Or are they independent? 1

Below is a different joint probability distribution for X and Y. Does there seem to be a relationship between X and Y? Are they independent? If you re given information on X, does it give you information on the distribution of Y? How would you describe the relationship? Is it stronger than the relationship on the previous page? Do you know MORE about Y for a given X? 2

Below is a joint probability distribution for an independent X and Y. This picture is the give-away that they re independent. Does there seem to be a relationship between X and Y? If you re given information on X, does it give you information on the distribution of Y? 3

Covariance When two random variables are being considered simultaneously, it is useful to describe how they relate to each other, or how they vary together. A common measure of the relationship between two random variables is the covariance. Covariance The covariance between the random variables X and Y, denoted as cov(x, Y ), or σ XY, is σ XY = E[(X E(X))(Y E(Y ))] = E[(X µ X )(Y µ Y )] = E(XY ) E(X)E(Y ) = E(XY ) µ X µ Y 4

To calculate covariance, we need to find the expected value of a function of X and Y. This is done similarly to how it was done in the univariate case... For X, Y discrete, E[h(x, y)] = x y h(x, y)f XY (x, y) For X, Y continuous, E[h(x, y)] = h(x, y)f XY (x, y)dxdy Covariance (i.e. σ XY ) is an expected value of a function of X and Y over the (X, Y ) space, if X and Y are continuous we can write σ XY = (x µ X )(y µ Y )f XY (x, y) dx dy To compute covariance, you ll probably use... σ XY = E(XY ) E(X)E(Y ) 5

When does the covariance have a positive value? In the integration we re conceptually putting weight on values of (x µ X )(y µ Y ). What regions of (X, Y ) space has... (x µ X )(y µ Y ) > 0? Both X and Y are above their means. Both X and Y are below their means. Values along a line of positive slope. A distribution that puts high probability on these regions will have a positive covariance. 6

When does the covariance have a negative value? In the integration we re conceptually putting weight on values of (x µ X )(y µ Y ). What regions of (X, Y ) space has... (x µ X )(y µ Y ) < 0? X is above its mean, and Y is below its mean. Y is above its mean, and X is below its mean. Values along a line of negative slope. A distribution that puts high probability on these regions will have a negative covariance. 7

Covariance is a measure of the linear relationship between X and Y. If there is a non-linear relationship between X and Y (such as a quadratic relationship), the covariance may not be sensitive to this. When does the covariance have a zero value? This can happen in a number of situations, but there s one situation that is of large interest... when X and Y are independent... When X and Y are independent, σ XY = 0. 8

If X and Y are independent, then... σ XY = (x µ X )(y µ Y )f XY (x, y) dx dy = (x µ X )(y µ Y )f X (x)f Y (y) dx dy ( ) ( ) = (x µ X )f X (x)dx (y µ Y )f Y (y)dy ( ) ( ) = xf X (x)dx µ X yf Y (y)dy µ Y = (E(X) µ X ) (E(Y ) µ Y ) = (µ X µ X ) (µ Y µ Y ) = 0 This does NOT mean... If covariance=0, then X and Y are independent. We can find cases to the contrary of the above statement, like when there is a strong quadratic relationship between X and Y (so they re not independent), but you can still get σ XY = 0. Remember that covariance specifically looks for a linear relationship. 9

When X and Y are independent, σ XY = 0. For this distribution showing independence, there is equal weight along the positive and negative diagonals. 10

A couple comments... You can also define covariance for discrete X and Y : σ XY = E[(X µ X )(Y µ Y )] = x y (x µ X)(y µ Y )f XY (x, y) And recall that you can get the expected value of any function of X and Y : E[h(X, Y )] = h(x, y)f XY (x, y) dx dy or E[h(X, Y )] = x y h(x, y)f XY (x, y) 11

Correlation Covariance is a measure of the linear relationship between two variables, but perhaps a more common and more easily interpretable measure is correlation. Correlation The correlation (or correlation coefficient) between random variables X and Y, denoted as ρ XY, is ρ XY = cov(x, Y ) V (X)V (Y ) = σ XY σ X σ Y. Notice that the numerator is the covariance, but it s now been scaled according to the standard deviation of X and Y (which are both > 0), we re just scaling the covariance. NOTE: Covariance and correlation will have the same sign (positive or negative). 12

Correlation lies in [ 1, 1], in other words, 1 ρ XY +1 Correlation is a unitless (or dimensionless) quantity. Correlation... 1 ρ XY +1 If X and Y have a strong positive linear relation ρ XY is near +1. If X and Y have a strong negative linear relation ρ XY is near 1. If X and Y have a non-zero correlation, they are said to be correlated. Correlation is a measure of linear relationship. If X and Y are independent, ρ XY = 0. 13

Example: Recall the particle movement model An article describes a model for the movement of a particle. Assume that a particle moves within the region A bounded by the x axis, the line x = 1, and the line y = x. Let (X, Y ) denote the position of the particle at a given time. The joint density of X and Y is given by f XY (x, y) = 8xy for (x, y) A a) Find cov(x, Y ) ANS: Earlier, we found E(X) = 4 5... 14

15

Example: Book problem 5-43 p. 179. The joint probability distribution is x -1 0 0 1 y 0-1 1 0 f XY 0.25 0.25 0.25 0.25 Show that the correlation between X and Y is zero, but X and Y are not independent. 16

17