3. Covariance and Correlation

Similar documents
I. Chi-squared Distributions

Convexity, Inequalities, and Norms

Properties of MLE: consistency, asymptotic normality. Fisher information.

Chapter 7 Methods of Finding Estimators

1 Correlation and Regression Analysis

THE REGRESSION MODEL IN MATRIX FORM. For simple linear regression, meaning one predictor, the model is. for i = 1, 2, 3,, n

Overview of some probability distributions.

Maximum Likelihood Estimators.

Math C067 Sampling Distributions

In nite Sequences. Dr. Philippe B. Laval Kennesaw State University. October 9, 2008

University of California, Los Angeles Department of Statistics. Distributions related to the normal distribution

Sampling Distribution And Central Limit Theorem

Hypothesis testing. Null and alternative hypotheses

Case Study. Normal and t Distributions. Density Plot. Normal Distributions

CHAPTER 7: Central Limit Theorem: CLT for Averages (Means)


Chapter 6: Variance, the law of large numbers and the Monte-Carlo method

5: Introduction to Estimation

1. C. The formula for the confidence interval for a population mean is: x t, which was

Normal Distribution.

Lecture 4: Cauchy sequences, Bolzano-Weierstrass, and the Squeeze theorem

Measures of Spread and Boxplots Discrete Math, Section 9.4

Output Analysis (2, Chapters 10 &11 Law)

Descriptive Statistics

Chapter 7 - Sampling Distributions. 1 Introduction. What is statistics? It consist of three major areas:

The following example will help us understand The Sampling Distribution of the Mean. C1 C2 C3 C4 C5 50 miles 84 miles 38 miles 120 miles 48 miles

Chapter 5: Inner Product Spaces

Lecture 13. Lecturer: Jonathan Kelner Scribe: Jonathan Pines (2009)

SAMPLE QUESTIONS FOR FINAL EXAM. (1) (2) (3) (4) Find the following using the definition of the Riemann integral: (2x + 1)dx

UC Berkeley Department of Electrical Engineering and Computer Science. EE 126: Probablity and Random Processes. Solutions 9 Spring 2006

Sequences and Series

Non-life insurance mathematics. Nils F. Haavardsson, University of Oslo and DNB Skadeforsikring

Center, Spread, and Shape in Inference: Claims, Caveats, and Insights

THE ARITHMETIC OF INTEGERS. - multiplication, exponentiation, division, addition, and subtraction

Lesson 17 Pearson s Correlation Coefficient

Week 3 Conditional probabilities, Bayes formula, WEEK 3 page 1 Expected value of a random variable

1 Computing the Standard Deviation of Sample Means

BASIC STATISTICS. f(x 1,x 2,..., x n )=f(x 1 )f(x 2 ) f(x n )= f(x i ) (1)

Trigonometric Form of a Complex Number. The Complex Plane. axis. ( 2, 1) or 2 i FIGURE The absolute value of the complex number z a bi is

PSYCHOLOGICAL STATISTICS

Chapter 14 Nonparametric Statistics

MEI Structured Mathematics. Module Summary Sheets. Statistics 2 (Version B: reference to new book)

GCSE STATISTICS. 4) How to calculate the range: The difference between the biggest number and the smallest number.

, a Wishart distribution with n -1 degrees of freedom and scale matrix.

Z-TEST / Z-STATISTIC: used to test hypotheses about. µ when the population standard deviation is unknown

3 Basic Definitions of Probability Theory

Metric, Normed, and Topological Spaces

Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 13

Statistical inference: example 1. Inferential Statistics

Section 11.3: The Integral Test

Definition. A variable X that takes on values X 1, X 2, X 3,...X k with respective frequencies f 1, f 2, f 3,...f k has mean

Example 2 Find the square root of 0. The only square root of 0 is 0 (since 0 is not positive or negative, so those choices don t exist here).

Lecture 4: Cheeger s Inequality

Universal coding for classes of sources

Confidence Intervals for One Mean

Parametric (theoretical) probability distributions. (Wilks, Ch. 4) Discrete distributions: (e.g., yes/no; above normal, normal, below normal)

A probabilistic proof of a binomial identity

Vladimir N. Burkov, Dmitri A. Novikov MODELS AND METHODS OF MULTIPROJECTS MANAGEMENT

Confidence Intervals. CI for a population mean (σ is known and n > 30 or the variable is normally distributed in the.

FIBONACCI NUMBERS: AN APPLICATION OF LINEAR ALGEBRA. 1. Powers of a matrix

S. Tanny MAT 344 Spring be the minimum number of moves required.

Determining the sample size

Class Meeting # 16: The Fourier Transform on R n

CHAPTER 3 DIGITAL CODING OF SIGNALS

LECTURE 13: Cross-validation

Chapter 7: Confidence Interval and Sample Size

Here are a couple of warnings to my students who may be here to get a copy of what happened on a day that you missed.

CS103A Handout 23 Winter 2002 February 22, 2002 Solving Recurrence Relations

Lecture 5: Span, linear independence, bases, and dimension

Entropy of bi-capacities

FOUNDATIONS OF MATHEMATICS AND PRE-CALCULUS GRADE 10

THE TWO-VARIABLE LINEAR REGRESSION MODEL

Overview. Learning Objectives. Point Estimate. Estimation. Estimating the Value of a Parameter Using Confidence Intervals

Annuities Under Random Rates of Interest II By Abraham Zaks. Technion I.I.T. Haifa ISRAEL and Haifa University Haifa ISRAEL.

CONTROL CHART BASED ON A MULTIPLICATIVE-BINOMIAL DISTRIBUTION

A Mathematical Perspective on Gambling

5 Boolean Decision Trees (February 11)

Confidence Intervals

1. MATHEMATICAL INDUCTION

One-sample test of proportions

A Recursive Formula for Moments of a Binomial Distribution

Our aim is to show that under reasonable assumptions a given 2π-periodic function f can be represented as convergent series

1 The Gaussian channel

INFINITE SERIES KEITH CONRAD

10-705/ Intermediate Statistics

Unbiased Estimation. Topic Introduction

Mathematical goals. Starting points. Materials required. Time needed

Lesson 15 ANOVA (analysis of variance)

Soving Recurrence Relations

TO: Users of the ACTEX Review Seminar on DVD for SOA Exam MLC

Solutions to Selected Problems In: Pattern Classification by Duda, Hart, Stork

Present Values, Investment Returns and Discount Rates

Finding the circle that best fits a set of points

Hypergeometric Distributions

Multi-server Optimal Bandwidth Monitoring for QoS based Multimedia Delivery Anup Basu, Irene Cheng and Yinzhe Yu

GCE Further Mathematics (6360) Further Pure Unit 2 (MFP2) Textbook. Version: 1.4

TIGHT BOUNDS ON EXPECTED ORDER STATISTICS

Infinite Sequences and Series

MARTINGALES AND A BASIC APPLICATION

Department of Computer Science, University of Otago

Transcription:

Virtual Laboratories > 3. Expected Value > 1 2 3 4 5 6 3. Covariace ad Correlatio Recall that by takig the expected value of various trasformatios of a radom variable, we ca measure may iterestig characteristics of the distributio of the variable. I this sectio, we will study a expected value that measures a special type of relatioship betwee two real-valued variables. This relatioship is very importat both i probability ad statistics. Basic Theory Defiitios As usual, our startig poit is a radom experimet with probability measure P o a uderlyig sample space. Suppose that X ad Y are real-valued radom variables for the experimet with meas E( X), E(Y ) ad variaces var( X), var(y ), respectively (assumed fiite). The covariace of X ad Y is defied by cov( X, Y ) = E(( X E( X)) (Y E(Y ))) ad (assumig the variaces are positive) the correlatio of X ad Y is defied by cov( X, Y ) cor( X, Y ) = sd( X) sd(y ) Correlatio is a scaled versio of covariace; ote that the two parameters always have the same sig (positive, egative, or 0). Whe the sig is positive, the variables are said to be positively correlated; whe the sig is egative, the variables are said to be egatively correlated; ad whe the sig is 0, the variables are said to be ucorrelated. Note also that correlatio is dimesioless, sice the umerator ad deomiator have the same physical uits. As these terms suggest, covariace ad correlatio measure a certai kid of depedece betwee the variables. Oe of our goals is a deep uderstadig of this depedece. As a start, ote that (E( X), E(Y )) is the ceter of the joit distributio of ( X, Y ), ad the vertical ad horizotal lie through this poit separate R 2 ito four quadrats. The fuctio ( x, y) ( x E( X)) ( y E(Y )) is positive o the first ad third of these quadrats ad egative o the secod ad fourth.

Properties The followig exercises give some basic properties of covariace. The mai tool that you will eed is the fact that expected value is a liear operatio. Other importat properties will be derived below, i the subsectio o the best liear predictor. 1. Show that cov( X, Y ) = E( X Y ) E( X) E(Y ). By Exercise 1, we see that X ad Y are ucorrelated if ad oly if E( X Y ) = E( X) E(Y ). I particular, if X ad Y are idepedet, the they are ucorrelated. However, the coverse fails with a passio, as the followig exercise shows. (Other examples of depedet yet ucorrelated variables occur i the computatioal exercises.) 2. Suppose that X is uiformly distributed o the iterval [ a, a], where a > 0, ad Y = X 2 Show that X ad Y are ucorrelated eve though Y is a fuctio of X (the strogest form of depedece). 3. Show that cov( X, Y ) = cov(y, X). 4. Show that cov( X, X) = var( X). Thus, covariace subsumes variace. 5. Show that cov( X + a, Y + b) = cov( X, Y ). 6. Show that cov(a X + b Y, Z ) = a cov( X, Z ) + b cov(y, Z ) Thus, covariace is liear i the first argumet, with the secod argumet fixed. By symmetry, covariace is liear i the secod argumet, with the first argumet fixed. Thus, the covariace operator is bi-liear. The geeral versio of this property is give i the followig exercise. 7. Suppose that ( X 1, X 2,..., X ) ad (Y 1, Y 2,..., Y m ) are sequeces of real-valued radom variables for a experimet. Show that cov ( a i =1 i X i, j =1 m bj Y j ) = i =1 m ai b j =1 j cov( X i, Y j ) 8. Show that the correlatio betwee X ad Y is simply the covariace of the correspodig stadard scores: The Variace of a Sum X E( X) cor( X, Y ) = cov, Y E(Y ) ( sd( X) sd(y ) ) We will ow show that the variace of a sum of variables is the sum of the pairwise covariaces. This result is very useful sice may radom variables with commo distributios ca be writte as sums of simpler radom variables (see i particular the biomial distributio ad hypergeometric distributio below).

9. Suppose that ( X 1, X 2,..., X ) is a sequece of real-valued radom variables. Use Exercise 3, Exercise 4, ad Exercise 6 to show that var( i =1 X i ) = i =1 j =1 cov( X i, X j ) = var( X i =1 i ) + 2 i < j cov( X i, X j ) Note that the variace of a sum ca be greater, smaller, or equal to the sum of the variaces, depedig o the pure covariace terms. As a special case of Exercise 9, whe = 2, we have var( X + Y ) = var( X) + var(y ) + 2 cov( X, Y ) 10. Suppose that ( X 1, X 2,..., X ) is a sequece of pairwise ucorrelated, real-valued radom variables. Show that var( i =1 X i ) = var( X i =1 i ) Note that the result i the previous exercise holds, i particular, if the radom variables are mutually idepedet. 11. Suppose that X ad Y are real-valued radom variables. Show that var( X + Y ) + var( X Y ) = 2 var( X) + 2 var(y ). 12. Suppose that X ad Y are real-valued radom variables with var( X) = var(y ). Show that X + Y ad X Y are ucorrelated. Radom Samples I the followig exercises, suppose that ( X 1, X 2,...) is a sequece of idepedet, real-valued radom variables with a commo distributio that has mea μ ad stadard deviatio σ > 0. (Thus, the variables form a radom sample from the commo distributio). 13. Let Y = i =1 X i. Show that E(Y ) = μ var(y ) = σ 2 14. Let M = Y = 1 X i i. Thus, M is the sample mea. Show that =1 E( M ) = μ var( M ) = E(( M μ) 2 ) = σ 2, so var( M ) 0 as. P( M μ > ε) 0 as for ay ε > 0. (Hit: Use Chebyshev's iequality). Part (b) of the last exercise meas that M μ as i mea square. Part (c) meas that M μ as i probability. These are both versios of the weak law of large umbers, oe of the fudametal theorems of probability. 15. Let Z = Y μ. Thus, Z is the stadard score associated with Y. Show that σ Z = M μ so that Z is also the stadard score associated with M. σ /

E( Z ) = 0 var( Z ) = 1 The cetral limit theorem, the other fudametal theorem of probability, states that the distributio of Z coverges to the stadard ormal distributio as Evets Suppose that A ad B are evets i a radom experimet. The covariace ad correlatio of A ad B are defied to be the covariace ad correlatio, respectively, of their idicator radom variables 1( A) ad 1( B). 16. Show that cov( A, B) = P( A B) P( A) P( B) cor( A, B) = P( A B) P( A) P( B) P( A) (1 P( A)) P( B) (1 P( B)) I particular, ote that A ad B are positively correlated, egatively correlated, or idepedet, respectively (as defied i the sectio o coditioal probability) if ad oly if the idicator variables of A ad B are positively correlated, egatively correlated, or ucorrelated, as defied i this sectio. 17. Show that cov( A, B c ) = cov( A, B) cov( A c, B c ) = cov( A, B) 18. Suppose that A B Show that cov( A, B) = P( A) (1 P( B)) P( A) (1 P( B)) cor( A, B) = P( B) (1 P( A)) The Best Liear Predictor What liear fuctio of X is closest to Y i the sese of miimizig mea square error? The questio is fudametally importat i the case where radom variable X (the predictor variable) is observable ad radom variable Y (the respose variable) is ot. The liear fuctio ca be used to estimate Y from a observed value of X. Moreover, the solutio will show that covariace ad correlatio measure the liear relatioship betwee X ad Y. To avoid trivial cases, let us assume that var( X) > 0 ad var(y ) > 0, so that the radom variables really are radom.

Let MSE(a, b) deote the mea square error whe a X + b is used as a estimator of Y (as a fuctio of the parameters a ad b): MSE(a, b) = E ( (Y (a X + b)) 2 ). 19. Show that MSE(a, b) = E ( Y 2 ) 2 a E ( X Y ) 2 b E(Y ) + a 2 E ( X 2 ) + 2 a b E( X) + b 2 20. Use basic calculus to show that MSE(a, b) is miimized whe cov( X, Y ) cov( X, Y ) a =, b = E(Y ) E( X) var( X) var( X) Thus, the best liear predictor of Y give X is the radom variable L(Y X) give by L(Y X) = E(Y ) cov( X, Y ) + ( X E( X)) var( X) 21. Show that the miimum value of the mea square error fuctio MSE, is E ( (Y L(Y X))2 ) = var(y ) ( 1 cor( X, Y )2 ) 22. From the last exercise, verify the followig importat properties: d. 1 cor( X, Y ) 1 sd( X) sd(y ) cov( X, Y ) sd( X) sd(y ) cor( X, Y ) = 1 if ad oly if Y = a X + b with probability 1 for some costats a > 0 ad cor( X, Y ) = 1 if ad oly if Y = a X + b with probability 1 for some costats a < 0 ad These exercises show clearly that cov( X, Y ) ad cor( X, Y ) measure the liear associatio betwee X ad Y. Recall that the best costat predictor of Y, i the sese of miimizig mea square error, is E(Y ) ad the miimum value of the mea square error for this predictor is var(y ). Thus, the differece betwee the variace of Y ad the mea square error i Exercise 21 is the reductio i the variace of Y whe the liear term i X is added to the predictor.

23. Show that var(y ) E((Y L(Y X))2 ) = var(y ) cor( X, Y ) 2 The fractio of the reductio is cor( X, Y ) 2, ad hece this quatity is called the (distributio) coefficiet of determiatio. Now let L(Y X = x) = E(Y ) cov( X, Y ) + ( x E( X)), x R var( X) The fuctio x L(Y X = x) is kow as the distributio regressio fuctio for Y give X, ad its graph is kow as the distributio regressio lie. Note that the regressio lie passes through (E( X), E(Y )), the ceter of the joit distributio. 24. Show that E( L(Y X)) = E(Y ). However, the choice of predictor variable ad respose variable is crucial. 25. Show that regressio lie for Y give X ad the regressio lie for X give Y are ot the same lie, except i the trivial case where the variables are perfectly correlated. However, the coefficiet of determiatio is the same, regardless of which variable is the predictor ad which is the respose. 26. Suppose that A ad B are evets i a radom experimet with 0 < P( A) < 1 ad 0 < P( B) < 1. Show that cor( A, B) = 1 if ad oly P( A B) = 0 ad P( B A) = 0 (That is, A ad B are equivalet.) cor( A, B) = 1 if ad oly P( A B c ) = 0 ad P( B c A) = 0 (That is, A ad B c are equivalet.) The cocept of best liear predictor is more powerful tha might first appear, because it ca be applied to trasformatios of the variables. Specifically, suppose that X ad Y are radom variables for our experimet, takig values i geeral spaces S ad T, respectively. Suppose also that g ad h are real-valued fuctios defied o S ad T, respectively. We ca fid L(h(Y ) g( X)), the liear fuctio of g( X) that is closest to h(y ) i the mea square sese. The results of this subsectio apply, of course, with g( X) replacig X ad h(y ) replacig Y. 27. Suppose that Z is aother real-valued radom variable for the experimet ad that c is a costat. Show that L(Y + Z X) = L(Y X) + L( Z X) L(c Y X) = c L(Y X) There are several extesios ad geeralizatios of the ideas i the subsectio: The correspodig statistical problem of estimatig a ad b, whe these distributio parameters are ukow, is cosidered i the sectio o Sample Covariace ad Correlatio. The problem fidig the fuctio of X (usig all reasoable fuctios, ot just liear oes) that is closest to Y i the mea square error sese is cosidered i the sectio o Coditioal Expected Value. The best liear predictio problem whe the predictor ad respose variables are radom vectors is cosidered i the sectio o Expected Value ad Covariace Matrices.

Examples ad Applicatios Uiform Distributios 28. Suppose that ( X, Y ) is uiformly distributed o the regio S R 2. Fid cov( X, Y ) ad cor( X, Y ) ad determie whether the variables are idepedet i each of the followig cases: S = [ a, b] [ c, d] where a < b ad c < d S = {( x, y) R 2 : a y x a} where a > 0. S = {( x, y) R 2 : x 2 + y 2 r 2 } where r > 0. 29. I the bivariate uiform experimet, select each of the regios below i tur. For each regio, ru the simulatio 2000 times, updatig every 10 rus. Note the value of the correlatio ad the shape of the cloud of poits i the scatterplot. Compare with the results i the last exercise. Square Triagle Circle 30. Suppose that X is uiformly distributed o the iterval ( 0, 1) ad that give X = x, Y is uiformly distributed o the iterval ( 0, x). Fid cov( X, Y ) Fid cor( X, Y ) Fid L(Y X). Fid L( X Y ) d. Dice Recall that a stadard die is a six-sided die. A fair die is oe i which the faces are equally likely. A ace-six flat die is a stadard die i which faces 1 ad 6 have probability 1 4 each, ad faces 2, 3, 4, ad 5 have probability 1 8 each. 31. A pair of stadard, fair dice are throw ad the scores ( X 1, X 2 ) recorded. Let Y = X 1 + X 2 deote the sum of the scores, U = mi {X 1, X 2 } the miimum scores, ad V = max {X 1, X 2 } the maximum score. Fid the covariace ad correlatio of each of the followig pairs of variables: d. e. ( X 1, X 2 ) ( X 1, Y ) ( X 1, U) (U, V) (U, Y )

32. Suppose that fair dice are throw. Fid the mea ad variace of each of the followig variables: The sum of the scores. The average of the scores. 33. I the dice experimet, select the followig radom variables. I each case, icrease the umber of dice ad observe the size ad locatio of the desity fuctio ad the mea-stadard deviatio bar. With = 20 dice, ru the experimet 1000 times, updatig every 10 rus, ad ote the apparet covergece of the empirical momets to the distributio momets. The sum of the scores. The average of the scores. 34. Repeat Exercise 32 for ace-six flat dice. 35. Repeat Exercise 33 for ace-six flat dice. 36. A pair of fair dice are throw ad the scores ( X 1, X 2 ) recorded. Let Y = X 1 + X 2 deote the sum of the scores, U = mi {X 1, X 2 } the miimum score, ad V = max {X 1, X 2 } the maximum score. Fid each of the followig: L(Y X 1). L(U X 1). L(V X 1). Beroulli Trials Recall that a Beroulli trials process is a sequece ( X 1, X 2,...) of idepedet, idetically distributed idicator radom variables. I the usual laguage of reliability, X i deotes the outcome of trial i, where 1 deotes success ad 0 deotes failure. The probability of success p = P( X i = 1) is the basic parameter of the process. The process is amed for James Beroulli. A separate chapter o the Beroulli Trials explores this process i detail. The umber of successes i the first trials is Y = X i i. Recall that this radom variable has the biomial =1 distributio with parameters ad p, which has probability desity fuctio P(Y = k) = ( k ) pk (1 p) k, k {0, 1,..., }

37. Show that E(Y ) = p var(y ) = p (1 p) 38. I the biomial coi experimet, select the umber of heads. Vary ad p ad ote the shape of the desity fuctio ad the size ad locatio of the mea-stadard deviatio bar. For selected values of ad p, ru the experimet 1000 times, updatig every 10 rus, ad ote the apparet covergece of the sample mea ad stadard deviatio to the distributio mea ad stadard deviatio. The proportio of successes i the first trials is M = Y. This radom variable is sometimes used as a statistical estimator of the parameter p, whe the parameter is ukow. 39. Show that E( M ) = p var( M ) = p (1 p) 40. I the biomial coi experimet, select the proportio of heads. Vary ad p ad ote the shape of the desity fuctio ad the size ad locatio of the mea-stadard deviatio bar. For selected values of ad p, ru the experimet 1000 times, updatig every 10 rus, ad ote the apparet covergece of the sample mea ad stadard deviatio to the distributio mea ad stadard deviatio. The Hypergeometric Distributio Suppose that a populatio cosists of m objects; r of the objects are type 1 ad m r are type 0. A sample of objects is chose at radom, without replacemet. Let X i deote the type of the i th object selected. Recall that ( X 1, X 2,..., X ) is a sequece of idetically distributed (but ot idepedet) idicator radom variables. I fact the sequece is exchageable, so that i particular P( X i = 1) = r m for each i ad P ( X i = 1, X j = 1) = r r 1 m for distice i ad j. m 1 Let Y deote the umber of type 1 objects i the sample, so that Y = i =1 the hypergeometric distributio, which has probability desity fuctio. X i. Recall that this radom variable has. r m r P(Y = k) = ( k ) ( k ) ( m, k {0, 1,..., } ) 41. Show that for distict i ad j, cov( X i, X j ) = r m (1 r m) cor( X i, X j ) = 1 m 1 1 m 1 Note that the evet of a type 1 object o draw i ad the evet of a type 1 object o draw j are egatively correlated, but

the correlatio depeds oly o the populatio size ad ot o the umber of type 1 objects. Note also that the correlatio is perfect if m = 2. Thik about these result ituitively. 42. Show that E(Y ) = r m var(y ) = r m (1 m) r m m 1 43. I the ball ad ur experimet, select samplig without replacemet. Vary m, r, ad ad ote the shape of the desity fuctio ad the size ad locatio of the mea-stadard deviatio bar. For selected values of the parameters, ru the experimet 1000 times, updatig every 10 rus, ad ote the apparet covergece of the sample mea ad stadard deviatio to the distributio mea ad stadard deviatio. Miscellaeous Exercises 44. Suppose that X ad Y are real-valued radom variables with cov( X, Y ) = 3. Fid cov(2 X 5, 4 Y + 2). 45. Suppose X ad Y are real-valued radom variables with var( X) = 5, var(y ) = 9, ad cov( X, Y ) = 3. Fid var(2 X + 3 Y 7). 46. Suppose that X ad Y are idepedet, real-valued radom variables with var( X) = 6 ad var(y ) = 8. Fid var(3 X 4 Y + 5). 47. Suppose that A ad B are evets i a experimet with P( A) = 1 2, P( B) = 1 3, ad P( A B) = 1. Fid the 8 covariace ad correlatio betwee A ad B. 48. Suppose that ( X, Y ) has probability desity fuctio f ( x, y) = x + y, 0 x 1, 0 y 1. Fid cov( X, Y ) Fid cor( X, Y ) Fid L(Y X). d. Fid L( X Y ). 49. Suppose that ( X, Y ) has probability desity fuctio f ( x, y) = 2 ( x + y), 0 x y 1. Fid cov( X, Y ) Fid cor( X, Y ) Fid L(Y X). Fid L( X Y ). d.

50. Suppose agai that ( X, Y ) has probability desity fuctio f ( x, y) = 2 ( x + y), 0 x y 1. d. Fid cov( X 2, Y ) Fid cor( X 2, Y ) Fid L(Y X 2 ). Which predictor of Y is better, the oe based o X or the oe based o X 2? 51. Suppose that ( X, Y ) has probability desity fuctio f ( x, y) = 6 x 2 y, 0 x 1, 0 y 1. Fid cov( X, Y ) Fid cor( X, Y ) Fid L(Y X). d. Fid L( X Y ). 52. Suppose that ( X, Y ) has probability desity fuctio f ( x, y) = 15 x 2 y, 0 x y 1. Fid cov( X, Y ) Fid cor( X, Y ) Fid L(Y X). d. Fid L( X Y ). 53. Suppose agai that ( X, Y ) has probability desity fuctio f ( x, y) = 15 x 2 y, 0 x y 1. d. Fid cov( X, Y ) Fid cor( X, Y ) Fid L(Y X). Which of the predictors of Y is better, the oe based o X of the oe based o X? Vector Space Cocepts Covariace is closely related to the cocept of ier product i the theory of vector spaces. This coectio ca help illustrate may of the properties of covariace from a differet poit of view. I this sectio, our vector space V 2 cosists of all real-valued radom variables defied o a fixed probability space ( Ω, F, P) (that is, relative to the same radom experimet) that have fiite secod momet. Recall that two radom variables are equivalet if they are equal with probability 1. As usual, we cosider two such radom variables as the same

vector, so that techically, our vector space cosists of equivalece classes uder this equivalece relatio. The additio operator correspods to the usual additio of two real-valued radom variables, ad the operatio of scalar multiplicatio correspods to the usual multiplicatio of a real-valued radom variable by a real (o-radom) umber. Ier Product If X ad Y are radom variables i V 2, we defie the ier product of X ad Y by X, Y = E( X Y ) The followig exercise gives results that are aalogs of the basic properties of covariace give above, ad show that this defiitio really does give a ier product o the vector space 54. Show that d. e. X, Y = Y, X X, X 0 X, X = 0 if ad oly if P( X = 0) = 1 (so that X is equivalet to 0). a X, Y = a X, Y for ay costat X + Y, Z = X, Z + Y, Z Covariace ad correlatio ca easily be expressed i terms of this ier product. The covariace of two radom variables is the ier product of the correspodig cetered variables. The correlatio is the ier product of the correspodig stadard scores. 55. Show that cov( X, Y ) = X E( X), Y E(Y ) cor( X, Y ) = X E( X ) sd( X ), Y E(Y ) sd(y ) The orm associated with the ier product is the 2-orm studied i the last sectio, ad correspods to the root mea square operatio o a radom variable. This fact is a fudametal reaso why the 2-orm plays such a special, hoored role; of all the k-orms, oly the 2-orm correspods to a ier product. I tur, this is oe of the reasos that root mea square differece is of fudametal importace i probability ad statistics. 56. Show that X, X = X 2 2 = E( X 2 ). Projectio Let X ad Y be radom variables i V 2 57. Show that the followig set is a subspace of V 2. I fact, it is the subspace geerated by X ad 1. W = {a X + b : (a R) ad (b R)} 58. Show that the best liear predictor of Y give X ca be characterized as the projectio of Y oto the subspace W.

That is, show that L(Y X) is the oly radom variable W W with the property that Y W is perpedicular to W. Specifically, fid W such that satisfies the followig two coditios: Y W, X = 0 Y W, 1 = 0 Hölder's Iequality The ext exercise gives Hölder's iequality, amed for Otto Hölder. 59. Suppose that j > 1, k > 1, ad 1 j + 1 k = 1. Show that X, Y X j Y k usig the steps below: Show that S = {( x, y) R 2 : ( x 0) ad ( y 0)} is a covex set ad g( x, y) = x 1/ j y 1/k is cocave o S. Use (a) ad Jese's iequality to show that if U ad V are oegative radom variables the E ( U 1/ j V 1/k ) E(U)1/ j E(V) 1/k I (b), let U = X j ad V = Y k I the cotext of the last exercise, j ad k are called cojugate expoets. If we let j = k = 2 i Hölder's iequality, the we get the Cauchy-Schwarz iequality, amed for Augusti Cauchy ad Karl Schwarz. I tur, this is equivalet to the iequalities i Exercise 22. E( X Y ) E( X 2 ) E(Y 2 ) 60. Suppose that ( X, Y ) has probability desity fuctio f ( x, y) = x + y, 0 x 1, 0 y 1. Verify Hölder's iequality i the followig cases: j = k = 2 j = 3, k = 3 2 61. Suppose that j ad k are cojugate expoets. Show that k = j j 1. Show that k 1 as j Theorems Revisited The followig exercise is a aalog of the result i Exercise 11. 62. Prove the parallelogram rule: X + Y 2 2 + X Y 2 2 = 2 X 2 2 + 2 Y 2 2

The followig exercise is a aalog of the result i Exercise 10. 63. Prove the Pythagorea theorem, amed for Pythagoras of course: if ( X 1, X 2,..., X ) is a sequece of real-valued radom variables with X i, X j = 0 for i j the i =1 X 2 i 2 = 2 i X =1 i 2 Virtual Laboratories > 3. Expected Value > 1 2 3 4 5 6 Cotets Applets Data Sets Biographies Exteral Resources Keywords Feedback