7. Sample Covariance and Correlation

Similar documents
1 Correlation and Regression Analysis

Properties of MLE: consistency, asymptotic normality. Fisher information.

Chapter 7 Methods of Finding Estimators

THE REGRESSION MODEL IN MATRIX FORM. For simple linear regression, meaning one predictor, the model is. for i = 1, 2, 3,, n

Hypothesis testing. Null and alternative hypotheses


I. Chi-squared Distributions

Chapter 6: Variance, the law of large numbers and the Monte-Carlo method

1. C. The formula for the confidence interval for a population mean is: x t, which was

In nite Sequences. Dr. Philippe B. Laval Kennesaw State University. October 9, 2008

Output Analysis (2, Chapters 10 &11 Law)

Overview of some probability distributions.

Non-life insurance mathematics. Nils F. Haavardsson, University of Oslo and DNB Skadeforsikring

BASIC STATISTICS. f(x 1,x 2,..., x n )=f(x 1 )f(x 2 ) f(x n )= f(x i ) (1)

Sampling Distribution And Central Limit Theorem

Soving Recurrence Relations

Measures of Spread and Boxplots Discrete Math, Section 9.4

Maximum Likelihood Estimators.

Section 11.3: The Integral Test

Confidence Intervals for One Mean

PSYCHOLOGICAL STATISTICS

Math C067 Sampling Distributions

1 Computing the Standard Deviation of Sample Means

University of California, Los Angeles Department of Statistics. Distributions related to the normal distribution

Sequences and Series

Confidence Intervals. CI for a population mean (σ is known and n > 30 or the variable is normally distributed in the.

Z-TEST / Z-STATISTIC: used to test hypotheses about. µ when the population standard deviation is unknown

Lecture 4: Cauchy sequences, Bolzano-Weierstrass, and the Squeeze theorem

Now here is the important step

MEI Structured Mathematics. Module Summary Sheets. Statistics 2 (Version B: reference to new book)

GCSE STATISTICS. 4) How to calculate the range: The difference between the biggest number and the smallest number.

, a Wishart distribution with n -1 degrees of freedom and scale matrix.

5: Introduction to Estimation

Incremental calculation of weighted mean and variance

S. Tanny MAT 344 Spring be the minimum number of moves required.

THE TWO-VARIABLE LINEAR REGRESSION MODEL

Convexity, Inequalities, and Norms

SAMPLE QUESTIONS FOR FINAL EXAM. (1) (2) (3) (4) Find the following using the definition of the Riemann integral: (2x + 1)dx

Lesson 17 Pearson s Correlation Coefficient

Lecture 13. Lecturer: Jonathan Kelner Scribe: Jonathan Pines (2009)

CS103A Handout 23 Winter 2002 February 22, 2002 Solving Recurrence Relations

Chapter 7 - Sampling Distributions. 1 Introduction. What is statistics? It consist of three major areas:

Chapter 14 Nonparametric Statistics

The following example will help us understand The Sampling Distribution of the Mean. C1 C2 C3 C4 C5 50 miles 84 miles 38 miles 120 miles 48 miles

Case Study. Normal and t Distributions. Density Plot. Normal Distributions

LECTURE 13: Cross-validation

CHAPTER 7: Central Limit Theorem: CLT for Averages (Means)

Normal Distribution.

Trigonometric Form of a Complex Number. The Complex Plane. axis. ( 2, 1) or 2 i FIGURE The absolute value of the complex number z a bi is

Theorems About Power Series

BINOMIAL EXPANSIONS In this section. Some Examples. Obtaining the Coefficients

Chapter 7: Confidence Interval and Sample Size

Lecture 4: Cheeger s Inequality

FIBONACCI NUMBERS: AN APPLICATION OF LINEAR ALGEBRA. 1. Powers of a matrix

Approximating Area under a curve with rectangles. To find the area under a curve we approximate the area using rectangles and then use limits to find

hp calculators HP 12C Statistics - average and standard deviation Average and standard deviation concepts HP12C average and standard deviation

Universal coding for classes of sources

Overview. Learning Objectives. Point Estimate. Estimation. Estimating the Value of a Parameter Using Confidence Intervals

A Recursive Formula for Moments of a Binomial Distribution

Lecture 5: Span, linear independence, bases, and dimension

CS103X: Discrete Structures Homework 4 Solutions

Lesson 15 ANOVA (analysis of variance)

Our aim is to show that under reasonable assumptions a given 2π-periodic function f can be represented as convergent series

where: T = number of years of cash flow in investment's life n = the year in which the cash flow X n i = IRR = the internal rate of return

Infinite Sequences and Series

A probabilistic proof of a binomial identity

Statistical inference: example 1. Inferential Statistics

Unbiased Estimation. Topic Introduction

Department of Computer Science, University of Otago

A Mathematical Perspective on Gambling

Annuities Under Random Rates of Interest II By Abraham Zaks. Technion I.I.T. Haifa ISRAEL and Haifa University Haifa ISRAEL.

Present Values, Investment Returns and Discount Rates

Descriptive Statistics

Multi-server Optimal Bandwidth Monitoring for QoS based Multimedia Delivery Anup Basu, Irene Cheng and Yinzhe Yu

1 The Gaussian channel

Definition. A variable X that takes on values X 1, X 2, X 3,...X k with respective frequencies f 1, f 2, f 3,...f k has mean

Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 13

Asymptotic Growth of Functions

Parametric (theoretical) probability distributions. (Wilks, Ch. 4) Discrete distributions: (e.g., yes/no; above normal, normal, below normal)

Finding the circle that best fits a set of points

SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES

UC Berkeley Department of Electrical Engineering and Computer Science. EE 126: Probablity and Random Processes. Solutions 9 Spring 2006

Class Meeting # 16: The Fourier Transform on R n

Modified Line Search Method for Global Optimization

Solutions to Selected Problems In: Pattern Classification by Duda, Hart, Stork

arxiv: v1 [math.st] 21 Aug 2009

SEQUENCES AND SERIES

AMS 2000 subject classification. Primary 62G08, 62G20; secondary 62G99

Present Value Factor To bring one dollar in the future back to present, one uses the Present Value Factor (PVF): Concept 9: Present Value

THE ABRACADABRA PROBLEM

COMPARISON OF THE EFFICIENCY OF S-CONTROL CHART AND EWMA-S 2 CONTROL CHART FOR THE CHANGES IN A PROCESS

Mathematical goals. Starting points. Materials required. Time needed

CONTROL CHART BASED ON A MULTIPLICATIVE-BINOMIAL DISTRIBUTION

Basic Elements of Arithmetic Sequences and Series

Here are a couple of warnings to my students who may be here to get a copy of what happened on a day that you missed.

arxiv: v1 [stat.me] 10 Jun 2015

Center, Spread, and Shape in Inference: Claims, Caveats, and Insights

STATISTICAL METHODS FOR BUSINESS

This document contains a collection of formulas and constants useful for SPC chart construction. It assumes you are already familiar with SPC.

TO: Users of the ACTEX Review Seminar on DVD for SOA Exam MLC

Transcription:

1 of 8 7/16/2009 6:06 AM Virtual Laboratories > 6. Radom Samples > 1 2 3 4 5 6 7 7. Sample Covariace ad Correlatio The Bivariate Model Suppose agai that we have a basic radom experimet, ad that X ad Y are real-valued radom variables for the experimet. Equivaletly, (X, Y) is a radom vector takig values i R 2. Please recall the basic properties of the meas, (X) ad (Y), the variaces, var(x) ad var(y) ad the covariace. I particular, recall that the correlatio is We will also eed a higher order bivariate momet. Let cor(x, Y) = sd(x) sd(y) d(x, Y) = (((X (X)) (Y (Y))) 2 ) Now suppose that we ru the basic experimet times. This creates a compoud experimet with a sequece of idepedet radom vectors ((X 1, Y 1 ), (X 2, Y 2 ),..., (X, Y )) each with the same distributio as (X, Y). I statistical terms, this is a radom sample of size from the distributio of (X, Y). As usual, we will let X = (X 1, X 2,..., X ) deote the sequece of first coordiates; this is a radom sample of size from the distributio of X. Similarly, we will let Y = (Y 1, Y 2,..., X ) deote the sequece of secod coordiates; this is a radom sample of size from the distributio of Y. Recall that the sample meas ad sample variaces for X are defied as follows (ad of course aalogous defiitios hold for Y):. M(X) = 1 i =1 X i, W 2 (X) = 1 i =1 (X i (X)) 2, S 2 (X) = 1 i =1 (X i M(X)) 2 I this sectio, we will defie ad study statistics that are atural estimators of the distributio covariace ad correlatio. These statistics will be measures of the liear relatioship of the sample poits i the plae. As usual, the defiitios deped o what other parameters are kow ad ukow. A Special Sample Covariace Suppose first that the distributio meas (X) ad (Y) are kow. This is usually a urealistic assumptio, of course, but is still a good place to start because the aalysis is very simple ad the results we obtai will be useful below. A atural estimator of i this case is

2 of 8 7/16/2009 6:06 AM W(X, Y) = 1 i =1 (X i (X)) (Y i (Y)) 1. Show that W(X, Y) is the sample mea for a radom sample of size from the distributio of (X (X)) (Y (Y)). 2. Use the result of Exercise 1 to show that (W(X, Y)) = var(w(x, Y)) = 1 (d(x, Y) cov 2 (X, Y)) W(X, Y) as with probability 1. I particular, W(X, Y) is a ubiased ad cosistet estimator of. Properties The formula i the followig exercise is sometimes better tha the defiitio for computatioal purposes. 3. With X Y defied to be the sequece (X 1 Y 1, X 2 Y 2,..., X Y ), show that W(X, Y) = M(X Y) M(X) (Y) M(Y) (X) + (X) (Y) The properties established i the followig exercises are aalogies of properties for the distributio covariace 4. Show that W(X, X) = W 2 (X) 5. Show that W(X, Y) = W(Y, X) 6. Show that if a is a costat the W(a X, Y) = a W(X, Y) 7. Show that W(X + Y, Z) = W(X, Z) + W(Y, Z) The followig exercise gives a formula for the sample variace of a sum. The result exteds aturally to larger sums. 8. Show that W 2 (X + Y) = W 2 (X) + W 2 (Y) + 2 W(X, Y) The Stadard Sample Covariace Cosider ow the more realistic assumptio that the distributio meas (X) ad (Y) are ukow. A atural approach i this case is to average (X i M(X)) (Y i M(Y)) over i {1, 2,..., }. But rather tha dividig by i our average, we should divide by whatever costat gives a ubiased estimator of. 9. Iterpret the sig of (X i M(X)) (Y i M(Y)) geometrically, i terms of the scatterplot of poits ad its ceter.

3 of 8 7/16/2009 6:06 AM Derivatio. 10. Use the biliearity of the covariace operator to show that cov(m(x), M(Y)) = 11. Expad ad sum term by term to show that i =1 (X i M(X)) (Y i M(Y)) = i =1 X i Y i M(X) M(Y) 12. Use the result of Exercises 10 ad 11, ad basic properties of expected value, to show that ( i =1 (X i M(X)) (Y i M(Y))) = ( 1) Therefore, to have a ubiased estimator of, we should defie the sample covariace to be the radom variable = 1 i =1 (X i M(X)) (Y i M(Y)) As with the sample variace, whe the sample size is large, it makes little differece whether we divide by or 1. Properties The formula i the followig exercise is sometimes better tha the defiitio for computatioal purposes. 13. With X Y defied as i Exercise 3, show that = 1 i =1 X i Y i M(X) M(Y) = (M(X Y) M(X) M(Y)) 1 1 14. Use the result of the previous exercise ad the strog law of large umbers to show that as with probability 1. The properties established i the followig exercises are aalogies of properties for the distributio covariace 15. Show that S(X, X) = S 2 (X) 16. Show that = S(Y, X) 17. Show that if a is a costat the S(a X, Y) = a 18. Show that S(X + Y, Z) = S(X, Z) + S(Y, Z) 19. Show that

4 of 8 7/16/2009 6:06 AM = (W(X, Y) (M(X) (X)) (M(Y) (Y))) 1 The followig exercise gives a formula for the sample variace of a sum. The result exteds aturally to larger sums. 20. Show that S 2 (X + Y) = S 2 (X) + S 2 (Y) + 2 Variace I this subsectio we will derive the followig formuala for the variace of the sample covariace. The derivatio was cotributed by Rajith Uikrisha, ad is similar to the derivatio of the variace of the sample variace. var() = 1 1 2 d(x, Y) + var(x) var(y) ( 1 1 cov2 (X, Y) ) 21. Verify the followig result. Hit: Start with the expressio o the right. Expad the product (X i X j) (Y i Y j), ad take the sums term by term. 1 = 2 ( 1) i =1 j =1 (X i X j) (Y i Y j) It follows that var() is the sum of all of the pairwise covariaces of the terms i the expasio of Exercise 21. 22. Now, derive the formula for var() by showig that cov((x i X j) (Y i Y j), (X k X l ) (Y k Y l )) = 0 if i = j or k = l or i, j, k, l are distict. cov((x i X j) (Y i Y j), (X i X j) (Y i Y j)) = 2 d(x, Y) + 2 var(x) var(y) if i j, ad there are 2 ( 1) such terms i the sum of covariaces. cov((x i X j) (Y i Y j), (X k X j) (Y k Y j)) = d(x, Y) cov 2 (X, Y) if i, j, k are distict, ad there are 4 ( 1) ( 2) such terms i the sum of covariaces. 23. Show that var() > var(w(x, Y)). Does this seem reasoable? 24. Show that var() 0 as. Thus, the sample covariace is a cosistet estimator of the distributio covariace. Sample Correlatio By aalogy with the distributio correlatio, the sample correlatio is obtaied by dividig the sample covariace by the product of the sample stadard deviatios:

5 of 8 7/16/2009 6:06 AM R(X, Y) = S(X) S(Y) 25. Use the strog law of large umbers to show that R(X, Y) cor(x, Y) as with probability 1. 26. Click i the iteractive scatterplot to defie 20 poits ad try to come as close as possible to the followig coditios: sample meas 0, sample stadard deviatios 1, sample correlatio as follows: 0, 0.5, 0.5, 0.7, 0.7, 0.9, 0.9. 27. Click i the iteractive scatterplot to defie 20 poits ad try to come as close as possible to the followig coditios: X sample mea 1, Y sample mea 3, Xsample stadard deviatio 2, Y sample stadard deviatio 1, sample correlatio as follows: 0, 0.5, 0.5, 0.7, 0.7, 0.9, 0.9. The Best Liear Predictor The Distributio Versio Recall that i the sectio o (distributio) correlatio ad regressio, we showed that the best liear predictor of Y based o X, i the sese of miimizig mea square error, is the radom variable L(Y X) = (Y) + (X (X)) var(x) Moreover, the (miimum) value of the mea square error is ((Y L(Y X)) 2 ) = var(y) (1 cor(x, Y) 2 ) The distributio regressio lie is give by y = L(Y X = x) = (Y) + (x (X)) var(x) The S ample Versio Of course, i real applicatios, we are ulikely to kow the distributio parameters (X), (Y), var(x), ad. Thus, i this sectio, we are iterested i the problem of estimatig the best liear predictor of Y based o X from our radom sample ((X 1, Y 1 ), (X 2, Y 2 ),..., (X, Y )). Oe atural approach is to fid the lie y = A x + B that fits the sample poits best. This is a basic ad importat problem i may areas of mathematics, ot just statistics. The term best meas that we wat to fid the lie (that is, fid A ad B) that miimizes the average of the squared errors betwee the actual y values i our data ad the predicted y values: MSE(A, B) = 1 i =1 (Y i (A X i + B)) 2 Fidig A ad B that miimize M SE is a stadard problem i calculus.

6 of 8 7/16/2009 6:06 AM 28. Show that MSE is miimized for A(X, Y) = S 2, B(X, Y) = M(Y) (X) S 2 M(X) (X) ad thus the sample regressio lie is y = M(Y) + S 2 (x M(X)) (X) 29. Show that the miimum mea square error, usig the coefficiets i the previous exercise, is MSE(A(X, Y), B(X, Y)) = S 2 (Y) (1 R 2 (X, Y)) 30. Use the result of the previous exercise to show that 1 R(X, Y) 1 R(X, Y) = 1 if ad oly if the sample poits lie o a lie with egative slope. R(X, Y) = 1 if ad oly if the sample poits lie o a lie with positive slope. Thus, the sample correlatio measures the degree of liearity of the sample poits. The results i the previous exercise ca also be obtaied by otig that the sample correlatio is simply the correlatio of the empirical distributio. Of course, properties (a), (b), ad (c) are kow for the distributio correlatio. The fact that the results i Exercise 28 ad Exercise 29 are the sample aalogies of the correspodig distributio results is beautiful ad reassurig. Note that the sample regressio lie passes through (M(X), M(Y)), the ceter of the empirical distributio. Naturally, the coefficiets of the sample regressio lie ca be viewed as estimators of the respective coefficiets i the distributio regressio lie. 31. Assumig that the appropriate higher order momets are fiite, use the law of large umbers to show that, with probability 1, the coefficiets of the sample regressio lie coverge to the coefficiets of the distributio regressio lie: S 2 as (X) var(x) M(Y) S 2 M(X) (Y) (X) as (X) var(x) As with the distributio regressio lies, the choice of predictor ad respose variables is importat. 32. Show that the sample regressio lie for Y based o X ad the sample regressio lie for X based o Y are ot the same lie, except i the trivial case where the sample poits all lie o a lie. Recall that the costat B that miimizes MSE(B) = 1 i =1 (Y i B) 2

7 of 8 7/16/2009 6:06 AM is the sample mea M(Y), ad the miimum value of the mea square error is the sample variace S 2 (Y). Thus, the differece betwee this value of the mea square error ad the oe i Exercise 29, amely S 2 (Y) R 2 (X, Y) is the reductio i the variability of the Y data whe the liear term i X is added to the predictor. The fractioal reductio is R 2 (X, Y), ad hece this statistics is called the (sample) coefficiet of determiatio. Exercises S imulatio Exercises 33. Click i the iteractive scatterplot, i various places, ad watch how the regressio lie chages. 34. Click i the iteractive scatterplot to defie 20 poits. Try to geerate a scatterplot i which the mea of the x values is 0, the stadard deviatio of the x values is 1, ad i which the regressio lie has slope 1, itercept 1 slope 3, itercept 0 slope 2, itercept 1 35. Click i the iteractive scatterplot to defie 20 poits with the followig properties: the mea of the x values is 1, the mea of the y values is 1, ad the regressio lie has slope 1 ad itercept 2. If you had a difficult time with the previous exercise, it's because the coditios imposed are impossible to satisfy! 36. Ru the bivariate uiform experimet 2000 times, with a update frequecy of 10, i each of the followig cases. Note the apparet covergece of the sample meas to the distributio meas, the sample stadard deviatios to the distributio stadard deviatios, the sample correlatio to the distributio correlatio, ad the sample regressio lie to distributio regressio lie. The uiform distributio o the square The uiform distributio o the triagle. The uiform distributio o the circle. 37. Ru the bivariate ormal experimet 2000 times, with a update frequecy of 10, i each of the followig cases. Note the apparet covergece of the sample meas to the distributio meas, the sample stadard deviatios to the distributio stadard deviatios, the sample correlatio to the distributio correlatio, ad the sample regressio lie to the distributio regressio lie. sd(x) = 1, sd(y) = 2, cor(x, Y) = 0.5 sd(x) = 1.5, sd(y) = 0.5, cor(x, Y) = 0.7 Data Aalysis Exercises

8 of 8 7/16/2009 6:06 AM 38. Compute the correlatio betwee petal legth ad petal width for the followig cases i Fisher's iris dat Commet o the differeces. d. All cases Setosa oly Vergiica oly Versicolor oly 39. Compute the correlatio betwee each pair of color cout variables i the M&M data 40. Cosider all cases i Fisher's iris dat Compute the least squares regressio lie with petal legth as the predictor variable ad petal width as the respose variable. Draw the scatterplot ad the regressio lie together. Predict the petal width of a iris with petal legth 40 41. Cosider the Setosa cases oly i Fisher's iris dat Compute the least squares regressio lie with sepal legth as the predictor variable ad sepal width as the ukow variable. Draw the scatterplot ad regressio lie together. Predict the sepal width of a iris with sepal legth 45. Virtual Laboratories > 6. Radom Samples > 1 2 3 4 5 6 7 Cotets Applets Data Sets Biographies Exteral Resources Key words Feedback