Covariance and correlation

Similar documents
1 Correlation and Regression Analysis

Lesson 17 Pearson s Correlation Coefficient

Math C067 Sampling Distributions

GCSE STATISTICS. 4) How to calculate the range: The difference between the biggest number and the smallest number.

Measures of Spread and Boxplots Discrete Math, Section 9.4

Confidence Intervals for One Mean

Z-TEST / Z-STATISTIC: used to test hypotheses about. µ when the population standard deviation is unknown

5.4 Amortization. Question 1: How do you find the present value of an annuity? Question 2: How is a loan amortized?

Hypothesis testing. Null and alternative hypotheses

In nite Sequences. Dr. Philippe B. Laval Kennesaw State University. October 9, 2008

Question 2: How is a loan amortized?

I. Chi-squared Distributions

Definition. A variable X that takes on values X 1, X 2, X 3,...X k with respective frequencies f 1, f 2, f 3,...f k has mean

1. C. The formula for the confidence interval for a population mean is: x t, which was

1 Computing the Standard Deviation of Sample Means

Learning objectives. Duc K. Nguyen - Corporate Finance 21/10/2014

THE REGRESSION MODEL IN MATRIX FORM. For simple linear regression, meaning one predictor, the model is. for i = 1, 2, 3,, n

Confidence Intervals. CI for a population mean (σ is known and n > 30 or the variable is normally distributed in the.

Biology 171L Environment and Ecology Lab Lab 2: Descriptive Statistics, Presenting Data and Graphing Relationships

Determining the sample size

CHAPTER 7: Central Limit Theorem: CLT for Averages (Means)

5: Introduction to Estimation

CHAPTER 11 Financial mathematics

hp calculators HP 12C Statistics - average and standard deviation Average and standard deviation concepts HP12C average and standard deviation

Listing terms of a finite sequence List all of the terms of each finite sequence. a) a n n 2 for 1 n 5 1 b) a n for 1 n 4 n 2

Output Analysis (2, Chapters 10 &11 Law)

BINOMIAL EXPANSIONS In this section. Some Examples. Obtaining the Coefficients

Elementary Theory of Russian Roulette

SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES

Chapter 7 Methods of Finding Estimators

The following example will help us understand The Sampling Distribution of the Mean. C1 C2 C3 C4 C5 50 miles 84 miles 38 miles 120 miles 48 miles

Analyzing Longitudinal Data from Complex Surveys Using SUDAAN

Repeating Decimals are decimal numbers that have number(s) after the decimal point that repeat in a pattern.

University of California, Los Angeles Department of Statistics. Distributions related to the normal distribution

.04. This means $1000 is multiplied by 1.02 five times, once for each of the remaining sixmonth

PSYCHOLOGICAL STATISTICS

Here are a couple of warnings to my students who may be here to get a copy of what happened on a day that you missed.

Maximum Likelihood Estimators.

Incremental calculation of weighted mean and variance

INVESTMENT PERFORMANCE COUNCIL (IPC)

CS103A Handout 23 Winter 2002 February 22, 2002 Solving Recurrence Relations

Example 2 Find the square root of 0. The only square root of 0 is 0 (since 0 is not positive or negative, so those choices don t exist here).

Center, Spread, and Shape in Inference: Claims, Caveats, and Insights

Chapter 5 Unit 1. IET 350 Engineering Economics. Learning Objectives Chapter 5. Learning Objectives Unit 1. Annual Amount and Gradient Functions

Confidence Intervals for Linear Regression Slope

, a Wishart distribution with n -1 degrees of freedom and scale matrix.

Present Value Factor To bring one dollar in the future back to present, one uses the Present Value Factor (PVF): Concept 9: Present Value

Descriptive Statistics

Week 3 Conditional probabilities, Bayes formula, WEEK 3 page 1 Expected value of a random variable

Case Study. Normal and t Distributions. Density Plot. Normal Distributions

Non-life insurance mathematics. Nils F. Haavardsson, University of Oslo and DNB Skadeforsikring

NATIONAL SENIOR CERTIFICATE GRADE 11

Predictive Modeling Data. in the ACT Electronic Student Record

Solving Logarithms and Exponential Equations

Chapter 14 Nonparametric Statistics

Bond Valuation I. What is a bond? Cash Flows of A Typical Bond. Bond Valuation. Coupon Rate and Current Yield. Cash Flows of A Typical Bond

CHAPTER 3 DIGITAL CODING OF SIGNALS

where: T = number of years of cash flow in investment's life n = the year in which the cash flow X n i = IRR = the internal rate of return

Terminology for Bonds and Loans

Properties of MLE: consistency, asymptotic normality. Fisher information.

Mann-Whitney U 2 Sample Test (a.k.a. Wilcoxon Rank Sum Test)

Project Deliverables. CS 361, Lecture 28. Outline. Project Deliverables. Administrative. Project Comments

Inference on Proportion. Chapter 8 Tests of Statistical Hypotheses. Sampling Distribution of Sample Proportion. Confidence Interval

Chapter 6: Variance, the law of large numbers and the Monte-Carlo method

Research Method (I) --Knowledge on Sampling (Simple Random Sampling)

Convexity, Inequalities, and Norms

CHAPTER 3 THE TIME VALUE OF MONEY

Tradigms of Astundithi and Toyota

Chapter 7: Confidence Interval and Sample Size

Basic Elements of Arithmetic Sequences and Series

Mathematical goals. Starting points. Materials required. Time needed

Sequences and Series

One-sample test of proportions

INVESTMENT PERFORMANCE COUNCIL (IPC) Guidance Statement on Calculation Methodology

BENEFIT-COST ANALYSIS Financial and Economic Appraisal using Spreadsheets

CS100: Introduction to Computer Science

Trigonometric Form of a Complex Number. The Complex Plane. axis. ( 2, 1) or 2 i FIGURE The absolute value of the complex number z a bi is

Lecture 4: Cauchy sequences, Bolzano-Weierstrass, and the Squeeze theorem

Rainbow options. A rainbow is an option on a basket that pays in its most common form, a nonequally

I. Why is there a time value to money (TVM)?

Professional Networking

This document contains a collection of formulas and constants useful for SPC chart construction. It assumes you are already familiar with SPC.

Sampling Distribution And Central Limit Theorem

Chapter 5: Inner Product Spaces

THE ARITHMETIC OF INTEGERS. - multiplication, exponentiation, division, addition, and subtraction

Statistical inference: example 1. Inferential Statistics

Normal Distribution.

3. Greatest Common Divisor - Least Common Multiple

LECTURE 13: Cross-validation

Simple Annuities Present Value.

Present Values, Investment Returns and Discount Rates

7.1 Finding Rational Solutions of Polynomial Equations

Confidence Intervals

Vladimir N. Burkov, Dmitri A. Novikov MODELS AND METHODS OF MULTIPROJECTS MANAGEMENT

Soving Recurrence Relations

How to read A Mutual Fund shareholder report

How to use what you OWN to reduce what you OWE

Section 11.3: The Integral Test

G r a d e. 2 M a t h e M a t i c s. statistics and Probability

Exam 3. Instructor: Cynthia Rudin TA: Dimitrios Bisias. November 22, 2011

Transcription:

Covariace ad correlatio The mea ad sd help us summarize a buch of umbers which are measuremets of just oe thig. A fudametal ad totally differet questio is how oe thig relates to aother. Stat 0: Quatitative Methods for Ecoomists Class : Correlatio ad Covariace, Portfolios Previously, we used a scatterplot to look at two thigs: the mea ad sd of differet assets. I this sectio of the otes we look at scatterplots ad how correlatio ca be used to summarize them. 2 Example 20 I geeral we have observatios ( x, y ) i i the ith observatio is a pair of umbers Is the umber of beers you ca drik related to your weight? beer 0 0 00 0 weight 200 Our data looks like: x y i 2.0 2 2.0 0 2.0 3.0 20.0 0 3.0.0 00 2.0... The plot eables us to see the relatioship betwee x ad y. 3 I the beer example, it does look like there is a relatioship. Eve more, the relatioship looks liear i that it looks like we could draw a lie through the plot to capture the patter. Covariace ad correlatio summarize how strog a liear relatioship there is betwee two variables. I the example weight ad beers were the two variables. Covariace Cosider two variables, X ad Y. The cocept of covariace asks: Is Y larger (or smaller) whe X is larger? We measure this usig somethig called covariace s xy I geeral we thik of them as x ad y. Covariace > 0 Larger X Covariace < 0 Larger X Larger Y Smaller Y

Here is the actual formula but you will ever calculate covariace by had Uderstadig covariace x The sample covariace betwee x ad y is: s = x x y y ( )( ) xy i i s = x x y y ( )( ) xy i i x x y What are the uits of covariace? y y I this example, we look at the relatioship betwee team payroll ad team performace i Major League Baseball usig data from the 200 seaso (for a total of 30 teams). The variables of iterest: Payroll team payroll (i millios of dollars) WiPct team wiig percetage (e.g., 0.3 meas.3% of games were wo) The Data wipct.3.... 0 0 00 0 200 payroll Would you say the covariace is positive, egative or zero? 0 Calculatig Covariace i Stata The Covariace Matrix It turs out that Cov(X,X)=Var(X). Weird, I kow. So: Variace of WiPct This is called a covariace matrix Covariace of WiPct ad Payroll Variace of Payroll 32.2*32.2=0.3 2 2

Beware of Iterpretig Covariace Makig Size Matter Covariace depeds o the uits! Does a covariace of. imply a strog or weak relatioship? Solutio: The correlatio coefficiet r xy = s xy s s x y covariace Oly the sig of covariace matters Stadard deviatio of x Stadard deviatio of y 3 The Correlatio Correlatio i Stata A umerical summary of the stregth of a liear relatioship betwee two variables Correlatios are boud betwee ad Sig: directio of the relatioship (+ or -) Absolute value: stregth of the relatioship. Example: -0. is a stroger relatioship tha +0. What is the correlatio of Payroll with Payroll or WiPct with WiPct? Rule of Thumb Magitude of r Iterpretatio.00-.20 Very weak.20-.0 Weak to moderate.0-.0 Medium to substatial.0-.0 Very Strog.0-.00 Extremely Strog The correlatio correspodig to the scatterplot we looked at earlier is: Correlatio of beer ad weight = 0.2 beer 20 0 0 00 0 weight 200 3

Cautio : Correlatio oly measures liear relatioships! These four data sets all have correlatio r=0. x y x2 y2 x3 y3 x y 0.00.0 0.00. 0.00..00..00..00..00..00. 3.00. 3.00. 3.00 2..00..00..00..00..00..00.33.00.2.00..00..00..00.0.00..00.0.00.2.00.3.00.0.00.2.00.2.00 3.0.00.3.00 2.0 2.00 0. 2.00.3 2.00..00..00.2.00.2.00.2.00..00..00..00.3.00. Pearso correlatio of x ad y = 0. Pearso correlatio of x2 ad y2 = 0. Pearso correlatio of x3 ad y3 = 0. Pearso correlatio of x ad y = 0. y3 y But the scatterplots all differ (bigtime!) 0 x 3 2 0 x3 y y2 3 x2 3 2 0 0 x 20 20 Example: The Coutries Mothly Returs Data Which coutries go up ad dow together? Lots of Data I have data o 23 coutries. That would be a lot of plots!!! 0. caada 0.0-0. -0. 0.0 usa 0. 2 22 To summarize we ca compute all pairwise correlatios: Correlatio ad Causatio Why blak here? There is strog correlatio betwee: The umber of teachers i a school district ad the umber of failig studets. The umber of automobiles i Califoria per year ad the umber of homicides. Kids feet legths ad readig ability Correlatio does ot imply causatio. 23 2

Cautio o Usig Correlatio Cosider the followig scatter plot of IQ scores versus shoe size Spurious Correlatio A large correlatio does ot allow us to make a causal statemet. As a perso ages their shoe size icreases as well as their IQ. Although there is a positive associatio, there is o causal lik betwee the two variables shoes size ad IQ. Example : i America tows ad cities, the correlatio betwee the umber of churches ad the umber of violet crimes is about +.. Does it make sese to say that havig more churches causes more violet crime? I 2000, U.S. News ad World Report published this table of factors that seem to be good predictors of who will wi the presidetial electio: Watch out for erroeous ifereces such as cocludig that some attribute observed i sample data exists i the overall populatio. 2 2 Correlatio Summary Scatter diagrams show relatioships betwee variables The covariace gives you the directio of a liear relatioship betwee the two variables The correlatio coefficiet measures the stregth of a liear relatioship Correlatio rages betwee - ad Covariace ca be ay umber Both covariace ad correlatio measure associatio, ot causatio They ca be misleadig if there are outliers or a oliear associatio 2 30

Combiig Data Sets Covariace (correlatio) shows up whe we combie two data sets together. That is, suppose X ad Y are two data sets, both of size. Create Z = ax+by for costats a ad b. Mea( Z) = Z = ax + by Var Z s a Var X b Var Y abcov X Y 2 ( ) = Z = ( ) + ( ) + 2 (, ) = a s + b s + 2( ab) s X Y XY For the curious. Suppose you have two data sets, x ad y (each of sample size ), ad wat to create a ew data set Z = ax+by a b Z = Z = ( ax + by ) = X + Y = ax + by i i i i i i i i i = = = = 3 32 For the curious (cot) Numerical Example Suppose you have two data sets, x ad y (each of sample size ), ad wat to create a ew data set Z = ax+by Var( Z) = ( Z Z ) = [( ax + by ) ( ax + by )] i i i 2 = [( ax i ax ) + ( byi by )] = + + ( ax ax ) ( byi by ) 2( ax ax )( byi by ) i i a b 2ab = ( X X ) + ( Y Y ) + ( X X )( Y Y ) i i i i = a Var( X ) + b Var( Y ) + 2 abcov( X, Y ) 33 3 Example (cot) Create Z =.3X+.Y Huh? These cocepts drive a lot of portfolio theory as we will soo see. Z = 3.2 =.3*3. +.*3. Var( Z ) =.3 (2.3 ) +. (2.2 ) + 2(.3)(.)(.) 2 = 20. 3 3

Thigs you should kow Applicatio of Covariace: Portfolios Covariace ad correlatio Correlatio is ot causatio Mea(aX+bY) = amea(x)+bmea(y) Var(aX+bY) = a mess, see the prior slides 3 3 Moder Portfolio Theory (MPT) Fiace professor Harry Markowitz bega a revolutio (i!) by suggestig that the value of a security (stock) to a ivestor might best be evaluated by its mea, its stadard deviatio, ad its correlatio to other securities i the portfolio. This audacious suggestio amouted to igorig a lot of iformatio about the firm -- its earigs, its divided policy, its capital structure, its market, its competitors -- ad calculatig a few simple statistics. I this sectio, we give a basic itroductio to what the fiace guys refer to as moder portfolio theory. Std Dev as a measure of risk The stadard deviatio is ofte used by ivestors to measure the risk of a stock or a stock portfolio. The basic idea is that the stadard deviatio is a measure of volatility: the more a stock's returs vary from the stock's average retur, the more volatile the stock. Cosider the followig two stocks ad their respective returs (i per cet) over the last six moths. Stock A Stock B Moth Value Retur (%) Fial value Moth Value Retur (%) Fial value Jue $,000.00 0. $,00.0 Jue $,000.00.0 $,0.00 July $,00.0.00 $,0. July $,0.00.00 $,0. Aug $,0. 3.00 $,0.0 Aug $,0. 2.00 $,3. Sept $,0.0 -.0 $,032.3 Sept $,3. -.00 $,0.2 Oct $,032.3 0.0 $,03. Oct $,0.2 -.00 $,02. Nov $,03. 2.00 $,0.2 Nov $,02..0 $,0. Which stock would you say is more volatile? 3 0 Descriptive Statistics: Returs, Returs2 Variable N Mea Media TrMea StDev Returs 0. 0. 0..20 Returs2..0..2 Both portfolios ed up icreasig i value from $,000 to $,0. However, they clearly differ i volatility. Portfolio A's mothly returs rage from -.% to 3% whereas Portfolio B's rage from -% to 2%. This volatility is represeted by the large differeces i stadard deviatios; the stadard deviatio of the returs for Portfolio A is.2; for Portfolio B it is.2. Formig portfolios Suppose you have $00 to ivest. Let R A be the retur o asset A. If R A =., ad you put all your moey ito asset A you will have $0 at the ed of the period. Let R B be the retur o asset B. If R B =., ad you put all your moey ito asset B you will have $ at the ed of the period. Suppose you put /2 your moey ito A ad /2 ito B. How much will you make? 2

Portfolio weights At the ed of the period you will have (00)*.*(+.) + (00)*.*(+.) =00*(+.*.+.*.) so the retur is.*. +.*.=.2. The average of the two returs To geeralize, let w A be the fractio of your wealth you ivest i asset A. Let w B be the fractio of your wealth you ivest i asset B. The w s are called the portfolio weights, ad we usually require that they sum to. Retur o a portfolio Hece the retur o a portfolio is give by Rp = wara + wbrb (a weighted sum of the two differet returs) People like to study the mea ad variace of portfolio returs, ad i the ext few slides we give you the formulas to do so. 3 Example Example (cot) Let s use our coutry data ad suppose that we had put. of our moey ito USA ad. ito Hog Kog. What would our returs have bee? The goal of this sectio of the otes is to lear what factors cotrol the risk ad retur of a portfolio. That is, it is easy to calculate i Stata, but what is goig o behid the scees? Comparig returs How do the returs o this portfolio compare with those of hogkog ad usa? It looks like the mea for my portfolio is right i betwee the meas of usa ad hogkog. What about the sd? Mea 0.02 0.020 0.0 0.0 0.0 0.0 0.0 0.0 0.03 0.03 usa 0.0 port 0.0 StDev 0.0 hokog 0.0 Now try three stocks Let s try a portfolio with three stocks. The weights must add up to oe, but they ca be egative, this is called goig short i the asset. port = -.*caada+usa+.*hokog Clearly, formig portfolios is a iterestig thig to do!! Mea 0.020 0.0 0.00 0.03 usa caada 0.0 port 0.0 StDev 0.0 hokog 0.0

What does goig short mea? Why form portfolios? Maybe the portfolio has a ice mea ad variace. There are some basic formulas that relate the mea ad stadard deviatio of the portfolio returs to the meas, variaces, ad covariaces of the returs of the assets that make up the portfolios. 0 Sectios Covered from the Book Sectio.. (page 22) o correlatio