Lesson 17 Pearson s Correlation Coefficient



Similar documents
Lesson 15 ANOVA (analysis of variance)

1 Correlation and Regression Analysis

GCSE STATISTICS. 4) How to calculate the range: The difference between the biggest number and the smallest number.

1. C. The formula for the confidence interval for a population mean is: x t, which was

Hypothesis testing. Null and alternative hypotheses

SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES

I. Chi-squared Distributions

Z-TEST / Z-STATISTIC: used to test hypotheses about. µ when the population standard deviation is unknown

Definition. A variable X that takes on values X 1, X 2, X 3,...X k with respective frequencies f 1, f 2, f 3,...f k has mean

PSYCHOLOGICAL STATISTICS

Chapter 7: Confidence Interval and Sample Size

Repeating Decimals are decimal numbers that have number(s) after the decimal point that repeat in a pattern.

Inference on Proportion. Chapter 8 Tests of Statistical Hypotheses. Sampling Distribution of Sample Proportion. Confidence Interval

Case Study. Normal and t Distributions. Density Plot. Normal Distributions

Confidence Intervals. CI for a population mean (σ is known and n > 30 or the variable is normally distributed in the.

Maximum Likelihood Estimators.

University of California, Los Angeles Department of Statistics. Distributions related to the normal distribution

In nite Sequences. Dr. Philippe B. Laval Kennesaw State University. October 9, 2008

Math C067 Sampling Distributions

I. Why is there a time value to money (TVM)?

Properties of MLE: consistency, asymptotic normality. Fisher information.

1 Computing the Standard Deviation of Sample Means

Chapter 14 Nonparametric Statistics

5.4 Amortization. Question 1: How do you find the present value of an annuity? Question 2: How is a loan amortized?

Now here is the important step

CHAPTER 11 Financial mathematics

OMG! Excessive Texting Tied to Risky Teen Behaviors

Present Value Factor To bring one dollar in the future back to present, one uses the Present Value Factor (PVF): Concept 9: Present Value

Center, Spread, and Shape in Inference: Claims, Caveats, and Insights

Analyzing Longitudinal Data from Complex Surveys Using SUDAAN

CHAPTER 3 THE TIME VALUE OF MONEY

Chapter 6: Variance, the law of large numbers and the Monte-Carlo method

Example 2 Find the square root of 0. The only square root of 0 is 0 (since 0 is not positive or negative, so those choices don t exist here).

Measures of Spread and Boxplots Discrete Math, Section 9.4

One-sample test of proportions

5: Introduction to Estimation

Week 3 Conditional probabilities, Bayes formula, WEEK 3 page 1 Expected value of a random variable

THE REGRESSION MODEL IN MATRIX FORM. For simple linear regression, meaning one predictor, the model is. for i = 1, 2, 3,, n

Exam 3. Instructor: Cynthia Rudin TA: Dimitrios Bisias. November 22, 2011

Confidence Intervals for One Mean

CHAPTER 7: Central Limit Theorem: CLT for Averages (Means)

CS103X: Discrete Structures Homework 4 Solutions

Simple Annuities Present Value.

.04. This means $1000 is multiplied by 1.02 five times, once for each of the remaining sixmonth

Predictive Modeling Data. in the ACT Electronic Student Record

Determining the sample size

BINOMIAL EXPANSIONS In this section. Some Examples. Obtaining the Coefficients

Output Analysis (2, Chapters 10 &11 Law)

MEI Structured Mathematics. Module Summary Sheets. Statistics 2 (Version B: reference to new book)

A Test of Normality. 1 n S 2 3. n 1. Now introduce two new statistics. The sample skewness is defined as:

The following example will help us understand The Sampling Distribution of the Mean. C1 C2 C3 C4 C5 50 miles 84 miles 38 miles 120 miles 48 miles

THE ARITHMETIC OF INTEGERS. - multiplication, exponentiation, division, addition, and subtraction

Quadrat Sampling in Population Ecology

Normal Distribution.

Here are a couple of warnings to my students who may be here to get a copy of what happened on a day that you missed.

Mann-Whitney U 2 Sample Test (a.k.a. Wilcoxon Rank Sum Test)

Professional Networking

Practice Problems for Test 3

Soving Recurrence Relations

WindWise Education. 2 nd. T ransforming the Energy of Wind into Powerful Minds. editi. A Curriculum for Grades 6 12

Listing terms of a finite sequence List all of the terms of each finite sequence. a) a n n 2 for 1 n 5 1 b) a n for 1 n 4 n 2

Basic Elements of Arithmetic Sequences and Series

hp calculators HP 12C Statistics - average and standard deviation Average and standard deviation concepts HP12C average and standard deviation

*The most important feature of MRP as compared with ordinary inventory control analysis is its time phasing feature.

Learning objectives. Duc K. Nguyen - Corporate Finance 21/10/2014

Non-life insurance mathematics. Nils F. Haavardsson, University of Oslo and DNB Skadeforsikring


, a Wishart distribution with n -1 degrees of freedom and scale matrix.

Chapter 5 Unit 1. IET 350 Engineering Economics. Learning Objectives Chapter 5. Learning Objectives Unit 1. Annual Amount and Gradient Functions

Unit 8: Inference for Proportions. Chapters 8 & 9 in IPS

A Guide to the Pricing Conventions of SFE Interest Rate Products

Forecasting techniques

The Stable Marriage Problem

Sole trader financial statements

Chapter 7 Methods of Finding Estimators

FM4 CREDIT AND BORROWING

A probabilistic proof of a binomial identity

3. Greatest Common Divisor - Least Common Multiple

Descriptive Statistics

National Institute on Aging. What Is A Nursing Home?

BENEFIT-COST ANALYSIS Financial and Economic Appraisal using Spreadsheets

CDs Bought at a Bank verses CD s Bought from a Brokerage. Floyd Vest

Chapter 7 - Sampling Distributions. 1 Introduction. What is statistics? It consist of three major areas:

Factoring x n 1: cyclotomic and Aurifeuillian polynomials Paul Garrett <garrett@math.umn.edu>

Lecture 4: Cauchy sequences, Bolzano-Weierstrass, and the Squeeze theorem

Elementary Theory of Russian Roulette

Incremental calculation of weighted mean and variance

Statistical inference: example 1. Inferential Statistics

Approximating Area under a curve with rectangles. To find the area under a curve we approximate the area using rectangles and then use limits to find

This document contains a collection of formulas and constants useful for SPC chart construction. It assumes you are already familiar with SPC.

Savings and Retirement Benefits

How to use what you OWN to reduce what you OWE

NATIONAL SENIOR CERTIFICATE GRADE 12

FIBONACCI NUMBERS: AN APPLICATION OF LINEAR ALGEBRA. 1. Powers of a matrix

Trigonometric Form of a Complex Number. The Complex Plane. axis. ( 2, 1) or 2 i FIGURE The absolute value of the complex number z a bi is

TIAA-CREF Wealth Management. Personalized, objective financial advice for every stage of life

Confidence Intervals

How To Solve The Homewor Problem Beautifully

Transcription:

Outlie Measures of Relatioships Pearso s Correlatio Coefficiet (r) -types of data -scatter plots -measure of directio -measure of stregth Computatio -covariatio of X ad Y -uique variatio i X ad Y -measurig variability Example Problem -steps i hypothesis testig -r Lesso 17 Pearso s Correlatio Coefficiet Note that some of the formulas I use differ from your text. Both sets of formulas are i the homework packet, ad you should use the formulas you feel most comfortable usig. Measures of Relatioships Up to this poit i the course our statistical tests have focused o demostratig differeces i effects of a depedet variable by a idepedet variable. I this way, we could ifer that by chagig the idepedet variable we could have a direct affect o the idepedet variable. With the statistics we have leared we ca make statemets about causality. Pearso s Correlatio Coefficiet (r) Types of data For the rest of the course we will be focused o demostratig relatioships betwee variables. Although we will kow if there is a relatioship betwee variables whe we compute a correlatio, we will ot be able to say that oe variable actually causes chages i aother variable. The statistics that reveal relatioships betwee variables are more versatile, but ot as defiitive as those we have already leared. Although correlatio will oly reveal a relatioship, ad ot causality, we will still be usig measuremet data. Recall that measuremet data comes from a measuremet we make o some scale. The type of data the statistic uses is oe way we will distiguish these types of measures, so keep it i mid for the ext statistic we lear (chi-square). Oe feature about the data that does differ from prior statistics is that we will have two values from each subject i our sample. So, we will eed both a X distributio ad Y distributio to express two values we measure from the same uit i the populatio. For

example, if I wat to examie the relatioship betwee amout of time spet studyig for a exam (X) i hours ad the score that perso makes o a exam (Y) we might have: X Y 5 3 70 3 75 4 70 5 85 85 7 90 Scatter plots A easy way to get a idea about the relatioship betwee two variables is to create a scatter plot of the relatioship. With a scatter plot we will graph our values o a X, Y coordiate plae. For example, say we measure the umber of hours a perso studies (X) ad plot that with their resultig correct aswers o a trivia test. (Y). X Y 0 0 1 1 1 3 3 5 4 5 5 Plot each X ad Y poit by drawig ad X,Y axis ad placig the x-variable o the x- axis, ad the y-variable o the y-axis. So, whe we are at 0 o the X-axis for the first perso, we are at 0 o the y-axis. The ext perso is at 1 o the X-axis ad 1 o the Y- axis. Plot each poit this way to form a scatter plot.

7 Number of Correc Aswers 5 4 3 1 0 0 4 Number of Hours Studyig I the resultig graph you ca see that as we icrease values o the x-axis, it correspods to a icrease i the y-axis. For a scatter plot like this oe we say that the relatioship or correlatio is positive. For positive correlatios, as values o the x-axis icrease, values o y-icrease also. So, as the umber of hours of study icreases, the umber of correct aswers o the exam icreases. The opposite is true as well. If oe variable goes dow the other goes dow as well. Both variables move i the same directio. Let s look at the opposite type of effect. I this example the X-variable is umber of alcoholic driks cosumed, ad the Y-variable is umber of correct aswers o a simple math test. Number of Correct Aswers 1 10 8 4 0 0 4 8 Number of Driks

This scatter plot represets a egative correlatio. As the values o X icrease, the values o Y decrease. So, as umber of driks cosumed icreases, umber of correct aswers decreases. The variables are movig i opposite directios. Measures of Stregth Scatter plots gave us a good idea about the measure of the directio of the relatioship betwee two variables. They also give a good idea of how strogly related two variables are to oe aother. Notice i the above graphs that you could draw a straight lie to represet the directio the plotted poits move. Number of Correct Aswers 1 10 8 4 0 0 4 8 Number of Driks The closer the poits come to a straight lie, the stroger the relatioship. We will express the stregth of the relatioship with a umber betwee 0 ad 1. A zero idicates o relatioship, ad a oe idicates a perfect relatioship. Most values will be a decimal value i betwee the two umbers. Note that the umber is idepedet of the directio of the effect. So, we may express a -1 value idicated a strog correlatio because of the umber ad a egative relatioship because of the sig. A value of +.03 would be a weak correlatio because the umber is small, ad it would be a positive relatioship because the sig is positive. Here are some more examples of scatter plots with estimated correlatio (r) values.

A B C Graph A represets a strog positive correlatio because the plots are very close together (perhaps +.85). Graph B represets a weaker positive correlatio ( +.30). Graph C represets a strog egative correlatio ( -.90). Computatio Whe we compute the correlatio it will be the ratio of covariatio i the X ad Y variable, to the idividual variability i X ad the idividual variability i Y. By covariatio we mea the amout that X ad Y vary together. So, the correlatio looks at the how much the two variables vary together relative to the amout they vary idividually. If the covariatio is large relative to the idividual variability of each variabile, the the relatioship ad the value of r is strog. A simple example might be helpful to uderstad the cocept. For this example, X is populatio desity ad Y is umber babies bor. Idividual variability i X You ca thik of a lot of differet reasos why populatio desity might vary by itself. People live i more desely populated areas for may reaso icludig job opportuities, family reasos, or climate. Idividual variability i Y You ca also thik of a lot of reasos why birth rate may vary by itself. People may be iflueced to have childre because of persoal reasos, war, or ecoomic reasos.

Covariatio of X ad Y For this example it is easy to see why we would expect X ad Y to vary together as well. No matter what the birth rate might happe to be, we would expect that more people would yield more babies beig bor. Whe we compute the correlatio coefficiet we do t have to thik of all the reasos for variables to vary or covary, but simply to measure the variability. How do we measure variability i a distributio? I hope you kow the aswer to that questio by ow. We measure variability with sums of squares (ofte expressed as variace). So, whe we compute the correlatio we will isert the sums of squares for X ad Y i the deomiator. The umerator is the covariatio of X ad Y. For this value we could multiply the variability i the X-variable times the variability i the Y-variable, but see the formula below for a easier computatio. X XY X ( X ) ( ) Y Y Y The oly ew compoet here is the sum of the products of X ad Y. Sice each uit i our sample has both ad X ad a Y value, you will multiply these two umbers together for each uit i your sample. The add the values you multiplied together. See the example below as well. Example Problem The followig example icludes the chages we will eed to make for hypothesis testig with the correlatio coefficiet, as well as a example of how to do the computatios. Below are the data for six participats givig their umber of years i college (X) ad their subsequet yearly icome (Y). Icome here is i thousads of dollars, but this fact does ot require ay chages i our computatios. Test whether there is a relatioship with Alpha =.05. # of Years of College Icome X Y X Y XY 0 15 0 5 0 1 15 1 5 15 3 0 9 400 0 4 5 1 5 100 4 30 1 900 10 35 3 15 10 ΣX = 18 ΣY = 140 ΣX = 78 ΣY = 300 ΣXY = 505

Notice that I have icluded the computatio for obtaiig the summary values for you for completeess. Be sure you kow how to obtai all the summed values, as they will ot always be give o the exam. Step 1: State the Hypotheses i Words ad Symbols H 1 The correlatio betwee years of educatio ad icome is equal to zero i the populatio. H 0 : The correlatio betwee years of educatio ad icome ot equal to zero i the populatio. As usual the ull states that there is o effect or o relatioship, ad the research hypothesis states that there is a effect. Whe we write them i symbols we will use the Greek letter rho (ρ) to idicate the correlatio i the populatio. Thus: H 1 ρ 0 H 0 : ρ = 0 Step : Fid the Critical Value Agai, we will use a table to fid the critical value i Appedix A of your book. Locate the table, ad fid the degrees of freedom for the appropriate test to fid the critical value. For this test df =, where is the umber of pairs of scores we have. Df = = 4 r critical = + 0.811 Step 3: Ru the Statistical Test X XY X ( X ) ( ) Y Y Y (18)(140) 505 18 78 140 300

50 505 34 1900 78 300 505 40 [ 78 54][ 300 3.7] 85 (4)(333.33) = 85 7999.9 = 85 89.44 =.95 Step 4: Make a Decisio about the Null Reject the ull Sice the value we computed i Step 3 is larger tha the critical value i Step, we reject the ull. Step 5: Write a Coclusio There is a relatioship betwee years spet i college ad icome. The more years of school, the more the subsequet icome. r Ofte times we will square the r-value we compute i order to get a measure of the size of the effect. Just like with eta-square i ANOVA, we will compute the percetage of variability i Y, that is accouted for by X. For the curret example r =.90, so 90% of the variability i icome is accouted for by educatio.