Lesson 17 Pearson s Correlation Coefficient

Outlie Measures of Relatioships Pearso s Correlatio Coefficiet (r) -types of data -scatter plots -measure of directio -measure of stregth Computatio -covariatio of X ad Y -uique variatio i X ad Y -measurig variability Example Problem -steps i hypothesis testig -r Lesso 17 Pearso s Correlatio Coefficiet Note that some of the formulas I use differ from your text. Both sets of formulas are i the homework packet, ad you should use the formulas you feel most comfortable usig. Measures of Relatioships Up to this poit i the course our statistical tests have focused o demostratig differeces i effects of a depedet variable by a idepedet variable. I this way, we could ifer that by chagig the idepedet variable we could have a direct affect o the idepedet variable. With the statistics we have leared we ca make statemets about causality. Pearso s Correlatio Coefficiet (r) Types of data For the rest of the course we will be focused o demostratig relatioships betwee variables. Although we will kow if there is a relatioship betwee variables whe we compute a correlatio, we will ot be able to say that oe variable actually causes chages i aother variable. The statistics that reveal relatioships betwee variables are more versatile, but ot as defiitive as those we have already leared. Although correlatio will oly reveal a relatioship, ad ot causality, we will still be usig measuremet data. Recall that measuremet data comes from a measuremet we make o some scale. The type of data the statistic uses is oe way we will distiguish these types of measures, so keep it i mid for the ext statistic we lear (chi-square). Oe feature about the data that does differ from prior statistics is that we will have two values from each subject i our sample. So, we will eed both a X distributio ad Y distributio to express two values we measure from the same uit i the populatio. For

example, if I wat to examie the relatioship betwee amout of time spet studyig for a exam (X) i hours ad the score that perso makes o a exam (Y) we might have: X Y 5 3 70 3 75 4 70 5 85 85 7 90 Scatter plots A easy way to get a idea about the relatioship betwee two variables is to create a scatter plot of the relatioship. With a scatter plot we will graph our values o a X, Y coordiate plae. For example, say we measure the umber of hours a perso studies (X) ad plot that with their resultig correct aswers o a trivia test. (Y). X Y 0 0 1 1 1 3 3 5 4 5 5 Plot each X ad Y poit by drawig ad X,Y axis ad placig the x-variable o the x- axis, ad the y-variable o the y-axis. So, whe we are at 0 o the X-axis for the first perso, we are at 0 o the y-axis. The ext perso is at 1 o the X-axis ad 1 o the Y- axis. Plot each poit this way to form a scatter plot.

7 Number of Correc Aswers 5 4 3 1 0 0 4 Number of Hours Studyig I the resultig graph you ca see that as we icrease values o the x-axis, it correspods to a icrease i the y-axis. For a scatter plot like this oe we say that the relatioship or correlatio is positive. For positive correlatios, as values o the x-axis icrease, values o y-icrease also. So, as the umber of hours of study icreases, the umber of correct aswers o the exam icreases. The opposite is true as well. If oe variable goes dow the other goes dow as well. Both variables move i the same directio. Let s look at the opposite type of effect. I this example the X-variable is umber of alcoholic driks cosumed, ad the Y-variable is umber of correct aswers o a simple math test. Number of Correct Aswers 1 10 8 4 0 0 4 8 Number of Driks

This scatter plot represets a egative correlatio. As the values o X icrease, the values o Y decrease. So, as umber of driks cosumed icreases, umber of correct aswers decreases. The variables are movig i opposite directios. Measures of Stregth Scatter plots gave us a good idea about the measure of the directio of the relatioship betwee two variables. They also give a good idea of how strogly related two variables are to oe aother. Notice i the above graphs that you could draw a straight lie to represet the directio the plotted poits move. Number of Correct Aswers 1 10 8 4 0 0 4 8 Number of Driks The closer the poits come to a straight lie, the stroger the relatioship. We will express the stregth of the relatioship with a umber betwee 0 ad 1. A zero idicates o relatioship, ad a oe idicates a perfect relatioship. Most values will be a decimal value i betwee the two umbers. Note that the umber is idepedet of the directio of the effect. So, we may express a -1 value idicated a strog correlatio because of the umber ad a egative relatioship because of the sig. A value of +.03 would be a weak correlatio because the umber is small, ad it would be a positive relatioship because the sig is positive. Here are some more examples of scatter plots with estimated correlatio (r) values.

A B C Graph A represets a strog positive correlatio because the plots are very close together (perhaps +.85). Graph B represets a weaker positive correlatio ( +.30). Graph C represets a strog egative correlatio ( -.90). Computatio Whe we compute the correlatio it will be the ratio of covariatio i the X ad Y variable, to the idividual variability i X ad the idividual variability i Y. By covariatio we mea the amout that X ad Y vary together. So, the correlatio looks at the how much the two variables vary together relative to the amout they vary idividually. If the covariatio is large relative to the idividual variability of each variabile, the the relatioship ad the value of r is strog. A simple example might be helpful to uderstad the cocept. For this example, X is populatio desity ad Y is umber babies bor. Idividual variability i X You ca thik of a lot of differet reasos why populatio desity might vary by itself. People live i more desely populated areas for may reaso icludig job opportuities, family reasos, or climate. Idividual variability i Y You ca also thik of a lot of reasos why birth rate may vary by itself. People may be iflueced to have childre because of persoal reasos, war, or ecoomic reasos.

Covariatio of X ad Y For this example it is easy to see why we would expect X ad Y to vary together as well. No matter what the birth rate might happe to be, we would expect that more people would yield more babies beig bor. Whe we compute the correlatio coefficiet we do t have to thik of all the reasos for variables to vary or covary, but simply to measure the variability. How do we measure variability i a distributio? I hope you kow the aswer to that questio by ow. We measure variability with sums of squares (ofte expressed as variace). So, whe we compute the correlatio we will isert the sums of squares for X ad Y i the deomiator. The umerator is the covariatio of X ad Y. For this value we could multiply the variability i the X-variable times the variability i the Y-variable, but see the formula below for a easier computatio. X XY X ( X ) ( ) Y Y Y The oly ew compoet here is the sum of the products of X ad Y. Sice each uit i our sample has both ad X ad a Y value, you will multiply these two umbers together for each uit i your sample. The add the values you multiplied together. See the example below as well. Example Problem The followig example icludes the chages we will eed to make for hypothesis testig with the correlatio coefficiet, as well as a example of how to do the computatios. Below are the data for six participats givig their umber of years i college (X) ad their subsequet yearly icome (Y). Icome here is i thousads of dollars, but this fact does ot require ay chages i our computatios. Test whether there is a relatioship with Alpha =.05. # of Years of College Icome X Y X Y XY 0 15 0 5 0 1 15 1 5 15 3 0 9 400 0 4 5 1 5 100 4 30 1 900 10 35 3 15 10 ΣX = 18 ΣY = 140 ΣX = 78 ΣY = 300 ΣXY = 505

Notice that I have icluded the computatio for obtaiig the summary values for you for completeess. Be sure you kow how to obtai all the summed values, as they will ot always be give o the exam. Step 1: State the Hypotheses i Words ad Symbols H 1 The correlatio betwee years of educatio ad icome is equal to zero i the populatio. H 0 : The correlatio betwee years of educatio ad icome ot equal to zero i the populatio. As usual the ull states that there is o effect or o relatioship, ad the research hypothesis states that there is a effect. Whe we write them i symbols we will use the Greek letter rho (ρ) to idicate the correlatio i the populatio. Thus: H 1 ρ 0 H 0 : ρ = 0 Step : Fid the Critical Value Agai, we will use a table to fid the critical value i Appedix A of your book. Locate the table, ad fid the degrees of freedom for the appropriate test to fid the critical value. For this test df =, where is the umber of pairs of scores we have. Df = = 4 r critical = + 0.811 Step 3: Ru the Statistical Test X XY X ( X ) ( ) Y Y Y (18)(140) 505 18 78 140 300

50 505 34 1900 78 300 505 40 [ 78 54][ 300 3.7] 85 (4)(333.33) = 85 7999.9 = 85 89.44 =.95 Step 4: Make a Decisio about the Null Reject the ull Sice the value we computed i Step 3 is larger tha the critical value i Step, we reject the ull. Step 5: Write a Coclusio There is a relatioship betwee years spet i college ad icome. The more years of school, the more the subsequet icome. r Ofte times we will square the r-value we compute i order to get a measure of the size of the effect. Just like with eta-square i ANOVA, we will compute the percetage of variability i Y, that is accouted for by X. For the curret example r =.90, so 90% of the variability i icome is accouted for by educatio.