1 SIMPLE LINEAR CORRELATION Smple lnear correlaton s a measure of the degree to whch two varables vary together, or a measure of the ntensty of the assocaton between two varables. Correlaton often s abused. You need to show that one varable actually s affectng another varable. The parameter beng measure s D (rho) and s estmated by the statstc r, the correlaton coeffcent. r can range from -1 to 1, and s ndependent of unts of measurement. The strength of the assocaton ncreases as r approaches the absolute value of 1.0 A value of 0 ndcates there s no assocaton between the two varables tested. A better estmate of r usually can be obtaned by calculatng r on treatment means averaged across replcates. Correlaton does not have to be performed only between ndependent and dependent varables. Correlaton can be done on two dependent varables. The X and Y n the equaton to determne r do not necessarly correspond between a ndependent and dependent varable, respectvely. Scatter plots are a useful means of gettng a better understandng of your data Postve assocaton Negatve assocaton No assocaton 1

2 The formula for r s: r XY XY - n SSCP ( X X) ( Y Y) (SSX)(SSY) Example X Y XY X 76 3Y 367 3XY 1,383 3X 16,3 3Y 8,833 n Step 1. Calculate SSCP (76)(367) SSCP 1, Step. Calculate SS X 76 SS X 16, Step 3. Calculate SS Y 367 SS Y 8, Step 4. Calculate the correlaton coeffcent r SSCP r (SSX)(SSY) (996.8)(189.)

3 Testng the Hypothess That an Assocaton Between X and Y Exsts To determne f an assocaton between two varables exsts as determned usng correlaton, the followng hypotheses are tested: H o : D 0 H A : D 0 Notce that ths correlaton s testng to see f r s sgnfcantly dfferent from zero,.e., there s an assocaton between the two varables evaluated. You are not testng to determne f there s a SIGNIFICANT CORRELATION. Ths cannot be tested. Crtcal or tabular values of r to test the hypothess H o : D 0 can be found n the table on the followng page. The df are equal to n- The number of ndependent varables wll equal one for all smple lnear correlaton. The tabular r value, r.0, 3 df Because the calculated r (.818) s less than the table r value (.878), we fal to reject H o : D 0 at the 9% level of confdence. We can conclude that there s no assocaton between X and Y. In ths example, t would appear that the assocaton between X and Y s strong because the r value s farly hgh. Yet, the test of H o : D 0 ndcates that there s not a lnear relatonshp. Ponts to Consder 1. The tabular r values are hghly dependent on n, the number of observatons.. As n ncreases, the tabular r value decreases. 3. We are more lkely to reject H o : D 0 as n ncreases. 4. As n approaches 100, the r value to reject H o : D 0 becomes farly small. Too many people abuse correlaton by not reportng the r value and statng ncorrectly that there s a sgnfcant correlaton. The falure to accept H o : D 0 says nothng about the strength of the assocaton between the two varables measured. 3

4 4

5 Example. The correlaton coeffcent squared equals the coeffcent of determnaton. Yet, you need to be careful f you decde to calculatng r by takng the square root of the coeffcent of determnaton. You may not have the correct sgn s there s a negatve assocaton between the two varables. Assume X s the ndependent varable and Y s the dependent varable, n 10, and the correlaton between the two varables s r Ths value of r s sgnfcantly dfferent from zero at the 99% level of confdence. Calculatng r usng r, , we fnd that 9% of the varaton n Y can be explaned by havng X n the model. Ths ndcates that even though the r value s sgnfcantly dfferent from zero, the assocaton between X and Y s weak. Some people feel the coeffcent of determnaton needs to be greater that 0.0 (.e. r 0.71) before the relatonshp between X an Y s very meanngful. Calculatng r Combned Across Experments, Locatons, Runs, etc. Ths s another area where correlaton s abused. When calculatng the pooled correlaton across experments, you cannot just put the data nto one data set and calculate r drectly. The value of r that wll be calculated s not a relable estmate of D. A better method of estmatng D would be to: 1. Calculate a value of r for each envronment, and. Average the r values across envronments. The proper method of calculatng a pooled r value s to test the homogenety of the correlaton coeffcents from the dfferent locatons. If the r values are homogenous, a pooled r value can be calculated.

6 Example The correlaton between gran yeld and kernel plumpness was 0.43 at Langdon, ND; 0.3 at Prosper, ND; and 0.7 at Carrngton, ND. There were cultvars evaluated at each locaton. Step 1. Make and complete the followng table Locaton n r Z Z -Z w (n -3)(Z - Z w ) Langdon, ND Prosper, ND Carrngton, ND n 7 Z w 0.36 P Where: Z Z χ w (1+ r ) 0.ln (1 r ) [(n [(n (n 3)Z ] 3) 3)(Z Z ) w ] df n 1for χ test Step. Look up tabular P value at the " 0.00 level. P 0.00, df 10.6 Step 3. Make conclusons Because the calculated P (0.388) s less than the table P value (10.6), we fal to reject the null hypothess that the r-values from the three locatons are equal. 6

7 Step 4. Calculate pooled r (r p ) value W Z e 1 r p Z e W + 1 Where e (0.36) e 1 Therefore rp (0.36) e + 1 Step. Determne f r p s sgnfcantly dfferent from zero usng a confdence nterval. r p ± ( n 3) CI 0.341± ± 0.41 Therefore LCI and UCI 0.8 Snce the CI does not nclude zero, we reject the hypothess that the pooled correlaton value s equal to zero. 7

