SIMPLE LINEAR CORRELATION Smple lnear correlaton s a measure of the degree to whch two varables vary together, or a measure of the ntensty of the assocaton between two varables. Correlaton often s abused. You need to show that one varable actually s affectng another varable. The parameter beng measure s D (rho) and s estmated by the statstc r, the correlaton coeffcent. r can range from -1 to 1, and s ndependent of unts of measurement. The strength of the assocaton ncreases as r approaches the absolute value of 1.0 A value of 0 ndcates there s no assocaton between the two varables tested. A better estmate of r usually can be obtaned by calculatng r on treatment means averaged across replcates. Correlaton does not have to be performed only between ndependent and dependent varables. Correlaton can be done on two dependent varables. The X and Y n the equaton to determne r do not necessarly correspond between a ndependent and dependent varable, respectvely. Scatter plots are a useful means of gettng a better understandng of your data..................................... Postve assocaton Negatve assocaton No assocaton 1
The formula for r s: r XY XY - n SSCP ( X X) ( Y Y) (SSX)(SSY) Example X Y XY 41 13 73 9 693 67 7 484 37 194 8 96 68 3X 76 3Y 367 3XY 1,383 3X 16,3 3Y 8,833 n Step 1. Calculate SSCP (76)(367) SSCP 1,383 114.6 Step. Calculate SS X 76 SS X 16,3-996.8 Step 3. Calculate SS Y 367 SS Y 8,33-189. Step 4. Calculate the correlaton coeffcent r SSCP 114.6 r 0.818 (SSX)(SSY) (996.8)(189.)
Testng the Hypothess That an Assocaton Between X and Y Exsts To determne f an assocaton between two varables exsts as determned usng correlaton, the followng hypotheses are tested: H o : D 0 H A : D 0 Notce that ths correlaton s testng to see f r s sgnfcantly dfferent from zero,.e., there s an assocaton between the two varables evaluated. You are not testng to determne f there s a SIGNIFICANT CORRELATION. Ths cannot be tested. Crtcal or tabular values of r to test the hypothess H o : D 0 can be found n the table on the followng page. The df are equal to n- The number of ndependent varables wll equal one for all smple lnear correlaton. The tabular r value, r.0, 3 df 0.878 Because the calculated r (.818) s less than the table r value (.878), we fal to reject H o : D 0 at the 9% level of confdence. We can conclude that there s no assocaton between X and Y. In ths example, t would appear that the assocaton between X and Y s strong because the r value s farly hgh. Yet, the test of H o : D 0 ndcates that there s not a lnear relatonshp. Ponts to Consder 1. The tabular r values are hghly dependent on n, the number of observatons.. As n ncreases, the tabular r value decreases. 3. We are more lkely to reject H o : D 0 as n ncreases. 4. As n approaches 100, the r value to reject H o : D 0 becomes farly small. Too many people abuse correlaton by not reportng the r value and statng ncorrectly that there s a sgnfcant correlaton. The falure to accept H o : D 0 says nothng about the strength of the assocaton between the two varables measured. 3
4
Example. The correlaton coeffcent squared equals the coeffcent of determnaton. Yet, you need to be careful f you decde to calculatng r by takng the square root of the coeffcent of determnaton. You may not have the correct sgn s there s a negatve assocaton between the two varables. Assume X s the ndependent varable and Y s the dependent varable, n 10, and the correlaton between the two varables s r 0.30. Ths value of r s sgnfcantly dfferent from zero at the 99% level of confdence. Calculatng r usng r, 0.30 0.09, we fnd that 9% of the varaton n Y can be explaned by havng X n the model. Ths ndcates that even though the r value s sgnfcantly dfferent from zero, the assocaton between X and Y s weak. Some people feel the coeffcent of determnaton needs to be greater that 0.0 (.e. r 0.71) before the relatonshp between X an Y s very meanngful. Calculatng r Combned Across Experments, Locatons, Runs, etc. Ths s another area where correlaton s abused. When calculatng the pooled correlaton across experments, you cannot just put the data nto one data set and calculate r drectly. The value of r that wll be calculated s not a relable estmate of D. A better method of estmatng D would be to: 1. Calculate a value of r for each envronment, and. Average the r values across envronments. The proper method of calculatng a pooled r value s to test the homogenety of the correlaton coeffcents from the dfferent locatons. If the r values are homogenous, a pooled r value can be calculated.
Example The correlaton between gran yeld and kernel plumpness was 0.43 at Langdon, ND; 0.3 at Prosper, ND; and 0.7 at Carrngton, ND. There were cultvars evaluated at each locaton. Step 1. Make and complete the followng table Locaton n r Z Z -Z w (n -3)(Z - Z w ) Langdon, ND 0.43 0.460 0.104 0.38 Prosper, ND 0.3 0.33-0.04 0.013 Carrngton, ND 0.7 0.77-0.079 0.137 3n 7 Z w 0.36 P 0.388 Where: Z Z χ w (1+ r ) 0.ln (1 r ) [(n [(n (n 3)Z ] 3) 3)(Z Z ) w ] df n 1for χ test Step. Look up tabular P value at the " 0.00 level. P 0.00, df 10.6 Step 3. Make conclusons Because the calculated P (0.388) s less than the table P value (10.6), we fal to reject the null hypothess that the r-values from the three locatons are equal. 6
Step 4. Calculate pooled r (r p ) value W Z e 1 r p Z e W + 1 Where e.718818 (0.36) e 1 Therefore rp 0. 341 (0.36) e + 1 Step. Determne f r p s sgnfcantly dfferent from zero usng a confdence nterval. r p ± 1.96 1 ( n 3) CI 0.341± 1.96 1 66 0.341± 0.41 Therefore LCI 0.100 and UCI 0.8 Snce the CI does not nclude zero, we reject the hypothess that the pooled correlaton value s equal to zero. 7