SIMPLE LINEAR CORRELATION



Similar documents
SIMPLE LINEAR CORRELATION. r can range from -1 to 1, and is independent of units of measurement. Correlation can be done on two dependent variables.

Causal, Explanatory Forecasting. Analysis. Regression Analysis. Simple Linear Regression. Which is Independent? Forecasting

PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 12

THE METHOD OF LEAST SQUARES THE METHOD OF LEAST SQUARES

CHAPTER 14 MORE ABOUT REGRESSION

CHAPTER 5 RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES

Can Auto Liability Insurance Purchases Signal Risk Attitude?

1. Measuring association using correlation and regression

benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ).

Economic Interpretation of Regression. Theory and Applications

STATISTICAL DATA ANALYSIS IN EXCEL

PRIVATE SCHOOL CHOICE: THE EFFECTS OF RELIGIOUS AFFILIATION AND PARTICIPATION

8.5 UNITARY AND HERMITIAN MATRICES. The conjugate transpose of a complex matrix A, denoted by A*, is given by

Exhaustive Regression. An Exploration of Regression-Based Data Mining Techniques Using Super Computation

Calculation of Sampling Weights

Calibration and Linear Regression Analysis: A Self-Guided Tutorial

CHOLESTEROL REFERENCE METHOD LABORATORY NETWORK. Sample Stability Protocol

How To Calculate The Accountng Perod Of Nequalty

BERNSTEIN POLYNOMIALS

An Alternative Way to Measure Private Equity Performance

High Correlation between Net Promoter Score and the Development of Consumers' Willingness to Pay (Empirical Evidence from European Mobile Markets)

Forecasting the Direction and Strength of Stock Market Movement

NPAR TESTS. One-Sample Chi-Square Test. Cell Specification. Observed Frequencies 1O i 6. Expected Frequencies 1EXP i 6

Shielding Equations and Buildup Factors Explained

Solution: Let i = 10% and d = 5%. By definition, the respective forces of interest on funds A and B are. i 1 + it. S A (t) = d (1 dt) 2 1. = d 1 dt.

5 Multiple regression analysis with qualitative information

Vasicek s Model of Distribution of Losses in a Large, Homogeneous Portfolio

Stress test for measuring insurance risks in non-life insurance

Characterization of Assembly. Variation Analysis Methods. A Thesis. Presented to the. Department of Mechanical Engineering. Brigham Young University

Traffic-light a stress test for life insurance provisions

Institute of Informatics, Faculty of Business and Management, Brno University of Technology,Czech Republic

Risk-based Fatigue Estimate of Deep Water Risers -- Course Project for EM388F: Fracture Mechanics, Spring 2008

Texas Instruments 30X IIS Calculator

Management Quality, Financial and Investment Policies, and. Asymmetric Information

7 ANALYSIS OF VARIANCE (ANOVA)

HARVARD John M. Olin Center for Law, Economics, and Business

An Evaluation of the Extended Logistic, Simple Logistic, and Gompertz Models for Forecasting Short Lifecycle Products and Services

Chapter 8 Group-based Lending and Adverse Selection: A Study on Risk Behavior and Group Formation 1

TESTING FOR EVIDENCE OF ADVERSE SELECTION IN DEVELOPING AUTOMOBILE INSURANCE MARKET. Oksana Lyashuk

Question 2: What is the variance and standard deviation of a dataset?

Answer: A). There is a flatter IS curve in the high MPC economy. Original LM LM after increase in M. IS curve for low MPC economy

Two Faces of Intra-Industry Information Transfers: Evidence from Management Earnings and Revenue Forecasts

total A A reag total A A r eag

A statistical approach to determine Microbiologically Influenced Corrosion (MIC) Rates of underground gas pipelines.

Linear Circuits Analysis. Superposition, Thevenin /Norton Equivalent circuits

v a 1 b 1 i, a 2 b 2 i,..., a n b n i.

14.74 Lecture 5: Health (2)

Scaling Models for the Severity and Frequency of External Operational Loss Data

Brigid Mullany, Ph.D University of North Carolina, Charlotte

An Analysis of the relationship between WTI term structure and oil market fundamentals in

Meta-Analysis of Hazard Ratios

A Multistage Model of Loans and the Role of Relationships

ADVERSE SELECTION IN INSURANCE MARKETS: POLICYHOLDER EVIDENCE FROM THE U.K. ANNUITY MARKET *

4 Hypothesis testing in the multiple regression model

The Use of Analytics for Claim Fraud Detection Roosevelt C. Mosley, Jr., FCAS, MAAA Nick Kucera Pinnacle Actuarial Resources Inc.

Time Value of Money. Types of Interest. Compounding and Discounting Single Sums. Page 1. Ch. 6 - The Time Value of Money. The Time Value of Money

Sulaiman Mouselli Damascus University, Damascus, Syria. and. Khaled Hussainey* Stirling University, Stirling, UK

! # %& ( ) +,../ # 5##&.6 7% 8 # #...

Part 1: quick summary 5. Part 2: understanding the basics of ANOVA 8

Description of the Force Method Procedure. Indeterminate Analysis Force Method 1. Force Method con t. Force Method con t

7.5. Present Value of an Annuity. Investigate

SPEE Recommended Evaluation Practice #6 Definition of Decline Curve Parameters Background:

Although ordinary least-squares (OLS) regression

Financial Instability and Life Insurance Demand + Mahito Okura *

Marginal Returns to Education For Teachers

Prediction of Disability Frequencies in Life Insurance

DEFINING %COMPLETE IN MICROSOFT PROJECT

The impact of hard discount control mechanism on the discount volatility of UK closed-end funds

Criminal Justice System on Crime *

1. Math 210 Finite Mathematics

1 De nitions and Censoring

Faraday's Law of Induction

Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College

Returns to Experience in Mozambique: A Nonparametric Regression Approach

Chapter 7: Answers to Questions and Problems

Latent Class Regression. Statistics for Psychosocial Research II: Structural Models December 4 and 6, 2006

Using Series to Analyze Financial Situations: Present Value

The Application of Fractional Brownian Motion in Option Pricing

A Multi-mode Image Tracking System Based on Distributed Fusion

A Novel Methodology of Working Capital Management for Large. Public Constructions by Using Fuzzy S-curve Regression

Transition Matrix Models of Consumer Credit Ratings

Fuzzy TOPSIS Method in the Selection of Investment Boards by Incorporating Operational Risks

1 Example 1: Axis-aligned rectangles

Forecasting and Stress Testing Credit Card Default using Dynamic Models

Measuring portfolio loss using approximation methods

Transcription:

SIMPLE LINEAR CORRELATION Smple lnear correlaton s a measure of the degree to whch two varables vary together, or a measure of the ntensty of the assocaton between two varables. Correlaton often s abused. You need to show that one varable actually s affectng another varable. The parameter beng measure s D (rho) and s estmated by the statstc r, the correlaton coeffcent. r can range from -1 to 1, and s ndependent of unts of measurement. The strength of the assocaton ncreases as r approaches the absolute value of 1.0 A value of 0 ndcates there s no assocaton between the two varables tested. A better estmate of r usually can be obtaned by calculatng r on treatment means averaged across replcates. Correlaton does not have to be performed only between ndependent and dependent varables. Correlaton can be done on two dependent varables. The X and Y n the equaton to determne r do not necessarly correspond between a ndependent and dependent varable, respectvely. Scatter plots are a useful means of gettng a better understandng of your data..................................... Postve assocaton Negatve assocaton No assocaton 1

The formula for r s: r XY XY - n SSCP ( X X) ( Y Y) (SSX)(SSY) Example X Y XY 41 13 73 9 693 67 7 484 37 194 8 96 68 3X 76 3Y 367 3XY 1,383 3X 16,3 3Y 8,833 n Step 1. Calculate SSCP (76)(367) SSCP 1,383 114.6 Step. Calculate SS X 76 SS X 16,3-996.8 Step 3. Calculate SS Y 367 SS Y 8,33-189. Step 4. Calculate the correlaton coeffcent r SSCP 114.6 r 0.818 (SSX)(SSY) (996.8)(189.)

Testng the Hypothess That an Assocaton Between X and Y Exsts To determne f an assocaton between two varables exsts as determned usng correlaton, the followng hypotheses are tested: H o : D 0 H A : D 0 Notce that ths correlaton s testng to see f r s sgnfcantly dfferent from zero,.e., there s an assocaton between the two varables evaluated. You are not testng to determne f there s a SIGNIFICANT CORRELATION. Ths cannot be tested. Crtcal or tabular values of r to test the hypothess H o : D 0 can be found n the table on the followng page. The df are equal to n- The number of ndependent varables wll equal one for all smple lnear correlaton. The tabular r value, r.0, 3 df 0.878 Because the calculated r (.818) s less than the table r value (.878), we fal to reject H o : D 0 at the 9% level of confdence. We can conclude that there s no assocaton between X and Y. In ths example, t would appear that the assocaton between X and Y s strong because the r value s farly hgh. Yet, the test of H o : D 0 ndcates that there s not a lnear relatonshp. Ponts to Consder 1. The tabular r values are hghly dependent on n, the number of observatons.. As n ncreases, the tabular r value decreases. 3. We are more lkely to reject H o : D 0 as n ncreases. 4. As n approaches 100, the r value to reject H o : D 0 becomes farly small. Too many people abuse correlaton by not reportng the r value and statng ncorrectly that there s a sgnfcant correlaton. The falure to accept H o : D 0 says nothng about the strength of the assocaton between the two varables measured. 3

4

Example. The correlaton coeffcent squared equals the coeffcent of determnaton. Yet, you need to be careful f you decde to calculatng r by takng the square root of the coeffcent of determnaton. You may not have the correct sgn s there s a negatve assocaton between the two varables. Assume X s the ndependent varable and Y s the dependent varable, n 10, and the correlaton between the two varables s r 0.30. Ths value of r s sgnfcantly dfferent from zero at the 99% level of confdence. Calculatng r usng r, 0.30 0.09, we fnd that 9% of the varaton n Y can be explaned by havng X n the model. Ths ndcates that even though the r value s sgnfcantly dfferent from zero, the assocaton between X and Y s weak. Some people feel the coeffcent of determnaton needs to be greater that 0.0 (.e. r 0.71) before the relatonshp between X an Y s very meanngful. Calculatng r Combned Across Experments, Locatons, Runs, etc. Ths s another area where correlaton s abused. When calculatng the pooled correlaton across experments, you cannot just put the data nto one data set and calculate r drectly. The value of r that wll be calculated s not a relable estmate of D. A better method of estmatng D would be to: 1. Calculate a value of r for each envronment, and. Average the r values across envronments. The proper method of calculatng a pooled r value s to test the homogenety of the correlaton coeffcents from the dfferent locatons. If the r values are homogenous, a pooled r value can be calculated.

Example The correlaton between gran yeld and kernel plumpness was 0.43 at Langdon, ND; 0.3 at Prosper, ND; and 0.7 at Carrngton, ND. There were cultvars evaluated at each locaton. Step 1. Make and complete the followng table Locaton n r Z Z -Z w (n -3)(Z - Z w ) Langdon, ND 0.43 0.460 0.104 0.38 Prosper, ND 0.3 0.33-0.04 0.013 Carrngton, ND 0.7 0.77-0.079 0.137 3n 7 Z w 0.36 P 0.388 Where: Z Z χ w (1+ r ) 0.ln (1 r ) [(n [(n (n 3)Z ] 3) 3)(Z Z ) w ] df n 1for χ test Step. Look up tabular P value at the " 0.00 level. P 0.00, df 10.6 Step 3. Make conclusons Because the calculated P (0.388) s less than the table P value (10.6), we fal to reject the null hypothess that the r-values from the three locatons are equal. 6

Step 4. Calculate pooled r (r p ) value W Z e 1 r p Z e W + 1 Where e.718818 (0.36) e 1 Therefore rp 0. 341 (0.36) e + 1 Step. Determne f r p s sgnfcantly dfferent from zero usng a confdence nterval. r p ± 1.96 1 ( n 3) CI 0.341± 1.96 1 66 0.341± 0.41 Therefore LCI 0.100 and UCI 0.8 Snce the CI does not nclude zero, we reject the hypothess that the pooled correlaton value s equal to zero. 7