CORRELATION AND SIMPLE REGRESSION ANALYSIS USING SAS IN DAIRY SCIENCE


 Carmella Robertson
 1 years ago
 Views:
Transcription
1 CORRELATION AND SIMPLE REGRESSION ANALYSIS USING SAS IN DAIRY SCIENCE A. K. Gupta, Vipul Sharma and M. Manoj NDRI, Karnal When analyzing farm records, simple descriptive statistics can reveal a great deal of information. However, it is often more important to examine relationships within the data, especially in medical and social sciences. In a livestock farm, data on several interrelated variable (economic traits) are being generated regularly. In such situations it is quite often the interest of the farm manager to understand the behavior of these variables in terms of exploring the relationship between these variables as well as to quantify the degree of association using the concept of correlation and regression. Through correlation measures and hypothesis testing, these relationships can be studied in depth, limited only by the data available to the researcher. Keeping in view the importance of correlation and regression in analysis of farm data, an attempt has been made to explain these tools with a statistical background and programming examples. CORRELATION Association as measured by correlation exists when two variables have a linear relationship beyond what is expected by chance merely. When examining data in SAS, correlation reveals itself by the relationship between two variables in a dataset. The most common measure of correlation is called the Pearson ProductMoment Correlation Coefficient. It is important to note that while more than two variables can be analyzed when looking for correlation, the correlation measure only applies to two variables at a time. The symbol for the sample correlation is r and for the entire population is ρ r xy ( X ( X X )( Y Y ) X ) It is apparent when examining the definition of correlation that measures from only two variables are included, namely the covariance between the two variables {cov(x,y)} and the standard deviation of each (σ x σ y ). The result of this calculation is the correlation between the two variables. The correlation coefficient as a measure of association ranges from 1 to 1. A value of 1 represents a perfect negative correlation, while a value of 1 represents a perfect positive correlation. The closer a correlation measure is to these extremes, the stronger the correlation between the two variables. A value of zero means that no correlation is observed. It is important to note that a correlation measure of zero does not necessarily mean that there is no relationship between the two variables, just that there is no linear relationship present in the data that is being analyzed. It is also sometimes difficult to judge whether a correlation measure is high or low. There are certain situations where a correlation measure of 0.3, for example, may be considered negligible. In other circumstances, such as in the social sciences, a 0.3 correlation measure may suggest that further examination is needed. As with all data analysis, the context of the data must be understood in order to evaluate any result. 2 ( Y Y ) 2
2 Spearman s Rank Correlation Coefficient: This method is based on the ranks of the items rather than their actual values. The advantage of this method over the others is that it can be used even when the actual values of items are unknown, e.g., to determine correlation between the degrees of agreement between the judgments of two judges. The formula is: 6 d i i 1 R 1 2 n ( n 1) where R = rank correlation coefficient d i = difference between the ranks of two items n = the number of observations. It is important to note that a strong (or even perfect) correlation does not imply causation, as other variables may be affecting the relationship between the two variables of interest. CORRELATION: SYNTAX In order to measure correlation in SAS, the proc corr procedure can be used. This procedure will provide correlation measures for multiple variables, in a crosstabular format. The syntax for the procedure is as follows: proc corr data=dataset; by byvars; freq freqvar; partial parvar; var varlist; weight weightvar; with variables run; where: dataset is the name of the dataset to be analyzed, either temporary or permanent. by statement produce separate correlation analyses for each BY group. freq statement identifies a variable whose values represent the frequency of each observation. partial statement identifies controlling variables to compute Pearson, Spearman, or Kendall partial correlation coefficients. n var statement identifies variables to correlate and their order in the correlation matrix. weight statement identifies a variable whose values weight each observation to compute Pearson weight productmoment correlation. 2
3 with statement compute correlations for specific combinations of variables. Note that there are more options that can be used with this procedure for less common (but still useful) correlation measurements. CORRELATION: EXAMPLE During this analysis we will be using Body_weight SAS Data set which consist of following variables WT_FC (Weight at First Calving), AFC (Age at First Calving), FLY (Milk Yield at First Lactation), FLL (First Lactation Length), FCI (First Calving Interval) and FSP (First Service Period), etc. with 40 observations (a subset of which is given below). Anim_no BW WT_6 M WT_12 M WT_18 M WT_24 M WT_30 M WT_FC AFC FLY FLL FCI FSP CORRELATION: OUTPUT The output produced by proc corr reveals a great deal of useful information. The first information displayed is a list of the variables included in this analysis. This is especially useful when no variables were included in the var statement, so all numeric variables were included. Next is a list of simple statistics for each variable. This list contains the number of observations, mean, standard deviation, sum, minimum and maximum. After this section, each variable in the analysis and their label is listed. Finally, the correlation measures are presented. Unless a different correlation measure is requested, this section will be labeled Pearson Correlation Coefficients. Results are provided in a crosstabular format, with values of one on the diagonal (a variable will always have a perfect positive correlation with itself). Along with the correlation coefficients, pvalues are listed, as are the number of observations (if different).
4 SAS Procedure for correlation coefficient proc corr data=body_wts; var FLY; with AFC FLL FSP FCI; run; SAS Output The SAS System The CORR Procedure 4 With Variables: AFC FLL FSP FCI 1 Variables: FLY Simple Statistics Variable N Mean Std Dev Sum Minimum Maximum Label AFC AFC FLL FLL FSP FSP FCI FCI FLY FLY Pearson Correlation Coefficients, N = 40 Prob > r under H0: Rho=0 FLY AFC AFC FLL FLL <.0001 FSP FSP FCI FCI
5 REGRESSION ANALYSIS Correlation gives us an idea of magnitude and direction of association between correlated variables. A statistical procedure called regression is concerned with causation in a relationship among variables. It assesses the contribution of one or more variable called causing variable or independent variable or one, which is being caused (dependent variable). When there is only one independent variable then the relationship is expressed by a straight line. This procedure is called simple linear regression. Regression analysis allows multiple variables to be examined simultaneously. The most widely used method of regression analysis is Ordinary Least Squares (OLS) analysis. OLS works by creating a best fit trend line through all of the available data points. First, the variables to be included in the analysis must be chosen, and incorporated into the appropriate model (in this case, a linear model): Y = β + β (x ) + ε where: Y is the dependent variable. x 1 is the independent variable. β 0 is the intercept. β 1 is the regression coefficient. ε is the error. Next, a testable hypothesis must be developed: H : β 1 = 0 0 H : β Therefore, if the analysis finds that the null hypothesis can be rejected (i.e. that the coefficient of interest does not in fact equal zero), then that variable has a significant effect on the dependent variable (Y). REGRESSION ANALYSIS: SYNTAX In order to perform regression analysis in SAS, the proc reg procedure can be used. This procedure will provide regression analysis (OLS) measures for multiple variables, in a crosstabular format. The syntax for the procedure is as follows: proc reg data=dataset; by byvars; model depvar=indepvars; freq freqvar; weight weightvar; run; quit;
6 where: dataset is the name of the dataset to be analyzed, either temporary or permanent. byvars is a list of all variables to be used to create by groups for processing. This option is common among most procedures. depvar is the name of the dependent variable to be used in the analysis (Y above). indepvars is a list of all independent variables to be used in the analysis (x xabove). 1 n freqvar is the numeric variable which contains the number of times an observation is to be counted for the analysis. Similar to weightvar. weightvar is the numeric variable which contains the weight for each observation. Similar to freqvar. Note that there are more options that can be used with this procedure for less common (but still useful) correlation measurements. SAS Procedure for regression proc reg data=body_wts; model AFC = WT_30M; run; SAS Output The SAS System The REG Procedure Model: MODEL1 Dependent Variable: AFC AFC Number of Observations Read 40 Number of Observations Used 40 Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model Error Corrected Total Root MSE RSquare Dependent Mean Adj RSq Coeff Var Parameter Estimates Parameter Standard Variable Label DF Estimate Error t Value Pr > t Intercept Intercept <.0001 WT_30M WT_30M
7 REGRESSION ANALYSIS ASSUMPTIONS In earlier topics, we learned how to do ordinary linear regression with SAS, concluding with methods for examining the distribution of variables to check for nonnormally distributed variables as a first look at checking assumptions in regression. Without verifying that data have met the regression assumptions, the results may be misleading. This section will explore how one can use SAS to test whether the data meet the assumptions of linear regression. In particular, we will consider the following assumption: Linearity  the relationships between the predictors and the outcome variable should be linear. Normality  the errors should be normally distributed  technically normality is necessary only for the ttests to be valid, estimation of the coefficients only requires that the errors be identically and independently distributed. Homogeneity of Variance (Homoscedasticity)  the error variance should be constant. Independence  the errors associated with one observation are not correlated with the errors of any other observation. Errors In Variables  predictor variables are measured without error. Model Specification  the model should be properly specified (including all relevant variables, and excluding irrelevant variables). Additionally, there are issues that can arise during the analysis that, while strictly speaking, are not assumptions of regression, are none the less, of great concern to regression analysts. Influence  individual observations that exert undue influence on the coefficients Collinearity  predictors that are highly collinear, i.e. linearly related, can cause problems in estimating the regression coefficients. Many graphical methods and numerical tests have been developed over the years for regression diagnostics. In this chapter, we will explore these methods and show how to verify regression assumptions and detect potential problems using SAS. UNUSUAL AND INFLUENTIAL DATA A single observation that is substantially different from all other observations can make a large difference in the results of your regression analysis. If a single observation (or small group of observations) substantially changes your results, you would want to know about this and investigate further. There are three ways that an observation can be unusual. Outliers: In linear regression, an outlier is an observation with large residual. In other words, it is an observation whose dependentvariable value is unusual given its values on the predictor variables. An outlier may indicate a sample peculiarity or may indicate a data entry error or other problem. Leverage: An observation with an extreme value on a predictor variable is called a point with high leverage. Leverage is a measure of how far an observation deviates from the mean of that variable. These leverage points can have an effect on the estimate of regression coefficients. Influence: An observation is said to be influential if removing the observation substantially changes the estimate of coefficients. Influence can be thought of as the product of leverage and outlierness.
8 HOW TO USE RESIDUALS FOR DIAGNOSTICS? Residual analysis is usually done graphically. We may look at 1. Quantile plots: to assess normality 2. Scatter plots: to assess model assumptions, such as constant variance and linearity, and to identify potential outliers 3. Histograms, stem and leaf diagrams and box plots Other regression diagnostics for identifying outliers: 1. Studentized residual (RS): The SR values are obtained by dividing the residuals by their standard error. The suggested cutoffs are : SR > 2 for data sets with small number of observations SR > 3 for data sets with large number of observations 2. RStudent residuals : These are similar to RS except that these are calculated after deleting the i th observation i.e,.the difference between the observed Y and predicted value of Y after after excluding this observation from analysis. 3. Cook's D: It is a measure of the simultaneous change in the parameter estimates when an observation is deleted from the analysis. Cook's distance measures the effect of deleting a given observation. Data points with large residuals (outliers) and/or high leverage may distort the outcome and accuracy of a regression. Points with a large Cook's distance are considered to merit closer examination in the analysis. A suggested cutoff is D i > 4/n. 4. Dffit: It is defined as the change ("DFFIT"), in the predicted value for a point, obtained when that point is left out of the regression, "Studentized" by dividing by the estimated standard deviation of the fit at that point: where and are the prediction for point i with and without point i included in the regression, s (i) is the standard error estimated without the point in question, and h ii is the leverage for the point. 5. Dfbetas: The DFBETAS statistics are the scaled measures of the change in each parameter estimate and are calculated by deleting the i th observation. SAS Procedure for regression with graphics ods graphics / imagemap=on; ods html gpath=&outpath path=&outpath file='reg.html'; proc reg data=body_wts plots(only)= (RSTUDENTBYPREDICTED(LABEL) COOKSD(LABEL) DFFITS(LABEL) DFBETAS(LABEL)); model FLY = AFC; run; ods html close;
9 SAS Output The SAS System The REG Procedure Model: MODEL1 Dependent Variable: FLY FLY Number of Observations Read 40 Number of Observations Used 40 Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model Error Corrected Total Root MSE RSquare Dependent Mean Adj RSq Coeff Var Parameter Estimates Parameter Standard Variable Label DF Estimate Error t Value Pr > t Intercept Intercept AFC AFC
10
11 Source: Statistics I: Introduction to ANOVA, Regression and logistic regression Course Notes SAS Institute Inc. Cary, NC, USA. (
Chapter 7: Simple linear regression Learning Objectives
Chapter 7: Simple linear regression Learning Objectives Reading: Section 7.1 of OpenIntro Statistics Video: Correlation vs. causation, YouTube (2:19) Video: Intro to Linear Regression, YouTube (5:18) 
More informationOutline. Topic 4  Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares
Topic 4  Analysis of Variance Approach to Regression Outline Partitioning sums of squares Degrees of freedom Expected mean squares General linear test  Fall 2013 R 2 and the coefficient of correlation
More informationSIMPLE LINEAR CORRELATION. r can range from 1 to 1, and is independent of units of measurement. Correlation can be done on two dependent variables.
SIMPLE LINEAR CORRELATION Simple linear correlation is a measure of the degree to which two variables vary together, or a measure of the intensity of the association between two variables. Correlation
More informationFigure 1. Figure 2. Figure 3. Figure 4. normality such as nonparametric procedures.
Introduction to Building a Linear Regression Model Leslie A. Christensen The Goodyear Tire & Rubber Company, Akron Ohio Abstract This paper will explain the steps necessary to build a linear regression
More informationNCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )
Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates
More informationUsing JMP with a Specific
1 Using JMP with a Specific Example of Regression Ying Liu 10/21/ 2009 Objectives 2 Exploratory data analysis Simple liner regression Polynomial regression How to fit a multiple regression model How to
More informationChapter 13 Introduction to Linear Regression and Correlation Analysis
Chapter 3 Student Lecture Notes 3 Chapter 3 Introduction to Linear Regression and Correlation Analsis Fall 2006 Fundamentals of Business Statistics Chapter Goals To understand the methods for displaing
More informationData Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression
Data Mining and Data Warehousing Henryk Maciejewski Data Mining Predictive modelling: regression Algorithms for Predictive Modelling Contents Regression Classification Auxiliary topics: Estimation of prediction
More informationStepwise Regression. Chapter 311. Introduction. Variable Selection Procedures. Forward (StepUp) Selection
Chapter 311 Introduction Often, theory and experience give only general direction as to which of a pool of candidate variables (including transformed variables) should be included in the regression model.
More informationSimple Linear Regression Inference
Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation
More informationIntroduction to Quantitative Methods
Introduction to Quantitative Methods October 15, 2009 Contents 1 Definition of Key Terms 2 2 Descriptive Statistics 3 2.1 Frequency Tables......................... 4 2.2 Measures of Central Tendencies.................
More informationUSING SAS/STAT SOFTWARE'S REG PROCEDURE TO DEVELOP SALES TAX AUDIT SELECTION MODELS
USING SAS/STAT SOFTWARE'S REG PROCEDURE TO DEVELOP SALES TAX AUDIT SELECTION MODELS Kirk L. Johnson, Tennessee Department of Revenue Richard W. Kulp, David Lipscomb College INTRODUCTION The Tennessee Department
More informationChapter 9. Section Correlation
Chapter 9 Section 9.1  Correlation Objectives: Introduce linear correlation, independent and dependent variables, and the types of correlation Find a correlation coefficient Test a population correlation
More informationBasic Statistical and Modeling Procedures Using SAS
Basic Statistical and Modeling Procedures Using SAS OneSample Tests The statistical procedures illustrated in this handout use two datasets. The first, Pulse, has information collected in a classroom
More informationTechnology StepbyStep Using StatCrunch
Technology StepbyStep Using StatCrunch Section 1.3 Simple Random Sampling 1. Select Data, highlight Simulate Data, then highlight Discrete Uniform. 2. Fill in the following window with the appropriate
More informationMultiple Linear Regression
Multiple Linear Regression A regression with two or more explanatory variables is called a multiple regression. Rather than modeling the mean response as a straight line, as in simple regression, it is
More informationRegression of Systolic Blood Pressure on Age, Weight & Cholesterol
Regression of Systolic Blood Pressure on Age, Weight & Cholesterol 1 * bp.sas; 2 options ls=120 ps=75 nocenter nodate; 3 title Regression of Systolic Blood Pressure on Age, Weight & Cholesterol ; 4 * BP
More informationRegression III: Advanced Methods
Lecture 11: Outliers and Influential data Regression III: Advanced Methods William G. Jacoby Michigan State University http://polisci.msu.edu/jacoby/icpsr/regress3 Outlying Observations: Why pay attention?
More informationEXPLORATORY DATA ANALYSIS: GETTING TO KNOW YOUR DATA
EXPLORATORY DATA ANALYSIS: GETTING TO KNOW YOUR DATA Michael A. Walega Covance, Inc. INTRODUCTION In broad terms, Exploratory Data Analysis (EDA) can be defined as the numerical and graphical examination
More informationAdditional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jintselink/tselink.htm
Mgt 540 Research Methods Data Analysis 1 Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jintselink/tselink.htm http://web.utk.edu/~dap/random/order/start.htm
More informationBasic Statistics and Data Analysis for Health Researchers from Foreign Countries
Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Volkert Siersma siersma@sund.ku.dk The Research Unit for General Practice in Copenhagen Dias 1 Content Quantifying association
More informationSIMPLE REGRESSION ANALYSIS
SIMPLE REGRESSION ANALYSIS Introduction. Regression analysis is used when two or more variables are thought to be systematically connected by a linear relationship. In simple regression, we have only two
More informationX X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)
CORRELATION AND REGRESSION / 47 CHAPTER EIGHT CORRELATION AND REGRESSION Correlation and regression are statistical methods that are commonly used in the medical literature to compare two or more variables.
More informationMultiple Regression in SPSS This example shows you how to perform multiple regression. The basic command is regression : linear.
Multiple Regression in SPSS This example shows you how to perform multiple regression. The basic command is regression : linear. In the main dialog box, input the dependent variable and several predictors.
More informationGetting Correct Results from PROC REG
Getting Correct Results from PROC REG Nathaniel Derby, Statis Pro Data Analytics, Seattle, WA ABSTRACT PROC REG, SAS s implementation of linear regression, is often used to fit a line without checking
More informationSection 14 Simple Linear Regression: Introduction to Least Squares Regression
Slide 1 Section 14 Simple Linear Regression: Introduction to Least Squares Regression There are several different measures of statistical association used for understanding the quantitative relationship
More informationYiming Peng, Department of Statistics. February 12, 2013
Regression Analysis Using JMP Yiming Peng, Department of Statistics February 12, 2013 2 Presentation and Data http://www.lisa.stat.vt.edu Short Courses Regression Analysis Using JMP Download Data to Desktop
More informationSimple linear regression
Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between
More informationAP Statistics 2001 Solutions and Scoring Guidelines
AP Statistics 2001 Solutions and Scoring Guidelines The materials included in these files are intended for noncommercial use by AP teachers for course and exam preparation; permission for any other use
More informationVariables and Data A variable contains data about anything we measure. For example; age or gender of the participants or their score on a test.
The Analysis of Research Data The design of any project will determine what sort of statistical tests you should perform on your data and how successful the data analysis will be. For example if you decide
More informationThe aspect of the data that we want to describe/measure is the degree of linear relationship between and The statistic r describes/measures the degree
PS 511: Advanced Statistics for Psychological and Behavioral Research 1 Both examine linear (straight line) relationships Correlation works with a pair of scores One score on each of two variables ( and
More information2. Simple Linear Regression
Research methods  II 3 2. Simple Linear Regression Simple linear regression is a technique in parametric statistics that is commonly used for analyzing mean response of a variable Y which changes according
More informationSection 3 Part 1. Relationships between two numerical variables
Section 3 Part 1 Relationships between two numerical variables 1 Relationship between two variables The summary statistics covered in the previous lessons are appropriate for describing a single variable.
More informationIntroduction to Stata
Introduction to Stata September 23, 2014 Stata is one of a few statistical analysis programs that social scientists use. Stata is in the midrange of how easy it is to use. Other options include SPSS,
More informationPredictor Coef StDev T P Constant 970667056 616256122 1.58 0.154 X 0.00293 0.06163 0.05 0.963. S = 0.5597 RSq = 0.0% RSq(adj) = 0.
Statistical analysis using Microsoft Excel Microsoft Excel spreadsheets have become somewhat of a standard for data storage, at least for smaller data sets. This, along with the program often being packaged
More informationInternational Statistical Institute, 56th Session, 2007: Phil Everson
Teaching Regression using American Football Scores Everson, Phil Swarthmore College Department of Mathematics and Statistics 5 College Avenue Swarthmore, PA198, USA Email: peverso1@swarthmore.edu 1. Introduction
More informatione = random error, assumed to be normally distributed with mean 0 and standard deviation σ
1 Linear Regression 1.1 Simple Linear Regression Model The linear regression model is applied if we want to model a numeric response variable and its dependency on at least one numeric factor variable.
More informationIAPRI Quantitative Analysis Capacity Building Series. Multiple regression analysis & interpreting results
IAPRI Quantitative Analysis Capacity Building Series Multiple regression analysis & interpreting results How important is Rsquared? Rsquared Published in Agricultural Economics 0.45 Best article of the
More informationRegression Analysis. Data Calculations Output
Regression Analysis In an attempt to find answers to questions such as those posed above, empirical labour economists use a useful tool called regression analysis. Regression analysis is essentially a
More informationInferential Statistics
Inferential Statistics Sampling and the normal distribution Zscores Confidence levels and intervals Hypothesis testing Commonly used statistical methods Inferential Statistics Descriptive statistics are
More informationLesson Lesson Outline Outline
Lesson 15 Linear Regression Lesson 15 Outline Review correlation analysis Dependent and Independent variables Least Squares Regression line Calculating l the slope Calculating the Intercept Residuals and
More informationSPSS Guide: Regression Analysis
SPSS Guide: Regression Analysis I put this together to give you a stepbystep guide for replicating what we did in the computer lab. It should help you run the tests we covered. The best way to get familiar
More informationPlease follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software
STATA Tutorial Professor Erdinç Please follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software 1.Wald Test Wald Test is used
More informationStatistics 104 Final Project A Culture of Debt: A Study of Credit Card Spending in America TF: Kevin Rader Anonymous Students: LD, MH, IW, MY
Statistics 104 Final Project A Culture of Debt: A Study of Credit Card Spending in America TF: Kevin Rader Anonymous Students: LD, MH, IW, MY ABSTRACT: This project attempted to determine the relationship
More informationData Analysis. Lecture Empirical Model Building and Methods (Empirische Modellbildung und Methoden) SS Analysis of Experiments  Introduction
Data Analysis Lecture Empirical Model Building and Methods (Empirische Modellbildung und Methoden) Prof. Dr. Dr. h.c. Dieter Rombach Dr. Andreas Jedlitschka SS 2014 Analysis of Experiments  Introduction
More informationIntroduction to Regression and Data Analysis
Statlab Workshop Introduction to Regression and Data Analysis with Dan Campbell and Sherlock Campbell October 28, 2008 I. The basics A. Types of variables Your variables may take several forms, and it
More informationMULTIPLE REGRESSION EXAMPLE
MULTIPLE REGRESSION EXAMPLE For a sample of n = 166 college students, the following variables were measured: Y = height X 1 = mother s height ( momheight ) X 2 = father s height ( dadheight ) X 3 = 1 if
More informationUsing Minitab for Regression Analysis: An extended example
Using Minitab for Regression Analysis: An extended example The following example uses data from another text on fertilizer application and crop yield, and is intended to show how Minitab can be used to
More informationcontaining Kendall correlations; and the OUTH = option will create a data set containing Hoeffding statistics.
Getting Correlations Using PROC CORR Correlation analysis provides a method to measure the strength of a linear relationship between two numeric variables. PROC CORR can be used to compute Pearson productmoment
More informationREGRESSION LINES IN STATA
REGRESSION LINES IN STATA THOMAS ELLIOTT 1. Introduction to Regression Regression analysis is about eploring linear relationships between a dependent variable and one or more independent variables. Regression
More information6 Variables: PD MF MA K IAH SBS
options pageno=min nodate formdlim=''; title 'Canonical Correlation, Journal of Interpersonal Violence, 10: 354366.'; data SunitaPatel; infile 'C:\Users\Vati\Documents\StatData\Sunita.dat'; input Group
More informationHYPOTHESIS TESTING: CONFIDENCE INTERVALS, TTESTS, ANOVAS, AND REGRESSION
HYPOTHESIS TESTING: CONFIDENCE INTERVALS, TTESTS, ANOVAS, AND REGRESSION HOD 2990 10 November 2010 Lecture Background This is a lightning speed summary of introductory statistical methods for senior undergraduate
More informationRidge Regression. Chapter 335. Introduction. Multicollinearity. Effects of Multicollinearity. Sources of Multicollinearity
Chapter 335 Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates are unbiased, but their variances
More informationABSTRACT INTRODUCTION READING THE DATA SESUG 2012. Paper PO14
SESUG 2012 ABSTRACT Paper PO14 Spatial Analysis of Gastric Cancer in Costa Rica using SAS So Young Park, North Carolina State University, Raleigh, NC Marcela AlfaroCordoba, North Carolina State University,
More informationAn Introduction to Statistical Tests for the SAS Programmer Sara Beck, Fred Hutchinson Cancer Research Center, Seattle, WA
ABSTRACT An Introduction to Statistical Tests for the SAS Programmer Sara Beck, Fred Hutchinson Cancer Research Center, Seattle, WA Often SAS Programmers find themselves in situations where performing
More information5. Linear Regression
5. Linear Regression Outline.................................................................... 2 Simple linear regression 3 Linear model............................................................. 4
More informationRegression Analysis (Spring, 2000)
Regression Analysis (Spring, 2000) By Wonjae Purposes: a. Explaining the relationship between Y and X variables with a model (Explain a variable Y in terms of Xs) b. Estimating and testing the intensity
More informationRegression. Name: Class: Date: Multiple Choice Identify the choice that best completes the statement or answers the question.
Class: Date: Regression Multiple Choice Identify the choice that best completes the statement or answers the question. 1. Given the least squares regression line y8 = 5 2x: a. the relationship between
More informationUnit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression
Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Objectives: To perform a hypothesis test concerning the slope of a least squares line To recognize that testing for a
More informationUsing R for Linear Regression
Using R for Linear Regression In the following handout words and symbols in bold are R functions and words and symbols in italics are entries supplied by the user; underlined words and symbols are optional
More informationSTATISTICA Formula Guide: Logistic Regression. Table of Contents
: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 SigmaRestricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary
More informationEcon 371 Problem Set #3 Answer Sheet
Econ 371 Problem Set #3 Answer Sheet 4.3 In this question, you are told that a OLS regression analysis of average weekly earnings yields the following estimated model. AW E = 696.7 + 9.6 Age, R 2 = 0.023,
More informationCorrelation key concepts:
CORRELATION Correlation key concepts: Types of correlation Methods of studying correlation a) Scatter diagram b) Karl pearson s coefficient of correlation c) Spearman s Rank correlation coefficient d)
More informationThis chapter will demonstrate how to perform multiple linear regression with IBM SPSS
CHAPTER 7B Multiple Regression: Statistical Methods Using IBM SPSS This chapter will demonstrate how to perform multiple linear regression with IBM SPSS first using the standard method and then using the
More informationSimple Linear Regression in SPSS STAT 314
Simple Linear Regression in SPSS STAT 314 1. Ten Corvettes between 1 and 6 years old were randomly selected from last year s sales records in Virginia Beach, Virginia. The following data were obtained,
More informationMulticollinearity Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised January 13, 2015
Multicollinearity Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised January 13, 2015 Stata Example (See appendices for full example).. use http://www.nd.edu/~rwilliam/stats2/statafiles/multicoll.dta,
More informationCorrelation Coefficient The correlation coefficient is a summary statistic that describes the linear relationship between two numerical variables 2
Lesson 4 Part 1 Relationships between two numerical variables 1 Correlation Coefficient The correlation coefficient is a summary statistic that describes the linear relationship between two numerical variables
More informationStatistics 112 Regression Cheatsheet Section 1B  Ryan Rosario
Statistics 112 Regression Cheatsheet Section 1B  Ryan Rosario I have found that the best way to practice regression is by brute force That is, given nothing but a dataset and your mind, compute everything
More informationTRINITY COLLEGE. Faculty of Engineering, Mathematics and Science. School of Computer Science & Statistics
UNIVERSITY OF DUBLIN TRINITY COLLEGE Faculty of Engineering, Mathematics and Science School of Computer Science & Statistics BA (Mod) Enter Course Title Trinity Term 2013 Junior/Senior Sophister ST7002
More informationThe importance of graphing the data: Anscombe s regression examples
The importance of graphing the data: Anscombe s regression examples Bruce Weaver Northern Health Research Conference Nipissing University, North Bay May 3031, 2008 B. Weaver, NHRC 2008 1 The Objective
More informationRegression III: Advanced Methods
Lecture 5: Linear leastsquares Regression III: Advanced Methods William G. Jacoby Department of Political Science Michigan State University http://polisci.msu.edu/jacoby/icpsr/regress3 Simple Linear Regression
More informationModule 5: Multiple Regression Analysis
Using Statistical Data Using to Make Statistical Decisions: Data Multiple to Make Regression Decisions Analysis Page 1 Module 5: Multiple Regression Analysis Tom Ilvento, University of Delaware, College
More information2013 MBA Jump Start Program. Statistics Module Part 3
2013 MBA Jump Start Program Module 1: Statistics Thomas Gilbert Part 3 Statistics Module Part 3 Hypothesis Testing (Inference) Regressions 2 1 Making an Investment Decision A researcher in your firm just
More informationStatistical Models in R
Statistical Models in R Some Examples Steven Buechler Department of Mathematics 276B Hurley Hall; 16233 Fall, 2007 Outline Statistical Models Linear Models in R Regression Regression analysis is the appropriate
More informationLinear Models in STATA and ANOVA
Session 4 Linear Models in STATA and ANOVA Page Strengths of Linear Relationships 42 A Note on NonLinear Relationships 44 Multiple Linear Regression 45 Removal of Variables 48 Independent Samples
More informationSydney Roberts Predicting Age Group Swimmers 50 Freestyle Time 1. 1. Introduction p. 2. 2. Statistical Methods Used p. 5. 3. 10 and under Males p.
Sydney Roberts Predicting Age Group Swimmers 50 Freestyle Time 1 Table of Contents 1. Introduction p. 2 2. Statistical Methods Used p. 5 3. 10 and under Males p. 8 4. 11 and up Males p. 10 5. 10 and under
More informationSimple Predictive Analytics Curtis Seare
Using Excel to Solve Business Problems: Simple Predictive Analytics Curtis Seare Copyright: Vault Analytics July 2010 Contents Section I: Background Information Why use Predictive Analytics? How to use
More informationCanonical Correlation
Chapter 400 Introduction Canonical correlation analysis is the study of the linear relations between two sets of variables. It is the multivariate extension of correlation analysis. Although we will present
More information1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96
1 Final Review 2 Review 2.1 CI 1propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years
More informationUnivariate Regression
Univariate Regression Correlation and Regression The regression line summarizes the linear relationship between 2 variables Correlation coefficient, r, measures strength of relationship: the closer r is
More informationMultiple Regression: What Is It?
Multiple Regression Multiple Regression: What Is It? Multiple regression is a collection of techniques in which there are multiple predictors of varying kinds and a single outcome We are interested in
More informationLecture 5 Hypothesis Testing in Multiple Linear Regression
Lecture 5 Hypothesis Testing in Multiple Linear Regression BIOST 515 January 20, 2004 Types of tests 1 Overall test Test for addition of a single variable Test for addition of a group of variables Overall
More informationRegression, least squares
Regression, least squares Joe Felsenstein Department of Genome Sciences and Department of Biology Regression, least squares p.1/24 Fitting a straight line X Two distinct cases: The X values are chosen
More informationSAS Software to Fit the Generalized Linear Model
SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling
More informationNotes on Applied Linear Regression
Notes on Applied Linear Regression Jamie DeCoster Department of Social Psychology Free University Amsterdam Van der Boechorststraat 1 1081 BT Amsterdam The Netherlands phone: +31 (0)20 4448935 email:
More informationMISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group
MISSING DATA TECHNIQUES WITH SAS IDRE Statistical Consulting Group ROAD MAP FOR TODAY To discuss: 1. Commonly used techniques for handling missing data, focusing on multiple imputation 2. Issues that could
More informationDESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.
DESCRIPTIVE STATISTICS The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses. DESCRIPTIVE VS. INFERENTIAL STATISTICS Descriptive To organize,
More informationSPSS: Descriptive and Inferential Statistics. For Windows
For Windows August 2012 Table of Contents Section 1: Summarizing Data...3 1.1 Descriptive Statistics...3 Section 2: Inferential Statistics... 10 2.1 ChiSquare Test... 10 2.2 T tests... 11 2.3 Correlation...
More informationCorrelation and Simple Linear Regression
Correlation and Simple Linear Regression We are often interested in studying the relationship among variables to determine whether they are associated with one another. When we think that changes in a
More informationPart 2: Analysis of Relationship Between Two Variables
Part 2: Analysis of Relationship Between Two Variables Linear Regression Linear correlation Significance Tests Multiple regression Linear Regression Y = a X + b Dependent Variable Independent Variable
More informationTesting for serial correlation in linear paneldata models
The Stata Journal (2003) 3, Number 2, pp. 168 177 Testing for serial correlation in linear paneldata models David M. Drukker Stata Corporation Abstract. Because serial correlation in linear paneldata
More informationApplying Statistics Recommended by Regulatory Documents
Applying Statistics Recommended by Regulatory Documents Steven Walfish President, Statistical Outsourcing Services steven@statisticaloutsourcingservices.com 301325 32531293129 About the Speaker Mr. Steven
More informationBivariate Analysis. Correlation. Correlation. Pearson's Correlation Coefficient. Variable 1. Variable 2
Bivariate Analysis Variable 2 LEVELS >2 LEVELS COTIUOUS Correlation Used when you measure two continuous variables. Variable 2 2 LEVELS X 2 >2 LEVELS X 2 COTIUOUS ttest X 2 X 2 AOVA (Ftest) ttest AOVA
More informationPearson s correlation
Pearson s correlation Introduction Often several quantitative variables are measured on each member of a sample. If we consider a pair of such variables, it is frequently of interest to establish if there
More informationChapter 14: Analyzing Relationships Between Variables
Chapter Outlines for: Frey, L., Botan, C., & Kreps, G. (1999). Investigating communication: An introduction to research methods. (2nd ed.) Boston: Allyn & Bacon. Chapter 14: Analyzing Relationships Between
More informationRegression Analysis: A Complete Example
Regression Analysis: A Complete Example This section works out an example that includes all the topics we have discussed so far in this chapter. A complete example of regression analysis. PhotoDisc, Inc./Getty
More informationMultiple Linear Regression in Data Mining
Multiple Linear Regression in Data Mining Contents 2.1. A Review of Multiple Linear Regression 2.2. Illustration of the Regression Process 2.3. Subset Selection in Linear Regression 1 2 Chap. 2 Multiple
More informationPoint Biserial Correlation Tests
Chapter 807 Point Biserial Correlation Tests Introduction The point biserial correlation coefficient (ρ in this chapter) is the productmoment correlation calculated between a continuous random variable
More informationLectures 8, 9 & 10. Multiple Regression Analysis
Lectures 8, 9 & 0. Multiple Regression Analysis In which you learn how to apply the principles and tests outlined in earlier lectures to more realistic models involving more than explanatory variable and
More informationPart 3. Comparing Groups. Chapter 7 Comparing Paired Groups 189. Chapter 8 Comparing Two Independent Groups 217
Part 3 Comparing Groups Chapter 7 Comparing Paired Groups 189 Chapter 8 Comparing Two Independent Groups 217 Chapter 9 Comparing More Than Two Groups 257 188 Elementary Statistics Using SAS Chapter 7 Comparing
More information