Correlation and simple linear regression S6

Size: px
Start display at page:

Download "Correlation and simple linear regression S6"

Transcription

1 Basic Medical Statistics Course Correlation and simple linear regression S6 Patrycja Gradowska December 3, /43

2 Introduction So far we have looked at the association between: Two categorical variables (chi-square test) Numerical variable and categorical variable (independent samples t-test and ANOVA) We will now look at the association between two numerical (continuous) variables, say and y 2/43

3 Introduction Eample 1: Mortality from malignant melanoma of the skin versus latitude of residency among white males in the United States (van Belle et al, 2004) Latitude Mortality rate # State (degrees North) (#deaths per 10 million) 1 1 Alabama Arizona Arkansas California Colorado Connecticut Delaware Wisconsin Wyoming How do we investigate the association between these two variables? 1 Mortality rate for the period /43

4 Scatter plot There is a roughly linear association 4/43

5 Relationship between two numerical variables If a linear relationship between and y appears to be reasonable from the scatter plot, we can take the net step and 1. Calculate Pearson s product moment correlation coefficient between and y Measures how closely the data points on the scatter plot resemble a straight line 2. Perform a simple linear regression analysis Finds the equation of the line that best describes the relationship between variables seen in a scatter plot 5/43

6 Correlation Sample Pearson s product moment correlation coefficient, or correlation coefficient, between variables and y is calculated as r(, y) = 1 n 1 n ( ) ( ) i yi ȳ i=1 s s y = 1 n 1 n z i z yi where {( i, y i ) : i = 1,..., n} is a random sample of n observations on and y, and ȳ are the sample means of respectively and y, s i=1 and s y are corresponding sample standard deviations, and z i are z-scores of and y for i-th observation. and z yi 6/43

7 Correlation Properties of r: r estimates the true population correlation coefficient (ρ) r takes on any value between 1 and 1, i.e. 1 r 1 Magnitude of r indicates the strength of a linear relationship between and y: r = 1 or 1 means perfect linear association r = 0 indicates no linear association (but can be e.g. non-linear) The closer r is to -1 or 1, the stronger the linear association (e.g. r = -0.1 (weak association) vs r = 0.85 (strong association)) Sign of r indicates the direction of association: r > 0 implies positive relationship i.e. the two variables tend to move in the same direction r < 0 implies negative relationship i.e. the two variables tend to move in opposite directions 7/43

8 Correlation Properties of r (cont): r(a + b, c y + d) = r(, y), where a > 0, c > 0, and b and d are constants r(, y) = r(y, ) r 0 does not imply causation! Just because two variables are correlated does not necessarily mean that one causes the other! r 2 is called the coefficient of determination r 2 is a number between 0 and 1 Represents the proportion of total variation in one variable that is eplained by the other For eample, the coefficient of determination between body weight and age of 0.60 means that 60% of total variation in body weight is eplained by age alone and the remaining 40% is eplained by other factors. 8/43

9 Correlation Correlation r= -1 r= 1 r= 0.8 r= -0.8 r= 0 r= 0 0 < r< 1-1 < r< 0 Don t interpret r without looking at the scatter plot! 9/43

10 Correlation Hypothesis test for the population correlation coefficient ρ: H 0 : ρ = 0 H 1 : ρ 0 Under H 0, the test statistic n 2 T = r 1 r 2 follows a Student-t distribution with n 2 degrees of freedom. Note: This test assumes that the variables are normally distributed 10/43

11 Correlation Eample 1 revisited: skin cancer mortality vs latitude 250,00 200,00 Mortality 150,00 100,00 50,00 25,00 30,00 35,00 40,00 45,00 50,00 Latitude What is the magnitude and sign of correlation coefficient between latitude and skin cancer mortality? 11/43

12 Correlation Eample 1 revisited: skin cancer mortality vs latitude SPSS output Correlations Mortality Latitude Pearson Correlation 1 -,825 ** Mortality Sig. (2-tailed),000 N Pearson Correlation -,825 ** 1 Latitude Sig. (2-tailed),000 N **. Correlation is significant at the 0.01 level (2-tailed). r p-value n 12/43

13 Pearson s product moment correlation coefficient measures the strength and direction of the linear association between and y But often times we are also interested in predicting the value of one variable given the value of the other This requires finding an equation (or mathematical model) that describes or summarizes the relationship between the variables If a scatter plot of our data shows an approimately linear relationship between and y we can use simple linear regression to estimate the equation of this line Regression, unlike correlation, requires that we have a dependent variable (or outcome or response variable), i.e. the variable being predicted (always on the vertical or y-ais) an independent variable (or eplanatory or predictor variable), i.e. the variable used for prediction (always on the horizontal or -ais) Let s assume that and y are the independent variable and the dependent variable, respectively 13/43

14 Simple linear regression postulates that in the population where: y is the dependent variable is the independent variable y = (α + β ) + ɛ, α and β are parameters called population regression coefficients ɛ is a random error term 14/43

15 y /43

16 y E(y i ) E(y i ) is the mean value of y when = i 16/43

17 y E(y i ) E(y ) = α + β E(y ) = α + β is the population regression function 17/43

18 y E(y ) = α + β 3β β α α is the y-intercept of the population regression function, i.e. the mean value of y when equals 0 β is the slope of the population regression function, i.e. the mean (or epected) change in y associated with a 1-unit increase in the value of c β is the mean change in y for a c-unit increase in the value of α and β are estimated from the sample data using the method of least squares (usually) 18/43

19 y = a + b y i e i ei = y i - i = residual i i 0 i Least squares method chooses a and b (estimates for α and β) to minimize the sum of the squares of the residuals n ei 2 = i=1 n (y i ŷ i ) 2 = i=1 n [y i (a + b i )] 2 i=1 19/43

20 The least squares estimates for α and β are: n i=1 b = ( i )(y i ȳ) n i=1 ( i ) 2 and a = ȳ b, where and ȳ are the respective sample means of and y. Note that: b = r(, y) sy s, where r(, y) is the sample product moment correlation between and y, and s and s y are the sample standard deviations of and y. 20/43

21 Relationship between slope b and correlation coefficient r r b unless s = s y r measures the strength of a linear association between and y while b measures the size of the change in the mean value of y due to a unit change in r does not distinguish between and y while b does r is scale-free while b is not But: r and b have the same sign both r and b do not imply causation both r and b can be affected by outliers r = 0 if and only if b = 0, thus test of β = 0 is equivalent to the test of ρ = 0 (i.e. no linear relationship) 21/43

22 Test of H 0 : β = 0 versus H 1 : β 0 1. t-test: Test statistic: T = b, where SE(b) is the standard error of b SE(b) calculated from the data Under H0, T follows a Student-t distribution with n 2 degrees of freedom 2. F-test: ( ) 2 Test-statistic: F = b SE(b) = T 2, where SE(b) and T are as above Under, H0, F follows an F distribution with 1 and n 2 degrees of freedom The t-test and the F-test lead to the same outcome Note: The test of zero intercept α is of less interest, unless = 0 is meaningful 22/43

23 Eample 2: blood pressure (mmhg) versus body weight (kg) in 20 patients with hypertension (Daniel & Cross, 2013) BP Weight 23/43

24 SPSS output: Coefficients a Model 1 (Constant) Weight Unstandardized Coefficients B Std. Error Beta t Sig a. From above, the regression equation is BP = Weight ANOVA a Model Sum of Squares df Mean Square F Sig. Regression 505, , ,859,000 b 1 Residual 54, ,029 Total 560, a. Dependent Variable: BP b. Predictors: (Constant), Weight F-test 24/43

25 Standardized coefficients Obtained by standardizing both y and (i.e. converting into z-scores) and re-running the regression After standardization, the intercept will be equal to zero and the slope for will be equal to the sample correlation coefficient Of greater concern in multiple linear regression (net lecture) where the predictors are epressed in different units Standardization removes the dependence of regression coefficients on the units of measurements of y and s so they can be meaningfully compared The larger the standardized coefficient (in absolute value) the greater the contribution of the respective variable in the prediction of y Standardized and unstandardized coefficients have the same sign and their significance tests are equivalent 25/43

26 Simple linear regression is only appropriate when the following assumptions are satisfied: 1. Independence: the observations are independent, i.e. there is only one pair of observations per subject 2. Linearity: the relationship between and y is linear 3. Constant variance: the variance of y is constant for all values of 4. Normality: y has a Normal distribution 26/43

27 27/43 Simple linear regression Checking linearity assumption: 1. Make a scatter plot of y versus If the assumption of linearity is met, the points in this plot should generally form a straight line 2. Plot the residuals against the eplanatory variable If the assumption of linearity is met, we should see a random scatter of points around zero rather than any systematic pattern 0 e Linearity 0 Lack of linearity e

28 28/43 Simple linear regression Checking constant variance assumption: Make a residual plot, i.e. plot the residuals against the fitted values of y (ŷ i = a + b i ) If the assumption is met, we epect to observe a random scatter of points If the scatter of the residuals increases or decreases as ŷ increases, then this assumption is not satisfied 0 e Constant variance 0 Non-constant variance e

29 Eample 2 revisited: blood pressure vs body weight Residual plot 29/43

30 Checking normality assumption: 1. Draw a histogram of the residuals and eyeball the result 2. Make a normal probability plot (P P plot) of the residuals, i.e. plot the epected cumulative probability of a normal distribution versus the observed cumulative probability at each value of the residual If the assumption of normality is met, the points in this plot should form a straight diagonal line 30/43

31 Eample 2 revisited: blood pressure vs body weight P P plot 31/43

32 Outliers Outlier is a data point that stands apart from the overall pattern seen in the scatter plot (i.e. unusual or unepected observation) It can be detected by looking at a scatter plot or residual plot We should always search for an eplanation for any outliers Common sources of outliers include: human and measurement errors during data collection and entry, sampling error and chance Some outliers can be corrected or removed, but some cannot In general, outliers that cannot be corrected should not be removed Outliers may influence the estimates of model parameters and thus the study conclusions In order to determine this influence, fit the line with and without the questionable points and see what happens 32/43

33 33/43 Simple linear regression Assessing goodness of fit The estimated regression line is the best one available (in the least-squares sense) Yet, it can still be a very poor fit to the observed data y Good fit Bad fit y

34 To assess goodness of fit of a regression line (i.e. how well does the line fit the data) we can: 1. Calculate the correlation coefficient between the predicted and observed values of y, R A higher absolute value of R indicates better fit (predicted and observed values of y are closer to each other) 2. Calculate R 2 (R Square in SPSS) 0 R 2 1 A higher value of R 2 indicates better fit R 2 = 1 indicates perfect fit (i.e. ŷ i = y i for each i) R 2 = 0 indicates very poor fit 34/43

35 Alternatively, R 2 can be calculated as n R 2 i=1 = (ŷ i ȳ) 2 variation in y eplained by n i=1 (y = i ȳ) 2 total variation in y We interpret R 2 as the proportion of total variability in y that can be eplained by the eplanatory variable An R 2 of 1 means that eplains all variability in y An R 2 of 0 indicates that does not eplain any variability in y R 2 is usually epressed as a percentage. For eample, R 2 = 0.93 indicates that 93% of total variation in y can be eplained by In SPSS, R 2 can be found in Model Summary table or it can be calculated from ANOVA table; both tables are produced when running linear regression 35/43

36 Eample 2 revisited: blood pressure vs body weight Model Summary Model R R Square Adjusted R Square Std. Error of the Estimate 1,950 a,903,897 1,74050 a. Predictors: (Constant), Weight 36/43

37 Prediction: interpolation versus etrapolation y Possible patterns of additional data Range of actual data Etrapolation beyond the range of the data is risky!! 37/43

38 Categorical eplanatory variable So far we assumed that the predictor variable is numerical But what if we want to study an association between y and a categorical, e.g. between blood pressure and gender or between skin cancer mortality and race/ethnicity? Categorical variables can be incorporated into a regression model through one or more indicator or dummy variables that take on the values 0 and 1 In general, to include a variable with p categories/levels p 1 dummy variables are required 38/43

39 Categorical eplanatory variable Eample: variable with 4 categories, e.g. blood group (A, B, AB, 0) Basic steps: 1. Create dummy variables for all categories { 1, if blood group is A A = 0, otherwise { 1, if blood group is B B = 0, otherwise { 1, if blood group is AB AB = 0, otherwise { 1, if blood group is 0 0 = 0, otherwise 39/43

40 Categorical eplanatory variable In a dataset: Subject ID Blood group A B AB 0 1 A B AB B A B AB Select one blood group as a reference category category that results in useful comparisons (e.g. eposed versus non-eposed, eperimental versus standard treatment) or a category with large number of subjects 3. Include in the model all dummies ecept the one corresponding to the reference category 40/43

41 Categorical eplanatory variable Taking blood group 0 as reference category, the model becomes y = α + β A A + β B B + β AB AB + ɛ and its estimated counterpart is ŷ = a + b A A + b B B + b AB AB Estimation of model parameters requires running multiple linear regression (net lecture), unless the eplanatory variable has only two categories (e.g. gender) Given that y represents IQ score, the estimated coefficients are interpreted as follows: a is the mean IQ for subjects with blood group 0, i.e. the reference category Each b represents the mean difference in IQ between subjects with a blood group represented by the respective dummy variable and subjects with blood group 0 (the reference category) 41/43

42 Categorical eplanatory variable Specifically: b A is the difference between the mean IQ in subjects with blood group A and the mean IQ in subjects with blood group 0, i.e. b A = ŷ( A = 1, B = 0, AB = 0) a b B is the difference between the mean IQ in subjects with blood group B and the mean IQ in subjects with blood group 0, i.e. b B = ŷ( A = 0, B = 1, AB = 0) a b AB is the difference between the mean IQ in subjects with blood group AB and the mean IQ in subjects with blood group 0, i.e. b AB = ŷ( A = 0, B = 0, AB = 1) a Note: A test for the significance of a categorical eplanatory variable with p levels involves the hypothesis that the coefficients of all p 1 dummy variables are zero. For that purpose, we need to use an overall F-test (net lecture) and not a t-test. The t-test can be used only when the variable is binary. 42/43

43 References Gerald van Belle, Lloyd D. Fisher, Patrick J. Heagerty, Thomas Lumley Biostatistics: a methodology for the health sciences, 2nd edition. John Wiley & Sons, Wayne W. Daniel, Chad L. Cross Biostatistics: a foundation for analysis in the health sciences, 10th edition. John Wiley & Sons, /43

Chapter 13 Introduction to Linear Regression and Correlation Analysis

Chapter 13 Introduction to Linear Regression and Correlation Analysis Chapter 3 Student Lecture Notes 3- Chapter 3 Introduction to Linear Regression and Correlation Analsis Fall 2006 Fundamentals of Business Statistics Chapter Goals To understand the methods for displaing

More information

Simple linear regression

Simple linear regression Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between

More information

The Big Picture. Correlation. Scatter Plots. Data

The Big Picture. Correlation. Scatter Plots. Data The Big Picture Correlation Bret Hanlon and Bret Larget Department of Statistics Universit of Wisconsin Madison December 6, We have just completed a length series of lectures on ANOVA where we considered

More information

Section 14 Simple Linear Regression: Introduction to Least Squares Regression

Section 14 Simple Linear Regression: Introduction to Least Squares Regression Slide 1 Section 14 Simple Linear Regression: Introduction to Least Squares Regression There are several different measures of statistical association used for understanding the quantitative relationship

More information

Chapter 7: Simple linear regression Learning Objectives

Chapter 7: Simple linear regression Learning Objectives Chapter 7: Simple linear regression Learning Objectives Reading: Section 7.1 of OpenIntro Statistics Video: Correlation vs. causation, YouTube (2:19) Video: Intro to Linear Regression, YouTube (5:18) -

More information

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION HOD 2990 10 November 2010 Lecture Background This is a lightning speed summary of introductory statistical methods for senior undergraduate

More information

Chapter 9 Descriptive Statistics for Bivariate Data

Chapter 9 Descriptive Statistics for Bivariate Data 9.1 Introduction 215 Chapter 9 Descriptive Statistics for Bivariate Data 9.1 Introduction We discussed univariate data description (methods used to eplore the distribution of the values of a single variable)

More information

SPSS Guide: Regression Analysis

SPSS Guide: Regression Analysis SPSS Guide: Regression Analysis I put this together to give you a step-by-step guide for replicating what we did in the computer lab. It should help you run the tests we covered. The best way to get familiar

More information

2. Simple Linear Regression

2. Simple Linear Regression Research methods - II 3 2. Simple Linear Regression Simple linear regression is a technique in parametric statistics that is commonly used for analyzing mean response of a variable Y which changes according

More information

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96 1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years

More information

Univariate Regression

Univariate Regression Univariate Regression Correlation and Regression The regression line summarizes the linear relationship between 2 variables Correlation coefficient, r, measures strength of relationship: the closer r is

More information

Module 3: Correlation and Covariance

Module 3: Correlation and Covariance Using Statistical Data to Make Decisions Module 3: Correlation and Covariance Tom Ilvento Dr. Mugdim Pašiƒ University of Delaware Sarajevo Graduate School of Business O ften our interest in data analysis

More information

Section 3 Part 1. Relationships between two numerical variables

Section 3 Part 1. Relationships between two numerical variables Section 3 Part 1 Relationships between two numerical variables 1 Relationship between two variables The summary statistics covered in the previous lessons are appropriate for describing a single variable.

More information

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1) CORRELATION AND REGRESSION / 47 CHAPTER EIGHT CORRELATION AND REGRESSION Correlation and regression are statistical methods that are commonly used in the medical literature to compare two or more variables.

More information

Multiple Regression in SPSS This example shows you how to perform multiple regression. The basic command is regression : linear.

Multiple Regression in SPSS This example shows you how to perform multiple regression. The basic command is regression : linear. Multiple Regression in SPSS This example shows you how to perform multiple regression. The basic command is regression : linear. In the main dialog box, input the dependent variable and several predictors.

More information

Correlational Research. Correlational Research. Stephen E. Brock, Ph.D., NCSP EDS 250. Descriptive Research 1. Correlational Research: Scatter Plots

Correlational Research. Correlational Research. Stephen E. Brock, Ph.D., NCSP EDS 250. Descriptive Research 1. Correlational Research: Scatter Plots Correlational Research Stephen E. Brock, Ph.D., NCSP California State University, Sacramento 1 Correlational Research A quantitative methodology used to determine whether, and to what degree, a relationship

More information

Module 5: Multiple Regression Analysis

Module 5: Multiple Regression Analysis Using Statistical Data Using to Make Statistical Decisions: Data Multiple to Make Regression Decisions Analysis Page 1 Module 5: Multiple Regression Analysis Tom Ilvento, University of Delaware, College

More information

DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9

DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9 DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9 Analysis of covariance and multiple regression So far in this course,

More information

The correlation coefficient

The correlation coefficient The correlation coefficient Clinical Biostatistics The correlation coefficient Martin Bland Correlation coefficients are used to measure the of the relationship or association between two quantitative

More information

Moderator and Mediator Analysis

Moderator and Mediator Analysis Moderator and Mediator Analysis Seminar General Statistics Marijtje van Duijn October 8, Overview What is moderation and mediation? What is their relation to statistical concepts? Example(s) October 8,

More information

Econometrics Simple Linear Regression

Econometrics Simple Linear Regression Econometrics Simple Linear Regression Burcu Eke UC3M Linear equations with one variable Recall what a linear equation is: y = b 0 + b 1 x is a linear equation with one variable, or equivalently, a straight

More information

Example: Boats and Manatees

Example: Boats and Manatees Figure 9-6 Example: Boats and Manatees Slide 1 Given the sample data in Table 9-1, find the value of the linear correlation coefficient r, then refer to Table A-6 to determine whether there is a significant

More information

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( ) Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

More information

Chapter Seven. Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS

Chapter Seven. Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS Chapter Seven Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS Section : An introduction to multiple regression WHAT IS MULTIPLE REGRESSION? Multiple

More information

Correlation key concepts:

Correlation key concepts: CORRELATION Correlation key concepts: Types of correlation Methods of studying correlation a) Scatter diagram b) Karl pearson s coefficient of correlation c) Spearman s Rank correlation coefficient d)

More information

Correlation. What Is Correlation? Perfect Correlation. Perfect Correlation. Greg C Elvers

Correlation. What Is Correlation? Perfect Correlation. Perfect Correlation. Greg C Elvers Correlation Greg C Elvers What Is Correlation? Correlation is a descriptive statistic that tells you if two variables are related to each other E.g. Is your related to how much you study? When two variables

More information

CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression

CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression Opening Example CHAPTER 13 SIMPLE LINEAR REGREION SIMPLE LINEAR REGREION! Simple Regression! Linear Regression Simple Regression Definition A regression model is a mathematical equation that descries the

More information

Correlation Coefficient The correlation coefficient is a summary statistic that describes the linear relationship between two numerical variables 2

Correlation Coefficient The correlation coefficient is a summary statistic that describes the linear relationship between two numerical variables 2 Lesson 4 Part 1 Relationships between two numerical variables 1 Correlation Coefficient The correlation coefficient is a summary statistic that describes the linear relationship between two numerical variables

More information

Chapter 23. Inferences for Regression

Chapter 23. Inferences for Regression Chapter 23. Inferences for Regression Topics covered in this chapter: Simple Linear Regression Simple Linear Regression Example 23.1: Crying and IQ The Problem: Infants who cry easily may be more easily

More information

Chapter 5 Analysis of variance SPSS Analysis of variance

Chapter 5 Analysis of variance SPSS Analysis of variance Chapter 5 Analysis of variance SPSS Analysis of variance Data file used: gss.sav How to get there: Analyze Compare Means One-way ANOVA To test the null hypothesis that several population means are equal,

More information

Part 2: Analysis of Relationship Between Two Variables

Part 2: Analysis of Relationship Between Two Variables Part 2: Analysis of Relationship Between Two Variables Linear Regression Linear correlation Significance Tests Multiple regression Linear Regression Y = a X + b Dependent Variable Independent Variable

More information

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses. DESCRIPTIVE STATISTICS The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses. DESCRIPTIVE VS. INFERENTIAL STATISTICS Descriptive To organize,

More information

5. Linear Regression

5. Linear Regression 5. Linear Regression Outline.................................................................... 2 Simple linear regression 3 Linear model............................................................. 4

More information

Lecture 11: Chapter 5, Section 3 Relationships between Two Quantitative Variables; Correlation

Lecture 11: Chapter 5, Section 3 Relationships between Two Quantitative Variables; Correlation Lecture 11: Chapter 5, Section 3 Relationships between Two Quantitative Variables; Correlation Display and Summarize Correlation for Direction and Strength Properties of Correlation Regression Line Cengage

More information

Factor Analysis. Principal components factor analysis. Use of extracted factors in multivariate dependency models

Factor Analysis. Principal components factor analysis. Use of extracted factors in multivariate dependency models Factor Analysis Principal components factor analysis Use of extracted factors in multivariate dependency models 2 KEY CONCEPTS ***** Factor Analysis Interdependency technique Assumptions of factor analysis

More information

An analysis appropriate for a quantitative outcome and a single quantitative explanatory. 9.1 The model behind linear regression

An analysis appropriate for a quantitative outcome and a single quantitative explanatory. 9.1 The model behind linear regression Chapter 9 Simple Linear Regression An analysis appropriate for a quantitative outcome and a single quantitative explanatory variable. 9.1 The model behind linear regression When we are examining the relationship

More information

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Objectives: To perform a hypothesis test concerning the slope of a least squares line To recognize that testing for a

More information

Introduction to Regression and Data Analysis

Introduction to Regression and Data Analysis Statlab Workshop Introduction to Regression and Data Analysis with Dan Campbell and Sherlock Campbell October 28, 2008 I. The basics A. Types of variables Your variables may take several forms, and it

More information

Linear Models in STATA and ANOVA

Linear Models in STATA and ANOVA Session 4 Linear Models in STATA and ANOVA Page Strengths of Linear Relationships 4-2 A Note on Non-Linear Relationships 4-4 Multiple Linear Regression 4-5 Removal of Variables 4-8 Independent Samples

More information

Simple Regression Theory II 2010 Samuel L. Baker

Simple Regression Theory II 2010 Samuel L. Baker SIMPLE REGRESSION THEORY II 1 Simple Regression Theory II 2010 Samuel L. Baker Assessing how good the regression equation is likely to be Assignment 1A gets into drawing inferences about how close the

More information

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r),

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r), Chapter 0 Key Ideas Correlation, Correlation Coefficient (r), Section 0-: Overview We have already explored the basics of describing single variable data sets. However, when two quantitative variables

More information

1.5 Oneway Analysis of Variance

1.5 Oneway Analysis of Variance Statistics: Rosie Cornish. 200. 1.5 Oneway Analysis of Variance 1 Introduction Oneway analysis of variance (ANOVA) is used to compare several means. This method is often used in scientific or medical experiments

More information

Exercise 1.12 (Pg. 22-23)

Exercise 1.12 (Pg. 22-23) Individuals: The objects that are described by a set of data. They may be people, animals, things, etc. (Also referred to as Cases or Records) Variables: The characteristics recorded about each individual.

More information

Introduction to Longitudinal Data Analysis

Introduction to Longitudinal Data Analysis Introduction to Longitudinal Data Analysis Longitudinal Data Analysis Workshop Section 1 University of Georgia: Institute for Interdisciplinary Research in Education and Human Development Section 1: Introduction

More information

Introduction to Quantitative Methods

Introduction to Quantitative Methods Introduction to Quantitative Methods October 15, 2009 Contents 1 Definition of Key Terms 2 2 Descriptive Statistics 3 2.1 Frequency Tables......................... 4 2.2 Measures of Central Tendencies.................

More information

" Y. Notation and Equations for Regression Lecture 11/4. Notation:

 Y. Notation and Equations for Regression Lecture 11/4. Notation: Notation: Notation and Equations for Regression Lecture 11/4 m: The number of predictor variables in a regression Xi: One of multiple predictor variables. The subscript i represents any number from 1 through

More information

Introduction. Hypothesis Testing. Hypothesis Testing. Significance Testing

Introduction. Hypothesis Testing. Hypothesis Testing. Significance Testing Introduction Hypothesis Testing Mark Lunt Arthritis Research UK Centre for Ecellence in Epidemiology University of Manchester 13/10/2015 We saw last week that we can never know the population parameters

More information

17. SIMPLE LINEAR REGRESSION II

17. SIMPLE LINEAR REGRESSION II 17. SIMPLE LINEAR REGRESSION II The Model In linear regression analysis, we assume that the relationship between X and Y is linear. This does not mean, however, that Y can be perfectly predicted from X.

More information

MULTIPLE REGRESSION WITH CATEGORICAL DATA

MULTIPLE REGRESSION WITH CATEGORICAL DATA DEPARTMENT OF POLITICAL SCIENCE AND INTERNATIONAL RELATIONS Posc/Uapp 86 MULTIPLE REGRESSION WITH CATEGORICAL DATA I. AGENDA: A. Multiple regression with categorical variables. Coding schemes. Interpreting

More information

II. DISTRIBUTIONS distribution normal distribution. standard scores

II. DISTRIBUTIONS distribution normal distribution. standard scores Appendix D Basic Measurement And Statistics The following information was developed by Steven Rothke, PhD, Department of Psychology, Rehabilitation Institute of Chicago (RIC) and expanded by Mary F. Schmidt,

More information

STAT 350 Practice Final Exam Solution (Spring 2015)

STAT 350 Practice Final Exam Solution (Spring 2015) PART 1: Multiple Choice Questions: 1) A study was conducted to compare five different training programs for improving endurance. Forty subjects were randomly divided into five groups of eight subjects

More information

Pearson s Correlation

Pearson s Correlation Pearson s Correlation Correlation the degree to which two variables are associated (co-vary). Covariance may be either positive or negative. Its magnitude depends on the units of measurement. Assumes the

More information

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Volkert Siersma siersma@sund.ku.dk The Research Unit for General Practice in Copenhagen Dias 1 Content Quantifying association

More information

2013 MBA Jump Start Program. Statistics Module Part 3

2013 MBA Jump Start Program. Statistics Module Part 3 2013 MBA Jump Start Program Module 1: Statistics Thomas Gilbert Part 3 Statistics Module Part 3 Hypothesis Testing (Inference) Regressions 2 1 Making an Investment Decision A researcher in your firm just

More information

Binary Logistic Regression

Binary Logistic Regression Binary Logistic Regression Main Effects Model Logistic regression will accept quantitative, binary or categorical predictors and will code the latter two in various ways. Here s a simple model including

More information

11. Analysis of Case-control Studies Logistic Regression

11. Analysis of Case-control Studies Logistic Regression Research methods II 113 11. Analysis of Case-control Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:

More information

Simple Predictive Analytics Curtis Seare

Simple Predictive Analytics Curtis Seare Using Excel to Solve Business Problems: Simple Predictive Analytics Curtis Seare Copyright: Vault Analytics July 2010 Contents Section I: Background Information Why use Predictive Analytics? How to use

More information

Course Objective This course is designed to give you a basic understanding of how to run regressions in SPSS.

Course Objective This course is designed to give you a basic understanding of how to run regressions in SPSS. SPSS Regressions Social Science Research Lab American University, Washington, D.C. Web. www.american.edu/provost/ctrl/pclabs.cfm Tel. x3862 Email. SSRL@American.edu Course Objective This course is designed

More information

COMPARISONS OF CUSTOMER LOYALTY: PUBLIC & PRIVATE INSURANCE COMPANIES.

COMPARISONS OF CUSTOMER LOYALTY: PUBLIC & PRIVATE INSURANCE COMPANIES. 277 CHAPTER VI COMPARISONS OF CUSTOMER LOYALTY: PUBLIC & PRIVATE INSURANCE COMPANIES. This chapter contains a full discussion of customer loyalty comparisons between private and public insurance companies

More information

Predictability Study of ISIP Reading and STAAR Reading: Prediction Bands. March 2014

Predictability Study of ISIP Reading and STAAR Reading: Prediction Bands. March 2014 Predictability Study of ISIP Reading and STAAR Reading: Prediction Bands March 2014 Chalie Patarapichayatham 1, Ph.D. William Fahle 2, Ph.D. Tracey R. Roden 3, M.Ed. 1 Research Assistant Professor in the

More information

SIMPLE LINEAR CORRELATION. r can range from -1 to 1, and is independent of units of measurement. Correlation can be done on two dependent variables.

SIMPLE LINEAR CORRELATION. r can range from -1 to 1, and is independent of units of measurement. Correlation can be done on two dependent variables. SIMPLE LINEAR CORRELATION Simple linear correlation is a measure of the degree to which two variables vary together, or a measure of the intensity of the association between two variables. Correlation

More information

1/27/2013. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2

1/27/2013. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2 PSY 512: Advanced Statistics for Psychological and Behavioral Research 2 Introduce moderated multiple regression Continuous predictor continuous predictor Continuous predictor categorical predictor Understand

More information

Correlation and Regression

Correlation and Regression Correlation and Regression Scatterplots Correlation Explanatory and response variables Simple linear regression General Principles of Data Analysis First plot the data, then add numerical summaries Look

More information

Simple Linear Regression, Scatterplots, and Bivariate Correlation

Simple Linear Regression, Scatterplots, and Bivariate Correlation 1 Simple Linear Regression, Scatterplots, and Bivariate Correlation This section covers procedures for testing the association between two continuous variables using the SPSS Regression and Correlate analyses.

More information

Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares

Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares Topic 4 - Analysis of Variance Approach to Regression Outline Partitioning sums of squares Degrees of freedom Expected mean squares General linear test - Fall 2013 R 2 and the coefficient of correlation

More information

Descriptive Statistics

Descriptive Statistics Descriptive Statistics Primer Descriptive statistics Central tendency Variation Relative position Relationships Calculating descriptive statistics Descriptive Statistics Purpose to describe or summarize

More information

AP STATISTICS REVIEW (YMS Chapters 1-8)

AP STATISTICS REVIEW (YMS Chapters 1-8) AP STATISTICS REVIEW (YMS Chapters 1-8) Exploring Data (Chapter 1) Categorical Data nominal scale, names e.g. male/female or eye color or breeds of dogs Quantitative Data rational scale (can +,,, with

More information

DATA INTERPRETATION AND STATISTICS

DATA INTERPRETATION AND STATISTICS PholC60 September 001 DATA INTERPRETATION AND STATISTICS Books A easy and systematic introductory text is Essentials of Medical Statistics by Betty Kirkwood, published by Blackwell at about 14. DESCRIPTIVE

More information

The Dummy s Guide to Data Analysis Using SPSS

The Dummy s Guide to Data Analysis Using SPSS The Dummy s Guide to Data Analysis Using SPSS Mathematics 57 Scripps College Amy Gamble April, 2001 Amy Gamble 4/30/01 All Rights Rerserved TABLE OF CONTENTS PAGE Helpful Hints for All Tests...1 Tests

More information

WEB APPENDIX. Calculating Beta Coefficients. b Beta Rise Run Y 7.1 1 8.92 X 10.0 0.0 16.0 10.0 1.6

WEB APPENDIX. Calculating Beta Coefficients. b Beta Rise Run Y 7.1 1 8.92 X 10.0 0.0 16.0 10.0 1.6 WEB APPENDIX 8A Calculating Beta Coefficients The CAPM is an ex ante model, which means that all of the variables represent before-thefact, expected values. In particular, the beta coefficient used in

More information

Linear Regression. Chapter 5. Prediction via Regression Line Number of new birds and Percent returning. Least Squares

Linear Regression. Chapter 5. Prediction via Regression Line Number of new birds and Percent returning. Least Squares Linear Regression Chapter 5 Regression Objective: To quantify the linear relationship between an explanatory variable (x) and response variable (y). We can then predict the average response for all subjects

More information

Formula for linear models. Prediction, extrapolation, significance test against zero slope.

Formula for linear models. Prediction, extrapolation, significance test against zero slope. Formula for linear models. Prediction, extrapolation, significance test against zero slope. Last time, we looked the linear regression formula. It s the line that fits the data best. The Pearson correlation

More information

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number 1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number A. 3(x - x) B. x 3 x C. 3x - x D. x - 3x 2) Write the following as an algebraic expression

More information

Introduction to Data Analysis in Hierarchical Linear Models

Introduction to Data Analysis in Hierarchical Linear Models Introduction to Data Analysis in Hierarchical Linear Models April 20, 2007 Noah Shamosh & Frank Farach Social Sciences StatLab Yale University Scope & Prerequisites Strong applied emphasis Focus on HLM

More information

Chapter 2 Probability Topics SPSS T tests

Chapter 2 Probability Topics SPSS T tests Chapter 2 Probability Topics SPSS T tests Data file used: gss.sav In the lecture about chapter 2, only the One-Sample T test has been explained. In this handout, we also give the SPSS methods to perform

More information

Simple Linear Regression Inference

Simple Linear Regression Inference Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation

More information

Correlation and Regression Analysis: SPSS

Correlation and Regression Analysis: SPSS Correlation and Regression Analysis: SPSS Bivariate Analysis: Cyberloafing Predicted from Personality and Age These days many employees, during work hours, spend time on the Internet doing personal things,

More information

January 26, 2009 The Faculty Center for Teaching and Learning

January 26, 2009 The Faculty Center for Teaching and Learning THE BASICS OF DATA MANAGEMENT AND ANALYSIS A USER GUIDE January 26, 2009 The Faculty Center for Teaching and Learning THE BASICS OF DATA MANAGEMENT AND ANALYSIS Table of Contents Table of Contents... i

More information

5. Multiple regression

5. Multiple regression 5. Multiple regression QBUS6840 Predictive Analytics https://www.otexts.org/fpp/5 QBUS6840 Predictive Analytics 5. Multiple regression 2/39 Outline Introduction to multiple linear regression Some useful

More information

Final Exam Practice Problem Answers

Final Exam Practice Problem Answers Final Exam Practice Problem Answers The following data set consists of data gathered from 77 popular breakfast cereals. The variables in the data set are as follows: Brand: The brand name of the cereal

More information

Chicago Booth BUSINESS STATISTICS 41000 Final Exam Fall 2011

Chicago Booth BUSINESS STATISTICS 41000 Final Exam Fall 2011 Chicago Booth BUSINESS STATISTICS 41000 Final Exam Fall 2011 Name: Section: I pledge my honor that I have not violated the Honor Code Signature: This exam has 34 pages. You have 3 hours to complete this

More information

Analysing Questionnaires using Minitab (for SPSS queries contact -) Graham.Currell@uwe.ac.uk

Analysing Questionnaires using Minitab (for SPSS queries contact -) Graham.Currell@uwe.ac.uk Analysing Questionnaires using Minitab (for SPSS queries contact -) Graham.Currell@uwe.ac.uk Structure As a starting point it is useful to consider a basic questionnaire as containing three main sections:

More information

Elements of statistics (MATH0487-1)

Elements of statistics (MATH0487-1) Elements of statistics (MATH0487-1) Prof. Dr. Dr. K. Van Steen University of Liège, Belgium December 10, 2012 Introduction to Statistics Basic Probability Revisited Sampling Exploratory Data Analysis -

More information

Introduction to Linear Regression

Introduction to Linear Regression 14. Regression A. Introduction to Simple Linear Regression B. Partitioning Sums of Squares C. Standard Error of the Estimate D. Inferential Statistics for b and r E. Influential Observations F. Regression

More information

Using R for Linear Regression

Using R for Linear Regression Using R for Linear Regression In the following handout words and symbols in bold are R functions and words and symbols in italics are entries supplied by the user; underlined words and symbols are optional

More information

We are often interested in the relationship between two variables. Do people with more years of full-time education earn higher salaries?

We are often interested in the relationship between two variables. Do people with more years of full-time education earn higher salaries? Statistics: Correlation Richard Buxton. 2008. 1 Introduction We are often interested in the relationship between two variables. Do people with more years of full-time education earn higher salaries? Do

More information

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. Module 7 Test Name MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. You are given information about a straight line. Use two points to graph the equation.

More information

Overview of Non-Parametric Statistics PRESENTER: ELAINE EISENBEISZ OWNER AND PRINCIPAL, OMEGA STATISTICS

Overview of Non-Parametric Statistics PRESENTER: ELAINE EISENBEISZ OWNER AND PRINCIPAL, OMEGA STATISTICS Overview of Non-Parametric Statistics PRESENTER: ELAINE EISENBEISZ OWNER AND PRINCIPAL, OMEGA STATISTICS About Omega Statistics Private practice consultancy based in Southern California, Medical and Clinical

More information

UNDERSTANDING THE DEPENDENT-SAMPLES t TEST

UNDERSTANDING THE DEPENDENT-SAMPLES t TEST UNDERSTANDING THE DEPENDENT-SAMPLES t TEST A dependent-samples t test (a.k.a. matched or paired-samples, matched-pairs, samples, or subjects, simple repeated-measures or within-groups, or correlated groups)

More information

. 58 58 60 62 64 66 68 70 72 74 76 78 Father s height (inches)

. 58 58 60 62 64 66 68 70 72 74 76 78 Father s height (inches) PEARSON S FATHER-SON DATA The following scatter diagram shows the heights of 1,0 fathers and their full-grown sons, in England, circa 1900 There is one dot for each father-son pair Heights of fathers and

More information

Elementary Statistics Sample Exam #3

Elementary Statistics Sample Exam #3 Elementary Statistics Sample Exam #3 Instructions. No books or telephones. Only the supplied calculators are allowed. The exam is worth 100 points. 1. A chi square goodness of fit test is considered to

More information

Statistics. Measurement. Scales of Measurement 7/18/2012

Statistics. Measurement. Scales of Measurement 7/18/2012 Statistics Measurement Measurement is defined as a set of rules for assigning numbers to represent objects, traits, attributes, or behaviors A variableis something that varies (eye color), a constant does

More information

Homework 11. Part 1. Name: Score: / null

Homework 11. Part 1. Name: Score: / null Name: Score: / Homework 11 Part 1 null 1 For which of the following correlations would the data points be clustered most closely around a straight line? A. r = 0.50 B. r = -0.80 C. r = 0.10 D. There is

More information

THE KRUSKAL WALLLIS TEST

THE KRUSKAL WALLLIS TEST THE KRUSKAL WALLLIS TEST TEODORA H. MEHOTCHEVA Wednesday, 23 rd April 08 THE KRUSKAL-WALLIS TEST: The non-parametric alternative to ANOVA: testing for difference between several independent groups 2 NON

More information

Pearson s Correlation Coefficient

Pearson s Correlation Coefficient Pearson s Correlation Coefficient In this lesson, we will find a quantitative measure to describe the strength of a linear relationship (instead of using the terms strong or weak). A quantitative measure

More information

Biostatistics: Types of Data Analysis

Biostatistics: Types of Data Analysis Biostatistics: Types of Data Analysis Theresa A Scott, MS Vanderbilt University Department of Biostatistics theresa.scott@vanderbilt.edu http://biostat.mc.vanderbilt.edu/theresascott Theresa A Scott, MS

More information

Two-sample inference: Continuous data

Two-sample inference: Continuous data Two-sample inference: Continuous data Patrick Breheny April 5 Patrick Breheny STA 580: Biostatistics I 1/32 Introduction Our next two lectures will deal with two-sample inference for continuous data As

More information

1. The parameters to be estimated in the simple linear regression model Y=α+βx+ε ε~n(0,σ) are: a) α, β, σ b) α, β, ε c) a, b, s d) ε, 0, σ

1. The parameters to be estimated in the simple linear regression model Y=α+βx+ε ε~n(0,σ) are: a) α, β, σ b) α, β, ε c) a, b, s d) ε, 0, σ STA 3024 Practice Problems Exam 2 NOTE: These are just Practice Problems. This is NOT meant to look just like the test, and it is NOT the only thing that you should study. Make sure you know all the material

More information

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm Mgt 540 Research Methods Data Analysis 1 Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm http://web.utk.edu/~dap/random/order/start.htm

More information

Notes on Applied Linear Regression

Notes on Applied Linear Regression Notes on Applied Linear Regression Jamie DeCoster Department of Social Psychology Free University Amsterdam Van der Boechorststraat 1 1081 BT Amsterdam The Netherlands phone: +31 (0)20 444-8935 email:

More information