Bivariate Analysis Variable 2 LEVELS >2 LEVELS COTIUOUS Correlation Used when you measure two continuous variables. Variable 2 2 LEVELS X 2 >2 LEVELS X 2 COTIUOUS t-test X 2 X 2 AOVA (F-test) t-test AOVA (F-test) -Correlation -Simple linear Regression Examples: Association between &. Association between & blood pressure Correlation Weight (Kg) Height (cm) 55 70 93 80 90 68 60 56 2 78 45 6 85 8 04 92 68 76 87 86 Heig ht 200 90 80 70 60 50 40 0 0 20 30 40 50 60 70 80 90 00 0 20 Weight Correlation is measured by Pearson's Correlation Coefficient. A measure of the linear association between two variables that have been measured on a continuous scale. Pearson's correlation coefficient is denoted by r. A correlation coefficient is a number ranges between - and +.
If r = perfect positive linear relationship between the two variables. If r = - perfect negative linear relationship between the two variables. If r = 0 o linear relationship between the two variables. r= + r= - r= 0 http://noppa5.pc.helsinki.fi/koe/corr/cor7.html -0.9 0.8 0.2-0.5
Moderate Moderate Example : Research question: Is there a linear relationship between the and of students? - -0.8-0.6-0.4-0.2 0 0.2 0.4 0.6 0.8 H o : there is no linear relationship between & of students in the population (p = 0) Strong Weak Strong H a : there is a linear relationship between & of students in the population (p 0) Statistical test: Pearson correlation coefficient (R) Example : SPSS Output Correlation is significant at the 0.0 level.65** 975 954.65** 954 97 r coefficient P-Value Example : SPSS Output.65** 975 954.65** 954 97 Correlation is significant at the 0.0 level Value of statistical test: 0.65 P-value: 0
Example : SPSS Output.65** 975 954.65** 954 97 Correlation is significant at the 0.0 level Conclusion: At significance level of 0.05, we reject null hypothesis and conclude that in the population there is significant linear relationship between the and of students. Example 2: SPSS Output Correlation is significant at the 0.0 level.55** 975 84.55** 84 846 Research question: Is there a linear relationship between the and of students? Example 2: SPSS Output Example 2: SPSS Output H o : H a : Correlation is significant at the 0.0 level.55** 975 84.55** 84 846 p = 0 ; o linear relationship between & in the population p 0 ; There is linear relationship between & in the population Value of statistical test: P-value: Correlation is significant at the 0.0 level 0.55** 975 84.55** 84 846 0.55
Example 2: SPSS Output Correlation is significant at the 0.0 level.55** 975 84.55** 84 846 Conclusion: At significance level of 0.05, we reject null hypothesis and conclude that in the population there is a significant linear relationship between the and of students. Example 3: SPSS Output Correlation is significant at the 0.0 level.084** 846 82.084** 82 97 Research question: Is there a linear relationship between the and of students? Example 3: SPSS Output Example 3: SPSS Output H o : H a : Correlation is significant at the 0.0 level.084** 846 82.084** 82 97 p = 0 ; o linear relationship between & in the population p 0 ; There is linear relationship between & in the population Value of statistical test: P-value: Correlation is significant at the 0.0 level 0.084** 846 82.084** 82 97 0.084
Example 3: SPSS Output Correlation is significant at the 0.0 level.084** 846 82.084** 82 97 Conclusion: At significance level of 0.05, we reject null hypothesis and conclude that in the population there is a significant linear relationship between the and of students. SPSS command for r Example Analyze Correlate Bivariate select and and put it in the variables box. T (True) or F (False): T (True) or F (False): In studying whether there is an association between gender and, the investigator found out that r= 0.90 and p-value<0.00 and concludes that there is a strong significant correlation between gender and. The correlation between obesity and number of cigarettes smoked was r=0.02 and the p-value= 0.856. Based on these results we conclude that there isn t any association between obesity and number of cigarette smoked.
Used to explain observed variation in the data For example, we measure blood pressure in a sample of patients and observe: I=Pt# 2 3 4 5 6 7 Y= BP 85 05 90 85 0 70 5 In order to explain why BP of individual patients are different, we try to associate the differences in PB with differences in other relevant patient characteristics (variables). Example: Can variation in blood pressure be explained by? Questions: ) What is the most appropriate mathematical to use? A straight line, parabola, etc 2) Given a specific model, how do we determine the best fitting model? Mathematical properties of a straight line Y= B 0 + B X Y = dependent variable X = independent variable B 0 = Y intercept B = Slope The intercept B 0 is the value of Y when X=0. The slope B is the amount of change in Y for each -unit change in X.
Estimation of a simple Linear Regression Optimal Regression line = B 0 + B X Y = B 0 + B X Example : Research Question: Does help to predict using a straight line model? Is there a linear relationship between and? Does explain a significant portion of the variation in the values of observed? Weight = B 0 + B Height SPSS output: Example Variables Entered/Removed b Variables Variables Entered Removed Method a. Enter a. All requested variables entered. b. Dependent Variable: Summary R R Square R Square the Estimate.65 a.424.423 0.878 a. Predictors: (Constant), SPSS output (Continued): Example Regression Residual Total a. Predictors: (Constant), b. Dependent Variable: (Constant) a. Dependent Variable: Unstandardized Coefficients AOVA b Sum of Squares df Mean Square F Sig. 69820.3 69820.297 435.30 a 230982.0 952 8.33 400802.3 953 Standardized Coefficients B Std. Error Beta t Sig. -95.246 4.226-22.539.940.025.65 37.883
SPSS output (Continued): Example Summary R R Square R Square the Estimate.65 a.424.423 0.878 a. Predictors: (Constant), SPSS output (Continued): Example Unstandardized Standardized Coefficients Coefficients B Std. Error Beta t Sig. (Constant) -95.246 4.226-22.539.940.025.65 37.883 a. Dependent Variable: 0.424 Height explains 42.4% of the variation seen in Weight = B 0 + B Height -95.246 0.940 Weight = -95.246 + 0.94 Height Increasing by unit ( cm) increases by 0.94 Kg (Constant) a. Dependent Variable: H 0 : B =0 H a : B 0 Unstandardized Coefficients Standardized Coefficients B Std. Error Beta t Sig. -95.246 4.226-22.539.940.025.65 37.883 Because the p-value of the B is < 0.05; then reject H 0 and conclude that provides significant information for predicting. Question : In a simple linear regression model the predicted straight line was as follows: Weight (Kg) = 3.5.32 (weekly hours of PA) R 2 = 0.22; p-value for the slope= 0.04 What is the dependent/ independent variable? Dependent variable: Weight Independent Variable: Weekly hours of PA
Question : Question : In a simple linear regression model the predicted straight line was as follows: Weight (Kg) = 3.5.32 (weekly hours of PA) R 2 = 0.22; p-value for the slope= 0.04 Interpret the value of R 2 umber of weekly hours of PA explain 22% of the variation observed in In a simple linear regression model the predicted straight line was as follows: Weight (Kg) = 3.5.32 (weekly hours of PA) R 2 = 0.22; p-value for the slope= 0.04 What is the null hypothesis? Alternative? H 0 : B weekly hours of PA =0 H a : B weekly hours of PA 0 Question : Question : In a simple linear regression model the predicted straight line was as follows: Weight (Kg) = 3.5.32 (weekly hours of PA) R 2 = 0.22; p-value for the slope= 0.04 In a simple linear regression model the predicted straight line was as follows: Weight (Kg) = 3.5.32 (weekly hours of PA) R 2 = 0.22; p-value for the slope= 0.04 Is the association between & weekly hours of PA positive or negative? egative What is the magnitude of this association?.32 => One hour increase of PA in a week decreases by.32 Kg.
Question : In a simple linear regression model the predicted straight line was as follows: Weight (Kg) = 3.5.32 (weekly hours of PA) R 2 = 0.22; p-value for the slope= 0.04 Is the association significant at a level of 0.05? Because the p-value of the B is < 0.05; then reject H 0 and conclude that weekly hours of PA provide significant information for predicting. Question 2: Summary R R Square R Square the Estimate.407 a.66.64 0.396 a. Predictors: (Constant), ISS - injury severity measure (Constant) ISS - injury severity measure Unstandardized Coefficients a. Dependent Variable: Length of hospital stay Standardized Coefficients B Std. Error Beta t Sig..443.747.593.554.66.066.407 9.945 Question 2: Summary R R Square R Squarethe Estimate.407 a.66.64 0.396 a. Predictors: (Constant), ISS - injury severity UnstandardizedStandardized Coefficients Coefficients Mode B Std. Error Beta t Sig. (Constant) ISS - injury.443.747.593.554.66 severity meas.066.407 9.945 a. Dependent Variable: Length of hospital stay Question 2: Summary R R Square R Squarethe Estimate.407 a.66.64 0.396 a. Predictors: (Constant), ISS - injury severity UnstandardizedStandardized Coefficients Coefficients Mode B Std. Error Beta t Sig. (Constant) ISS - injury.443.747.593.554.66 severity meas.066.407 9.945 a. Dependent Variable: Length of hospital stay What is the dependent/ independent variable? Dependent variable: Length of hospital stay Independent Variable: ISS- Injury severity score Interpret the value of R 2 ISS explains 40.7% of the variation observed in length of hospital stay.
Question 2: Summary R R Square R Squarethe Estimate.407 a.66.64 0.396 a. Predictors: (Constant), ISS - injury severity UnstandardizedStandardized Coefficients Coefficients Mode B Std. Error Beta t Sig. (Constant) ISS - injury.443.747.593.554.66 severity meas.066.407 9.945 a. Dependent Variable: Length of hospital stay Question 2: Summary R R Square R Squarethe Estimate.407 a.66.64 0.396 a. Predictors: (Constant), ISS - injury severity UnstandardizedStandardized Coefficients Coefficients Mode B Std. Error Beta t Sig. (Constant) ISS - injury.443.747.593.554.66 severity meas.066.407 9.945 a. Dependent Variable: Length of hospital stay What is the null hypothesis? Alternative? H 0 : B ISS =0 H a : B ISS 0 Is there a significant association between the dependent & the independent? Because the p-value of the B ISS is < 0.05; then reject H 0 and conclude that ISS provide significant information for predicting length of hospital stay. Question 2: Summary R R Square R Squarethe Estimate.407 a.66.64 0.396 a. Predictors: (Constant), ISS - injury severity What is the magnitude of this association? UnstandardizedStandardized Coefficients Coefficients Mode B Std. Error Beta t Sig. (Constant) ISS - injury.443.747.593.554.66 severity meas.066.407 9.945 a. Dependent Variable: Length of hospital stay 0.66 => Increasing ISS by unit increases length of hospital stay by 0.66 days. Biases Selection bias Information bias Confounding bias Bias is an error in an epidemiologic study that results in an incorrect estimation of the association between exposure and outcome.
Biases Confounding Bias: Definition Selection bias Information bias Confounding bias Is present when the association between an exposure and an outcome is distorted by an extraneous third variable (referred to a confounding variable). Confounding Bias: Example Example : Study the association between coffee drinking and lung cancer LC Yes o Yes 80 5 o 20 85 Coffee Confounding Bias: Minimize bias Research Design: Use of randomized clinical trial Restriction Data Analysis: Multivariate statistical techniques OR= (80x 85)/ (5 x 20)= 22 What would you conclude????
Variable 2 Bivariate Analysis 2 LEVELS X 2 Variable 2 LEVELS >2 LEVELS COTIUOUS >2 LEVELS X 2 COTIUOUS T-test X 2 X 2 AOVA (F-test) t-test AOVA (F-test) -Correlation -Simple linear Regression Multivariate analyses Logistic Regression (If outcome is 2 levels) Multiple Linear Regression (If outcome is continuous) Multivariate Analysis is used for adjusting for confounding variables. Multivariate Analysis Multivariate analyses WHY? To investigate the effect of more than one independent variable. Logistic Regression (If outcome is 2 levels) Multiple Linear Regression (If outcome is continuous) Predict the outcome using various independent variables. Adjust for confounding variables