Basic Statistics and Data Analysis for Health Researchers from Foreign Countries


 Emery Heath
 2 years ago
 Views:
Transcription
1 Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Volkert Siersma The Research Unit for General Practice in Copenhagen Dias 1
2 Content Quantifying association between continuous variables. In particular: Correlation (Simple) regression Dias 2
3 Example Newly diagnosed Type 2 Diabetes pt glucose bmi sex age A data set with 729 newly diagnosed Type 2 diabetes patients. pt: Patient ID glucose: Diagnostic plasma glucose (mmol/l) bmi: sex: age: Body Mass Index (kg/m2) sex (1=male, 0=female) age (years) Dias 3
4 Research question Do fat people have a more severe diabetes when the diabetes is discovered? Or in a more statistical language: Is diagnostic plasma glucose (positively) associated with the body mass index at the time of diagnosis? Dias 4
5 Scatterplot When investigating a potential association between only two variables (like diagnostic plasma glucose and BMI) a scatterplot is an important part of the analysis. It gives insight in the nature of the association. It shows problems in the data, e.g. outliers, strange or impossible values. Dias 5
6 Scatterplot Dias 6
7 Scatterplot There is no apparent tendency, specifically not one that would support our research question and if we have to point out a tendency, it would be that high BMI associates with lower diagnostic glucose (why is this not so strange if we think about the diagnosis of diabetes?). There seem to be some very large values, especially for diagnostic plasma glucose. These are valid measurements. Maybe a log transformation of glucose would make associations more apparent? Dias 7
8 Scatterplot R code plot(diabetes$bmi,diabetes$glucose, frame=true, main=null, xlab= BMI (kg/m2), ylab= Glucose (mmol/l), col= green, pch=19) Dias 8
9 Scatterplot log transformation Dias 9
10 Measures of association We want to capture the association between two variables in a single number: a correlation coefficient, a measure of association. Suppose that Y i is the diagnostic plasma glucose of patient i and X i the BMI for the same person. Then we want our measure of association to have the following characteristics: A positive association indicates that if X i is large (relative to the rest of the sample) then Y i is likely to be large as well. A negative association indicates that if X i is large then Y i is likely to be small. Dias 10
11 Measures of association between 1 and 1 0 : No association 1 : perfect positive association 1 : Perfect negative association Dias 11
12 Measures of association for the diabetes data r = ρ = τ = Dias 12
13 Measures of association for the diabetes data and log transformed r = ρ = τ = Only the first one changes! Dias 13
14 Pearson s correlation coefficient Pearson s correlation coefficient is computed from the data set (X i, Y i ), i = 1,,N as: X Y r = N i= 1 ( X where and are the respective means and SD x and SD y the respective standard deviations. i X )( Y ( N 1) SD SD x i Y ) y Dias 14
15 Characteristics of Pearson s correlation coefficient Pearson s correlation coefficient has the following properties: It measures the degree of linear association. It is invariant to linear change of scale for the variables. It is not robust to outliers. Coefficient values that are comparable between different data sets, and moreover a valid confidence interval and pvalue, require that both X i and Y i are normally distributed. Dias 15
16 Pearson s correlation coefficient R code > cor(diabetes$bmi,diabetes$glucose,use= complete.obs ) [1] Gives only the correlation coefficient. > cor.test(diabetes$bmi,diabetes$glucose) Pearson's productmoment correlation data: diabetes$bmi and diabetes$glucose t = , df = 723, pvalue = alternative hypothesis: true correlation is not equal to 0 95 percent confidence interval: sample estimates: cor Also performs a statistical test to see whether the coefficient is different from zero. Dias 16
17 Normally distributed? BMI Glucose A Normal distribution for comparison. Dias 17
18 Normally distributed? BMI Log(Glucose) Dias 18
19 Normally distributed? Dias 19
20 Normally distributed? Dias 20
21 R code A histogram of BMI: hist(diabetes$bmi,main= BMI,xlab= BMI (kg/m2),col= green ) A Normal QQ plot of BMI: qqnorm(diabetes$bmi,main= BMI,col= green ) qqline(diabetes$bmi,col= red ) And how do we get all these works of art in some decent format? jpeg(file= D:\mydirectory\mypicture.jpg,width=500,height=500) # # put here the code that generates the picture # dev.off() Dias 21
22 Rank correlation Spearman s ρ If data does not appear to be Normally distributed, or when there are outliers, one may instead compute the correlation between the ranks of the X i values and the ranks of the Y i values. This gives a nonparametric correlation coefficient called Spearman s ρ. It measures monotone association. It is invariant to monotone transformations (like a log transformation). It is robust to outliers. It has an odd interpretation. Dias 22
23 Spearman s rank correlation coefficient R code > cor.test(diabetes$bmi,diabetes$glucose,method= spearman ) Spearman's rank correlation rho data: diabetes$bmi and diabetes$glucose S = , pvalue = alternative hypothesis: true rho is not equal to 0 sample estimates: rho Warning message: In cor.test.default(diabetes$bmi, diabetes$glucose, method = "spearman") : Cannot compute exact pvalues with ties Dias 23
24 Rank correlation Kendall s τ A measure of monotone association with a more intuitive interpretation than Spearman s ρ is Kendall s τ. The observations from a pair of subjects i, j are and concordant if X i < X j and Y i < Y j or X i > X j and Y i > Y j discordant if X i < X j and Y i > Y j or X i > X j and Y i < Y j Kendall s τ is the difference between the probability for a concordant pair and the probability for a discordant pair. There are various versions of Kendall s τ depending on how ties are treated. Dias 24
25 Characteristics of Kendall s tau It measures monotone association. It is invariant to monotone transformations (like a log transformation). It is robust to outliers. It has a more straightforward interpretation than Spearman s rho. Dias 25
26 Kendall s rank correlation coefficient R code > cor.test(diabetes$bmi,diabetes$glucose,method= kendall ) Kendall's rank correlation tau data: diabetes$bmi and diabetes$glucose z = , pvalue = alternative hypothesis: true tau is not equal to 0 sample estimates: tau Dias 26
27 Correlation in the diabetes data r = (p = 0.110) ρ = (p = 0.180) τ = (p = 0.169) Dias 27
28 Correlation in the diabetes data and log transformed r = (p = 0.154) ρ = (p = 0.180) τ = (p = 0.169) Dias 28
29 Limitations of correlation coefficients While it is (relatively) clear what a correlation coefficient of 0 means, and also 1 or 1, it is often unclear what a highly significant correlation of, say, 0.5 means Correlation rarely answers the research question to a sufficient extend; because it is not easily interpretable. Coefficients of correlation depend on the sample selection and therefore we cannot compare values of the coefficients found in different data. Dias 29
30 Dias 30 Department of Biostatistics
31 Regression analysis An (intuitively interpretable) way to describe a (linear) association between two continuous type variables. It models a response Y (the dependent variable, the exogenous variable, the output) as a function of a predictor X (the independent variable, the exogenous variable, the explanatory variable, the covariate) and a term representing random other influences (error, noise). Dias 31
32 Regression model formulation We say: To regress Y on X or: To regress glucose on BMI Mathematically: Y i = α + βx i + ε i Where ε i are independently Normal distributed noise terms with mean 0 and standard deviation σ. Dias 32
33 Regression model The mean of Y is modelled with a linear function of X; a line in the XY plane. For each X, Y is a random variable Normally distributed around the modelled mean of Y, with standard deviation σ Dias 33
34 Scatterplot with regression line Dias 34
35 Interpretation of the parameters We have variation due to a systematic part, the explanatory variable, and a random part, the noise. The systematic part of the model is defined by the regression line. α = the intercept: mean level for Y i when X i = 0 β = the slope: mean increase for Y i when X i is increased 1 unit. Dias 35
36 Research question Do fat people have a more severe diabetes when the diabetes is discovered? Or in a more statistical language: Is diagnostic plasma glucose (positively) associated with the body mass index at the time of diagnosis? In a (simple) linear regression analysis, is the slope β different from 0 (or more pertinently, larger than 0)? Dias 36
37 How does the model answer the research question? Interest may focus on making a simple hypothesis about the two parameters: Null hypothesis : β = 0 Null hypothesis : α = 0 The second hypothesis often has no (clinical) meaning. Dias 37
38 Linear regression R code > mymodel < lm(diabetes$glucose~diabetes$bmi) > summary(mymodel) Call: lm(formula = diabetes$glucose ~ diabetes$bmi) Residuals: Min 1Q Median 3Q Max Estimate of the slope Pvalue of the test for the null hypothesis β = 0. Coefficients: Estimate Std. Error t value Pr(> t ) Table with (Intercept) <2e16 *** parameter diabetes$bmi estimates  Signif. codes: 0 *** ** 0.01 * Residual standard error: on 723 degrees of freedom (4 observations deleted due to missingness) Multiple Rsquared: , Adjusted Rsquared: Fstatistic: on 1 and 723 DF, pvalue: Dias 38
39 Plot of regression line R code The lm() function can be used to plot the regression line in the scatterplot: > plot(diabetes$bmi,diabetes$glucose) > mymodel < lm(diabetes$glucose~diabetes$bmi) > abline(mymodel) Dias 39
40 Scatterplot with regression line log transformed glucose Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) <2e16 *** diabetes$bmi Dias 40
41 How are the parameters estimated? The estimated parameters of the linear model define the line (found among all possible lines) which minimizes the squared distance between the datapoints and the line in the scatterplot. The estimation method is called ordinary leastsquares (maximum likelihood gives the same answer). Dias 41
42 Least squares fit Dias 42
43 Does the model fit the data? Dias 43
44 Diagnostic plots Dias 44
45 Diagnostic plots R produces some diagnostic plots (of varying usefulness). The residuals (the error or noise) was supposed to be Normal distributed, this can be studied in the QQ plot (top right) More importantly, the residuals should have a single standard deviation, i.e. the variance should not increase with, for example, BMI. This can be studied in the residuals vs. fitted plot (top left) > mymodel < lm(diabetes$glucose~diabetes$bmi) > opar < par(mfrow = c(2,2), oma = c(0,0,1.1,0)) > plot(mymodel) > par(opar) Dias 45
46 Data transformations If the residuals are not Normal, or (and this is more serious because the central limit theorem deals with much of the non Normality issue) if variance seems to increase with level, it may be a good idea to transform one or both variables. This is the real reason to investigate log(glucose) instead of glucose. Dias 46
47 Data transformations log transform Dias 47
48 The influence of one outlier Dias 48
49 Simpson s paradox Florida death penalty verdicts for homicide relative to defendant s race White Black 11% (53/430) 8% (15/176) Dias 49
50 Simpson s paradox Victim white Victim black White Black 11% 23% (53/414) (11/37) 0% 3% (0/16) (4/139) Blacks tend to murder blacks and whites tend to murder whites and the murder of a white person has a higher probability of death penalty. For any victim the probability for a black person to get death penalty is about 2 times higher. Dias 50
51 Confounding Victim s race We are interested in the green highlighted association, but there is a correlation with the victim s race both with the defendant s race and the outcome of the trial. Defendant s race Death penalty Dias 51
52 Confounding A confounder influences both exposure and outcome Confounder When confounding is present we cannot interpret the green highlighted association as causal Exposure Outcome Dias 52
53 Randomization Exposure randomised Confounder Outcome Often there are many factors that may influence both exposure and outcome, some of them may not be observed or are unknown. If exposure is randomised, then there is no confounding. The green highlighted association can be interpreted causal. Dias 53
54 Two regressions The blue points denote patients with SBP>140 mmhg; the blue line the corresponding regression line. The red points denote patients with SBP < 140 mmhg; the red line the corresponding regression line. The black line is the general regression line. The slopes from the stratified analyses are less steep than the slope of the general line. Dias 54
55 Multiple regression > mymodel < lm(log(diabetes$glucose)~diabetes$bmi+diabetes$sbp) > summary(mymodel) Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) <2e16 *** diabetes$bmi diabetes$sbp *  Signif. codes: 0 *** ** 0.01 * The adjusted slope (association) of bmi is less pronounced than before. SBP is related to both glucose and bmi and is a confounder. Dias 55
56 Multiple regression Adjusting a statistical analysis means to include other predictor variables into the model formula. Intuitively, a slope for BMI is determined for each level of the SBP variable separately and these are then averaged. including SBP in the analysis removes the confounding effect of SBP from the relationship between log(glucose) and BMI. Dias 56
57 Take home message Association between two continuous variables may be measured by correlation coefficients or in (simple) linear regression analysis. The latter provides arguably the best interpretable results. Moreover, it is straightforwardly extended to be able to deal with confounding, and more Dias 57
5. Linear Regression
5. Linear Regression Outline.................................................................... 2 Simple linear regression 3 Linear model............................................................. 4
More informationMultiple Linear Regression
Multiple Linear Regression A regression with two or more explanatory variables is called a multiple regression. Rather than modeling the mean response as a straight line, as in simple regression, it is
More informationEDUCATION AND VOCABULARY MULTIPLE REGRESSION IN ACTION
EDUCATION AND VOCABULARY MULTIPLE REGRESSION IN ACTION EDUCATION AND VOCABULARY 510 hours of input weekly is enough to pick up a new language (Schiff & Myers, 1988). Dutch children spend 5.5 hours/day
More informationStatistical Models in R
Statistical Models in R Some Examples Steven Buechler Department of Mathematics 276B Hurley Hall; 16233 Fall, 2007 Outline Statistical Models Linear Models in R Regression Regression analysis is the appropriate
More informationCorrelation and Simple Linear Regression
Correlation and Simple Linear Regression We are often interested in studying the relationship among variables to determine whether they are associated with one another. When we think that changes in a
More informationWeek 5: Multiple Linear Regression
BUS41100 Applied Regression Analysis Week 5: Multiple Linear Regression Parameter estimation and inference, forecasting, diagnostics, dummy variables Robert B. Gramacy The University of Chicago Booth School
More information11. Analysis of Casecontrol Studies Logistic Regression
Research methods II 113 11. Analysis of Casecontrol Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:
More informationChapter 7: Simple linear regression Learning Objectives
Chapter 7: Simple linear regression Learning Objectives Reading: Section 7.1 of OpenIntro Statistics Video: Correlation vs. causation, YouTube (2:19) Video: Intro to Linear Regression, YouTube (5:18) 
More informationMultivariate Logistic Regression
1 Multivariate Logistic Regression As in univariate logistic regression, let π(x) represent the probability of an event that depends on p covariates or independent variables. Then, using an inv.logit formulation
More information, then the form of the model is given by: which comprises a deterministic component involving the three regression coefficients (
Multiple regression Introduction Multiple regression is a logical extension of the principles of simple linear regression to situations in which there are several predictor variables. For instance if we
More informationYiming Peng, Department of Statistics. February 12, 2013
Regression Analysis Using JMP Yiming Peng, Department of Statistics February 12, 2013 2 Presentation and Data http://www.lisa.stat.vt.edu Short Courses Regression Analysis Using JMP Download Data to Desktop
More informationPart 2: Analysis of Relationship Between Two Variables
Part 2: Analysis of Relationship Between Two Variables Linear Regression Linear correlation Significance Tests Multiple regression Linear Regression Y = a X + b Dependent Variable Independent Variable
More information1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96
1 Final Review 2 Review 2.1 CI 1propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years
More informationWe extended the additive model in two variables to the interaction model by adding a third term to the equation.
Quadratic Models We extended the additive model in two variables to the interaction model by adding a third term to the equation. Similarly, we can extend the linear model in one variable to the quadratic
More informationUsing R for Linear Regression
Using R for Linear Regression In the following handout words and symbols in bold are R functions and words and symbols in italics are entries supplied by the user; underlined words and symbols are optional
More informationRegression in ANOVA. James H. Steiger. Department of Psychology and Human Development Vanderbilt University
Regression in ANOVA James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) 1 / 30 Regression in ANOVA 1 Introduction 2 Basic Linear
More informationQuantitative Understanding in Biology Module II: Model Parameter Estimation Lecture I: Linear Correlation and Regression
Quantitative Understanding in Biology Module II: Model Parameter Estimation Lecture I: Linear Correlation and Regression Correlation Linear correlation and linear regression are often confused, mostly
More informationIntroduction to Regression and Data Analysis
Statlab Workshop Introduction to Regression and Data Analysis with Dan Campbell and Sherlock Campbell October 28, 2008 I. The basics A. Types of variables Your variables may take several forms, and it
More informationChapter 13 Introduction to Linear Regression and Correlation Analysis
Chapter 3 Student Lecture Notes 3 Chapter 3 Introduction to Linear Regression and Correlation Analsis Fall 2006 Fundamentals of Business Statistics Chapter Goals To understand the methods for displaing
More informationSection 14 Simple Linear Regression: Introduction to Least Squares Regression
Slide 1 Section 14 Simple Linear Regression: Introduction to Least Squares Regression There are several different measures of statistical association used for understanding the quantitative relationship
More information2. Simple Linear Regression
Research methods  II 3 2. Simple Linear Regression Simple linear regression is a technique in parametric statistics that is commonly used for analyzing mean response of a variable Y which changes according
More informationChapter 5: Linear regression
Chapter 5: Linear regression Last lecture: Ch 4............................................................ 2 Next: Ch 5................................................................. 3 Simple linear
More informationTesting for Lack of Fit
Chapter 6 Testing for Lack of Fit How can we tell if a model fits the data? If the model is correct then ˆσ 2 should be an unbiased estimate of σ 2. If we have a model which is not complex enough to fit
More information7. Tests of association and Linear Regression
7. Tests of association and Linear Regression In this chapter we consider 1. Tests of Association for 2 qualitative variables. 2. Measures of the strength of linear association between 2 quantitative variables.
More informationSimple linear regression
Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between
More information17. SIMPLE LINEAR REGRESSION II
17. SIMPLE LINEAR REGRESSION II The Model In linear regression analysis, we assume that the relationship between X and Y is linear. This does not mean, however, that Y can be perfectly predicted from X.
More informationDATA INTERPRETATION AND STATISTICS
PholC60 September 001 DATA INTERPRETATION AND STATISTICS Books A easy and systematic introductory text is Essentials of Medical Statistics by Betty Kirkwood, published by Blackwell at about 14. DESCRIPTIVE
More informationE(y i ) = x T i β. yield of the refined product as a percentage of crude specific gravity vapour pressure ASTM 10% point ASTM end point in degrees F
Random and Mixed Effects Models (Ch. 10) Random effects models are very useful when the observations are sampled in a highly structured way. The basic idea is that the error associated with any linear,
More informationRegression. Name: Class: Date: Multiple Choice Identify the choice that best completes the statement or answers the question.
Class: Date: Regression Multiple Choice Identify the choice that best completes the statement or answers the question. 1. Given the least squares regression line y8 = 5 2x: a. the relationship between
More informationUnivariate Regression
Univariate Regression Correlation and Regression The regression line summarizes the linear relationship between 2 variables Correlation coefficient, r, measures strength of relationship: the closer r is
More informationOutline. Topic 4  Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares
Topic 4  Analysis of Variance Approach to Regression Outline Partitioning sums of squares Degrees of freedom Expected mean squares General linear test  Fall 2013 R 2 and the coefficient of correlation
More information1. The parameters to be estimated in the simple linear regression model Y=α+βx+ε ε~n(0,σ) are: a) α, β, σ b) α, β, ε c) a, b, s d) ε, 0, σ
STA 3024 Practice Problems Exam 2 NOTE: These are just Practice Problems. This is NOT meant to look just like the test, and it is NOT the only thing that you should study. Make sure you know all the material
More informationStatistiek II. John Nerbonne. March 24, 2010. Information Science, Groningen Slides improved a lot by Harmut Fitz, Groningen!
Information Science, Groningen j.nerbonne@rug.nl Slides improved a lot by Harmut Fitz, Groningen! March 24, 2010 Correlation and regression We often wish to compare two different variables Examples: compare
More informationNCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )
Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates
More informationAMS7: WEEK 8. CLASS 1. Correlation Monday May 18th, 2015
AMS7: WEEK 8. CLASS 1 Correlation Monday May 18th, 2015 Type of Data and objectives of the analysis Paired sample data (Bivariate data) Determine whether there is an association between two variables This
More informationFinal Exam Practice Problem Answers
Final Exam Practice Problem Answers The following data set consists of data gathered from 77 popular breakfast cereals. The variables in the data set are as follows: Brand: The brand name of the cereal
More informationPart II. Multiple Linear Regression
Part II Multiple Linear Regression 86 Chapter 7 Multiple Regression A multiple linear regression model is a linear model that describes how a yvariable relates to two or more xvariables (or transformations
More informationUnit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression
Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Objectives: To perform a hypothesis test concerning the slope of a least squares line To recognize that testing for a
More informationX X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)
CORRELATION AND REGRESSION / 47 CHAPTER EIGHT CORRELATION AND REGRESSION Correlation and regression are statistical methods that are commonly used in the medical literature to compare two or more variables.
More informationInterpretation of Somers D under four simple models
Interpretation of Somers D under four simple models Roger B. Newson 03 September, 04 Introduction Somers D is an ordinal measure of association introduced by Somers (96)[9]. It can be defined in terms
More informationModule 5 Hypotheses Tests: Comparing Two Groups
Module 5 Hypotheses Tests: Comparing Two Groups Objective: In medical research, we often compare the outcomes between two groups of patients, namely exposed and unexposed groups. At the completion of this
More informationExercise Page 1 of 32
Exercise 10.1 (a) Plot wages versus LOS. Describe the relationship. There is one woman with relatively high wages for her length of service. Circle this point and do not use it in the rest of this exercise.
More informationDEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9
DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9 Analysis of covariance and multiple regression So far in this course,
More informationInferential Statistics
Inferential Statistics Sampling and the normal distribution Zscores Confidence levels and intervals Hypothesis testing Commonly used statistical methods Inferential Statistics Descriptive statistics are
More informationAdditional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jintselink/tselink.htm
Mgt 540 Research Methods Data Analysis 1 Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jintselink/tselink.htm http://web.utk.edu/~dap/random/order/start.htm
More informationwhere b is the slope of the line and a is the intercept i.e. where the line cuts the y axis.
Least Squares Introduction We have mentioned that one should not always conclude that because two variables are correlated that one variable is causing the other to behave a certain way. However, sometimes
More informationSCHOOL OF MATHEMATICS AND STATISTICS
RESTRICTED OPEN BOOK EXAMINATION (Not to be removed from the examination hall) Data provided: Statistics Tables by H.R. Neave MAS5052 SCHOOL OF MATHEMATICS AND STATISTICS Basic Statistics Spring Semester
More informationComparing Nested Models
Comparing Nested Models ST 430/514 Two models are nested if one model contains all the terms of the other, and at least one additional term. The larger model is the complete (or full) model, and the smaller
More informationCategorical Data Analysis
Richard L. Scheaffer University of Florida The reference material and many examples for this section are based on Chapter 8, Analyzing Association Between Categorical Variables, from Statistical Methods
More informationSpearman s correlation
Spearman s correlation Introduction Before learning about Spearman s correllation it is important to understand Pearson s correlation which is a statistical measure of the strength of a linear relationship
More informationDeterministic and Stochastic Modeling of Insulin Sensitivity
Deterministic and Stochastic Modeling of Insulin Sensitivity Master s Thesis in Engineering Mathematics and Computational Science ELÍN ÖSP VILHJÁLMSDÓTTIR Department of Mathematical Science Chalmers University
More informationBivariate Analysis. Correlation. Correlation. Pearson's Correlation Coefficient. Variable 1. Variable 2
Bivariate Analysis Variable 2 LEVELS >2 LEVELS COTIUOUS Correlation Used when you measure two continuous variables. Variable 2 2 LEVELS X 2 >2 LEVELS X 2 COTIUOUS ttest X 2 X 2 AOVA (Ftest) ttest AOVA
More informationSimple Linear Regression Inference
Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation
More informationStat 412/512 CASE INFLUENCE STATISTICS. Charlotte Wickham. stat512.cwick.co.nz. Feb 2 2015
Stat 412/512 CASE INFLUENCE STATISTICS Feb 2 2015 Charlotte Wickham stat512.cwick.co.nz Regression in your field See website. You may complete this assignment in pairs. Find a journal article in your field
More informationEconometrics Simple Linear Regression
Econometrics Simple Linear Regression Burcu Eke UC3M Linear equations with one variable Recall what a linear equation is: y = b 0 + b 1 x is a linear equation with one variable, or equivalently, a straight
More informationDescriptive Statistics
Descriptive Statistics Primer Descriptive statistics Central tendency Variation Relative position Relationships Calculating descriptive statistics Descriptive Statistics Purpose to describe or summarize
More informationPerform hypothesis testing
Multivariate hypothesis tests for fixed effects Testing homogeneity of level1 variances In the following sections, we use the model displayed in the figure below to illustrate the hypothesis tests. Partial
More informationExample: Boats and Manatees
Figure 96 Example: Boats and Manatees Slide 1 Given the sample data in Table 91, find the value of the linear correlation coefficient r, then refer to Table A6 to determine whether there is a significant
More informationSection 3 Part 1. Relationships between two numerical variables
Section 3 Part 1 Relationships between two numerical variables 1 Relationship between two variables The summary statistics covered in the previous lessons are appropriate for describing a single variable.
More informationExercise 1.12 (Pg. 2223)
Individuals: The objects that are described by a set of data. They may be people, animals, things, etc. (Also referred to as Cases or Records) Variables: The characteristics recorded about each individual.
More informationExchange Rate Regime Analysis for the Chinese Yuan
Exchange Rate Regime Analysis for the Chinese Yuan Achim Zeileis Ajay Shah Ila Patnaik Abstract We investigate the Chinese exchange rate regime after China gave up on a fixed exchange rate to the US dollar
More informationLogistic Regression (a type of Generalized Linear Model)
Logistic Regression (a type of Generalized Linear Model) 1/36 Today Review of GLMs Logistic Regression 2/36 How do we find patterns in data? We begin with a model of how the world works We use our knowledge
More informationPaired Differences and Regression
Paired Differences and Regression Students sometimes have difficulty distinguishing between paired data and independent samples when comparing two means. One can return to this topic after covering simple
More informationPITFALLS IN TIME SERIES ANALYSIS. Cliff Hurvich Stern School, NYU
PITFALLS IN TIME SERIES ANALYSIS Cliff Hurvich Stern School, NYU The t Test If x 1,..., x n are independent and identically distributed with mean 0, and n is not too small, then t = x 0 s n has a standard
More informationStatistical Models in R
Statistical Models in R Some Examples Steven Buechler Department of Mathematics 276B Hurley Hall; 16233 Fall, 2007 Outline Statistical Models Structure of models in R Model Assessment (Part IA) Anova
More informationHandling missing data in Stata a whirlwind tour
Handling missing data in Stata a whirlwind tour 2012 Italian Stata Users Group Meeting Jonathan Bartlett www.missingdata.org.uk 20th September 2012 1/55 Outline The problem of missing data and a principled
More informationE205 Final: Version B
Name: Class: Date: E205 Final: Version B Multiple Choice Identify the choice that best completes the statement or answers the question. 1. The owner of a local nightclub has recently surveyed a random
More informationRegression III: Advanced Methods
Lecture 5: Linear leastsquares Regression III: Advanced Methods William G. Jacoby Department of Political Science Michigan State University http://polisci.msu.edu/jacoby/icpsr/regress3 Simple Linear Regression
More informationRegression in SPSS. Workshop offered by the Mississippi Center for Supercomputing Research and the UM Office of Information Technology
Regression in SPSS Workshop offered by the Mississippi Center for Supercomputing Research and the UM Office of Information Technology John P. Bentley Department of Pharmacy Administration University of
More informatione = random error, assumed to be normally distributed with mean 0 and standard deviation σ
1 Linear Regression 1.1 Simple Linear Regression Model The linear regression model is applied if we want to model a numeric response variable and its dependency on at least one numeric factor variable.
More informationWhat is correlational research?
Key Ideas Purpose and use of correlational designs How correlational research developed Types of correlational designs Key characteristics of correlational designs Procedures used in correlational studies
More informationTechnology StepbyStep Using StatCrunch
Technology StepbyStep Using StatCrunch Section 1.3 Simple Random Sampling 1. Select Data, highlight Simulate Data, then highlight Discrete Uniform. 2. Fill in the following window with the appropriate
More informationCHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression
Opening Example CHAPTER 13 SIMPLE LINEAR REGREION SIMPLE LINEAR REGREION! Simple Regression! Linear Regression Simple Regression Definition A regression model is a mathematical equation that descries the
More informationSimple Linear Regression
Chapter Nine Simple Linear Regression Consider the following three scenarios: 1. The CEO of the local Tourism Authority would like to know whether a family s annual expenditure on recreation is related
More informationEpidemiologyBiostatistics Exam Exam 2, 2001 PRINT YOUR LEGAL NAME:
EpidemiologyBiostatistics Exam Exam 2, 2001 PRINT YOUR LEGAL NAME: Instructions: This exam is 30% of your course grade. The maximum number of points for the course is 1,000; hence this exam is worth 300
More informationSPSS Guide: Regression Analysis
SPSS Guide: Regression Analysis I put this together to give you a stepbystep guide for replicating what we did in the computer lab. It should help you run the tests we covered. The best way to get familiar
More informationLeast Squares Estimation
Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN13: 9780470860809 ISBN10: 0470860804 Editors Brian S Everitt & David
More informationStat 411/511 ANOVA & REGRESSION. Charlotte Wickham. stat511.cwick.co.nz. Nov 31st 2015
Stat 411/511 ANOVA & REGRESSION Nov 31st 2015 Charlotte Wickham stat511.cwick.co.nz This week Today: Lack of fit Ftest Weds: Review email me topics, otherwise I ll go over some of last year s final exam
More informationPsychology 205: Research Methods in Psychology
Psychology 205: Research Methods in Psychology Using R to analyze the data for study 2 Department of Psychology Northwestern University Evanston, Illinois USA November, 2012 1 / 38 Outline 1 Getting ready
More informationSimple Predictive Analytics Curtis Seare
Using Excel to Solve Business Problems: Simple Predictive Analytics Curtis Seare Copyright: Vault Analytics July 2010 Contents Section I: Background Information Why use Predictive Analytics? How to use
More informationRegression Analysis: A Complete Example
Regression Analysis: A Complete Example This section works out an example that includes all the topics we have discussed so far in this chapter. A complete example of regression analysis. PhotoDisc, Inc./Getty
More informationChicago Booth BUSINESS STATISTICS 41000 Final Exam Fall 2011
Chicago Booth BUSINESS STATISTICS 41000 Final Exam Fall 2011 Name: Section: I pledge my honor that I have not violated the Honor Code Signature: This exam has 34 pages. You have 3 hours to complete this
More informationUsing SPSS for Multiple Regression. UDP 520 Lab 7 Lin Lin December 4 th, 2007
Using SPSS for Multiple Regression UDP 520 Lab 7 Lin Lin December 4 th, 2007 Step 1 Define Research Question What factors are associated with BMI? Predict BMI. Step 2 Conceptualizing Problem (Theory) Individual
More informationThe importance of graphing the data: Anscombe s regression examples
The importance of graphing the data: Anscombe s regression examples Bruce Weaver Northern Health Research Conference Nipissing University, North Bay May 3031, 2008 B. Weaver, NHRC 2008 1 The Objective
More informationCORRELATION AND SIMPLE REGRESSION ANALYSIS USING SAS IN DAIRY SCIENCE
CORRELATION AND SIMPLE REGRESSION ANALYSIS USING SAS IN DAIRY SCIENCE A. K. Gupta, Vipul Sharma and M. Manoj NDRI, Karnal132001 When analyzing farm records, simple descriptive statistics can reveal a
More informationKSTAT MINIMANUAL. Decision Sciences 434 Kellogg Graduate School of Management
KSTAT MINIMANUAL Decision Sciences 434 Kellogg Graduate School of Management Kstat is a set of macros added to Excel and it will enable you to do the statistics required for this course very easily. To
More informationMTH 140 Statistics Videos
MTH 140 Statistics Videos Chapter 1 Picturing Distributions with Graphs Individuals and Variables Categorical Variables: Pie Charts and Bar Graphs Categorical Variables: Pie Charts and Bar Graphs Quantitative
More informationPrinciples of Hypothesis Testing for Public Health
Principles of Hypothesis Testing for Public Health Laura Lee Johnson, Ph.D. Statistician National Center for Complementary and Alternative Medicine johnslau@mail.nih.gov Fall 2011 Answers to Questions
More informationThe scatterplot indicates a positive linear relationship between waist size and body fat percentage:
STAT E150 Statistical Methods Multiple Regression Three percent of a man's body is essential fat, which is necessary for a healthy body. However, too much body fat can be dangerous. For men between the
More informationStatistics for Sports Medicine
Statistics for Sports Medicine Suzanne Hecht, MD University of Minnesota (suzanne.hecht@gmail.com) Fellow s Research Conference July 2012: Philadelphia GOALS Try not to bore you to death!! Try to teach
More informationMULTIPLE REGRESSION EXAMPLE
MULTIPLE REGRESSION EXAMPLE For a sample of n = 166 college students, the following variables were measured: Y = height X 1 = mother s height ( momheight ) X 2 = father s height ( dadheight ) X 3 = 1 if
More informationPearson s correlation
Pearson s correlation Introduction Often several quantitative variables are measured on each member of a sample. If we consider a pair of such variables, it is frequently of interest to establish if there
More informationSTAT 350 Practice Final Exam Solution (Spring 2015)
PART 1: Multiple Choice Questions: 1) A study was conducted to compare five different training programs for improving endurance. Forty subjects were randomly divided into five groups of eight subjects
More informationSimple Linear Regression Chapter 11
Simple Linear Regression Chapter 11 Rationale Frequently decisionmaking situations require modeling of relationships among business variables. For instance, the amount of sale of a product may be related
More informationSimple Linear Regression
Inference for Regression Simple Linear Regression IPS Chapter 10.1 2009 W.H. Freeman and Company Objectives (IPS Chapter 10.1) Simple linear regression Statistical model for linear regression Estimating
More informationOrdinal Regression. Chapter
Ordinal Regression Chapter 4 Many variables of interest are ordinal. That is, you can rank the values, but the real distance between categories is unknown. Diseases are graded on scales from least severe
More informationHow to choose a statistical test. Francisco J. Candido dos Reis DGOFMRP University of São Paulo
How to choose a statistical test Francisco J. Candido dos Reis DGOFMRP University of São Paulo Choosing the right test One of the most common queries in stats support is Which analysis should I use There
More informationSydney Roberts Predicting Age Group Swimmers 50 Freestyle Time 1. 1. Introduction p. 2. 2. Statistical Methods Used p. 5. 3. 10 and under Males p.
Sydney Roberts Predicting Age Group Swimmers 50 Freestyle Time 1 Table of Contents 1. Introduction p. 2 2. Statistical Methods Used p. 5 3. 10 and under Males p. 8 4. 11 and up Males p. 10 5. 10 and under
More informationLets suppose we rolled a sixsided die 150 times and recorded the number of times each outcome (16) occured. The data is
In this lab we will look at how R can eliminate most of the annoying calculations involved in (a) using ChiSquared tests to check for homogeneity in twoway tables of catagorical data and (b) computing
More informationUsing Excel for inferential statistics
FACT SHEET Using Excel for inferential statistics Introduction When you collect data, you expect a certain amount of variation, just caused by chance. A wide variety of statistical tests can be applied
More information