Linear Mixed Models PGRM 15. Outline. Linear regression. Correlated measurements (eg repeated)
|
|
- Jessica Monica Jacobs
- 7 years ago
- Views:
Transcription
1 Lear ixed odels PGR 15 Outle Lear regression Correlated measurements (eg repeated) Random effects leadg to different components of variance & correlated measurements Different Correlation Structures Simple Analysis of Clustered Data Split Plot Analysis Repeated easures Analysis 1
2 Relationship between variables Yield and fertilizer level Temperature and bacterial growth Light and respiration rate Age and alcohol consumption Can the relationship be represented by a simple curve? Is there necessarily a causal relationship? Simple lear regression 8 6 Y X 2
3 Simple lear regression model Data: (x i, y i ) Probabilistic model: yi = 0 1 i β + β x + ε Random term with variance σ 2 Population Characteristic β 0 β 1 Sample Statistic $ β 0 $ β 1 To predict from the model: yˆ i =β ˆ ˆ 0+ β1x i σ 2 s 2 Simple lear regression Estimated equation of the le: ^ y = x 8 6 Y 4 2 Intercept = β 0 = 2 ^ Slope = β 1 = 0.75 ^ X 3
4 Simple lear regression Estimated equation of the le: ^ y = x 8 6 Y 4 2 Variance = s 2 = X Assumptions for a Simple Lear Regression model Note: If you are fittg a simple lear regression model to your own data, there are assumptions that must be satisfied. d details of how to test the assumptions for your fitted model any basic statistics text book. 4
5 Regression example Distance from fire ire damage station x (miles) y ($1000) Damage caused by fire ($1000) vs Distance from fire station (miles) 50 damage ($1000) Distance from fire station (miles) 5
6 Damage caused by fire ($1000) vs Distance from fire station (miles) damage ($1000) y ^ = x Distance from fire station (miles) Damage caused by fire ($1000) vs Distance from fire station (miles) damage ($1000) Distance from fire station (miles) 6
7 Correlated easurements Two values are correlated if knowledge of one provides formation about the other eg if knowg one value is high (relative to the appropriate mean) means the other is also likely to be high (+ve correlation), or low (-ve correlation) Correlation can be visualized by trends plots Independent values are uncorrelated Positive correlation often arises between values resultg from shared fluences When modellg, shared fluences must be taken to account (eg by cludg random effects, or specifyg a correlation structure) Pearson s correlation coefficient ρ (rho) easures the strength and direction (+ve or ve) of lear relationship between two variables measured on the same experimental unit Does not depend on the units of measurement Lies between 1 and +1 In fittg a regression le y = a + bx the estimate of ρ, r, gives: 100r 2 = % of variation y explaed by the lear regression Hence r = 0.5 dicates a negative relationship that explas 25% of the variation y. r = 0.8 explas approximately 2/3 of the variation. 7
8 Interpretg r A high value of r corresponds to pots closely scattered around a le Examples of correlated responses Clustered data easurements on piglets from several litters responses from piglets with a litter are related, sharg parental and environmental fluences Experiment at several sites responses with a site are related Split-plot designs (with ma and sub-plots see later) Responses from the same ma plot will be related, sharg the ma plot characteristics Repeated easures from experimental units responses from the same unit are related, sharg the fluential characteristics of the unit if repeated over time, responses from the same unit close time may be more closely related than responses further apart 8
9 Identifyg Correlation & Different Sources of Variation Components of variation & correlation So far background noise has been accounted for by a sgle source of variation the variation between experimental units sharg the same values of explanatory variables. With clustered data, some of the variation will be due to differences between clusters (eg litters, sites). Individuals with a litter are related genetically. Plots with a site are related as they share common environmental elements. Clusters may form a source of random variation (any particular litter could have been allocated to a different treatment group but all piglets the litter would then have the same treatment and would not be dependent. Variation between sites may contribute to the variation of the estimate of a treatment effect based on a number of sites. The treatment effect may also vary over sites. Observations with the same value of a random effect will be positively correlated 9
10 ore general correlation structures Introducg a random effect for clusters of measurements assumes a very restricted structure of correlation ore generally we identify clusters whose members are correlated specify the correlation structure Example of clustered data analysis 27 sites with 30 plots per site = 810 plots our species 2 grasses (G) & 2 legumes (L) Simplex design at two densities 10
11 Location of Sites 16 id European (E) 4 Northern European (NE) 3 oist editerranean () 2 Dry editerranean (D) 2 Other A typical E site Species - G 1 Lolium perenne - G 2 Dactylis glomerata - L 1 Trifolium pratense - L 2 Trifolium repens irst complete year s data was used. 11
12 Diversity effect is positive and significant at most sites Country Type Av mono (t ha -1 ) δˆ (t ha -1 ) Germany E Ireland E Lithuania E Lithuania E Lithuania E Netherlands E Norway E Norway E Norway E Poland E Spa E Sweden E Sweden E Switzerland E Wales E Wales E Average Site fixed vs site random Site ixed Proc glm data=yield; Class site monomix density; odel tot_yld=site density monomix ; Lsmeans monomix/stderr; Run; Diff 2.39 SE LSmeans ean SE ono ix Site Random Proc mixed data=yield; Class site monomix density; odel tot_yld= density monomix; Lsmeans monomix/stderr; Random site Run; Diff 2.39 SE LSmeans ean SE ono ix
13 onomix tested agast random site*monomix Estimate of diversity effect is the same (2.39) SE of diversity effect is now greater (0.254) Loss of precision is compensated for by a wider range of ference about the diversity effect Any new site predict a diversity effect of 2.39 but use the se = for settg bounds for the prediction. Unbalanced mixed model analysis The example data was balanced each treatment (combation of a level of V with a level of N) appeared the same number of times (once!) each block or unbalanced data every comparison has a different SE. It s important to use the ddfm = kenwardroger option on the model statement to drop teraction terms if non-significant See PGR
14 Cost data revisited meta analysis Consistency of treatment effects across multiple sites (meta analysis) If an experiment is repeated across a number of sites the question arises as to how well the average treatment effect estimated across all sites represents the difference between treatments. Is there variation the onomix effect across sites? If sites are regarded as random (drawn at random from some population of sites), how well does the average onomix effect over sites represent the typical monomix effect at a site drawn from the same population.. To use the average monomix effect this way its should conta variation due to how the effect varies from site to site (i.e. formation on the site x monomix teraction. This is achieved by cludg the SAS program the le random site site*monomix; Diversity effect is positive and significant at most sites eta analysis Country Type Av mono (t ha -1 ) δˆ (t ha -1 ) Germany E Ireland E Lithuania E Lithuania E Lithuania E Netherlands E Norway E Norway E Norway E Poland E Spa E Sweden E Sweden E Switzerland E Wales E Wales E Average
15 SAS programs Site random proc mixed data=yield; class site monomix density; model tot_yld = density monomix /ddfm=kenwardroger ; run; random site; lsmeans monomix; estimate 'diversity effect' monomix -1 1; SAS programs Site random proc mixed data=yield; class site monomix density; model tot_yld = density monomix /ddfm=kenwardroger ; run; random site; random site*monomix; lsmeans monomix; estimate 'diversity effect' monomix -1 1; 15
16 Site ixed Proc glm data=yield; Site fixed vs site random and Site*monomix cluded Class site monomix density; odel tot_yld=site density monomix site*monomix; Lsmeans monomix/stderr; Run; Diff 2.39 SE LSmeans ean SE ono ix Site Random Proc mixed data=yield; Class site monomix density; odel tot_yld= density monomix; Lsmeans monomix/stderr; Random site Random site*monomix Run; Diff 2.39 SE LSmeans ean SE ono ix Comparison of models Random effects SITE SITE SITE*ONOIX Variance Estimates Site Site * onomix 0.70 Residual log Likelihood ono SE ixture SE Diversity effect SE P value <0.001 <
17 Split Plot Experimental Design Split Plot: effect of fertiliser on 3 varieties of oats In each of 6 homogeneous blocks: Varieties allocated to ma plots at random In each sub plot varyg levels of fertiliser N (kg/ha) are allocated at random 17
18 Analysis of split plot design Interest is on effect of V (variety) and N (nitrogen level) on Y (yield) possible sources of background variation Initially regard N level as categorical odel terms would clude B (block), V, N and the teraction V*N In addition model must reflect the split plot structure: yields from the same ma plot will share similar (randomly allocated) characteristics, and consequently be correlated a plots are identified by specifyg B and V, ie B*V. SAS/IXED procedure caters for the random effects of B*V. SAS/IXED code proc mixed data = oats; class b v n; model y = b n v n*v / htype = 1; random b*v; estimate 'Variety 1v2' v 1-1 0; estimate 'N 1v2' n ; run; b*v each level of this corresponds to a ma plot which contributes its characteristics to the response 18
19 SAS/IXED OUTPUT Type 1 Tests of ixed Effects Effect Num D Den D Value Pr > b n <.0001 v v*n V*N teraction is far from significant (so the model could be re-run without this term) 2. N has a strong effect 3. Varieties do not appear to differ SAS/IXED OUTPUT Covariance Parameter Estimates Cov Parm Estimate b*v Residual There are 2 sources of background variation: 1. with ma plots (component of variance: ) 2. between ma plots (component of variance: ) Different comparisons will use appropriate combations of these 19
20 LSmeans & SEDs Code lsmeans v n v*n / diff; a effect means (ie for v (SED = 7.079), n (SED = 4.436) and SEDs are appropriate if v*n is non-significant Comparison of the 12 v*n means volves 2 different SEDs When comparg 2 levels of N with a particular variety (SED = 7.683) When makg comparisons where the variety changes (SED = 9.715) lsmeans / diff; OUTPUT fferences of Least Squares eans Effect Variety Nitrogen Variety Nitrogen Estimate Standard Error D t Value Pr > t v Golden Ra arvellous v Golden Ra Victory v arvellous Victory n <.0001 n <.0001 n <.0001 n n <.0001 n v*n Golden Ra 0 Golden Ra v*n Golden Ra 0 Golden Ra <.0001 v*n Golden Ra 0 Golden Ra <.0001 v*n v*n Golden Ra Golden Ra 0 0 arvellous arvellous
21 Is there a simple description of the N effect? Usg SAS/EANS procedure calculate the mean yield for each treatment (V, N combation) Plot ean Yield v Nitrogen Level for each variety it a le for each variety. Split plot design 21
22 SAS/IXED: Split Plot Analysis N seems to have equal effect on yield for the 3 varieties Plot suggests effect of N is lear Add variable nval = n to data set oats it & test for separate slopes and LO: proc mixed data = oats; class b v n; model y = b nval n v nval*v n*v / htype = 1; random b*v; run; n n*v terms measure LO nval*v checks equal slopes SAS OUTPUT Type 1 Tests of ixed Effects Num Den Effect D D Value Pr > b nval <.0001 n v nval*v v*n LO terms not significant (p =.27,.93) 2. Lear effect of N same for the 3 varieties (p =.63) 3. Lear effect of N highly significant (p <.0001) 4. Refit model without LO terms and teraction nval*v to get coefficient of nval: 0.59 (SE: ) 22
23 Practical exercise SAS/IXED for Split Plot Analysis Lab Session 6 Exercise 6.2 (a) (e) Simple Analyses of Correlated Data 23
24 A simple analysis combe correlated values Seedlgs are planted 20 each of 10 contaers and the dry mass of each plant measured after 60 days. Responses from plants the same contaer will be correlated. Use as response variable the total dry mass of all 20 plants the same contaer. This gives 10 dependent response values Responses from piglets the same litter could be combed, though adjustments would have to be made for varyg litter size. Averages from large litters could give more precise formation than from small, but may also be lower. A simple analysis combe correlated values easurg plant height every 14 days for 10 weeks gives 6 measurements for each plant (after 0, 2, 4, 6, 8, and 10 weeks) If daily growth is of terest a sgle value from each plant could be calculated by (fal height itial height)/140 the slope of the fitted le to the data from the plant 24
25 Repeated easures Analysis utiple measures on the same units at the same time Repeated measurements on the same units over time Example The weights of 27 male and female children were measured at ages 8, 10, 12 and 14 years. Child Gender To keep SAS happy replace 8, 10, etc with w_age8, w_age10, etc Child Gender
26 Issues Repeated easures Analysis 1. What questions are to be addressed by the analysis? Can these be answered by reducg the measurements to a sgle value? eg fal weight weight ga over a particular period (from 8 to 10 or 10 to 14 yr) the slope of a straight le Issues Repeated easures Analysis 2. Are the measurements taken at equally spaced time pots? When times are not equally spaced, and even more so when patterns of timg differ from unit to unit, the analysis is more difficult 26
27 Issues Repeated easures Analysis 3. What is a reasonable error/correlation structure for the data? Are measurements equally related, irrespective of whether they are close together time or otherwise? Does the correlation decle as time pots are farther apart? Can this decle be described simply? Child Weights Example simple analysis usg 2 possible sgle values data c; set childweights; wga1 = y4 - y1; run; proc glm data = c; class gender; model wga1 = gender; lsmeans gender; estimate 'diff -' gender -1 1; quit; 27
28 SAS OUTPUT Ga 8 10 yr Parameter Estimate Standard Error t Value Pr > t diff Ga yr Parameter Estimate Standard Error t Value Pr > t diff Differences between genders depend on age ie there is a gender year teraction The complete data set needs to be analysed Gettg the picture! Are the slopes equal? 28
29 Repeated easures Analysis of Weight Restructure data for SAS Obs Child Gender weight age Centre age usg cage = age - 11 This means that gender comparisons will be made at age 11 yr. Repeated easures Analysis of Weight Observations from the same child will be correlated A simple way to clude this is to troduce a random effect for the child However: this assumes that any two observations on the same child are equally correlated, dependent of whether far apart or not (COPOUND SYETRY) proc mixed data = cw; class gender cage child; model weight = gender cage cage*gender / ddfm = kenwardroger htype = 1; random child; run; Note: it s best to use model option: ddfm = kenwardroger 29
30 IXED selected OUTPUT Least Squares eans Effect Gender Estimate Standard Error D t Value Pr > t Gender <.0001 Gender <.0001 Effect Gender cage Type 1 Tests of ixed Effects Num D Den D Value Pr > <.0001 Covariance Parameter Estimates Cov Parm Child Residual Estimate cage*gender The common correlation between observations on the same subject (child) is /( ) = 0.63 IXED selected OUTPUT Gender Difference Differences of Least Squares eans Effect Gender Gender Estimate Standard Error D t Value Pr > t Gender LSmeans and difference refer to cage = 0 30
31 IXED selected OUTPUT Age Effect Solution for ixed Effects Effect Gender Estimate Standard Error D t Value Pr > t Intercept <.0001 Gender Gender cage <.0001 cage*gender cage*gender Slopes : = : These differ significantly (p = ) IXED selected OUTPUT rcorr option gives correlation between the 4 measurements on each subject (child) Estimated R Correlation atrix for Child 1 Row Col1 Col2 Col3 Col Note the CS structure and value of the correlation 31
32 Was compound symmetry a good choice for Child Weights example? What are the alternatives? Alternatives to Compound Symmetry Alternatively we can specify that measurements on the same subject are correlated with compound symmetry or another structure as follows: IXED syntax: replace random child; with Choice of correlation structure: cs ar(1) unstructured repeated / type = cs subject = child rcorr; With balanced data these are equivalent, but repeated permits more complex structures: AR(1) correlation gets less for measurement father apart beg ρ, ρ 2, ρ 3, for measurements 1, 2, 3, time units apart (eg.5,.25,.125, ) UNSTRUCTURED no particular structure assumed 32
33 Gettg the correlation structure correct Unstructured correlation Estimated R Correlation atrix for Child 1 Row Col1 Col2 Col3 Col Not very different from CS 2. Other results similar 3. Correlation does not decrease with separation don t use AR(1) correlation Gettg the correlation structure correct Tests for correlation structure Chi-squared tests can compare CS and AR(1) with unstructured (UN) (they are cluded UN) If both CS and AR(1) are not rejected when compared to UN there isn t a formal test; be guided by appropriateness and the prciple of maximisg likelihood or the formal test we need to note the number of estimated parameters the structures beg compared the difference is the df for the relevant chisquared test 33
34 Chi-squared test Need to re-run IXED with different structures. The default method used by IXED is REL which is best for determg random effects IXED OUTPUT (usg REL) cludes a table of it cludg -2 Res Log Likelihood The number of covariance parameters is OUTPUT the Dimensions Testg whether a simplified covariance structure will do the chi-squared df is the difference covariance parameters where p = 1 - cdf( chisq, d, df); d = change (-2 Res Log Likelihood) Child Weights Example Structure df -2 Res Log Likelihood Change from UN p UN CS (8 df) 0.32 AR(1) (8 df) 0.01 data _null_; p = 1-cdf( chisq, 9.3, 8); put p = ; run; Result - SAS LOG: p= CS ok AR(1) rejected 34
35 ore correlation structures See SAS Help! or example: suppose some subjects are more variable than others, but there is still a common correlation between different measurements on the same subject. Covariance Parameter Estimates Cov Parm Subject Estimate Var(1) Child Var(2) Child Var(3) Child Use Heterogeneous CS: Var(4) CSH Child Child type = csh Structure CSH CS df Res Log Likelihood Change from CSH 1.8 (3 df) p 0.61 CS not rejected when compared to CSH SAS/IXED code The Idea Summarised Values that are correlated must be identified - each set of correlated values beg a level of some factor Introduce a variable identifyg correlated values Include it the class statement Use a repeated statement & specify the correlation structure Alternatively: use a random statement if the correlation is simply due to units sharg a common fluences 35
36 Practical exercise SAS/IXED for Repeated easures Analysis Lab Session 6 Exercise 6.3 (a) (d) Random coefficients model Previous odel assumes a sgle regression le, with subjects (children) deviatg from this each year with correlated deviations. ore generally we could allow each subject to have its own regression le; this has the advantage beg possible when subjects are measured at different times Sce subjects are randomly allocated to treatments, the dividual tercepts and slopes are random variables We fit a fixed le (tercept and slope beg the estimated mean for all subjects) Each subject has its own le resultg from addg/subtractg from the mean tercept and slope. 36
37 Random coefficients model 6 (of 11) females 6 (of 16) males Plots show a le fit to each child s data (for the first 6 males and 6females) SAS/IXED: random coefficients model SAS/IXED code proc mixed data = cw; class gender child; model weight = gender cage gender*cage / ddfm = kenwardroger htype = 1 solution; random tercept cage / type = un subject = child gcorr; run; 37
38 IXED: selected OUTPUT Covariance Parameter Estimates Cov Parm Subject Estimate UN(1,1) Child UN(2,1) Child UN(2,2) Child Residual Estimated G Correlation atrix Row Effect Child Col1 Col2 1 Intercept age Type 1 Tests of ixed Effects Effect Num D Den D Value Pr > Gender cage <.0001 cage*gender Correlation between tercept and slope: Variances: tercept 5.79, slope 0.033, residual Conclusions: the pattern of growth differs between males and females (p = ), as with earlier analysis. Average slopes unchanged. The End but hopefully just the begng! 38
SAS Syntax and Output for Data Manipulation:
Psyc 944 Example 5 page 1 Practice with Fixed and Random Effects of Time in Modeling Within-Person Change The models for this example come from Hoffman (in preparation) chapter 5. We will be examining
More informationIndividual Growth Analysis Using PROC MIXED Maribeth Johnson, Medical College of Georgia, Augusta, GA
Paper P-702 Individual Growth Analysis Using PROC MIXED Maribeth Johnson, Medical College of Georgia, Augusta, GA ABSTRACT Individual growth models are designed for exploring longitudinal data on individuals
More informationThis can dilute the significance of a departure from the null hypothesis. We can focus the test on departures of a particular form.
One-Degree-of-Freedom Tests Test for group occasion interactions has (number of groups 1) number of occasions 1) degrees of freedom. This can dilute the significance of a departure from the null hypothesis.
More informationFamily economics data: total family income, expenditures, debt status for 50 families in two cohorts (A and B), annual records from 1990 1995.
Lecture 18 1. Random intercepts and slopes 2. Notation for mixed effects models 3. Comparing nested models 4. Multilevel/Hierarchical models 5. SAS versions of R models in Gelman and Hill, chapter 12 1
More information861 Example SPLH. 5 page 1. prefer to have. New data in. SPSS Syntax FILE HANDLE. VARSTOCASESS /MAKE rt. COMPUTE mean=2. COMPUTE sal=2. END IF.
SPLH 861 Example 5 page 1 Multivariate Models for Repeated Measures Response Times in Older and Younger Adults These data were collected as part of my masters thesis, and are unpublished in this form (to
More informationChapter 7: Simple linear regression Learning Objectives
Chapter 7: Simple linear regression Learning Objectives Reading: Section 7.1 of OpenIntro Statistics Video: Correlation vs. causation, YouTube (2:19) Video: Intro to Linear Regression, YouTube (5:18) -
More information2013 MBA Jump Start Program. Statistics Module Part 3
2013 MBA Jump Start Program Module 1: Statistics Thomas Gilbert Part 3 Statistics Module Part 3 Hypothesis Testing (Inference) Regressions 2 1 Making an Investment Decision A researcher in your firm just
More informationSimple linear regression
Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between
More informationPart 2: Analysis of Relationship Between Two Variables
Part 2: Analysis of Relationship Between Two Variables Linear Regression Linear correlation Significance Tests Multiple regression Linear Regression Y = a X + b Dependent Variable Independent Variable
More informationRandom effects and nested models with SAS
Random effects and nested models with SAS /************* classical2.sas ********************* Three levels of factor A, four levels of B Both fixed Both random A fixed, B random B nested within A ***************************************************/
More information2. Simple Linear Regression
Research methods - II 3 2. Simple Linear Regression Simple linear regression is a technique in parametric statistics that is commonly used for analyzing mean response of a variable Y which changes according
More informationIntroduction to Longitudinal Data Analysis
Introduction to Longitudinal Data Analysis Longitudinal Data Analysis Workshop Section 1 University of Georgia: Institute for Interdisciplinary Research in Education and Human Development Section 1: Introduction
More informationChapter 5 Analysis of variance SPSS Analysis of variance
Chapter 5 Analysis of variance SPSS Analysis of variance Data file used: gss.sav How to get there: Analyze Compare Means One-way ANOVA To test the null hypothesis that several population means are equal,
More informationCHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression
Opening Example CHAPTER 13 SIMPLE LINEAR REGREION SIMPLE LINEAR REGREION! Simple Regression! Linear Regression Simple Regression Definition A regression model is a mathematical equation that descries the
More informationCorrelation and Simple Linear Regression
Correlation and Simple Linear Regression We are often interested in studying the relationship among variables to determine whether they are associated with one another. When we think that changes in a
More informationECON 142 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE #2
University of California, Berkeley Prof. Ken Chay Department of Economics Fall Semester, 005 ECON 14 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE # Question 1: a. Below are the scatter plots of hourly wages
More informationUnit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression
Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Objectives: To perform a hypothesis test concerning the slope of a least squares line To recognize that testing for a
More informationLinear Models in STATA and ANOVA
Session 4 Linear Models in STATA and ANOVA Page Strengths of Linear Relationships 4-2 A Note on Non-Linear Relationships 4-4 Multiple Linear Regression 4-5 Removal of Variables 4-8 Independent Samples
More information1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96
1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years
More informationRegression Analysis: A Complete Example
Regression Analysis: A Complete Example This section works out an example that includes all the topics we have discussed so far in this chapter. A complete example of regression analysis. PhotoDisc, Inc./Getty
More informationSimple Linear Regression Inference
Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation
More informationAssignments Analysis of Longitudinal data: a multilevel approach
Assignments Analysis of Longitudinal data: a multilevel approach Frans E.S. Tan Department of Methodology and Statistics University of Maastricht The Netherlands Maastricht, Jan 2007 Correspondence: Frans
More informationLinear Regression. Chapter 5. Prediction via Regression Line Number of new birds and Percent returning. Least Squares
Linear Regression Chapter 5 Regression Objective: To quantify the linear relationship between an explanatory variable (x) and response variable (y). We can then predict the average response for all subjects
More informationSimple Predictive Analytics Curtis Seare
Using Excel to Solve Business Problems: Simple Predictive Analytics Curtis Seare Copyright: Vault Analytics July 2010 Contents Section I: Background Information Why use Predictive Analytics? How to use
More informationHYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION
HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION HOD 2990 10 November 2010 Lecture Background This is a lightning speed summary of introductory statistical methods for senior undergraduate
More informationBasic Statistical and Modeling Procedures Using SAS
Basic Statistical and Modeling Procedures Using SAS One-Sample Tests The statistical procedures illustrated in this handout use two datasets. The first, Pulse, has information collected in a classroom
More informationI L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN
Beckman HLM Reading Group: Questions, Answers and Examples Carolyn J. Anderson Department of Educational Psychology I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN Linear Algebra Slide 1 of
More informationRegression III: Advanced Methods
Lecture 16: Generalized Additive Models Regression III: Advanced Methods Bill Jacoby Michigan State University http://polisci.msu.edu/jacoby/icpsr/regress3 Goals of the Lecture Introduce Additive Models
More informationX X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)
CORRELATION AND REGRESSION / 47 CHAPTER EIGHT CORRELATION AND REGRESSION Correlation and regression are statistical methods that are commonly used in the medical literature to compare two or more variables.
More informationBinary Logistic Regression
Binary Logistic Regression Main Effects Model Logistic regression will accept quantitative, binary or categorical predictors and will code the latter two in various ways. Here s a simple model including
More informationChapter 13 Introduction to Linear Regression and Correlation Analysis
Chapter 3 Student Lecture Notes 3- Chapter 3 Introduction to Linear Regression and Correlation Analsis Fall 2006 Fundamentals of Business Statistics Chapter Goals To understand the methods for displaing
More informationExample: Boats and Manatees
Figure 9-6 Example: Boats and Manatees Slide 1 Given the sample data in Table 9-1, find the value of the linear correlation coefficient r, then refer to Table A-6 to determine whether there is a significant
More informationDEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9
DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9 Analysis of covariance and multiple regression So far in this course,
More informationUnivariate Regression
Univariate Regression Correlation and Regression The regression line summarizes the linear relationship between 2 variables Correlation coefficient, r, measures strength of relationship: the closer r is
More informationFactors affecting online sales
Factors affecting online sales Table of contents Summary... 1 Research questions... 1 The dataset... 2 Descriptive statistics: The exploratory stage... 3 Confidence intervals... 4 Hypothesis tests... 4
More information11. Analysis of Case-control Studies Logistic Regression
Research methods II 113 11. Analysis of Case-control Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:
More information4. Multiple Regression in Practice
30 Multiple Regression in Practice 4. Multiple Regression in Practice The preceding chapters have helped define the broad principles on which regression analysis is based. What features one should look
More informationSPSS Guide: Regression Analysis
SPSS Guide: Regression Analysis I put this together to give you a step-by-step guide for replicating what we did in the computer lab. It should help you run the tests we covered. The best way to get familiar
More informationOutline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares
Topic 4 - Analysis of Variance Approach to Regression Outline Partitioning sums of squares Degrees of freedom Expected mean squares General linear test - Fall 2013 R 2 and the coefficient of correlation
More informationLesson 1: Comparison of Population Means Part c: Comparison of Two- Means
Lesson : Comparison of Population Means Part c: Comparison of Two- Means Welcome to lesson c. This third lesson of lesson will discuss hypothesis testing for two independent means. Steps in Hypothesis
More informationCorrelation key concepts:
CORRELATION Correlation key concepts: Types of correlation Methods of studying correlation a) Scatter diagram b) Karl pearson s coefficient of correlation c) Spearman s Rank correlation coefficient d)
More informationGeostatistics Exploratory Analysis
Instituto Superior de Estatística e Gestão de Informação Universidade Nova de Lisboa Master of Science in Geospatial Technologies Geostatistics Exploratory Analysis Carlos Alberto Felgueiras cfelgueiras@isegi.unl.pt
More informationChapter 15. Mixed Models. 15.1 Overview. A flexible approach to correlated data.
Chapter 15 Mixed Models A flexible approach to correlated data. 15.1 Overview Correlated data arise frequently in statistical analyses. This may be due to grouping of subjects, e.g., students within classrooms,
More informationTRINITY COLLEGE. Faculty of Engineering, Mathematics and Science. School of Computer Science & Statistics
UNIVERSITY OF DUBLIN TRINITY COLLEGE Faculty of Engineering, Mathematics and Science School of Computer Science & Statistics BA (Mod) Enter Course Title Trinity Term 2013 Junior/Senior Sophister ST7002
More informationSection 14 Simple Linear Regression: Introduction to Least Squares Regression
Slide 1 Section 14 Simple Linear Regression: Introduction to Least Squares Regression There are several different measures of statistical association used for understanding the quantitative relationship
More informationPremaster Statistics Tutorial 4 Full solutions
Premaster Statistics Tutorial 4 Full solutions Regression analysis Q1 (based on Doane & Seward, 4/E, 12.7) a. Interpret the slope of the fitted regression = 125,000 + 150. b. What is the prediction for
More informationDescriptive Statistics
Descriptive Statistics Primer Descriptive statistics Central tendency Variation Relative position Relationships Calculating descriptive statistics Descriptive Statistics Purpose to describe or summarize
More informationClass 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1)
Spring 204 Class 9: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.) Big Picture: More than Two Samples In Chapter 7: We looked at quantitative variables and compared the
More informationElementary Statistics Sample Exam #3
Elementary Statistics Sample Exam #3 Instructions. No books or telephones. Only the supplied calculators are allowed. The exam is worth 100 points. 1. A chi square goodness of fit test is considered to
More informationChapter 10. Key Ideas Correlation, Correlation Coefficient (r),
Chapter 0 Key Ideas Correlation, Correlation Coefficient (r), Section 0-: Overview We have already explored the basics of describing single variable data sets. However, when two quantitative variables
More informationExercise 1.12 (Pg. 22-23)
Individuals: The objects that are described by a set of data. They may be people, animals, things, etc. (Also referred to as Cases or Records) Variables: The characteristics recorded about each individual.
More informationAn analysis method for a quantitative outcome and two categorical explanatory variables.
Chapter 11 Two-Way ANOVA An analysis method for a quantitative outcome and two categorical explanatory variables. If an experiment has a quantitative outcome and two categorical explanatory variables that
More informationA Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution
A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 4: September
More informationModel Fitting in PROC GENMOD Jean G. Orelien, Analytical Sciences, Inc.
Paper 264-26 Model Fitting in PROC GENMOD Jean G. Orelien, Analytical Sciences, Inc. Abstract: There are several procedures in the SAS System for statistical modeling. Most statisticians who use the SAS
More informationAdditional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm
Mgt 540 Research Methods Data Analysis 1 Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm http://web.utk.edu/~dap/random/order/start.htm
More informationAnswer: C. The strength of a correlation does not change if units change by a linear transformation such as: Fahrenheit = 32 + (5/9) * Centigrade
Statistics Quiz Correlation and Regression -- ANSWERS 1. Temperature and air pollution are known to be correlated. We collect data from two laboratories, in Boston and Montreal. Boston makes their measurements
More informationCourse Objective This course is designed to give you a basic understanding of how to run regressions in SPSS.
SPSS Regressions Social Science Research Lab American University, Washington, D.C. Web. www.american.edu/provost/ctrl/pclabs.cfm Tel. x3862 Email. SSRL@American.edu Course Objective This course is designed
More informationMISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group
MISSING DATA TECHNIQUES WITH SAS IDRE Statistical Consulting Group ROAD MAP FOR TODAY To discuss: 1. Commonly used techniques for handling missing data, focusing on multiple imputation 2. Issues that could
More informationStatistical Models in R
Statistical Models in R Some Examples Steven Buechler Department of Mathematics 276B Hurley Hall; 1-6233 Fall, 2007 Outline Statistical Models Structure of models in R Model Assessment (Part IA) Anova
More informationcontaining Kendall correlations; and the OUTH = option will create a data set containing Hoeffding statistics.
Getting Correlations Using PROC CORR Correlation analysis provides a method to measure the strength of a linear relationship between two numeric variables. PROC CORR can be used to compute Pearson product-moment
More informationII. DISTRIBUTIONS distribution normal distribution. standard scores
Appendix D Basic Measurement And Statistics The following information was developed by Steven Rothke, PhD, Department of Psychology, Rehabilitation Institute of Chicago (RIC) and expanded by Mary F. Schmidt,
More informationNCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )
Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates
More informationBasic Statistics and Data Analysis for Health Researchers from Foreign Countries
Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Volkert Siersma siersma@sund.ku.dk The Research Unit for General Practice in Copenhagen Dias 1 Content Quantifying association
More informationModule 5: Multiple Regression Analysis
Using Statistical Data Using to Make Statistical Decisions: Data Multiple to Make Regression Decisions Analysis Page 1 Module 5: Multiple Regression Analysis Tom Ilvento, University of Delaware, College
More informationRegression and Correlation
Regression and Correlation Topics Covered: Dependent and independent variables. Scatter diagram. Correlation coefficient. Linear Regression line. by Dr.I.Namestnikova 1 Introduction Regression analysis
More informationUsing Excel for inferential statistics
FACT SHEET Using Excel for inferential statistics Introduction When you collect data, you expect a certain amount of variation, just caused by chance. A wide variety of statistical tests can be applied
More informationHow To Run Statistical Tests in Excel
How To Run Statistical Tests in Excel Microsoft Excel is your best tool for storing and manipulating data, calculating basic descriptive statistics such as means and standard deviations, and conducting
More informationSIMPLE LINEAR CORRELATION. r can range from -1 to 1, and is independent of units of measurement. Correlation can be done on two dependent variables.
SIMPLE LINEAR CORRELATION Simple linear correlation is a measure of the degree to which two variables vary together, or a measure of the intensity of the association between two variables. Correlation
More informationPearson's Correlation Tests
Chapter 800 Pearson's Correlation Tests Introduction The correlation coefficient, ρ (rho), is a popular statistic for describing the strength of the relationship between two variables. The correlation
More informationGLM I An Introduction to Generalized Linear Models
GLM I An Introduction to Generalized Linear Models CAS Ratemaking and Product Management Seminar March 2009 Presented by: Tanya D. Havlicek, Actuarial Assistant 0 ANTITRUST Notice The Casualty Actuarial
More informationTechnical report. in SPSS AN INTRODUCTION TO THE MIXED PROCEDURE
Linear mixedeffects modeling in SPSS AN INTRODUCTION TO THE MIXED PROCEDURE Table of contents Introduction................................................................3 Data preparation for MIXED...................................................3
More informationIntroduction to Regression and Data Analysis
Statlab Workshop Introduction to Regression and Data Analysis with Dan Campbell and Sherlock Campbell October 28, 2008 I. The basics A. Types of variables Your variables may take several forms, and it
More informationFinancial Risk Management Exam Sample Questions/Answers
Financial Risk Management Exam Sample Questions/Answers Prepared by Daniel HERLEMONT 1 2 3 4 5 6 Chapter 3 Fundamentals of Statistics FRM-99, Question 4 Random walk assumes that returns from one time period
More informationElectronic Thesis and Dissertations UCLA
Electronic Thesis and Dissertations UCLA Peer Reviewed Title: A Multilevel Longitudinal Analysis of Teaching Effectiveness Across Five Years Author: Wang, Kairong Acceptance Date: 2013 Series: UCLA Electronic
More informationSo, using the new notation, P X,Y (0,1) =.08 This is the value which the joint probability function for X and Y takes when X=0 and Y=1.
Joint probabilit is the probabilit that the RVs & Y take values &. like the PDF of the two events, and. We will denote a joint probabilit function as P,Y (,) = P(= Y=) Marginal probabilit of is the probabilit
More informationADVANCED FORECASTING MODELS USING SAS SOFTWARE
ADVANCED FORECASTING MODELS USING SAS SOFTWARE Girish Kumar Jha IARI, Pusa, New Delhi 110 012 gjha_eco@iari.res.in 1. Transfer Function Model Univariate ARIMA models are useful for analysis and forecasting
More informationDirections for using SPSS
Directions for using SPSS Table of Contents Connecting and Working with Files 1. Accessing SPSS... 2 2. Transferring Files to N:\drive or your computer... 3 3. Importing Data from Another File Format...
More informationHow To Model A Series With Sas
Chapter 7 Chapter Table of Contents OVERVIEW...193 GETTING STARTED...194 TheThreeStagesofARIMAModeling...194 IdentificationStage...194 Estimation and Diagnostic Checking Stage...... 200 Forecasting Stage...205
More informationName: Date: Use the following to answer questions 2-3:
Name: Date: 1. A study is conducted on students taking a statistics class. Several variables are recorded in the survey. Identify each variable as categorical or quantitative. A) Type of car the student
More informationCORRELATIONAL ANALYSIS: PEARSON S r Purpose of correlational analysis The purpose of performing a correlational analysis: To discover whether there
CORRELATIONAL ANALYSIS: PEARSON S r Purpose of correlational analysis The purpose of performing a correlational analysis: To discover whether there is a relationship between variables, To find out the
More informationCorrelation Coefficient The correlation coefficient is a summary statistic that describes the linear relationship between two numerical variables 2
Lesson 4 Part 1 Relationships between two numerical variables 1 Correlation Coefficient The correlation coefficient is a summary statistic that describes the linear relationship between two numerical variables
More informationChapter 23. Inferences for Regression
Chapter 23. Inferences for Regression Topics covered in this chapter: Simple Linear Regression Simple Linear Regression Example 23.1: Crying and IQ The Problem: Infants who cry easily may be more easily
More informationUsing SAS Proc Mixed for the Analysis of Clustered Longitudinal Data
Using SAS Proc Mixed for the Analysis of Clustered Longitudinal Data Kathy Welch Center for Statistical Consultation and Research The University of Michigan 1 Background ProcMixed can be used to fit Linear
More informationLecture 15. Endogeneity & Instrumental Variable Estimation
Lecture 15. Endogeneity & Instrumental Variable Estimation Saw that measurement error (on right hand side) means that OLS will be biased (biased toward zero) Potential solution to endogeneity instrumental
More informationAnalysing Questionnaires using Minitab (for SPSS queries contact -) Graham.Currell@uwe.ac.uk
Analysing Questionnaires using Minitab (for SPSS queries contact -) Graham.Currell@uwe.ac.uk Structure As a starting point it is useful to consider a basic questionnaire as containing three main sections:
More informationThe Big Picture. Correlation. Scatter Plots. Data
The Big Picture Correlation Bret Hanlon and Bret Larget Department of Statistics Universit of Wisconsin Madison December 6, We have just completed a length series of lectures on ANOVA where we considered
More informationCorrelational Research. Correlational Research. Stephen E. Brock, Ph.D., NCSP EDS 250. Descriptive Research 1. Correlational Research: Scatter Plots
Correlational Research Stephen E. Brock, Ph.D., NCSP California State University, Sacramento 1 Correlational Research A quantitative methodology used to determine whether, and to what degree, a relationship
More informationPITFALLS IN TIME SERIES ANALYSIS. Cliff Hurvich Stern School, NYU
PITFALLS IN TIME SERIES ANALYSIS Cliff Hurvich Stern School, NYU The t -Test If x 1,..., x n are independent and identically distributed with mean 0, and n is not too small, then t = x 0 s n has a standard
More informationThe Chi-Square Test. STAT E-50 Introduction to Statistics
STAT -50 Introduction to Statistics The Chi-Square Test The Chi-square test is a nonparametric test that is used to compare experimental results with theoretical models. That is, we will be comparing observed
More informationBiostatistics Short Course Introduction to Longitudinal Studies
Biostatistics Short Course Introduction to Longitudinal Studies Zhangsheng Yu Division of Biostatistics Department of Medicine Indiana University School of Medicine Zhangsheng Yu (Indiana University) Longitudinal
More informationData Mining and Predictive Modeling in Institutional Advancement: How Ten Schools Found Success
Technical report Data Mg and Predictive Modelg Institutional Advancement: How Ten Schools Found Success Dan Luperchio, Campaign Admistrator Zanvyl Krieger School of Arts and Sciences The Johns Hopks University
More informationData Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression
Data Mining and Data Warehousing Henryk Maciejewski Data Mining Predictive modelling: regression Algorithms for Predictive Modelling Contents Regression Classification Auxiliary topics: Estimation of prediction
More informationABSORBENCY OF PAPER TOWELS
ABSORBENCY OF PAPER TOWELS 15. Brief Version of the Case Study 15.1 Problem Formulation 15.2 Selection of Factors 15.3 Obtaining Random Samples of Paper Towels 15.4 How will the Absorbency be measured?
More informationWhat s New in Econometrics? Lecture 8 Cluster and Stratified Sampling
What s New in Econometrics? Lecture 8 Cluster and Stratified Sampling Jeff Wooldridge NBER Summer Institute, 2007 1. The Linear Model with Cluster Effects 2. Estimation with a Small Number of Groups and
More information10. Analysis of Longitudinal Studies Repeat-measures analysis
Research Methods II 99 10. Analysis of Longitudinal Studies Repeat-measures analysis This chapter builds on the concepts and methods described in Chapters 7 and 8 of Mother and Child Health: Research methods.
More informationStatistical Models in R
Statistical Models in R Some Examples Steven Buechler Department of Mathematics 276B Hurley Hall; 1-6233 Fall, 2007 Outline Statistical Models Linear Models in R Regression Regression analysis is the appropriate
More informationMultiple Linear Regression
Multiple Linear Regression A regression with two or more explanatory variables is called a multiple regression. Rather than modeling the mean response as a straight line, as in simple regression, it is
More informationMultivariate Analysis of Variance (MANOVA)
Chapter 415 Multivariate Analysis of Variance (MANOVA) Introduction Multivariate analysis of variance (MANOVA) is an extension of common analysis of variance (ANOVA). In ANOVA, differences among various
More informationExamining a Fitted Logistic Model
STAT 536 Lecture 16 1 Examining a Fitted Logistic Model Deviance Test for Lack of Fit The data below describes the male birth fraction male births/total births over the years 1931 to 1990. A simple logistic
More information