Introduction to Growth Curves Using Stata

Size: px
Start display at page:

Download "Introduction to Growth Curves Using Stata"

Transcription

1 Multilevel/Mixed Models and Longitudinal Analysis Using Stata Alan C. Acock University Distinguished Professor of Family Studies & Knudson Chair for Family Research & Policy Oregon State University College of Health and Human Sciences Summer Workshop Series July 2010 Introduction to Growth Curves Using Stata 1

2 What s in a name 2 What s in a name: Cross Sectional When measured at one time Repeated measures on a case The case might be a family Repeated measures might be Dad s happiness, Mom s happiness, Oldest kid s happiness, Next oldest kid s happiness, etc. Idea is that the measurements are nested (repeated) in the case. We have 2+ measurements in each family 3 2

3 What s in a name: Longitudinal When measured longitudinally in a panel The case might be a an individual Repeated measures might be his/her happiness at wave 1, wave 2, wave 3, wave 4, etc. Idea is that the measurements are nested (repeated) in the case. We have 2+ measurements in each family 4 What s this about levels Cross Sectional? Cross-sectional has individuals nested in families. Level 1 is the individual s score (mom, dad, kid) Level 2 is the family Level 1 scores within a family more homogeneous than scores for random individuals Level 3 might be neighborhood 5 3

4 What s this about levels?--variables Can have different predictor variables at each level Level 1 variables might be personality, IQ, attitude Level 2 variables might be household income, days/ week family eats dinner together Level 3 might be neighborhood %white, median home value Key All this is interdependent because the levels are nested 6 What s this about levels--longitudinal? Longitudinal models have scores at each wave nested in individuals Level 1 is the score at wave 1, wave 2, etc. Level 2 is the individual Level 1 scores of individual at each wave are more homogeneous than scores for random individuals 7 4

5 Graphing the Interdependence Sophia Rabe-Hesketh & Anders Skrondal, Multilevel and Longitudinal Modeling Using Stata. College Station, TX: Stata Press. I change the labels of variables from what they use 8 Graphing the Interdependence Generate a mean for the husband twoway (scatter husband1 couple, msymbol(circle)) ///! (scatter husband2 couple, sort msymbol(circle_hollow)), ///! xtitle(couple) ytitle(husband's measure Stability) ///! legend(order(1 "Time 1" 2 "Time 2"))! 9 5

6 Graphing the Interdependence Husband's measure Stability Couple Time 1 Time 2 10 Wide to Long Format 11 6

7 Wide to Long Format 12 Variance Components Intraclass Correlation We run a regression with no predictors and tell Stata what is the id variable 13 7

8 The command--xtreg! Stata has many commands for multilevel models, all start with xt!. xtreg husband, i(couple) mle! Just enter the level 1 variable (repeated variable) in variable list In our data, each husband from 1 to 17 is identified by the variable couple. The i means whatever variable is in parentheses is the identification variable. This might be called id, case, etc. Here it happens to be called couple 14 The command--xtreg! The mle means we are asking for a maximum likelihood estimator The default is restricted maximum likelihood, reml! But reml makes it harder to compare models This command requires the data to be in the long format 15 8

9 The xtreg Result 16 The xtreg Result We have 34 level 1 observations (two measures) for each of our 17 level 2 cases (called groups since the level 1 values are grouped in the 17 level 2 husbands We have no missing values: min, avg, max all = 2. Stata automatically uses all available data, e.g., with families and mom, dad, kids some families (level 2) might have 1 kid, some might have 2 kids, etc. The chi-square test with no predictors is meaningless (df = 0) The maximized log likelihoods value is

10 The xtreg Result The _cons (constant/intercept) with no predictors, , is the overall mean (best guess in absence of predictors) The /sigma_u is that standard deviation (other programs report variance (option in Stata) Between(husbands). We expect this to be large The /sigma_e is the standard deviation Within (husbands). We expect this to be small. Rho ( ) is the intraclass correlation (ICC) Var(Between) ICC = Var(Between) + Var(Within = SD(Between) 2 SD(Between) 2 + SD(Within) = = The xtreg Result The _cons (constant/intercept) with no predictors, , is the overall mean (best guess in absence of predictors) The /sigma_u is that standard deviation (other programs report variance (option in Stata) Between(husbands). We expect this to be large The /sigma_e is the standard deviation Within(husbands). We expect this to be small. Rho ( ) is the intraclass correlation (ICC) is.967 Below table chi-square(1) = 46.27; p <.001 is the significance of the ICC 19 10

11 The Intraclass Correlation ICC = = Var(Between) Var(Between) + Var(Within = SD(Between) 2 SD(Between) 2 + SD(Within) =.967 Using the standard deviations is more easily interpretable than using the variances. About 95% of the husbands will be within 2*(107.05) of the mean of That is, the mean plus or minus Roughly between 250 and 650. About 95% of the two measures for each husband will be within 2*19.91 of the husband s mean. Roughly his mean plus or minus 40. Husbands are relatively stable. Most variance is between husbands rather than within husbands. 20 The xtmixed command The xtmixed command is much more general. xtmixed does not report the ICC. xtmixed husband couple:, mle! After the two vertical bars we have the identification variable followed by a colon After the comma we ask from a maximum likelihood estimator 21 11

12 The xtmixed result 22 The xtmixed result This has all of the same numbers as the xtreg! The variance components are shown in the bottom table labeled random-effects parameters The standard deviation between individuals is the standard deviation around the overall mean, This appears as the sd(_cons) and is The standard deviation within each husband, across his repeated measures is the sd(residual) and is The ICC is computed using the simple formula shown before 23 12

13 The xtmixed result Conf. Intervals Both results show standard errors for the estimated standard deviations and 95% confidence intervals These are somewhat problematic. The boundary space for a variance or standard deviation has a lower limit of zero A similar problem occurs putting a confidence interval around a correlation coefficient since it can t be below 0. Stata adjusts for this by reporting an asymmetric confidence interval. A symmetrical C.I. for sd(residual) would be Graphic Representation of Variance Components H 1 ε 11 Husband j's mean (true score) ε 21 H 2 ζ 1 Distance of husband j from overall mean M =

14 Graphic Representation of Variance Components Husband j s mean is j above the overall mean a happy guy At time 1, he is 1j point is above his average score At time 2, he is 2j point is above his average score The variance of his mean around the overall mean ( j ) is the between variance (should be big) The variance of his two scores around his own mean ( ij ) is the within variance (should be small) 26 Applications of Variance Components Often just a first step to get the ICC to show that the data is not independent and a multilevel analysis is needed If ICC is small some say you do not need to run multilevel analysis Counter argument If the design is multilevel then you need to run a multilevel analysis 27 14

15 Applications of Variance Components You don t change the test you planned to do to get a significant result If you set up a nonparametric test and it was not significant, but then you noticed a few outliers, what would you do? Change to a t-test that is sensitive to outliers and might be significant Stay the course with your research design FDA expects drug companies to indicate what tests they will run before they collect the data and does not allow them to try different tests till they find one is significant If you set up a test using a two-tail assumption, can you change it to one-tail after seeing the result? This is equivalent to not running a multilevel analysis after you see the ICC is small 28 Applications of Variance Components Can use ICC and graph to see who is most similar Are wives more consistent than husbands? Are identical twins more similar than other twins? Are students in all female math classes more similar than mixed math classes? Just compare the ICCs and possibly do a graph 29 15

16 How many 2nd level groups are needed? Here these are husbands Could be families, organizations, classrooms, etc In a very real sense, these are your cases. 30 to 50 seems reasonable It is possible to do a power analysis If you had 5 classes, it would be like having 5 observations a pretty small sample size 30 How many level 1 scores are needed? Here we only had 2, more would be very helpful Could be scores on members of a group students in a class (25-30), members of a family (3-6) Issue is getting a mean of these values to represent some sense of a true score. Husband s mean is his reference point Mean of 25 students in a class is the classes reference point 31 16

17 Do-file * intraclass.do! clear! cd "/Volumes/acock/1flash/1presentations/OSU 2010 Workshop/ data"! use intraclass.dta! list couple hus*! egen husband_mean = rowmean(husband1 husband2)! summarize husband_mean! * Use menu system to generate this graph! twoway (scatter husband1 couple, msymbol(circle)) ///! (scatter husband2 couple, sort msymbol(circle_hollow)), ///! xtitle(couple) ytitle(husband's measure Stability) ///! legend(order(1 "Time 1" 2 "Time 2"))! list! * Reshaping the data from wide to long! reshape long wife husband, i(couple) j(occassion)! list couple occassion husband husband_mean if couple < 5! * Variance Components models! xtreg husband, i(couple) mle! xtmixed husband couple:, mle! xtmixed husband couple:, mle nolog! 32 Do-file * Comparison table! quietly xtreg wife, i(couple) mle! estimates store her! quietly xtreg husband, i(couple) mle! estimates store him! estimates table her him! list in 1/10! gen id = _n! list in 1/10! rename wife pw! rename husband ph! list in 1/10! reshape long p, i(id) j(partner) string! list in 1/10! encode partner, gen(spouse)! list in 1/10, nolabel! recode spouse 2 = 0! list in 1/10, nolabel! 33 17

18 Do-file xtmixed p couple:, mle! estimates store model1! xtmixed p spouse couple:, mle! estimates store model2! twoway (scatter p couple if spouse==0, msymbol (circle)) ///!! (scatter p couple if spouse==1, msymbol (circle_hollow)), ///!! xtitle(couple) ytitle(marital Satisfaction) ///!! legend(order(0 "Wife" 1 "Husband")) xlabel (1/17)! * Three Way, measures nested in spouses who are nested in couples! xtmixed p spouse couple: spouse:, mle! estimates store model3! lrtest model2 model3! 34 Sometimes a Simple Example Helps Farmer Brown has 48 brand new pigs and his daughter, Emma, weighs each pig once a week for 9 weeks Farmer Brown wants to know what the weight trajectory Stata uses this data, but I ve added a catch. Emma is not reliable. In fact, she only records 294 of the 432 (9*48) possible weights so that we have 30% missing values. This means only 3 pigs got weighed all 9 weeks (listwise) The result for the first 2 pigs (in Long Format) appears on the next slide 35 18

19 Data for first two pigs 36 Graph for 10 pigs twoway connected weight week if id<=10, connect(line)! weight week 37 19

20 How about a fixed effects model? Brown really doesn t care much for individual differences and really just want to see how fast the pigs are growing overall To adjust for the lack of independence (9 weights nested in each pig), Brown does a fixed effects model using xtreg! 38 Fixed Effects Model 39 20

21 Making a graph of the fixed effect predict weightfe! twoway (line weightfe week)! Linear prediction week 40 Random Intercept model There are now two error terms, one for the variance around the intercept and one for the rest of the unexplained variance weight ij = β 0 + β 1 week ij + µ i + ε ij Pig i at week j now has i This error will be positive if the pig weighs more than the average initially It will be negative if weights less than average initially Intercept will be β 0 + µ i There is also an error, ij, for each pig at each wave. A pig might have been sick one week and lost weight that week

22 Estimating the random intercept model. xtmixed weight week id:, mle!. estimates store weightri! weight week part Response variable weight has a fixed portion depending on the week id: specifies a random effect by the grouping variable id. This gives us the random intercept. The mle uses a maximum likelihood estimator The estimates store weightri stores the results using the name weightri! 42 Random Intercept Model: Results 43 22

23 Random Intercept Model: Interpretation We have 294 cases where we have a weight for a pig (not 3 as would be the case with listwise deletion and not 432 The first estimation table reports the fixed effects We estimate B 0 = and B 1 = 6.21! Weight = week + error is our fixed effect part Second table is variance components. The 3.89 is the standard deviation of the constant/intercept and its standard error, 0.41 is quite small The sd(residual) = 2.10 is the standard deviation of the error (standard error) The chi-square(1) = , p < tells us we needed to use a multilevel model 44 A Random Slope Now let s try a random coefficient/slope weight ij = β 0 + β 1 week ij + µ 0i + µ 1i week ij + ε ij The 0i is the variance around the intercept The 1i week ij is the variance weekly variance around the slope Random intercept: (β 0 + µ 0i ) Random slope: (β 1 + µ 1i )week ij 45 23

24 Covariance of Intercept & Slope Need to decide on the covariance of the intercept and the slope The default assumes the covariance of the intercept variance and slope variance are uncorrelated, an identity matrix 46 A Random slope: cov(unstruct) Now let s try a random slope weight ij = β 0 + β 1 week ij + µ 0i + µ 1i week ij + ε ij The 0i is the variance around the intercept The 1i week ij is the variance weekly variance around the slope Unstructured covariance assumes the covariance of the intercept variance and slope variance are correlated: 47 24

25 Random Coefficients Model. xtmixed weight week id: week, nolog mle cov(unstruct) var!. estimates store weightrc!. lrtest weightri weightrc! The id: is the part of the command that gives us the random intercept Any variable after the colon will have a random coefficient The variable week is allowed to have a different slope for each pig since some grow faster than others The cov(unstruct) allows the random intercept and random slope to be correlated Notice the var at the end means we are estimating variances

26 School Engagement Example Data from Day and others of children and their parents from Seattle. They have 3 waves. Kids were 10, 11, 12, or 13 the first wave, 11, 12, 13, or 14 the second wave, and 12, 13, 14, or 15 the third year Reorganized data by age at birth (MCAR) birthyr wave1 wave2 wave3 wave4 wave5 wave6! ! ! ! ! ! ! Total ! ! 50 Correlation of Intercept and Slope We can see if the intercept and slope are correlated We need to do 494 separate regressions of school engagement on year for each child and save the 494 intercepts and slopes statsby inter=_b[_cons] slope = _b[yr], ///! by(id) saving(ols): regress sch yr! 51 26

27 Correlation of Intercept and Slope We merge the saved dataset with our active dataset Then we do the graph using twoway (scatter slope inter) (lfit slope /// inter), xtitle(intercept) ytitle(slope)! 52 Intercept and slope are correlated Slope r = Intercept 53 27

28 How do the means fit? We expect there to be a steady decline in school engagement 54 Using xtreg to estimate the ICC 55 28

29 Compare random intercept & random coefficient models. xtmixed sch female mom_ed nev_mar ///! div_sep other yr id:, mle ///! cov(unstructured)!. estimates store ri!. xtmixed sch female mom_ed nev_mar /// div_sep other yr id:, mle!. estimates store ri!. lrtest ri rc! 56 Telling a story We will run the model using random slopes (even though in this case they were not needed) We will create a graph comparing a male whose mother has low education and has never married to a female whose mother has a college degree and is married We think of these as ideal types. xtmixed sch female mom_ed nev_mar div_sep ///! other yr id: yr, mle cov(unstructured)!. predict sch_score!. twoway (connected sch_score yr if female==0 ///! & mom_ed==2 & nev_mar==1, sort)(connected ///! sch_score yr if female ==1 & mom_ed==4 & ///! mom_ed <. & nev_mar==0 & div_sep==0 & other==0)! 57 29

30 Telling a Story Linear prediction, fixed portion yr Male, Mom never married, low ed Female, Mom married, B.A. 58 *mkdaygrow! clear! cd "/Volumes/acock/1daygrow/data"! use "wave1-3_final_combinedsite_8.dta! destring family_id, gen(id)! keep if site == 1! fre p1_21b_1 p1_21c_1! gen birthyr = p1_21b_1! tab birthyr p1_21b_1! replace birthyr = 1995 if birthyr == ! drop if birthyr == 1993 birthyr == 1998! tab birthyr p1_21b_1! gen age1 = 13 if birthyr == 1994! gen age2 = 14 if birthyr == 1994! gen age3 = 15 if birthyr == 1994! replace age1 = 12 if birthyr == 1995! replace age2 = 13 if birthyr == 1995! replace age3 = 14 if birthyr == 1995! replace age1 = 11 if birthyr == 1996! replace age2 = 12 if birthyr == 1996! replace age3 = 13 if birthyr == 1996! replace age1 = 10 if birthyr == 1997! replace age2 = 11 if birthyr == 1997! replace age3 = 12 if birthyr == 1997! gen wave1 = 0 if age1 == 10! gen wave2 = 1 if age2 == 11! replace wave2 = 1 if age1 == 11! gen wave3 = 2 if age3 == 12! replace wave3 = 2 if age2 == 12! replace wave3 = 2 if age1 == 12! 59 30

31 factor c_scheng1_1 - c_scheng3_1 c_scheng5_1 c_scheng7_1 - c_scheng8_1 ///!!c_scheng15_1, pcf! factor c_scheng1_2 - c_scheng3_2 c_scheng5_2 c_scheng7_2 - c_scheng9_2, pcf! factor c_scheng1_3 - c_scheng3_3 c_scheng5_3 c_scheng7_3 - c_scheng9_3, pcf! alpha c_scheng1_1 - c_scheng3_1 c_scheng5_1 c_scheng7_1 - c_scheng8_1 ///!!c_scheng15_1, asis item! alpha c_scheng1_2 - c_scheng3_2 c_scheng5_2 c_scheng7_2 - c_scheng9_2, ///! asis item! alpha c_scheng1_3 - c_scheng3_3 c_scheng5_3 c_scheng7_3 - c_scheng9_3, ///! asis item! egen schengage1 = rowmean(c_scheng1_1 - c_scheng3_1 c_scheng5_1 ///! c_scheng7_1 - c_scheng8_1 c_scheng15_1)! egen schengage2 = rowmean(c_scheng1_2 - c_scheng3_2 c_scheng5_2 ///!! c_scheng7_2 - c_scheng9_2)! egen schengage3 = rowmean(c_scheng1_3 - c_scheng3_3 c_scheng5_3 ///!! c_scheng7_3 - c_scheng9_3)! pwcorr schengage1-schengage3, obs! /* make six wave for school engatement! */! 60 gen sch1 = schengage1 if birthyr == 1997! gen sch2 = schengage2 if birthyr == 1997! gen sch3 = schengage3 if birthyr == 1997! replace sch2 = schengage1 if birthyr == 1996! replace sch3 = schengage2 if birthyr == 1996! gen sch4 = schengage3 if birthyr == 1996! replace sch3 = schengage1 if birthyr == 1995! replace sch4 = schengage2 if birthyr == 1995! gen sch5 = schengage3 if birthyr == 1995! replace sch4 = schengage1 if birthyr == 1994! replace sch5 = schengage2 if birthyr == 1994! gen sch6 = schengage3 if birthyr == 1994! list id sch* birthyr in 1/50! tabstat sch1-sch6, statistics( count mean ) by(birthyr) columns(variables)! gen wave4 = 3 if age3 == 13! replace wave4 = 3 if age2 == 13! replace wave4 = 3 if age1 == 13! gen wave5 = 4 if age3 == 14! replace wave5 = 4 if age2 == 14! gen wave6 = 5 if age3 == 15! tabstat wave*, statistics( count ) by(birthyr) columns(variables)! /*! Summary statistics: N! by categories of: birthyr! birthyr wave1 wave2 wave3 wave4 wave5 wave6! ! ! ! ! ! ! Total ! ! */! /* School Engagement! Wave 2 and 3 had 9 items, wave 1 had 15. Droped items 4 and 6 as negqtively! worded. Kept 7 items that are in common! c_scheng1_1 - c_scheng3_1 c_scheng5_1 c_scheng7_1 - c_scheng9_1 c_scheng15_1! c_scheng1_2 - c_scheng3_2 c_scheng5_2 c_scheng7_2 - c_scheng9_2! c_scheng1_3 - c_scheng3_3 c_scheng5_3 c_scheng7_3 - c_scheng9_3! alphas are.80,.83, and.83 for waves 1, 2, and 3.! */! 61 31

32 factor c_scheng1_1 - c_scheng3_1 c_scheng5_1 c_scheng7_1 - c_scheng8_1 ///!!c_scheng15_1, pcf! factor c_scheng1_2 - c_scheng3_2 c_scheng5_2 c_scheng7_2 - c_scheng9_2, pcf! factor c_scheng1_3 - c_scheng3_3 c_scheng5_3 c_scheng7_3 - c_scheng9_3, pcf! alpha c_scheng1_1 - c_scheng3_1 c_scheng5_1 c_scheng7_1 - c_scheng8_1 ///!!c_scheng15_1, asis item! alpha c_scheng1_2 - c_scheng3_2 c_scheng5_2 c_scheng7_2 - c_scheng9_2, ///! asis item! alpha c_scheng1_3 - c_scheng3_3 c_scheng5_3 c_scheng7_3 - c_scheng9_3, ///! asis item! egen schengage1 = rowmean(c_scheng1_1 - c_scheng3_1 c_scheng5_1 ///! c_scheng7_1 - c_scheng8_1 c_scheng15_1)! egen schengage2 = rowmean(c_scheng1_2 - c_scheng3_2 c_scheng5_2 ///!! c_scheng7_2 - c_scheng9_2)! egen schengage3 = rowmean(c_scheng1_3 - c_scheng3_3 c_scheng5_3 ///!! c_scheng7_3 - c_scheng9_3)! pwcorr schengage1-schengage3, obs! /* make six wave for school engatement! */! gen sch1 = schengage1 if birthyr == 1997! gen sch2 = schengage2 if birthyr == 1997! gen sch3 = schengage3 if birthyr == 1997! replace sch2 = schengage1 if birthyr == 1996! replace sch3 = schengage2 if birthyr == 1996! gen sch4 = schengage3 if birthyr == 1996! replace sch3 = schengage1 if birthyr == 1995! replace sch4 = schengage2 if birthyr == 1995! gen sch5 = schengage3 if birthyr == 1995! replace sch4 = schengage1 if birthyr == 1994! replace sch5 = schengage2 if birthyr == 1994! gen sch6 = schengage3 if birthyr == 1994! list id sch* birthyr in 1/50! tabstat sch1-sch6, statistics( count mean ) by(birthyr) columns(variables)! 62 /* Generating covariates! gender! mom's education! marital status===redo! */! gen nev_mar = 1 if famstruct2_1 == 4! replace nev_mar = 0 if famstruct2_1 ~= 4 & famstruct2_1 <.! gen div_sep = 1 if famstruct2_1 == 1 famstruct2_1 == 5! replace div_sep = 0 if famstruct2_1 ~= 1 & famstruct2_1 ~= 5 & famstruct2_1 <.! gen married = 1 if famstruct2_1 == 2! replace married = 0 if famstruct2_1 ~= 2 & famstruct2_1 <.! gen other = 1 if famstruct2_1 == 3 famstruct2_1 == 6! replace other = 0 if famstruct2_1 ~= 3 & famstruct2_1 ~= 6 & famstruct2_1 <.! fre famstruct2_1 nev_mar div_sep married other! gen female = p1_21a_1-1! fre female! clonevar mom_ed = p1_4_1! reshape long sch, i(id) j(w)! keep id sch w female mom_ed nev_mar div_sep married other! list id sch w in 1/30! gen yr = w -1! /*!!We want to know if the means for school engagement go down/up in a linear!!fashion. We can make a table of the mean for each of the six years, year!!0 to year 5! */! tabstat sch, statistics(mean count) by(yr) columns(variables)! xtreg sch, i(id) mle! xtmixed sch yr female mom_ed nev_mar div_sep other id:! xtmixed sch yr female mom_ed id:yr! regress sch yr if id == ! 63 32

33 /*! Correlation of intercept and Slope! This section calculates the intercept and the slope when you regress sch on! yr for each case, then it creates a graph showing the link of school and year.! */! statsby inter=_b[_cons] slope = _b[yr], by(id) saving(ols): regress sch yr! sort id! merge id using ols! drop _merge! twoway (scatter slope inter) (lfit slope inter), xtitle(intercept) ytitle(slope)! corr inter slope! corr inter slope, cov! xtdescribe if yr <., i(id) t(yr)! xtsum sch female mom_ed nev_mar div_sep married other yr, i(id)! regress sch female mom_ed nev_mar div_sep married other yr! predict res, residuals! /* Correlation of residuals */! preserve! keep id res yr! reshape wide res, i(id) j(yr)! tabstat res*, statistics(count variance)! pwcorr res*,obs! restore! /* Fixed effects model! These effects are the within subject estimates effects of the time! varying covariates. We have none. The time invariant covariates have! no within subject variance and hence cannot be estimated (are dropped).! The estimates for time variant covariance are not biased because of! omitted time invariant covariates. Each subject serves as his/her own! control. We could add time varying family processes, for example.! */! 64 xtreg sch female mom_ed nev_mar div_sep other yr, i(id) fe! /* Random Intercept Model! */! xtmixed sch yr id:, ml cov(unstructured)! estimates store riyronly! xtmixed sch female mom_ed nev_mar div_sep other yr id:, mle! estimates store ri! /*! Mixed-effects ML regression Number of obs = 1386! ! sch Coef. Std. Err. z P> z [95% Conf. Interval]! ! female ! mom_ed ! nev_mar ! div_sep ! other ! yr ! _cons ! ! ! Random-effects Parameters Estimate Std. Err. [95% Conf. Interval]! ! id: Identity! sd(_cons) ! ! sd(residual) ! ! LR test vs. linear regression: chibar2(01) = Prob >= chibar2 = ! ICC =.443^2/(.443^ ^2) = =.528! */! 65 33

34 66 Random Coefficients Model! */! xtmixed sch yr id: yr, mle cov(unstructured)! estimates store rcyronly! xtmixed sch female mom_ed nev_mar div_sep other yr id: yr, mle cov(unstructured)! estimates store rc! /*! Mixed-effects ML regression Number of obs = 1386! Group variable: id Number of groups = 483! Obs per group: min = 1! avg = 2.9! max = 3! Wald chi2(6) = ! Log likelihood = Prob > chi2 = ! ! sch Coef. Std. Err. z P> z [95% Conf. Interval]! ! female ! mom_ed ! nev_mar ! div_sep ! other ! yr ! _cons ! ! Random-effects Parameters Estimate Std. Err. [95% Conf. Interval]! ! id: Unstructured! sd(yr) ! sd(_cons) ! corr(yr,_cons) ! ! sd(residual) ! ! LR test vs. linear regression: chi2(3) = Prob > chi2 = ! lrtest riyronly rcyronly! lrtest ri rc! /*!. estimates store riyronly!. lrtest riyronly rcyronly! Likelihood-ratio test LR chi2(2) = 2.62! (Assumption: riyronly nested in rcyronly) Prob > chi2 = ! Note: The reported degrees of freedom assumes the null hypothesis is not on the boundaryof! the parameter space. If this is not true, then the reported test is conservative.! lrtest ri rc! Likelihood-ratio test LR chi2(2) = 2.35! (Assumption: ri nested in rc) Prob > chi2 = !!DIVIDE THE P VALUE BY TWO BECAUSE THIS IS INHERENTLY A ONE TAIL TEST --CAN'T!!BE NEGATIVE! xtmixed sch female mom_ed nev_mar div_sep other yr id: yr, mle cov(unstructured)! predict sch_score! twoway (connected sch_score yr if female==0 & mom_ed==2 & nev_mar==1, sort) ///! (connected sch_score yr if female ==1 & mom_ed==4 & mom_ed <. & nev_mar==0 ///! & div_sep==0 & other==0)! gen yrxfemale = yr * female! gen yrxmom_ed = yr * mom_ed! gen yrxnev_mar = yr*nev_mar! gen yrxdiv_sep = yr*div_sep! xtmixed sch female mom_ed nev_mar div_sep other yr yrxfemale ///! id: yr, mle cov(unstructured)! xtmixed sch female mom_ed nev_mar div_sep other yr yrxmom_ed ///! id: yr, mle cov(unstructured)! xtmixed sch female mom_ed nev_mar div_sep other yr yrxnev_mar ///! id: yr, mle cov(unstructured)! xtmixed sch female mom_ed nev_mar div_sep other yr yrxdiv_sep ///! id: yr, mle cov(unstructured)! 67 34

Lab 5 Linear Regression with Within-subject Correlation. Goals: Data: Use the pig data which is in wide format:

Lab 5 Linear Regression with Within-subject Correlation. Goals: Data: Use the pig data which is in wide format: Lab 5 Linear Regression with Within-subject Correlation Goals: Data: Fit linear regression models that account for within-subject correlation using Stata. Compare weighted least square, GEE, and random

More information

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN Beckman HLM Reading Group: Questions, Answers and Examples Carolyn J. Anderson Department of Educational Psychology I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN Linear Algebra Slide 1 of

More information

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( ) Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

More information

Sample Size Calculation for Longitudinal Studies

Sample Size Calculation for Longitudinal Studies Sample Size Calculation for Longitudinal Studies Phil Schumm Department of Health Studies University of Chicago August 23, 2004 (Supported by National Institute on Aging grant P01 AG18911-01A1) Introduction

More information

Stata Walkthrough 4: Regression, Prediction, and Forecasting

Stata Walkthrough 4: Regression, Prediction, and Forecasting Stata Walkthrough 4: Regression, Prediction, and Forecasting Over drinks the other evening, my neighbor told me about his 25-year-old nephew, who is dating a 35-year-old woman. God, I can t see them getting

More information

MULTIPLE REGRESSION EXAMPLE

MULTIPLE REGRESSION EXAMPLE MULTIPLE REGRESSION EXAMPLE For a sample of n = 166 college students, the following variables were measured: Y = height X 1 = mother s height ( momheight ) X 2 = father s height ( dadheight ) X 3 = 1 if

More information

Module 3: Correlation and Covariance

Module 3: Correlation and Covariance Using Statistical Data to Make Decisions Module 3: Correlation and Covariance Tom Ilvento Dr. Mugdim Pašiƒ University of Delaware Sarajevo Graduate School of Business O ften our interest in data analysis

More information

ECON 142 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE #2

ECON 142 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE #2 University of California, Berkeley Prof. Ken Chay Department of Economics Fall Semester, 005 ECON 14 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE # Question 1: a. Below are the scatter plots of hourly wages

More information

Department of Economics Session 2012/2013. EC352 Econometric Methods. Solutions to Exercises from Week 10 + 0.0077 (0.052)

Department of Economics Session 2012/2013. EC352 Econometric Methods. Solutions to Exercises from Week 10 + 0.0077 (0.052) Department of Economics Session 2012/2013 University of Essex Spring Term Dr Gordon Kemp EC352 Econometric Methods Solutions to Exercises from Week 10 1 Problem 13.7 This exercise refers back to Equation

More information

Correlation and Regression

Correlation and Regression Correlation and Regression Scatterplots Correlation Explanatory and response variables Simple linear regression General Principles of Data Analysis First plot the data, then add numerical summaries Look

More information

Please follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software

Please follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software STATA Tutorial Professor Erdinç Please follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software 1.Wald Test Wald Test is used

More information

Correlated Random Effects Panel Data Models

Correlated Random Effects Panel Data Models INTRODUCTION AND LINEAR MODELS Correlated Random Effects Panel Data Models IZA Summer School in Labor Economics May 13-19, 2013 Jeffrey M. Wooldridge Michigan State University 1. Introduction 2. The Linear

More information

gllamm companion for Contents

gllamm companion for Contents gllamm companion for Rabe-Hesketh, S. and Skrondal, A. (2012). Multilevel and Longitudinal Modeling Using Stata (3rd Edition). Volume I: Continuous Responses. College Station, TX: Stata Press. Contents

More information

Milk Data Analysis. 1. Objective Introduction to SAS PROC MIXED Analyzing protein milk data using STATA Refit protein milk data using PROC MIXED

Milk Data Analysis. 1. Objective Introduction to SAS PROC MIXED Analyzing protein milk data using STATA Refit protein milk data using PROC MIXED 1. Objective Introduction to SAS PROC MIXED Analyzing protein milk data using STATA Refit protein milk data using PROC MIXED 2. Introduction to SAS PROC MIXED The MIXED procedure provides you with flexibility

More information

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm Mgt 540 Research Methods Data Analysis 1 Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm http://web.utk.edu/~dap/random/order/start.htm

More information

Linear Models in STATA and ANOVA

Linear Models in STATA and ANOVA Session 4 Linear Models in STATA and ANOVA Page Strengths of Linear Relationships 4-2 A Note on Non-Linear Relationships 4-4 Multiple Linear Regression 4-5 Removal of Variables 4-8 Independent Samples

More information

" Y. Notation and Equations for Regression Lecture 11/4. Notation:

 Y. Notation and Equations for Regression Lecture 11/4. Notation: Notation: Notation and Equations for Regression Lecture 11/4 m: The number of predictor variables in a regression Xi: One of multiple predictor variables. The subscript i represents any number from 1 through

More information

Introduction to Multilevel Modeling Using HLM 6. By ATS Statistical Consulting Group

Introduction to Multilevel Modeling Using HLM 6. By ATS Statistical Consulting Group Introduction to Multilevel Modeling Using HLM 6 By ATS Statistical Consulting Group Multilevel data structure Students nested within schools Children nested within families Respondents nested within interviewers

More information

HLM software has been one of the leading statistical packages for hierarchical

HLM software has been one of the leading statistical packages for hierarchical Introductory Guide to HLM With HLM 7 Software 3 G. David Garson HLM software has been one of the leading statistical packages for hierarchical linear modeling due to the pioneering work of Stephen Raudenbush

More information

Part 2: Analysis of Relationship Between Two Variables

Part 2: Analysis of Relationship Between Two Variables Part 2: Analysis of Relationship Between Two Variables Linear Regression Linear correlation Significance Tests Multiple regression Linear Regression Y = a X + b Dependent Variable Independent Variable

More information

From the help desk: Swamy s random-coefficients model

From the help desk: Swamy s random-coefficients model The Stata Journal (2003) 3, Number 3, pp. 302 308 From the help desk: Swamy s random-coefficients model Brian P. Poi Stata Corporation Abstract. This article discusses the Swamy (1970) random-coefficients

More information

Analyzing Intervention Effects: Multilevel & Other Approaches. Simplest Intervention Design. Better Design: Have Pretest

Analyzing Intervention Effects: Multilevel & Other Approaches. Simplest Intervention Design. Better Design: Have Pretest Analyzing Intervention Effects: Multilevel & Other Approaches Joop Hox Methodology & Statistics, Utrecht Simplest Intervention Design R X Y E Random assignment Experimental + Control group Analysis: t

More information

DETERMINANTS OF CAPITAL ADEQUACY RATIO IN SELECTED BOSNIAN BANKS

DETERMINANTS OF CAPITAL ADEQUACY RATIO IN SELECTED BOSNIAN BANKS DETERMINANTS OF CAPITAL ADEQUACY RATIO IN SELECTED BOSNIAN BANKS Nađa DRECA International University of Sarajevo nadja.dreca@students.ius.edu.ba Abstract The analysis of a data set of observation for 10

More information

How to set the main menu of STATA to default factory settings standards

How to set the main menu of STATA to default factory settings standards University of Pretoria Data analysis for evaluation studies Examples in STATA version 11 List of data sets b1.dta (To be created by students in class) fp1.xls (To be provided to students) fp1.txt (To be

More information

Introduction to Quantitative Methods

Introduction to Quantitative Methods Introduction to Quantitative Methods October 15, 2009 Contents 1 Definition of Key Terms 2 2 Descriptive Statistics 3 2.1 Frequency Tables......................... 4 2.2 Measures of Central Tendencies.................

More information

Descriptive Statistics

Descriptive Statistics Descriptive Statistics Primer Descriptive statistics Central tendency Variation Relative position Relationships Calculating descriptive statistics Descriptive Statistics Purpose to describe or summarize

More information

Failure to take the sampling scheme into account can lead to inaccurate point estimates and/or flawed estimates of the standard errors.

Failure to take the sampling scheme into account can lead to inaccurate point estimates and/or flawed estimates of the standard errors. Analyzing Complex Survey Data: Some key issues to be aware of Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised January 24, 2015 Rather than repeat material that is

More information

xtmixed & denominator degrees of freedom: myth or magic

xtmixed & denominator degrees of freedom: myth or magic xtmixed & denominator degrees of freedom: myth or magic 2011 Chicago Stata Conference Phil Ender UCLA Statistical Consulting Group July 2011 Phil Ender xtmixed & denominator degrees of freedom: myth or

More information

Interaction effects and group comparisons Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised February 20, 2015

Interaction effects and group comparisons Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised February 20, 2015 Interaction effects and group comparisons Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised February 20, 2015 Note: This handout assumes you understand factor variables,

More information

10 Dichotomous or binary responses

10 Dichotomous or binary responses 10 Dichotomous or binary responses 10.1 Introduction Dichotomous or binary responses are widespread. Examples include being dead or alive, agreeing or disagreeing with a statement, and succeeding or failing

More information

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION HOD 2990 10 November 2010 Lecture Background This is a lightning speed summary of introductory statistical methods for senior undergraduate

More information

Marginal Effects for Continuous Variables Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised February 21, 2015

Marginal Effects for Continuous Variables Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised February 21, 2015 Marginal Effects for Continuous Variables Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised February 21, 2015 References: Long 1997, Long and Freese 2003 & 2006 & 2014,

More information

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96 1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years

More information

Multinomial and Ordinal Logistic Regression

Multinomial and Ordinal Logistic Regression Multinomial and Ordinal Logistic Regression ME104: Linear Regression Analysis Kenneth Benoit August 22, 2012 Regression with categorical dependent variables When the dependent variable is categorical,

More information

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Objectives: To perform a hypothesis test concerning the slope of a least squares line To recognize that testing for a

More information

Answer: C. The strength of a correlation does not change if units change by a linear transformation such as: Fahrenheit = 32 + (5/9) * Centigrade

Answer: C. The strength of a correlation does not change if units change by a linear transformation such as: Fahrenheit = 32 + (5/9) * Centigrade Statistics Quiz Correlation and Regression -- ANSWERS 1. Temperature and air pollution are known to be correlated. We collect data from two laboratories, in Boston and Montreal. Boston makes their measurements

More information

Simple Linear Regression Inference

Simple Linear Regression Inference Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation

More information

Chapter 5 Analysis of variance SPSS Analysis of variance

Chapter 5 Analysis of variance SPSS Analysis of variance Chapter 5 Analysis of variance SPSS Analysis of variance Data file used: gss.sav How to get there: Analyze Compare Means One-way ANOVA To test the null hypothesis that several population means are equal,

More information

Multicollinearity Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised January 13, 2015

Multicollinearity Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised January 13, 2015 Multicollinearity Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised January 13, 2015 Stata Example (See appendices for full example).. use http://www.nd.edu/~rwilliam/stats2/statafiles/multicoll.dta,

More information

Binary Logistic Regression

Binary Logistic Regression Binary Logistic Regression Main Effects Model Logistic regression will accept quantitative, binary or categorical predictors and will code the latter two in various ways. Here s a simple model including

More information

Nonlinear Regression Functions. SW Ch 8 1/54/

Nonlinear Regression Functions. SW Ch 8 1/54/ Nonlinear Regression Functions SW Ch 8 1/54/ The TestScore STR relation looks linear (maybe) SW Ch 8 2/54/ But the TestScore Income relation looks nonlinear... SW Ch 8 3/54/ Nonlinear Regression General

More information

11. Analysis of Case-control Studies Logistic Regression

11. Analysis of Case-control Studies Logistic Regression Research methods II 113 11. Analysis of Case-control Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:

More information

CORRELATIONAL ANALYSIS: PEARSON S r Purpose of correlational analysis The purpose of performing a correlational analysis: To discover whether there

CORRELATIONAL ANALYSIS: PEARSON S r Purpose of correlational analysis The purpose of performing a correlational analysis: To discover whether there CORRELATIONAL ANALYSIS: PEARSON S r Purpose of correlational analysis The purpose of performing a correlational analysis: To discover whether there is a relationship between variables, To find out the

More information

Chapter 7: Simple linear regression Learning Objectives

Chapter 7: Simple linear regression Learning Objectives Chapter 7: Simple linear regression Learning Objectives Reading: Section 7.1 of OpenIntro Statistics Video: Correlation vs. causation, YouTube (2:19) Video: Intro to Linear Regression, YouTube (5:18) -

More information

Discussion Section 4 ECON 139/239 2010 Summer Term II

Discussion Section 4 ECON 139/239 2010 Summer Term II Discussion Section 4 ECON 139/239 2010 Summer Term II 1. Let s use the CollegeDistance.csv data again. (a) An education advocacy group argues that, on average, a person s educational attainment would increase

More information

Introduction to Hierarchical Linear Modeling with R

Introduction to Hierarchical Linear Modeling with R Introduction to Hierarchical Linear Modeling with R 5 10 15 20 25 5 10 15 20 25 13 14 15 16 40 30 20 10 0 40 30 20 10 9 10 11 12-10 SCIENCE 0-10 5 6 7 8 40 30 20 10 0-10 40 1 2 3 4 30 20 10 0-10 5 10 15

More information

Introduction; Descriptive & Univariate Statistics

Introduction; Descriptive & Univariate Statistics Introduction; Descriptive & Univariate Statistics I. KEY COCEPTS A. Population. Definitions:. The entire set of members in a group. EXAMPLES: All U.S. citizens; all otre Dame Students. 2. All values of

More information

Introduction to Longitudinal Data Analysis

Introduction to Longitudinal Data Analysis Introduction to Longitudinal Data Analysis Longitudinal Data Analysis Workshop Section 1 University of Georgia: Institute for Interdisciplinary Research in Education and Human Development Section 1: Introduction

More information

Panel Data Analysis in Stata

Panel Data Analysis in Stata Panel Data Analysis in Stata Anton Parlow Lab session Econ710 UWM Econ Department??/??/2010 or in a S-Bahn in Berlin, you never know.. Our plan Introduction to Panel data Fixed vs. Random effects Testing

More information

SIMPLE LINEAR CORRELATION. r can range from -1 to 1, and is independent of units of measurement. Correlation can be done on two dependent variables.

SIMPLE LINEAR CORRELATION. r can range from -1 to 1, and is independent of units of measurement. Correlation can be done on two dependent variables. SIMPLE LINEAR CORRELATION Simple linear correlation is a measure of the degree to which two variables vary together, or a measure of the intensity of the association between two variables. Correlation

More information

HURDLE AND SELECTION MODELS Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics July 2009

HURDLE AND SELECTION MODELS Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics July 2009 HURDLE AND SELECTION MODELS Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics July 2009 1. Introduction 2. A General Formulation 3. Truncated Normal Hurdle Model 4. Lognormal

More information

Gamma Distribution Fitting

Gamma Distribution Fitting Chapter 552 Gamma Distribution Fitting Introduction This module fits the gamma probability distributions to a complete or censored set of individual or grouped data values. It outputs various statistics

More information

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written

More information

E(y i ) = x T i β. yield of the refined product as a percentage of crude specific gravity vapour pressure ASTM 10% point ASTM end point in degrees F

E(y i ) = x T i β. yield of the refined product as a percentage of crude specific gravity vapour pressure ASTM 10% point ASTM end point in degrees F Random and Mixed Effects Models (Ch. 10) Random effects models are very useful when the observations are sampled in a highly structured way. The basic idea is that the error associated with any linear,

More information

Chapter 15. Mixed Models. 15.1 Overview. A flexible approach to correlated data.

Chapter 15. Mixed Models. 15.1 Overview. A flexible approach to correlated data. Chapter 15 Mixed Models A flexible approach to correlated data. 15.1 Overview Correlated data arise frequently in statistical analyses. This may be due to grouping of subjects, e.g., students within classrooms,

More information

Regression Analysis: A Complete Example

Regression Analysis: A Complete Example Regression Analysis: A Complete Example This section works out an example that includes all the topics we have discussed so far in this chapter. A complete example of regression analysis. PhotoDisc, Inc./Getty

More information

KSTAT MINI-MANUAL. Decision Sciences 434 Kellogg Graduate School of Management

KSTAT MINI-MANUAL. Decision Sciences 434 Kellogg Graduate School of Management KSTAT MINI-MANUAL Decision Sciences 434 Kellogg Graduate School of Management Kstat is a set of macros added to Excel and it will enable you to do the statistics required for this course very easily. To

More information

Longitudinal Data Analysis: Stata Tutorial

Longitudinal Data Analysis: Stata Tutorial Part A: Overview of Stata I. Reading Data: Longitudinal Data Analysis: Stata Tutorial use Read data that have been saved in Stata format. infile Read raw data and dictionary files. insheet Read spreadsheets

More information

MODEL I: DRINK REGRESSED ON GPA & MALE, WITHOUT CENTERING

MODEL I: DRINK REGRESSED ON GPA & MALE, WITHOUT CENTERING Interpreting Interaction Effects; Interaction Effects and Centering Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised February 20, 2015 Models with interaction effects

More information

SUMAN DUVVURU STAT 567 PROJECT REPORT

SUMAN DUVVURU STAT 567 PROJECT REPORT SUMAN DUVVURU STAT 567 PROJECT REPORT SURVIVAL ANALYSIS OF HEROIN ADDICTS Background and introduction: Current illicit drug use among teens is continuing to increase in many countries around the world.

More information

Chapter 13 Introduction to Linear Regression and Correlation Analysis

Chapter 13 Introduction to Linear Regression and Correlation Analysis Chapter 3 Student Lecture Notes 3- Chapter 3 Introduction to Linear Regression and Correlation Analsis Fall 2006 Fundamentals of Business Statistics Chapter Goals To understand the methods for displaing

More information

5. Linear Regression

5. Linear Regression 5. Linear Regression Outline.................................................................... 2 Simple linear regression 3 Linear model............................................................. 4

More information

A Basic Introduction to Missing Data

A Basic Introduction to Missing Data John Fox Sociology 740 Winter 2014 Outline Why Missing Data Arise Why Missing Data Arise Global or unit non-response. In a survey, certain respondents may be unreachable or may refuse to participate. Item

More information

2. Simple Linear Regression

2. Simple Linear Regression Research methods - II 3 2. Simple Linear Regression Simple linear regression is a technique in parametric statistics that is commonly used for analyzing mean response of a variable Y which changes according

More information

Using SAS Proc Mixed for the Analysis of Clustered Longitudinal Data

Using SAS Proc Mixed for the Analysis of Clustered Longitudinal Data Using SAS Proc Mixed for the Analysis of Clustered Longitudinal Data Kathy Welch Center for Statistical Consultation and Research The University of Michigan 1 Background ProcMixed can be used to fit Linear

More information

Introduction to Data Analysis in Hierarchical Linear Models

Introduction to Data Analysis in Hierarchical Linear Models Introduction to Data Analysis in Hierarchical Linear Models April 20, 2007 Noah Shamosh & Frank Farach Social Sciences StatLab Yale University Scope & Prerequisites Strong applied emphasis Focus on HLM

More information

II. DISTRIBUTIONS distribution normal distribution. standard scores

II. DISTRIBUTIONS distribution normal distribution. standard scores Appendix D Basic Measurement And Statistics The following information was developed by Steven Rothke, PhD, Department of Psychology, Rehabilitation Institute of Chicago (RIC) and expanded by Mary F. Schmidt,

More information

Directions for using SPSS

Directions for using SPSS Directions for using SPSS Table of Contents Connecting and Working with Files 1. Accessing SPSS... 2 2. Transferring Files to N:\drive or your computer... 3 3. Importing Data from Another File Format...

More information

Simple Regression Theory II 2010 Samuel L. Baker

Simple Regression Theory II 2010 Samuel L. Baker SIMPLE REGRESSION THEORY II 1 Simple Regression Theory II 2010 Samuel L. Baker Assessing how good the regression equation is likely to be Assignment 1A gets into drawing inferences about how close the

More information

Generalized Linear Models

Generalized Linear Models Generalized Linear Models We have previously worked with regression models where the response variable is quantitative and normally distributed. Now we turn our attention to two types of models where the

More information

Chicago Booth BUSINESS STATISTICS 41000 Final Exam Fall 2011

Chicago Booth BUSINESS STATISTICS 41000 Final Exam Fall 2011 Chicago Booth BUSINESS STATISTICS 41000 Final Exam Fall 2011 Name: Section: I pledge my honor that I have not violated the Honor Code Signature: This exam has 34 pages. You have 3 hours to complete this

More information

Illustration (and the use of HLM)

Illustration (and the use of HLM) Illustration (and the use of HLM) Chapter 4 1 Measurement Incorporated HLM Workshop The Illustration Data Now we cover the example. In doing so we does the use of the software HLM. In addition, we will

More information

This chapter will demonstrate how to perform multiple linear regression with IBM SPSS

This chapter will demonstrate how to perform multiple linear regression with IBM SPSS CHAPTER 7B Multiple Regression: Statistical Methods Using IBM SPSS This chapter will demonstrate how to perform multiple linear regression with IBM SPSS first using the standard method and then using the

More information

Ordinal Regression. Chapter

Ordinal Regression. Chapter Ordinal Regression Chapter 4 Many variables of interest are ordinal. That is, you can rank the values, but the real distance between categories is unknown. Diseases are graded on scales from least severe

More information

MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group

MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group MISSING DATA TECHNIQUES WITH SAS IDRE Statistical Consulting Group ROAD MAP FOR TODAY To discuss: 1. Commonly used techniques for handling missing data, focusing on multiple imputation 2. Issues that could

More information

Assignments Analysis of Longitudinal data: a multilevel approach

Assignments Analysis of Longitudinal data: a multilevel approach Assignments Analysis of Longitudinal data: a multilevel approach Frans E.S. Tan Department of Methodology and Statistics University of Maastricht The Netherlands Maastricht, Jan 2007 Correspondence: Frans

More information

STATISTICA Formula Guide: Logistic Regression. Table of Contents

STATISTICA Formula Guide: Logistic Regression. Table of Contents : Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

More information

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Volkert Siersma siersma@sund.ku.dk The Research Unit for General Practice in Copenhagen Dias 1 Content Quantifying association

More information

Association Between Variables

Association Between Variables Contents 11 Association Between Variables 767 11.1 Introduction............................ 767 11.1.1 Measure of Association................. 768 11.1.2 Chapter Summary.................... 769 11.2 Chi

More information

Statistics 104 Final Project A Culture of Debt: A Study of Credit Card Spending in America TF: Kevin Rader Anonymous Students: LD, MH, IW, MY

Statistics 104 Final Project A Culture of Debt: A Study of Credit Card Spending in America TF: Kevin Rader Anonymous Students: LD, MH, IW, MY Statistics 104 Final Project A Culture of Debt: A Study of Credit Card Spending in America TF: Kevin Rader Anonymous Students: LD, MH, IW, MY ABSTRACT: This project attempted to determine the relationship

More information

Recall this chart that showed how most of our course would be organized:

Recall this chart that showed how most of our course would be organized: Chapter 4 One-Way ANOVA Recall this chart that showed how most of our course would be organized: Explanatory Variable(s) Response Variable Methods Categorical Categorical Contingency Tables Categorical

More information

Regression III: Advanced Methods

Regression III: Advanced Methods Lecture 16: Generalized Additive Models Regression III: Advanced Methods Bill Jacoby Michigan State University http://polisci.msu.edu/jacoby/icpsr/regress3 Goals of the Lecture Introduce Additive Models

More information

Competing-risks regression

Competing-risks regression Competing-risks regression Roberto G. Gutierrez Director of Statistics StataCorp LP Stata Conference Boston 2010 R. Gutierrez (StataCorp) Competing-risks regression July 15-16, 2010 1 / 26 Outline 1. Overview

More information

Introduction to Regression and Data Analysis

Introduction to Regression and Data Analysis Statlab Workshop Introduction to Regression and Data Analysis with Dan Campbell and Sherlock Campbell October 28, 2008 I. The basics A. Types of variables Your variables may take several forms, and it

More information

25 Working with categorical data and factor variables

25 Working with categorical data and factor variables 25 Working with categorical data and factor variables Contents 25.1 Continuous, categorical, and indicator variables 25.1.1 Converting continuous variables to indicator variables 25.1.2 Converting continuous

More information

Two-sample hypothesis testing, II 9.07 3/16/2004

Two-sample hypothesis testing, II 9.07 3/16/2004 Two-sample hypothesis testing, II 9.07 3/16/004 Small sample tests for the difference between two independent means For two-sample tests of the difference in mean, things get a little confusing, here,

More information

Data Analysis Tools. Tools for Summarizing Data

Data Analysis Tools. Tools for Summarizing Data Data Analysis Tools This section of the notes is meant to introduce you to many of the tools that are provided by Excel under the Tools/Data Analysis menu item. If your computer does not have that tool

More information

Lecture 15. Endogeneity & Instrumental Variable Estimation

Lecture 15. Endogeneity & Instrumental Variable Estimation Lecture 15. Endogeneity & Instrumental Variable Estimation Saw that measurement error (on right hand side) means that OLS will be biased (biased toward zero) Potential solution to endogeneity instrumental

More information

Multivariate Analysis of Variance (MANOVA)

Multivariate Analysis of Variance (MANOVA) Chapter 415 Multivariate Analysis of Variance (MANOVA) Introduction Multivariate analysis of variance (MANOVA) is an extension of common analysis of variance (ANOVA). In ANOVA, differences among various

More information

From the help desk: Bootstrapped standard errors

From the help desk: Bootstrapped standard errors The Stata Journal (2003) 3, Number 1, pp. 71 80 From the help desk: Bootstrapped standard errors Weihua Guan Stata Corporation Abstract. Bootstrapping is a nonparametric approach for evaluating the distribution

More information

Models for Longitudinal and Clustered Data

Models for Longitudinal and Clustered Data Models for Longitudinal and Clustered Data Germán Rodríguez December 9, 2008, revised December 6, 2012 1 Introduction The most important assumption we have made in this course is that the observations

More information

Multilevel Analysis and Complex Surveys. Alan Hubbard UC Berkeley - Division of Biostatistics

Multilevel Analysis and Complex Surveys. Alan Hubbard UC Berkeley - Division of Biostatistics Multilevel Analysis and Complex Surveys Alan Hubbard UC Berkeley - Division of Biostatistics 1 Outline Multilevel data analysis Estimating specific parameters of the datagenerating distribution (GEE) Estimating

More information

Addressing Alternative. Multiple Regression. 17.871 Spring 2012

Addressing Alternative. Multiple Regression. 17.871 Spring 2012 Addressing Alternative Explanations: Multiple Regression 17.871 Spring 2012 1 Did Clinton hurt Gore example Did Clinton hurt Gore in the 2000 election? Treatment is not liking Bill Clinton 2 Bivariate

More information

Handling missing data in Stata a whirlwind tour

Handling missing data in Stata a whirlwind tour Handling missing data in Stata a whirlwind tour 2012 Italian Stata Users Group Meeting Jonathan Bartlett www.missingdata.org.uk 20th September 2012 1/55 Outline The problem of missing data and a principled

More information

Main Effects and Interactions

Main Effects and Interactions Main Effects & Interactions page 1 Main Effects and Interactions So far, we ve talked about studies in which there is just one independent variable, such as violence of television program. You might randomly

More information

T O P I C 1 2 Techniques and tools for data analysis Preview Introduction In chapter 3 of Statistics In A Day different combinations of numbers and types of variables are presented. We go through these

More information

SAS Syntax and Output for Data Manipulation:

SAS Syntax and Output for Data Manipulation: Psyc 944 Example 5 page 1 Practice with Fixed and Random Effects of Time in Modeling Within-Person Change The models for this example come from Hoffman (in preparation) chapter 5. We will be examining

More information

Data analysis process

Data analysis process Data analysis process Data collection and preparation Collect data Prepare codebook Set up structure of data Enter data Screen data for errors Exploration of data Descriptive Statistics Graphs Analysis

More information

Rockefeller College University at Albany

Rockefeller College University at Albany Rockefeller College University at Albany PAD 705 Handout: Hypothesis Testing on Multiple Parameters In many cases we may wish to know whether two or more variables are jointly significant in a regression.

More information

Panel Data Analysis Josef Brüderl, University of Mannheim, March 2005

Panel Data Analysis Josef Brüderl, University of Mannheim, March 2005 Panel Data Analysis Josef Brüderl, University of Mannheim, March 2005 This is an introduction to panel data analysis on an applied level using Stata. The focus will be on showing the "mechanics" of these

More information