Multilevel Analysis and Complex Surveys. Alan Hubbard UC Berkeley - Division of Biostatistics
|
|
|
- Irene Welch
- 10 years ago
- Views:
Transcription
1 Multilevel Analysis and Complex Surveys Alan Hubbard UC Berkeley - Division of Biostatistics 1
2 Outline Multilevel data analysis Estimating specific parameters of the datagenerating distribution (GEE) Estimating the whole (latent variable) distribution (Multilevel mixed models and MLE). Complex Survey (Estimation and Inference) Estimating Multilevel mixed models with complex survey data 2
3 Schedule Beginning Time Ending Time Topic 8:00 9:15 Introduction/Overview, GEE 9:15 9:45 GEE Exer 9:45 11:30 Multilievel Models 11:30 MLM Exercise 1:00 2:00 Complex Survey 2:00 2:30 Survey Exer 2:30 3:30 Combined 3:30 4:00 Combined Exer 4:15 5:00 Causality Issues (Michael Oakes, Ecological Effects, etc). 3
4 Multilevel Analysis and Complex Surveys Part 1: Parameters and inference from mixed models (MLE) and estimating equation (GEE) approaches Alan Hubbard UC Berkeley - Division of Biostatistics 4
5 Models For Multilevel Data References Analysis of Longitudinal Data by Diggle, Liang and Zeger. Applied Longitudinal Data, by Fizmaurice, Laird and Ware. Generalized Latent Variable Modeling: Multilevel, Longitudinal and Structural Equation Models by Skrondal, A. and Rabe-Hesketh, S. To gee or not to gee: comparing population average and mixed models for estimating the associations between neighborhood risk factors and health (with commentary and reply). Epidemiology, 21: (2010). 5
6 Generalized Estimation Equation (GEE) Approach to Clustered Data Alan Hubbard UC Berkeley - Division of Biostatistics 6
7 Clustered Data Regressions Ignore Clustering ordinary regressions, assuming that outcomes conditionally independent. Multilevel (Mixed Effects) Models explicit model of sources of random variability at cluster level, E(Y ijk X ijk,α i,α ij ), α i ~N(0,σ 2 α ),. Generalized Estimating Equation (GEE) approach only specify relative simple parameters (e.g., E(Y ijk X ijk )). 7
8 Issues with Clustered Data (Estimation) Covariates of Interest and identifiability: higher level (ecological) vs. individual level covariates. Targeting contributions of both. Defining effect of interest (e.g., direct effect of ecological covariates apart from individual level covariates). Causal inference challenges with clustered data (can one ever measure impact of composite variables vs. contextual variables?). Much work on mechanical implementation, less on what are the appropriate parameters of interest and necessary (but sometimes dubious) identifiability assumptions (Oakes). 8
9 Issues with Clustered Data (Correlation) Dealing with correlated data: general repeated measures issues. Model based inference (inference based on proposed data-generating distribution) Empirical inference use form of estimating equation to get simple robust empirical variance sampling distribution: ˆ θ = θ + 1 n n IC(O i ; θ,γ ) + op 1 n,var( θ ˆ ) var(ic) n i=1 9
10 Example: observations within subjects: The Effect of Drug and Alcohol Use on Teenage Sexual Activity Minnis & Padian (2001) conducted a longitudinal study of teenagers in San Rafael, California to investigate the association between drug and alcohol use and sexual activity on the same day. Participants were asked to keep track of their activities over approximately one month and binary indicator variables were created to show whether drug/alcohol use and/or sexual activity were reported for each 24 hour period. 10
11 Example of Binary Outcome: Sex, Drugs and Teenagers A longitudinal study of the effects of drug-use on sexual activity. Let X ij, the only explanatory variable of interest for now, indicate whether or not subject i reported drug-use (1=yes, 0=no) on day j. Let Y ij denote whether subject had sex (1=yes, 0=no), i.e., Y ij is a binary outcome and thus its expectation can be modeled via the logit transform. 11
12 Data eid today drgalcoh sx24hrs Jun 98 yes no Jun 98 no no Jun 98 no no Jun 98 yes no Jun 98 no no Jun 98 no no Jun 98 no no Jun 98 no no Jun 98 yes no Jun 98 no no Jun 98 no no Jun 98 no yes Jun 98 no no Jun 98 no no Jun 98 no no Jun 98 no no Jun 98 no yes Jun 98 no no Jun 98 no yes Jul 98 no yes Jul 98 no no Jul 98 no no Jul 98 no no Jul 98 no no Jun 98 no no Jun 98 no no Jun 98 no no 12
13 Sexual Activity and drug/alcohol use among teenagers revisted Main Variables sex24hrs - sex in last 24 hrs. (0=no, 1=yes) drgalcoh - drug or alcohol use in last 24 hrs. tues-sun - dummy variables designating day of week 13
14 Random Effects Models Uses a random effect to model the relative similarity of observations made on same statistical unit (e.g., person) Assumes Y ij and Y ik, j k are independent given some realized value of a random effect (β i0 ) and the covariates. Y ij Y ik X ij,β 0i The model assumes these random effects are randomly drawn from a known distribution. 14
15 Random Effects Model for Teenage Sex and Drug-Use logit[p(y ij =1 β 0i,X ij = x ij )] = log P(Y ij =1 β 0i, X ij = x ij ) = β RE 0 + β 0i + β RE 1 x ij P(Y ij = 0 β 0i,X ij = x ij ) Assume that the repeated observations for the ith teenager are independent of one another given β i0 and X ij. Must assume parametric distribution for the β i0, usually β i0 ~N(0,τ 2 ). exp(β 1 RE ) is odds ratio for having sex infection when subject i reports drug-use relative to when same subject does not report drug-use. 15
16 Motivation for This Approach Natural for modeling heterogeneity across individuals in their regression coefficients. This heterogeneity can be represented by a probability distribution Most useful when object is to make inferences about individuals rather than population averages. 16
17 Motivation for This Approach Also useful to estimate the contributions to variability from different sources (e.g., within and among individuals). Can be extended to hierarchy of units (multilevel modeling), such as repeated longitudinal measures of a person, within a household, within a community... 17
18 Some available software for random effects models Linear Models Proc Mixed in SAS xtreg in STATA (only simple random effects models) xtmixed in STATA 10 lme in R Logistic and Poisson Models xtlogit and xtpoisson in STATA for simple random effects, xtmelogit and xtmepoisson for general mixed models in STATA version 10 gllamm for general mixed models is STATA add-on 18
19 Random effects using xtlogit in STATA. xtlogit sx24hrs drgalcoh, or i(eid) re Random-effects logit Number of obs = 1708 Group variable (i) : eid Number of groups = 109 Random effects u_i ~ Gaussian Obs per group: min = 1 avg = 15.7 max = 33 Wald chi2(1) = 5.48 Log likelihood = Prob > chi2 = sx24hrs OR Std. Err. z P> z [95% Conf. Interval] exp(β 1 RE ) /lnsig2u τ sigma_u rho Likelihood ratio test of rho=0: chibar2(01) = Prob >= chibar2 =
20 Estimation of Marginal Models (GEE) Estimate marginal mean model. Marginal model is a population, not individual, model. The marginal E[Y ij X ij = x ij ] is defined as the mean value of an observation Y ij in the theoretical experiment where one randomly draws an observation from a population where everyone has X ij = x ij. 20
21 Marginal Models (GEE) For instance, if Y ij is the cholesterol and X ij = yes if one smokes, no otherwise. In a marginal model, E[Y ij X ij = yes] will be the mean of a randomly drawn Y ij from the subpopulation where everyone smokes. 21
22 Parameter Interpretation in a marginal model Parameters in an equivalent random effects and GEE model have subtly different interpretations. Coefficients in a random effects model represent expected differences (odds ratios, relative risks, etc) within an individual, given a change in their X from one value to another Coefficients in a marginal model represent expected differences (odds ratios, relative risks, etc) within an population, given a change in everyone s X from one value to another. 22
23 Parameter Interpretation in a GEE model, cont. In linear, log-linear models, the random effects and marginal regression parameters are the same. In Logistic regression, they are different more later. 23
24 24
25 Marginal Models (GEE) GEE software typically allows several different working correlation models (e.g., exchangeable, auto-regressive, unstructured, etc.). These correlation models are used to build weight matrices, which are used in a weighted regression. When deriving inferences for the coefficients, though, it calculates robust standard errors. 25
26 Examples of Correlation Models R R V = σ R R 0n Each individual is independent of all others Correlation within individuals across longitudinal observations has the same structure 26
27 Structure for R 0 General structure: 1 ρ 12 ρ 13 ρ 1n ρ 12 1 ρ 23 ρ 2n R 0 = ρ 13 ρ 23 1 ρ 3n 1 ρ 1n ρ 2n ρ 3n 1 A lot of unknown parameters 27
28 Correlation Models (contd): Uniform correlation (compound symmetry or exchangeable) 1 ρ ρ ρ ρ 1 ρ ρ R 0 = ρ ρ 1 ρ 1 ρ ρ ρ 1 Arises from random effects model e ij Y ij = α + α i + β x ij + e ij Errors uncorrelated, and independent of and x ij α i Var(α ρ = i ) Var(α i ) + Var(e ij ) 28
29 Correlation Models (contd):time-decaying Correlations (Auto-regressive) 1 ρ ρ 2 ρ n 1 ρ 1 ρ ρ n 2 R 0 = ρ 2 ρ 1 ρ n 3 1 ρ n 1 ρ n 2 ρ n 3 1 Auto-regressive: e ij = ρe ij 1 + η ij Not great for unequally spaced longitudinal data Exponential correlation model generalizes this to rather than corr(y ij, y ik ) = ρ t j t k ρ j k 29
30 Examples of var-cov. models Description Abbrev. Var-Cov. Matrix σ σ 0 σ 0 Compound Symmetry Unstructured Autoregressive Spatial Power CS UN AR(1) Banded Diagnonal UN(1) SP(POW)(c) 2 σ 0 2 σ 0 2 σ 0 2 σ 1 σ 2 +σ σ 0 2 σ 0 2 σ 0 2 σ 0 σ 2 +σ 0 2 σ σ 0 2 σ 0 2 σ 0 σ 2 +σ 0 2 σ 12 σ 13 σ 14 σ 12 2 σ 2 σ 23 σ 24 σ 13 σ 23 2 σ 3 σ 34 σ 14 σ 24 σ 34 2 σ 4 σ 2 ρσ 2 ρ 2 σ 2 ρ 3 σ 2 ρσ 2 σ 2 ρσ 2 ρ 2 σ 2 ρ 2 σ 2 ρσ 2 σ 2 ρσ 2 ρ 3 σ 2 ρ 2 σ 2 ρσ 2 σ 2 2 σ σ σ σ 4 σ 2 ρ d12 σ 2 ρ d13 σ 2 ρ d14 σ 2 ρ d12 σ 2 σ 2 ρ d23 σ 2 ρ d24 σ 2 ρ d13 σ 2 ρ d23 σ 2 σ 2 ρ d34 σ 2 ρ d14 σ 2 ρ d24 σ 2 ρ d34 σ 2 σ 2 30
31 The GEE Algorithm Algorithm is similar to the one used for the non-repeated measures problems (e.g., OLS for continuous data, logistic regression for binary and Poisson regression for counts). Let R(α) be a n i x n i "working" correlation matrix that is fully characterized by a vector of parameters, α. V i is again the variance-covariance of the observations which will be a function of the mean (E(Y i X i )), a scale parameter, φ and R(α). 31
32 Standard Errors of Coefficients GEE will normally return two estimates of the variance of the coefficient estimates, 1) naive and 2) robust. Naive assumes that the chosen model for R(α), such as compound symmetry, is correct. Robust is a more nonparametric estimate that does not assume your guess for R(α) is correct. However, its variance estimates can be more variable. 32
33 log GEE Marginal Model for Teenage Sex and Drug-Use µ P( Y = 1 ij ij ij ij M M it[ P( Yij = 1 Xij = xij)] = log = log = β0 + β1 1 µ ij P( Yij = 0 Xij = xij) var(y ij )= µ ij (1-µ ij )*, corr(y ij, Y ik ) = ρ (i.e., assume compound symmetry). exp(β 1M ) is a ratio of population frequencies, i.e., it is a population averaged parameter. It is the odds ratio of the probabilities (proportions) of teenagers who would engage in sexual activity in populations reporting drug use vs. populations not reporting drug-use. X = x ) x ij * Semi-robust inference can you tell why? 33
34 Sexual Activity and drug/alcohol use among teenagers revisted Main Variables sex24hrs - sex in last 24 hrs. (0=no, 1=yes) drgalcoh - drug or alcohol use in last 24 hrs. tues-sun - dummy variables designating day of week 34
35 Results using xtgee in STATA robust SE. xtgee sx24hrs drgalcoh, eform i(id) family(binomial) cor(ind) robust GEE population-averaged model Number of obs = 1708 Group variable: id Number of groups = 109 Link: logit Obs per group: min = 1 Family: binomial avg = 15.7 Correlation: independent max = 33 (standard errors adjusted for clustering on id) Semi-robust sx24hrs Odds Ratio Std. Err. z P> z [95% Conf. Interval] exp(β 1M )drgalcoh non-robust (naive) SE. xtgee sx24hrs drgalcoh, eform i(eid) family(binomial) cor(ind) sx24hrs Odds Ratio Std. Err. z P> z [95% Conf. Interval] drgalcoh
36 xtgee Options family(?), link(?) -- identify that we wish linear regression with continuous outcome (as compared to, say, binary outcomes more later) corr(ind) -- identify that we will assume independence for our correlation structure (some other possibilities include exchangeability and autoregressive structures) i(?)--identify which variable indentifies the individual (or cluster) ro -- identifies that we wish robust estimates of variability 36
37 Model 2 same marginal model, different working correlation. log µ P( Y = 1 ij ij ij ij M M it[ P( Yij = 1 Xij = xij)] = log = log = β0 + β1 1 µ ij P( Yij = 0 Xij = xij) X = x ) x ij x ij = 0 if drug/alcohol use is no, 1 if yes y ij = 0 if no sex in last 24 hours, 1 if yes cor(yij,yij )=ρ (compound symmetry or exchangeable correlation structure) 37
38 Results of Model 2 using STATA robust SE. xtgee sx24hrs drgalcoh, eform i(id) family(binomial) cor(exc) robust GEE population-averaged model Number of obs = 1708 Group variable: id Number of groups = 109 Link: logit Obs per group: min = 1 Family: binomial avg = 15.7 Correlation: exchangeable max = 33 (standard errors adjusted for clustering on id) Semi-robust sx24hrs Odds Ratio Std. Err. z P> z [95% Conf. Interval] drgalcoh non-robust (naive) SE. xtgee sx24hrs drgalcoh, eform i(eid) family(binomial) cor(exc) sx24hrs Odds Ratio Std. Err. z P> z [95% Conf. Interval] drgalcoh
39 Estimated Working Correlation. xtcorr c1 c2 c3 c4 c5 c6 c7 c8 c9 r r r r r r r r r r r r r r r r r r r
40 Model 3 adjusting for day of week log it[ P( Yij = 1 xij, dayij)] = β + β xij + γ z1ij + γ 2 z ij + + γ z ij... x ij = 1 if drug/alcohol use is yes, 0 if no z 1ij = 1 if interview day is Tuesday, 0 if not z 2ij = 1 if interview day is Wed., 0 if not... z 6ij = 1 if interview day is Sunday, 0 if not y ij = 1 if sex in last 24 hours, 0 if no cor(yij,yij )=ρ (compound symmetry or exchangeable correlation structure) 40
41 Results of Model 3 using STATA. xtgee sx24hrs drgalcoh tues wed thur fri sat sun, eform i(id) family(binomial > ) cor(exc) robust GEE population-averaged model Number of obs = 1708 Group variable: id Number of groups = 109 Link: logit Obs per group: min = 1 Family: binomial avg = 15.7 Correlation: exchangeable max = 33 Wald chi2(7) = Scale parameter: 1 Prob > chi2 = (standard errors adjusted for clustering on id) Semi-robust sx24hrs Odds Ratio Std. Err. z P> z [95% Conf. Interval] drgalcoh tues wed thur fri sat sun
42 Model for drug/alcohol use vs. day of week log it [ P( Xij = 1 dayij)] = γ + γ * z1i j γ * 2 z ij γ * z ij *... X ij = 1 if drug/alcohol use is yes, 0 if no z 1ij = 1 if interview day is Tuesday, 0 if not z 2ij = 1 if interview day is Wed., 0 if not... z 6ij = 1 if interview day is Sunday, 0 if not cor(yij,yij )=ρ (compound symmetry or exchangeable correlation structure) 42
43 Results of drug/alcohol use Model using STATA. xtgee drgalcoh tues wed thur fri sat sun, eform i(id) family(binomial) cor(ex > c) robust GEE population-averaged model Number of obs = 1708 Group variable: id Number of groups = 109 Link: logit Obs per group: min = 1 Family: binomial avg = 15.7 Correlation: exchangeable max = 33 Wald chi2(6) = Scale parameter: 1 Prob > chi2 = (standard errors adjusted for clustering on id) Semi-robust drgalcoh Odds Ratio Std. Err. z P> z [95% Conf. Interval] tues wed thur fri sat sun
44 Covariate and Cluster size issues We examine a simple example to look at how estimation and inference with clustered data are impacted by various changes in the data distribution. Cluster constant (e.g., county level) versus cluster varying (e.g., individual-level) covariates. Balanced versus unbalanced data (number of subunits within clusters). 44
45 Longitudinal Data on HIV+ patients Deeks, et al. (1999) report the results from a longitudinal study of HIV-infected adults undergoing Highly Active Anti-Retroviral Therapy (HAART) at San Francisco General Hospital (SFGH). Patients were included in this analysis if they received at least 16 weeks of continuous therapy with an anti-retroviral regimen The following data was obtained during the initial review: date of birth, sex and length of previous exposure to each individual anti-retroviral agent. 45
46 Once patients were identified, their medical records were reviewed every 3-4 months until November Plasma HIV RNA assays were performed using a branched DNA (bdna) assay. Repeated and irregular measurements of CD4 and viral load (time-structured repeated measures) Data not always matched in time. Goal is to find how CD4 varies with viral load and how this pattern varies in the population 46
47 Sample of HIV+ Data 47
48 CD4 versus Time etime 30 Evenly Spaced Subjects Ranked by Slope (CD4 vs. T) 48
49 HIV+ (CD4 Count) Data some simple analyses using only 2 observations per person Purpose is to illustrate the effects on estimates and inference of both different working correlation matrices and robust vs. naive inference: Consider two scenarios: baseline (time-independent) covariate, time-dependent covariate. 49
50 Association of Baseline Covariate (Age) on CD4 count. Binary age (X ij ) = 0 (<40) or 1 (>40) Fit simple linear model: E[ Y X = x] = β + β i j ij i 0 1 Compare results of Models A-D Naive Robust Unweighted OLS A B Weighted LS C D x i 50
51 Association of Baseline Covariate (Age) on CD4 count Model A. xtgee cd4 binage, i(id) cor(ind) cd4 Coef. Std. Err. z P> z [95% Conf. Interval] binage _cons Model B. xtgee cd4 binage, i(id) cor(ind) robust (standard errors adjusted for clustering on id) Semi-robust cd4 Coef. Std. Err. z P> z [95% Conf. Interval] binage _cons
52 Association of Baseline Covariate (Age) on CD4 count Model C. xtgee cd4 binage, i(id) cor(exc) cd4 Coef. Std. Err. z P> z [95% Conf. Interval] binage _cons Model D. xtgee cd4 binage, i(id) cor(exc) robust (standard errors adjusted for clustering on id) Semi-robust cd4 Coef. Std. Err. z P> z [95% Conf. Interval] binage _cons
53 Summary of Results of Association of Baseline Covariate (Age) on CD4 count β 0 (SE) Naive Robust Unweighted OLS (9.9) (12.6) Weighted LS (13.3) (12.6) β 1 (SE) Naive Robust Unweighted OLS (14.2) (19.3) Weighted LS (19.2) (19.3) 53
54 Association of Time (within cluster) Varying Covariate (Viral Load) on CD4 count. Binary VL: X ij = 0 (<2000) or 1 (>2000) all subjects included have one low and one high VL. Fit simple linear model: E[ Y X = x] = β + β i j ij i 0 1 x ij Compare results of Models A-D Naive Robust Unweighted OLS A B Weighted LS C D 54
55 Association of within-cluster-varying Covariate (VL) on CD4 count Model A. xtgee cd4 medvl, i(id) cor(ind) cd4 Coef. Std. Err. z P> z [95% Conf. Interval] medvl _cons Model B. xtgee cd4 medvl, i(id) cor(ind) robust (standard errors adjusted for clustering on id) Semi-robust cd4 Coef. Std. Err. z P> z [95% Conf. Interval] medvl _cons
56 Association of Within-Cluster-Varying Covariate (VL) on CD4 count Model C. xtgee cd4 medvl, i(id) cor(exc) cd4 Coef. Std. Err. z P> z [95% Conf. Interval] medvl _cons Model D. xtgee cd4 medvl, i(id) cor(exc) robust (standard errors adjusted for clustering on id) Semi-robust cd4 Coef. Std. Err. z P> z [95% Conf. Interval] medvl _cons
57 Robust Equivalent to Paired T-test Paired T-test. keep id cd4 medvl etime. sort cd4 medvl. reshape wide cd4 etime, i(id) j(medvl). ttest cd40= cd41 Paired t test Variable Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] cd cd diff Ho: mean(cd40 - cd41) = mean(diff) = 0 Ha: mean(diff) < 0 Ha: mean(diff)!= 0 Ha: mean(diff) > 0 t = t = t = P < t = P > t = P > t =
58 Summary of Results of Association of Time Varying Covariate (VL) on CD4 count β 0 (SE) Naive Robust Unweighted OLS 377.4(21.4) 377.4(22.9) Weighted LS 377.4(21.4) 377.4(22.9) β 1 (SE) Naive Robust Unweighted OLS -98.3(30.3) -98.3(16.5) Weighted LS -98.3(16.4) -98.3(16.5) t-test (difference) -98.3(16.5) 58
59 Multiple and varying observations per person CD4 (Y) vs. continuous (log) Viral Load (X) E[Y i j X i1 = x i1, X ij = x i j ] = β 0 + β 1 x i1 + β 2 (x ij x i1 ) β 2 represents the expected change in Y given a change in X ij relative to the baseline value (X i1 ) - longitudinal effect. β 1 represents the expected difference in average Y across two sub-populations that differ by their baseline values, X i1 - crosssectional effect. 59
60 Association of Within-Cluster-Varying Covariate (VL) on CD4 count multiple observations per person Model A. xtgee cd4 logvlbase logvlchange, i(id) cor(ind) GEE population-averaged model Number of obs = 7053 Group variable: id Number of groups = 406 Link: identity Obs per group: min = 1 Family: Gaussian avg = 17.4 Correlation: independent max = cd4 Coef. Std. Err. z P> z [95% Conf. Interval] logvlbase logvlchange _cons
61 Association of Time-Varying Covariate (VL) on CD4 count multiple observations per person Model B. xtgee cd4 logvlbase logvlchange, i(id) cor(ind) robust GEE population-averaged model Number of obs = 7053 Group variable: id Number of groups = 406 Link: identity Obs per group: min = 1 Family: Gaussian avg = 17.4 Correlation: independent max = 58 Wald chi2(2) = (standard errors adjusted for clustering on id) Semi-robust cd4 Coef. Std. Err. z P> z [95% Conf. Interval] logvlbase logvlchange _cons
62 Association of Time-Varying Covariate (VL) on CD4 count multiple observations per person Model C. xtgee cd4 logvlbase logvlchange, i(id) cor(exc) cd4 Coef. Std. Err. z P> z [95% Conf. Interval] logvlbase logvlchange _cons Model D (standard errors adjusted for clustering on id) Semi-robust cd4 Coef. Std. Err. z P> z [95% Conf. Interval] logvlbase logvlchange _cons
63 Summary of Results of Association of Time- Varying Covariate (VL) on CD4 count multiple observations per person β 0 (SE) Naive Robust Unweighted OLS 618.9(11.6) 618.9(35.2) Weighted LS 509.1(31.2) 509.1(32.9) β 1 (SE) Naive Robust Unweighted OLS -83.7(3.0) -83.7(8.3) Weighted LS -52.7(7.4) -52.7(7.7) β 2 (SE) Naive Robust Unweighted OLS -99.2(2.4) -99.2(6.8) Weighted LS -54.7(2.2) -54.7(3.2) 63
64 MultiLevel Models Alan Hubbard UC Berkeley - Division of Biostatistics 64
65 Many Names for variants of Same Statistical Model Hierarchical Linear Models (HLM s) Random Coefficient Models Mixed Models (most general) Multilevel models (MLM s) Nested modeling 65
66 Typical Data Structures for MLM s Distinctive feature is the hierarchical nature of statistical units, e.g. Neighborhoods people Measurements made over time on the people Classrooms students Measurements made over time on the students Different sources of variation: Between classrooms Between students within classrooms Within students 66
67 Motivation for using MLM s (mixed models) Procedure estimates the fixed effects of interest Dissects the sources of variation Accounts for residual correlation among statistically dependent units when deriving inference. Permits one to specify a rich set of correlation models and allows for heteroskedascity. 67
68 Motivation for using MLM s (mixed models) It allows different subjects to have different responses to a treatment, risk variable, etc., thus has intuitive appeal. Rarely interesting, but can also provide postestimation estimates of the random effects. You get the entire data-generating distribution. Use the virtues of having a likelihood. Can simulate data from the resulting parameter estimates In contrast with other approaches that only target a specific aspect of the data-generating distribution. 68
69 What s being Mixed? A mixed model has two types of effects, fixed and random. A fixed effect means that all levels of the variable are contained in the data and the effect is universal to all in the target population. A random effect means that the levels (effects) of the variable comprise random samples of the levels (effects) in the target population. Consider a risk factor effect. Fixed, Random, Both? 69
70 The Simplest Example. The Model: Y = µ + α + ij i e ij E(α i )=0, E(e ij )=0, E[α i e ij ]=0. Var(α i )= σ 2 α. Var(e ij )= σ 2 e. More specifically, α i ~N(0, σ 2 α), e ij ~N(0, σ 2 e). 70
71 Likelihood Given α i: f (Y ij α i ) = φ Y ij α i µ 2, f ( n Y i α σ i ) = φ Y ij α i µ i 2 e σ e Likelihood of observed data (for one unit) is: j =1 f ( Y i ) = f ( Y n i α) f (α)dα = φ Y ij α µ 2 σ e φ i α σ 2 α α α j =1 dα 71
72 Estimation of fixed effects using mixed models Random effects models imply certain variancecovariance structures. For instance, a simple random effects model results in equal correlation (exchangeable or compound symmetry) among all observations measured on the same subject. We know that if the variance-covariance matrix (V) is known, then the most efficient estimate of the coefficients is weighted-least squares: ˆ T 1 T β = ( X WX ) X WY 72 where W = V -1.
73 Estimation of coefficients using mixed models, cont. The Mixed Model procedure works by: Converting the random effects model into its implied variance-covariance matrix, V, starting with the independent model (OLS) it gets residuals and then estimates V based on this model, creates weight matrix as W = Vˆ 1, does weighted least squares and gets residuals, repeats until convergence. The SE s the procedure return come from: vâr( ˆ T 1 T 1 1 β ) = ( X WX ) = ( X V X ) 73 Vˆ ˆ
74 Model Based Inference When deriving the inference on coefficients, the estimating procedure assumes that the variance-covariance model of the outcome implied by the model IS CORRECT (i.e., it s SE(βˆ) always naïve, not robust ). 74
75 Virtues of MultiLevel Models Diez-Roux 75
76 Provides Road Map for Accounting for Systematic and random variation at various levels (individuals, counties, states,.) Diez-Roux 76
77 Stage 2 Diez-Roux 77
78 Put it together, just a mixed model Diez-Roux 78
79 Random Intercepts and Random Associations The Model: ij ( β ) 0 + β i + β 1 + β ) x eij Y + = 0 ( 1i ij E(β 0i )=0, E(β 1i )=0, E(e ij )=0. Var(β 0i )= σ 2 0, Var(β1 i )= σ2 1, Var(e ij )= σ2 cov(β 0i, β 1i )= σ 12, cov(β 0i, e ij )=0, cov(β 1i, e ij )=0. What are the fixed and random effects in this model? 79
80 Simple Example (Individual is Cluster) Orthodontic study (Potthoff and Roy; 1964) 16 boys and 11 girls between the ages of 8 and 14 years Response variable is the distance (in millimeters) between the pituitary and the pterygomaxillary fissure. 80
81 Dental Data obsno child age distance gender
82 Dental Data distance age(yrs) 82
83 Mixed Model I for Dental Data Model ij ( β + β ) i + β x eij Y + = where x ij, is the jth age of ith child, Y ij is the distance. ij β 0i i.i.d N(0, σ 2 0 ), e ij i.i.d. N(0, σ2 ) 83
84 Why is this model called MultiLevel? Can write the model in two steps (as two levels): Y x + ij * = β 0 i + β1 ij e ij then model for individual coefficients: ( β ) β + * 0 i = 0 β 0i Implies that each individual has their own random intercept and are drawn from a population with mean intercept, β 0, but all subjects have same slope, β 1. * β 0i Can also have functions of say baseline covariates, z, or β 0i * (z) = β 0 + β 0i + α 1 z 1 + α 2 z
85 Mixed (MultiLevel) Model II for Dental Data The Model (called a random coefficients model) E(β 0i )=0, E(β 1i )=0, E(e ij )=0. ij ( β ) 0 + β i + β 1 + β ) x eij Y + = 0 ( 1i ij Var(β 0i )= σ 2 0, Var(β 1i )= σ2 1, Var(e ij )= σ2 cov(β 0i, β 1i )= σ 12, cov(β 0i, e ij )=cov(β 1i, e ij )=0. 85
86 STATA for Model II xtmixed. xtmixed distance age child: age, cov(uns) Mixed-effects REML regression Number of obs = 108 Group variable: child Number of groups = 27 Obs per group: min = 4 avg = 4.0 max = 4 Wald chi2(1) = Log restricted-likelihood = Prob > chi2 = distance Coef. Std. Err. z P> z [95% Conf. Interval] age _cons
87 STATA for Model II xtmixed Variance Components Estimates Random-effects Parameters Estimate Std. Err. [95% Conf. Interval] child: Unstructured σ 1 sd(age) σ 0 sd(_cons) σ 12 corr(age,_cons) sd(residual) LR test vs. linear regression: chi2(3) = Prob > chi2 = Note: LR test is conservative and provided only for reference 87
88 STATA for Model II xtmixed random coefficient estimates. predict b*, reffects. list b* β 1i, b1 β 0i,b
89 Summary of Results of Association of Baseline Covariate (Age) on CD4 count β 0 (SE) Naive Robust Unweighted OLS (9.9) (12.6) Weighted LS (13.3) (12.6) β 1 (SE) Naive Robust Unweighted OLS (14.2) (19.3) Weighted LS (19.2) (19.3) 89
90 Re-visit Model of CD4 vs. Baseline Age Fit simple random effects model: Y i j = β 0 + β 0i + β 1 X ij + e ij. xtreg cd4 binage, i(id) re Random-effects GLS regression Number of obs = 594 Group variable (i): id Number of groups = 297 R-sq: within =. Obs per group: min = 2 between = avg = 2.0 overall = max = 2 Random effects u_i ~ Gaussian Wald chi2(1) = 1.59 corr(u_i, X) = 0 (assumed) Prob > chi2 = cd4 Coef. Std. Err. z P> z [95% Conf. Interval] est of β 1 binage est of β 0 _cons sigma_u estimate of σ 2 0 sigma_e estimate of σ 2 rho (fraction of variance due to u_i)
91 Association of Time-Varying Covariate (Viral Load) on CD4 count. Binary VL: X ij = 0 (<2000) or 1 (>2000) all subjects included have one low and one high VL. Fit simple linear model: E[ Y X = x] = β + β i j ij i 0 1 Compare results of Models A-D x ij Naive Robust Unweighted OLS A B Weighted LS C D 91
92 Summary of Results of Association of Time Varying Covariate (VL) on CD4 count (note, different data that last lecture) β 0 (SE) Naive Robust Unweighted OLS 355.1(21.7) 355.1(23.6)) Weighted LS 355.1(21.7) 377.4(22.9) β 1 (SE) Naive Robust Unweighted OLS -79.3(30.7) -79.3(17.1) Weighted LS -79.3(17.0) -79.3(17.1) t-test (difference) -79.3(17.1) 92
93 Random Effects Model of CD4 vs. log 10 (viral load) Fit simple random effects model: Y i j = β 0 + β 0i + β 1 X ij + e ij. xtreg cd4 medvl, i(id) re Random-effects GLS regression Number of obs = 174 Group variable (i): id Number of groups = 87 R-sq: within = Obs per group: min = 2 between = avg = 2.0 overall = max = 2 Random effects u_i ~ Gaussian Wald chi2(1) = corr(u_i, X) = 0 (assumed) Prob > chi2 = cd4 Coef. Std. Err. z P> z [95% Conf. Interval] medvl _cons sigma_u estimate of σ 2 0 sigma_e estimate of σ 2 rho (fraction of variance due to u_i)
94 Multiple and varying observations per person CD4 (Y) vs. continuous (log) Viral Load (X) E[Y i j X i1 = x i1,x ij = x i j] = β 0 + β 1 x i1 + β 2 (x ij x i1 ) β 2 represents the expected change in Y given a change in X ij relative to the baseline value (X i1 ) - longitudinal effect. β 1 represents the expected difference in average Y across two sub-populations that differ by their baseline values, X i1 - cross-sectional effect. 94
95 Summary of Results of Association of Time- Varying Covariate (VL) on CD4 count multiple observations per person β 0 (SE) Naive Robust Unweighted OLS 618.9(11.6) 618.9(35.2) Weighted LS 509.1(31.2) 509.1(32.9) β 1 (SE) Naive Robust Unweighted OLS -83.7(3.0) -83.7(8.3) Weighted LS -52.7(7.4) -52.7(7.7) β 2 (SE) Naive Robust Unweighted OLS -99.2(2.4) -99.2(6.8) Weighted LS -54.7(2.2) -54.7(3.2) 95
96 Random Effects Model of CD4 vs. log 10 (viral load) Fit simple random effects model: Y i j = β 0 + β 0i + β 1 X i1 + β 2 (X ij X i1 ) + e ij Random-effects GLS regression Number of obs = 7053 Group variable (i): id Number of groups = 406 R-sq: within = Obs per group: min = 1 between = avg = 17.4 overall = max = 58 Random effects u_i ~ Gaussian Wald chi2(2) = corr(u_i, X) = 0 (assumed) Prob > chi2 = cd4 Coef. Std. Err. z P> z [95% Conf. Interval] logvlbase logvlchange _cons sigma_u sigma_e rho (fraction of variance due to u_i)
97 Random Coefficients Model of CD4 vs. log 10 (viral load) Fit random coef. model: Y i j = (β 0 + β 0i ) + (β 1 + β 1i )X i1 + β 2 (X ij X i1 ) + e ij Fixed Effects (Coefficient) Estimates. xtmixed cd4 logvlbase logvlchange id: logvlbase Mixed-effects REML regression Number of obs = 7053 Group variable: id Number of groups = 406 Obs per group: min = 1 avg = 17.4 max = 58 Wald chi2(2) = Log restricted-likelihood = Prob > chi2 = cd4 Coef. Std. Err. z P> z [95% Conf. Interval] logvlbase logvlchange _cons
98 Random Coefficients Model of CD4 vs. log 10 (viral load) Fit random coef. model: Y i j = (β 0 + β 0i ) + (β 1 + β 1i )X i1 + β 2 (X ij X i1 ) + e ij Variance Components Estimates Random-effects Parameters Estimate Std. Err. [95% Conf. Interval] id: Independent var(β 1i ) sd(logvlb~e) var(β 0i ) sd(_cons) var(e ij ) sd(residual) LR test vs. linear regression: chi2(2) = Prob > chi2 =
99 Random Effects Model for Teenage Sex and Drug-Use logit[p(y ij =1 β 0i, X ij = x ij )] = log P(Y ij =1 β 0i,X ij ) = β * P(Y ij = 0 β 0i,X ij ) 0i + β 1 X ij,β * 0i = β 0+ β 0i,β 0i ~ N(0,τ 2 ) Assume that the repeated observations for the ith teenager are independent of one another given β i0 and X ij. Must assume parametric distribution for the β i0, usually β i0 ~N(0,τ 2 ). exp(β 1 ) is odds ratio for having sex infection when subject i reports drug-use relative to when same subject does not report drug-use. 99
100 Random effects for teenage sex vs drug use. xtlogit sx24hrs drgalcoh, or i(eid) re Random-effects logit Number of obs = 1708 Group variable (i) : eid Number of groups = 109 Random effects u_i ~ Gaussian Obs per group: min = 1 avg = 15.7 max = 33 Wald chi2(1) = 5.48 Log likelihood = Prob > chi2 = sx24hrs OR Std. Err. z P> z [95% Conf. Interval] Β 1 drgalcoh /lnsig2u τ sigma_u rho Likelihood ratio test of rho=0: chibar2(01) = Prob >= chibar2 =
101 Random Effects Model for Diarrhea Study in Children P( Yijk log 1 P( Y Measurements made at children (k) within households (j) within villages (k). Want to know the greatest sources of variation: households var(β 0ij ) or villages var(β 0i ) = 1) = 1) ijk = β + β + β 0 0i 0ij Assumes children in same household have same probability of diarrhea. Use gllamm in STATA (also xtmelogit) 101
102 Random Effects Model for Diarrhea Study in Children gllamm diarrhea, i(hhid vilid) nip(5) family(binomial) number of level 1 units = 4736 number of level 2 units = 18 number of level 3 units = 4 Condition Number = gllamm model log likelihood = diarrhea Coef. Std. Err. z P> z [95% Conf. Interval] _cons
103 Random Effects Model for Diarrhea Study in Children Variances and covariances of random effects ***level 2 (hhid) var(β 0ij ) var(1): ( ) ***level 3 (vilid) var(β 0i ) var(1): ( ) Cluster correlation coefficient (based on latent response model): ρ ρ house village = var( β 0ij = var( β 0ij var( β 0ij ) + var( β ) 0i var( β0i ) ) + var( β 0i ) + π 2 ) + π 2 = = = =
104 Bangladesh Fertility Study 104
105 Bangladesh Fertility Study Slides from: Roberto G. Gutierrez, Director of Statistics, StataCorp LP North American Stata Users Group Meeting, Boston
106 Bangladesh Fertility Study Slides from: Roberto G. Gutierrez, Director of Statistics, StataCorp LP North American Stata Users Group Meeting, Boston
107 Bangladesh Fertility Study Slides from: Roberto G. Gutierrez, Director of Statistics, StataCorp LP North American Stata Users Group Meeting, Boston
108 Bangladesh Fertility Study Slides from: Roberto G. Gutierrez, Director of Statistics, StataCorp LP North American Stata Users Group Meeting, Boston
109 Bangladesh Fertility Study Slides from: Roberto G. Gutierrez, Director of Statistics, StataCorp LP North American Stata Users Group Meeting, Boston
110 Bangladesh Fertility Study Slides from: Roberto G. Gutierrez, Director of Statistics, StataCorp LP North American Stata Users Group Meeting, Boston
111 Tower of London Task Study Slides from: Roberto G. Gutierrez, Director of Statistics, StataCorp LP North American Stata Users Group Meeting, Boston
112 Tower of London Task Study Slides from: Roberto G. Gutierrez, Director of Statistics, StataCorp LP North American Stata Users Group Meeting, Boston
113 Tower of London Task Study Slides from: Roberto G. Gutierrez, Director of Statistics, StataCorp LP North American Stata Users Group Meeting, Boston
114 References Slides from: Roberto G. Gutierrez, Director of Statistics, StataCorp LP North American Stata Users Group Meeting, Boston
115 Using conditional logistic regression for estimating within unit OR in logistic regression models 115
116 Treat Individual as a stratification variable for Teen Sex and Drugs For the teen sex and drugs example, we can represent the data on each individual, i, as a simple 2x2 table: Sex yes no D r u g s yes a i b i no c i d i n i Can get the OR for every subject: ˆ O R i = a id i b i c i Because our model it P( Yij = 1 β0i, Xij = xij) log [ P( Yij = 1 β 0 i, Xij = xij)] = log = β * + β0i + β * x 0 ij P( Yij = 0 β i Xij xij 1 0, = ) assumes every person has the same OR, we can average each estimated OR to get the estimate. 116
117 Mantel-Haenszel Average of Stratified OR s Then the MH estimate is: O ˆ R MH = exp( β ˆ 1 * ) = m i=1 m w i ˆ O R i w i i=1 i=1 Note, that for any subject who has identical exposure (drug use) or outcomes (sex) for all observations, the OR is undefined and that person does not contribute to the estimate (their 2x2 table are dropped). = m i=1 m (a i d i ) /n i (b i c i ) /n i 117
118 Conditional Logistic Regression To illustrate, use the teenage sex and drugs example, assume just two observation for a person, and that one had the outcome (Y i1 =1) with drugs (X i1 =1) one observation had neither (Y i2 =0, X i2 =0). Then, the conditional likelihood contribution for this observation is: CondLik i = P(Y i1 =1 X i1 =1)P(Y i2 = 0 X i2 = 0) P(Y i1 =1 X i1 =1)P(Y i2 = 0 X i2 = 0) + P(Y i1 =1 X i1 = 0)P(Y i2 = 0 X i2 =1) After plugging in the model for Y ij: log P( Y = 1 β, X x ) it ij 0i ij = ij [ P( Yij = 1 β 0 i, Xij = xij)] = log = β * + β 0 0i + β * P( Yij = 0 β i Xij xij 1 0, = ) and doing some algebra, one gets: CondLik i = 1 1+ exp(β 1 * (X i2 X i1 )) x ij Notice, the individual level intercept (whether random or not) drops out. 118
119 Conditional Logistic Regression What it means is that the estimate of the within subject OR no longer depends on assumptions on the distribution of the random effect. Can only use this to estimate the association of timevarying covariates. Subjects with identical outcomes will be dropped from analysis. For those covariates that do not change in a subject, they will not contribute to estimation of the OR for that covariate. 119
120 Conditional Logistic Regression More generally, you might want to estimate the within subject OR for several variables simultaneously and/or the OR for a unit change in a continuous variable. Can still do so by using the conditional likelihood - a method used to estimated OR s for matched case-control studies. The conditional likelihood (in example of a cohort) is the probability of observing that the cases have covariates they have and the controls have their observed covariates, given the distribution of covariates observed over all the repeated measurements. To define the likelihood, one normalizes the probability of observing the outcomes conditional on the covariates by the summed probabilities over all possible combinations of covariates and outcomes. 120
121 Teenage Sex and Drug-Use Using M-H summary OR.. cs sx24hrs drgalcoh, by(eid) or eid OR [95% Conf. Interval] M-H Weight (Cornfield) (Cornfield) (Cornfield) (Cornfield) (Cornfield) (Cornfield) (Cornfield) (Cornfield) (Cornfield) (Cornfield) (Cornfield) (Cornfield) (Cornfield) (Cornfield) (Cornfield) (Cornfield) M-H combined
122 Conditional Logistic Estimate. clogit sx24hrs drgalcoh, or group(eid) note: multiple positive outcomes within groups encountered. note: 23 groups (161 obs) dropped due to all positive or all negative outcomes. Iteration 0: log likelihood = Iteration 1: log likelihood = Iteration 2: log likelihood = Conditional (fixed-effects) logistic regression Number of obs = 1547 LR chi2(1) = 2.93 Prob > chi2 = Log likelihood = Pseudo R2 = sx24hrs Odds Ratio Std. Err. z P> z [95% Conf. Interval] drgalcoh
123 Pitfalls of Latent Variable Models in General (including MultiLevel Mixed Models) 123
124 General Contrast of Mixed and GEE Models General Mixed Effects Model Specific Mixed Model (logistic) 124
125 Mixed Models General Likelihood of Observed Data based on Mixed Model. Specific from example 125
126 Latent Variable Models Nonparametric Nonidentifiability Point 1 for Mixed Models 126
127 Parameter returned by GEE: Population Average Models Parameter specific estimating function Estimating Function 127
128 Special Case: MultiLevel model coefficients have interpretations in both latent variable and observed data-generating worlds 128
129 Usually, different interpretations Logistic Case 129
130 Often difference in interpretations (Pop. Ave. vs. Mixed) (near) meaningless 130
131 It gets worse: What if Model is mis-specified? Parameter of Interest as Projection 131
132 Interpretation of coefficients in mis-specified multilevel mixed models can get wacky True model 132
133 Summary of MultiLevel Mixed vs. GEE (Population Ave.) Models 133
134 New Golden Rules of Estimation with Latent Variable Models? 134
135 Should avoid relying solely on unverifiable assumptions for inferences 135
136 Multilevel Analysis and Complex Surveys Part 2: Estimation and Inference from Complex Surveys Alan Hubbard UC Berkeley - Division of Biostatistics 136
137 Foundation Finite population U={1,..,N}. Sample s, subset of U. V=(Y,X) observations of outcome, covariates, on each units. Values in finite fixed population are v U =v 1,..,v N and the process by which one draws these will be called the observation process. Parameters can be defined with regard to a finite population, or superpopulation that is, there is some data generating model of interest, P V, and we want to estimate parameters of it, θ(p V ). 137
138 Sampling Mechanism δ=(i t, t=1,..,n) with δ t =1 if t s, 0 otherwise g(δ V=v,Z) sampling mechanism, where Z are so-called design variables, so z U =z 1,..,z N. Thus, the observed data O is generated by a combination of the mechanisms, P V Z and g: O=(δ*v U *z U ) define the joint distribution of observed data, P 0 (O), O=(V, Z δ=1) Special cases: g(δ V=v,Z) = g(δ Z) (noninformative sampling). Parameters of interest from distribution of V Z=z (disaggregated analysis) Parameters of interest from distribution of P V (aggregated) P V (v;θ ) = P(V = v Z = z) p(z = z) 138 z
139 Types of Inference Design-based inference Model-based inference 139
140 Full Likelihood Consider one more source of missingness (say R=1 respondent), e.g., nonrespondent. Full likelihood is then: f (r γ,z,v) f (δ z,v) f (v z) f (z) Missingness caused by mechanisms for both δ and r. Different assumptions imply different conditional independences. 140
141 Pseudo-Likelihood Estimating Equation Approaches Most practical applications of survey data do not contain enough information to define the entire likelihood of the joint missingness/ sampling mechanisms and the distribution of interest. In addition, the parameter of interest can often be identified without having the entire joint likelihood identifiable, but just some of the design elements. Thus, most survey analyses rely on pseudolikelihood estimation. 141
142 Simple Example Assume an exponential family: If the entire population was a simple random draw from this distribution (and you observed everyone s value) then the score equation based on the likelihood would be: with obvious solution f (y;θ) = θe θy, E(Y) =1/θ s(θ;v ) = N t =1 ( ) Y 1 θ 1 ˆ θ = Y ave(y) 142
143 Pseudo-Likelihood, Continued However, let s now assume all we have is the usual subset of the population U defined by s, and the probability that a observation was sampled, given it s observed values (for now, no Z): π t =P(δ t =1 Y t ). An unbiased estimating function for the population average (and thus the parameter of f(y;θ)) is: s(θ;o,π) = 1 ( Y 1 ) θ = t s π t Thus, just treat as a general missingness problem and use inverse weighting. N t =1 δ t π t ( Y 1 ) θ 143
144 It works (in this case)! Consistent estimating equation, which of course results in estimator, when solved: E Y E δ π Y 1 θ ( ) Y Note π = E(δ Y), so get E Y ( Y 1 ) θ = 0 ˆ µ s 1 ˆ θ s = E ( Y Y 1 θ)e δ π Y = t s t s π t 1 Y t π t 1 So, a re-weighted score equation provides consistent, pseudo-likelihood estimate. 144
145 Inference From Estimating Equation Designed based Approach parametric estimation uncertainty from repeated samplings (of the type done) from a fixed target population (Y U fixed). Model based from repeated draws from the underlying data generating distribution (Y U random). Both variance comes both from underlying data-generating mechanism and sampling mechanism if finite population large, model-based portion contributes almost nothing var( θ ˆ s ) = v ar(e( θ ˆ s U)) + E(var( θ ˆ 145 s U)) Model Source Design Source
146 Designed-based Inference, cont. All from sampling mechanism. Need simple empirical estimate derived from estimating equation. Often called sandwich estimator. Can be generally derived as the variancecovariance of the influence curve of the estimator. ˆ θ θ + 1 n t s IC(O t ;γ,θ), so var( θ ˆ ) var(ic(o ;γ,θ)) t n 146
147 Designed-based Inference In this general framework the things to account for in inference: stratified design (not a simple random sample) finite sample population correction (sometime samples not from an infinite population) clustered (correlated) data 147
148 Multilevel Analysis and Complex Surveys Part 3: Putting MultiLevel Models and Complex Survey Data Together Alan Hubbard UC Berkeley - Division of Biostatistics Slides are from Sophia Rabe-Hesketh, UC Berkeley, School of Education and Division of Biostatistics 148
Lab 5 Linear Regression with Within-subject Correlation. Goals: Data: Use the pig data which is in wide format:
Lab 5 Linear Regression with Within-subject Correlation Goals: Data: Fit linear regression models that account for within-subject correlation using Stata. Compare weighted least square, GEE, and random
Sample Size Calculation for Longitudinal Studies
Sample Size Calculation for Longitudinal Studies Phil Schumm Department of Health Studies University of Chicago August 23, 2004 (Supported by National Institute on Aging grant P01 AG18911-01A1) Introduction
Correlated Random Effects Panel Data Models
INTRODUCTION AND LINEAR MODELS Correlated Random Effects Panel Data Models IZA Summer School in Labor Economics May 13-19, 2013 Jeffrey M. Wooldridge Michigan State University 1. Introduction 2. The Linear
10 Dichotomous or binary responses
10 Dichotomous or binary responses 10.1 Introduction Dichotomous or binary responses are widespread. Examples include being dead or alive, agreeing or disagreeing with a statement, and succeeding or failing
Models for Longitudinal and Clustered Data
Models for Longitudinal and Clustered Data Germán Rodríguez December 9, 2008, revised December 6, 2012 1 Introduction The most important assumption we have made in this course is that the observations
xtmixed & denominator degrees of freedom: myth or magic
xtmixed & denominator degrees of freedom: myth or magic 2011 Chicago Stata Conference Phil Ender UCLA Statistical Consulting Group July 2011 Phil Ender xtmixed & denominator degrees of freedom: myth or
I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN
Beckman HLM Reading Group: Questions, Answers and Examples Carolyn J. Anderson Department of Educational Psychology I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN Linear Algebra Slide 1 of
Introduction to Longitudinal Data Analysis
Introduction to Longitudinal Data Analysis Longitudinal Data Analysis Workshop Section 1 University of Georgia: Institute for Interdisciplinary Research in Education and Human Development Section 1: Introduction
DETERMINANTS OF CAPITAL ADEQUACY RATIO IN SELECTED BOSNIAN BANKS
DETERMINANTS OF CAPITAL ADEQUACY RATIO IN SELECTED BOSNIAN BANKS Nađa DRECA International University of Sarajevo [email protected] Abstract The analysis of a data set of observation for 10
Prediction for Multilevel Models
Prediction for Multilevel Models Sophia Rabe-Hesketh Graduate School of Education & Graduate Group in Biostatistics University of California, Berkeley Institute of Education, University of London Joint
Poisson Models for Count Data
Chapter 4 Poisson Models for Count Data In this chapter we study log-linear models for count data under the assumption of a Poisson error structure. These models have many applications, not only to the
Assignments Analysis of Longitudinal data: a multilevel approach
Assignments Analysis of Longitudinal data: a multilevel approach Frans E.S. Tan Department of Methodology and Statistics University of Maastricht The Netherlands Maastricht, Jan 2007 Correspondence: Frans
Longitudinal Data Analysis
Longitudinal Data Analysis Acknowledge: Professor Garrett Fitzmaurice INSTRUCTOR: Rino Bellocco Department of Statistics & Quantitative Methods University of Milano-Bicocca Department of Medical Epidemiology
Department of Economics Session 2012/2013. EC352 Econometric Methods. Solutions to Exercises from Week 10 + 0.0077 (0.052)
Department of Economics Session 2012/2013 University of Essex Spring Term Dr Gordon Kemp EC352 Econometric Methods Solutions to Exercises from Week 10 1 Problem 13.7 This exercise refers back to Equation
VI. Introduction to Logistic Regression
VI. Introduction to Logistic Regression We turn our attention now to the topic of modeling a categorical outcome as a function of (possibly) several factors. The framework of generalized linear models
Biostatistics Short Course Introduction to Longitudinal Studies
Biostatistics Short Course Introduction to Longitudinal Studies Zhangsheng Yu Division of Biostatistics Department of Medicine Indiana University School of Medicine Zhangsheng Yu (Indiana University) Longitudinal
ESTIMATING AVERAGE TREATMENT EFFECTS: IV AND CONTROL FUNCTIONS, II Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics
ESTIMATING AVERAGE TREATMENT EFFECTS: IV AND CONTROL FUNCTIONS, II Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics July 2009 1. Quantile Treatment Effects 2. Control Functions
From the help desk: Bootstrapped standard errors
The Stata Journal (2003) 3, Number 1, pp. 71 80 From the help desk: Bootstrapped standard errors Weihua Guan Stata Corporation Abstract. Bootstrapping is a nonparametric approach for evaluating the distribution
Overview. Longitudinal Data Variation and Correlation Different Approaches. Linear Mixed Models Generalized Linear Mixed Models
Overview 1 Introduction Longitudinal Data Variation and Correlation Different Approaches 2 Mixed Models Linear Mixed Models Generalized Linear Mixed Models 3 Marginal Models Linear Models Generalized Linear
Longitudinal Data Analysis: Stata Tutorial
Part A: Overview of Stata I. Reading Data: Longitudinal Data Analysis: Stata Tutorial use Read data that have been saved in Stata format. infile Read raw data and dictionary files. insheet Read spreadsheets
From the help desk: Swamy s random-coefficients model
The Stata Journal (2003) 3, Number 3, pp. 302 308 From the help desk: Swamy s random-coefficients model Brian P. Poi Stata Corporation Abstract. This article discusses the Swamy (1970) random-coefficients
Milk Data Analysis. 1. Objective Introduction to SAS PROC MIXED Analyzing protein milk data using STATA Refit protein milk data using PROC MIXED
1. Objective Introduction to SAS PROC MIXED Analyzing protein milk data using STATA Refit protein milk data using PROC MIXED 2. Introduction to SAS PROC MIXED The MIXED procedure provides you with flexibility
Basic Statistical and Modeling Procedures Using SAS
Basic Statistical and Modeling Procedures Using SAS One-Sample Tests The statistical procedures illustrated in this handout use two datasets. The first, Pulse, has information collected in a classroom
HLM software has been one of the leading statistical packages for hierarchical
Introductory Guide to HLM With HLM 7 Software 3 G. David Garson HLM software has been one of the leading statistical packages for hierarchical linear modeling due to the pioneering work of Stephen Raudenbush
Analysis of Correlated Data. Patrick J. Heagerty PhD Department of Biostatistics University of Washington
Analysis of Correlated Data Patrick J Heagerty PhD Department of Biostatistics University of Washington Heagerty, 6 Course Outline Examples of longitudinal data Correlation and weighting Exploratory data
STATISTICA Formula Guide: Logistic Regression. Table of Contents
: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary
Introduction to mixed model and missing data issues in longitudinal studies
Introduction to mixed model and missing data issues in longitudinal studies Hélène Jacqmin-Gadda INSERM, U897, Bordeaux, France Inserm workshop, St Raphael Outline of the talk I Introduction Mixed models
Chapter 1. Longitudinal Data Analysis. 1.1 Introduction
Chapter 1 Longitudinal Data Analysis 1.1 Introduction One of the most common medical research designs is a pre-post study in which a single baseline health status measurement is obtained, an intervention
SAS Software to Fit the Generalized Linear Model
SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling
ECON 142 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE #2
University of California, Berkeley Prof. Ken Chay Department of Economics Fall Semester, 005 ECON 14 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE # Question 1: a. Below are the scatter plots of hourly wages
Overview Classes. 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7)
Overview Classes 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7) 2-4 Loglinear models (8) 5-4 15-17 hrs; 5B02 Building and
Geostatistics Exploratory Analysis
Instituto Superior de Estatística e Gestão de Informação Universidade Nova de Lisboa Master of Science in Geospatial Technologies Geostatistics Exploratory Analysis Carlos Alberto Felgueiras [email protected]
Multilevel Modelling of medical data
Statistics in Medicine(00). To appear. Multilevel Modelling of medical data By Harvey Goldstein William Browne And Jon Rasbash Institute of Education, University of London 1 Summary This tutorial presents
Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.
Course Catalog In order to be assured that all prerequisites are met, students must acquire a permission number from the education coordinator prior to enrolling in any Biostatistics course. Courses are
Multinomial and Ordinal Logistic Regression
Multinomial and Ordinal Logistic Regression ME104: Linear Regression Analysis Kenneth Benoit August 22, 2012 Regression with categorical dependent variables When the dependent variable is categorical,
GLM I An Introduction to Generalized Linear Models
GLM I An Introduction to Generalized Linear Models CAS Ratemaking and Product Management Seminar March 2009 Presented by: Tanya D. Havlicek, Actuarial Assistant 0 ANTITRUST Notice The Casualty Actuarial
gllamm companion for Contents
gllamm companion for Rabe-Hesketh, S. and Skrondal, A. (2012). Multilevel and Longitudinal Modeling Using Stata (3rd Edition). Volume I: Continuous Responses. College Station, TX: Stata Press. Contents
Multilevel Modeling of Complex Survey Data
Multilevel Modeling of Complex Survey Data Sophia Rabe-Hesketh, University of California, Berkeley and Institute of Education, University of London Joint work with Anders Skrondal, London School of Economics
Linear Mixed-Effects Modeling in SPSS: An Introduction to the MIXED Procedure
Technical report Linear Mixed-Effects Modeling in SPSS: An Introduction to the MIXED Procedure Table of contents Introduction................................................................ 1 Data preparation
Statistics Graduate Courses
Statistics Graduate Courses STAT 7002--Topics in Statistics-Biological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.
Introduction to Multilevel Modeling Using HLM 6. By ATS Statistical Consulting Group
Introduction to Multilevel Modeling Using HLM 6 By ATS Statistical Consulting Group Multilevel data structure Students nested within schools Children nested within families Respondents nested within interviewers
Logit Models for Binary Data
Chapter 3 Logit Models for Binary Data We now turn our attention to regression models for dichotomous data, including logistic regression and probit analysis. These models are appropriate when the response
Multilevel Models for Longitudinal Data. Fiona Steele
Multilevel Models for Longitudinal Data Fiona Steele Aims of Talk Overview of the application of multilevel (random effects) models in longitudinal research, with examples from social research Particular
Linda K. Muthén Bengt Muthén. Copyright 2008 Muthén & Muthén www.statmodel.com. Table Of Contents
Mplus Short Courses Topic 2 Regression Analysis, Eploratory Factor Analysis, Confirmatory Factor Analysis, And Structural Equation Modeling For Categorical, Censored, And Count Outcomes Linda K. Muthén
Panel Data Analysis Josef Brüderl, University of Mannheim, March 2005
Panel Data Analysis Josef Brüderl, University of Mannheim, March 2005 This is an introduction to panel data analysis on an applied level using Stata. The focus will be on showing the "mechanics" of these
Marginal Effects for Continuous Variables Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised February 21, 2015
Marginal Effects for Continuous Variables Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised February 21, 2015 References: Long 1997, Long and Freese 2003 & 2006 & 2014,
Department of Epidemiology and Public Health Miller School of Medicine University of Miami
Department of Epidemiology and Public Health Miller School of Medicine University of Miami BST 630 (3 Credit Hours) Longitudinal and Multilevel Data Wednesday-Friday 9:00 10:15PM Course Location: CRB 995
Failure to take the sampling scheme into account can lead to inaccurate point estimates and/or flawed estimates of the standard errors.
Analyzing Complex Survey Data: Some key issues to be aware of Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised January 24, 2015 Rather than repeat material that is
What s New in Econometrics? Lecture 8 Cluster and Stratified Sampling
What s New in Econometrics? Lecture 8 Cluster and Stratified Sampling Jeff Wooldridge NBER Summer Institute, 2007 1. The Linear Model with Cluster Effects 2. Estimation with a Small Number of Groups and
Overview of Methods for Analyzing Cluster-Correlated Data. Garrett M. Fitzmaurice
Overview of Methods for Analyzing Cluster-Correlated Data Garrett M. Fitzmaurice Laboratory for Psychiatric Biostatistics, McLean Hospital Department of Biostatistics, Harvard School of Public Health Outline
Handling missing data in Stata a whirlwind tour
Handling missing data in Stata a whirlwind tour 2012 Italian Stata Users Group Meeting Jonathan Bartlett www.missingdata.org.uk 20th September 2012 1/55 Outline The problem of missing data and a principled
Generalized Linear Models
Generalized Linear Models We have previously worked with regression models where the response variable is quantitative and normally distributed. Now we turn our attention to two types of models where the
Qualitative vs Quantitative research & Multilevel methods
Qualitative vs Quantitative research & Multilevel methods How to include context in your research April 2005 Marjolein Deunk Content What is qualitative analysis and how does it differ from quantitative
Efficient and Practical Econometric Methods for the SLID, NLSCY, NPHS
Efficient and Practical Econometric Methods for the SLID, NLSCY, NPHS Philip Merrigan ESG-UQAM, CIRPÉE Using Big Data to Study Development and Social Change, Concordia University, November 2103 Intro Longitudinal
BIO 226: APPLIED LONGITUDINAL ANALYSIS COURSE SYLLABUS. Spring 2015
BIO 226: APPLIED LONGITUDINAL ANALYSIS COURSE SYLLABUS Spring 2015 Instructor: Teaching Assistants: Dr. Brent Coull HSPH Building II, Room 413 Phone: (617) 432-2376 E-mail: [email protected] Office
Standard errors of marginal effects in the heteroskedastic probit model
Standard errors of marginal effects in the heteroskedastic probit model Thomas Cornelißen Discussion Paper No. 320 August 2005 ISSN: 0949 9962 Abstract In non-linear regression models, such as the heteroskedastic
Basic Statistics and Data Analysis for Health Researchers from Foreign Countries
Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Volkert Siersma [email protected] The Research Unit for General Practice in Copenhagen Dias 1 Content Quantifying association
Statistics 104 Final Project A Culture of Debt: A Study of Credit Card Spending in America TF: Kevin Rader Anonymous Students: LD, MH, IW, MY
Statistics 104 Final Project A Culture of Debt: A Study of Credit Card Spending in America TF: Kevin Rader Anonymous Students: LD, MH, IW, MY ABSTRACT: This project attempted to determine the relationship
FIXED EFFECTS AND RELATED ESTIMATORS FOR CORRELATED RANDOM COEFFICIENT AND TREATMENT EFFECT PANEL DATA MODELS
FIXED EFFECTS AND RELATED ESTIMATORS FOR CORRELATED RANDOM COEFFICIENT AND TREATMENT EFFECT PANEL DATA MODELS Jeffrey M. Wooldridge Department of Economics Michigan State University East Lansing, MI 48824-1038
Introducing the Multilevel Model for Change
Department of Psychology and Human Development Vanderbilt University GCM, 2010 1 Multilevel Modeling - A Brief Introduction 2 3 4 5 Introduction In this lecture, we introduce the multilevel model for change.
Statistical Models in R
Statistical Models in R Some Examples Steven Buechler Department of Mathematics 276B Hurley Hall; 1-6233 Fall, 2007 Outline Statistical Models Structure of models in R Model Assessment (Part IA) Anova
A Panel Data Analysis of Corporate Attributes and Stock Prices for Indian Manufacturing Sector
Journal of Modern Accounting and Auditing, ISSN 1548-6583 November 2013, Vol. 9, No. 11, 1519-1525 D DAVID PUBLISHING A Panel Data Analysis of Corporate Attributes and Stock Prices for Indian Manufacturing
Please follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software
STATA Tutorial Professor Erdinç Please follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software 1.Wald Test Wald Test is used
Individual Growth Analysis Using PROC MIXED Maribeth Johnson, Medical College of Georgia, Augusta, GA
Paper P-702 Individual Growth Analysis Using PROC MIXED Maribeth Johnson, Medical College of Georgia, Augusta, GA ABSTRACT Individual growth models are designed for exploring longitudinal data on individuals
Statistical Rules of Thumb
Statistical Rules of Thumb Second Edition Gerald van Belle University of Washington Department of Biostatistics and Department of Environmental and Occupational Health Sciences Seattle, WA WILEY AJOHN
11. Analysis of Case-control Studies Logistic Regression
Research methods II 113 11. Analysis of Case-control Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:
I n d i a n a U n i v e r s i t y U n i v e r s i t y I n f o r m a t i o n T e c h n o l o g y S e r v i c e s
I n d i a n a U n i v e r s i t y U n i v e r s i t y I n f o r m a t i o n T e c h n o l o g y S e r v i c e s Linear Regression Models for Panel Data Using SAS, Stata, LIMDEP, and SPSS * Hun Myoung Park,
Module 14: Missing Data Stata Practical
Module 14: Missing Data Stata Practical Jonathan Bartlett & James Carpenter London School of Hygiene & Tropical Medicine www.missingdata.org.uk Supported by ESRC grant RES 189-25-0103 and MRC grant G0900724
Multivariate Logistic Regression
1 Multivariate Logistic Regression As in univariate logistic regression, let π(x) represent the probability of an event that depends on p covariates or independent variables. Then, using an inv.logit formulation
SYSTEMS OF REGRESSION EQUATIONS
SYSTEMS OF REGRESSION EQUATIONS 1. MULTIPLE EQUATIONS y nt = x nt n + u nt, n = 1,...,N, t = 1,...,T, x nt is 1 k, and n is k 1. This is a version of the standard regression model where the observations
Multivariate Normal Distribution
Multivariate Normal Distribution Lecture 4 July 21, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Lecture #4-7/21/2011 Slide 1 of 41 Last Time Matrices and vectors Eigenvalues
Least Squares Estimation
Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David
Logistic Regression (a type of Generalized Linear Model)
Logistic Regression (a type of Generalized Linear Model) 1/36 Today Review of GLMs Logistic Regression 2/36 How do we find patterns in data? We begin with a model of how the world works We use our knowledge
Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics
Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics For 2015 Examinations Aim The aim of the Probability and Mathematical Statistics subject is to provide a grounding in
Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.
Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C
E(y i ) = x T i β. yield of the refined product as a percentage of crude specific gravity vapour pressure ASTM 10% point ASTM end point in degrees F
Random and Mixed Effects Models (Ch. 10) Random effects models are very useful when the observations are sampled in a highly structured way. The basic idea is that the error associated with any linear,
HURDLE AND SELECTION MODELS Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics July 2009
HURDLE AND SELECTION MODELS Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics July 2009 1. Introduction 2. A General Formulation 3. Truncated Normal Hurdle Model 4. Lognormal
Average Redistributional Effects. IFAI/IZA Conference on Labor Market Policy Evaluation
Average Redistributional Effects IFAI/IZA Conference on Labor Market Policy Evaluation Geert Ridder, Department of Economics, University of Southern California. October 10, 2006 1 Motivation Most papers
UNIVERSITY OF WAIKATO. Hamilton New Zealand
UNIVERSITY OF WAIKATO Hamilton New Zealand Can We Trust Cluster-Corrected Standard Errors? An Application of Spatial Autocorrelation with Exact Locations Known John Gibson University of Waikato Bonggeun
Two Tools for the Analysis of Longitudinal Data: Motivations, Applications and Issues
Two Tools for the Analysis of Longitudinal Data: Motivations, Applications and Issues Vern Farewell Medical Research Council Biostatistics Unit, UK Flexible Models for Longitudinal and Survival Data Warwick,
Using Repeated Measures Techniques To Analyze Cluster-correlated Survey Responses
Using Repeated Measures Techniques To Analyze Cluster-correlated Survey Responses G. Gordon Brown, Celia R. Eicheldinger, and James R. Chromy RTI International, Research Triangle Park, NC 27709 Abstract
Technical report. in SPSS AN INTRODUCTION TO THE MIXED PROCEDURE
Linear mixedeffects modeling in SPSS AN INTRODUCTION TO THE MIXED PROCEDURE Table of contents Introduction................................................................3 Data preparation for MIXED...................................................3
CHAPTER 12 EXAMPLES: MONTE CARLO SIMULATION STUDIES
Examples: Monte Carlo Simulation Studies CHAPTER 12 EXAMPLES: MONTE CARLO SIMULATION STUDIES Monte Carlo simulation studies are often used for methodological investigations of the performance of statistical
Lecture 18: Logistic Regression Continued
Lecture 18: Logistic Regression Continued Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South Carolina
Lecture 3: Linear methods for classification
Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,
Chapter 29 The GENMOD Procedure. Chapter Table of Contents
Chapter 29 The GENMOD Procedure Chapter Table of Contents OVERVIEW...1365 WhatisaGeneralizedLinearModel?...1366 ExamplesofGeneralizedLinearModels...1367 TheGENMODProcedure...1368 GETTING STARTED...1370
Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 )
Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 ) and Neural Networks( 類 神 經 網 路 ) 許 湘 伶 Applied Linear Regression Models (Kutner, Nachtsheim, Neter, Li) hsuhl (NUK) LR Chap 10 1 / 35 13 Examples
Missing Data: Part 1 What to Do? Carol B. Thompson Johns Hopkins Biostatistics Center SON Brown Bag 3/20/13
Missing Data: Part 1 What to Do? Carol B. Thompson Johns Hopkins Biostatistics Center SON Brown Bag 3/20/13 Overview Missingness and impact on statistical analysis Missing data assumptions/mechanisms Conventional
Introduction to Hierarchical Linear Modeling with R
Introduction to Hierarchical Linear Modeling with R 5 10 15 20 25 5 10 15 20 25 13 14 15 16 40 30 20 10 0 40 30 20 10 9 10 11 12-10 SCIENCE 0-10 5 6 7 8 40 30 20 10 0-10 40 1 2 3 4 30 20 10 0-10 5 10 15
Clustering in the Linear Model
Short Guides to Microeconometrics Fall 2014 Kurt Schmidheiny Universität Basel Clustering in the Linear Model 2 1 Introduction Clustering in the Linear Model This handout extends the handout on The Multiple
data visualization and regression
data visualization and regression Sepal.Length 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 I. setosa I. versicolor I. virginica I. setosa I. versicolor I. virginica Species Species
Introduction to Data Analysis in Hierarchical Linear Models
Introduction to Data Analysis in Hierarchical Linear Models April 20, 2007 Noah Shamosh & Frank Farach Social Sciences StatLab Yale University Scope & Prerequisites Strong applied emphasis Focus on HLM
SUMAN DUVVURU STAT 567 PROJECT REPORT
SUMAN DUVVURU STAT 567 PROJECT REPORT SURVIVAL ANALYSIS OF HEROIN ADDICTS Background and introduction: Current illicit drug use among teens is continuing to increase in many countries around the world.
Lecture 14: GLM Estimation and Logistic Regression
Lecture 14: GLM Estimation and Logistic Regression Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South
Panel Data Analysis Fixed and Random Effects using Stata (v. 4.2)
Panel Data Analysis Fixed and Random Effects using Stata (v. 4.2) Oscar Torres-Reyna [email protected] December 2007 http://dss.princeton.edu/training/ Intro Panel data (also known as longitudinal
Power and sample size in multilevel modeling
Snijders, Tom A.B. Power and Sample Size in Multilevel Linear Models. In: B.S. Everitt and D.C. Howell (eds.), Encyclopedia of Statistics in Behavioral Science. Volume 3, 1570 1573. Chicester (etc.): Wiley,
