Sample Size Calculation for Longitudinal Studies



Similar documents
Lab 5 Linear Regression with Within-subject Correlation. Goals: Data: Use the pig data which is in wide format:

Milk Data Analysis. 1. Objective Introduction to SAS PROC MIXED Analyzing protein milk data using STATA Refit protein milk data using PROC MIXED

From the help desk: Swamy s random-coefficients model

DETERMINANTS OF CAPITAL ADEQUACY RATIO IN SELECTED BOSNIAN BANKS

Standard errors of marginal effects in the heteroskedastic probit model

IAPRI Quantitative Analysis Capacity Building Series. Multiple regression analysis & interpreting results

ECON 142 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE #2

Correlated Random Effects Panel Data Models

HURDLE AND SELECTION MODELS Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics July 2009

Handling missing data in Stata a whirlwind tour

Multinomial and Ordinal Logistic Regression

Please follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software

xtmixed & denominator degrees of freedom: myth or magic

Module 14: Missing Data Stata Practical

Department of Economics Session 2012/2013. EC352 Econometric Methods. Solutions to Exercises from Week (0.052)

Introduction. Hypothesis Testing. Hypothesis Testing. Significance Testing

Estimating the random coefficients logit model of demand using aggregate data

MULTIPLE REGRESSION EXAMPLE

A Simple Feasible Alternative Procedure to Estimate Models with High-Dimensional Fixed Effects

A Panel Data Analysis of Corporate Attributes and Stock Prices for Indian Manufacturing Sector

Discussion Section 4 ECON 139/ Summer Term II

ESTIMATING AVERAGE TREATMENT EFFECTS: IV AND CONTROL FUNCTIONS, II Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics

Web-based Supplementary Materials for Bayesian Effect Estimation. Accounting for Adjustment Uncertainty by Chi Wang, Giovanni

Using Stata for One Sample Tests

Part 2: Analysis of Relationship Between Two Variables

Nonlinear Regression Functions. SW Ch 8 1/54/

Multicollinearity Richard Williams, University of Notre Dame, Last revised January 13, 2015

An introduction to GMM estimation using Stata

Multilevel Analysis and Complex Surveys. Alan Hubbard UC Berkeley - Division of Biostatistics

Confidence Intervals for One Standard Deviation Using Standard Deviation

Basic Statistical and Modeling Procedures Using SAS

From the help desk: hurdle models

Syntax Menu Description Options Remarks and examples Stored results Methods and formulas References Also see. level(#) , options2

SIMPLE LINEAR CORRELATION. r can range from -1 to 1, and is independent of units of measurement. Correlation can be done on two dependent variables.

Lecture 15. Endogeneity & Instrumental Variable Estimation

Quick Stata Guide by Liz Foster

Least Squares Estimation

The following postestimation commands for time series are available for regress:

Statistical modelling with missing data using multiple imputation. Session 4: Sensitivity Analysis after Multiple Imputation

III. INTRODUCTION TO LOGISTIC REGRESSION. a) Example: APACHE II Score and Mortality in Sepsis

Hypothesis testing - Steps

Longitudinal Data Analysis: Stata Tutorial

Simple Linear Regression Inference

A Log-Robust Optimization Approach to Portfolio Management

Fitting Subject-specific Curves to Grouped Longitudinal Data

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Logit Models for Binary Data

Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares

From this it is not clear what sort of variable that insure is so list the first 10 observations.

Credit Risk Models: An Overview

Nonparametric Statistics

Lecture 14: GLM Estimation and Logistic Regression

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

Multivariate normal distribution and testing for means (see MKB Ch 3)

INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition)

Using Stata for Categorical Data Analysis

Confidence Intervals for the Difference Between Two Means

Point Biserial Correlation Tests

August 2012 EXAMINATIONS Solution Part I

THE FIRST SET OF EXAMPLES USE SUMMARY DATA... EXAMPLE 7.2, PAGE 227 DESCRIBES A PROBLEM AND A HYPOTHESIS TEST IS PERFORMED IN EXAMPLE 7.

In the general population of 0 to 4-year-olds, the annual incidence of asthma is 1.4%

Introduction to General and Generalized Linear Models

M1 in Economics and Economics and Statistics Applied multivariate Analysis - Big data analytics Worksheet 1 - Bootstrap

VI. Introduction to Logistic Regression

Introduction; Descriptive & Univariate Statistics

Data Analysis Methodology 1

Tutorial 5: Hypothesis Testing

Competing-risks regression

Spatial Statistics Chapter 3 Basics of areal data and areal data modeling

5. Linear Regression

especially with continuous

Model Selection and Claim Frequency for Workers Compensation Insurance

Geostatistics Exploratory Analysis

From the help desk: Bootstrapped standard errors

Lesson 1: Comparison of Population Means Part c: Comparison of Two- Means

Chapter 4 Statistical Inference in Quality Control and Improvement. Statistical Quality Control (D. C. Montgomery)

Lecture 1: Review and Exploratory Data Analysis (EDA)

Confidence Intervals for Cp

Extreme Value Modeling for Detection and Attribution of Climate Extremes

Dongfeng Li. Autumn 2010

Using Stata 9 & Higher for OLS Regression Richard Williams, University of Notre Dame, Last revised January 8, 2015

Nonlinear relationships Richard Williams, University of Notre Dame, Last revised February 20, 2015

Week 5: Multiple Linear Regression

Risk and return (1) Class 9 Financial Management,

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

How to set the main menu of STATA to default factory settings standards

Panel Data: Linear Models

Investment Statistics: Definitions & Formulas

3.4 Statistical inference for 2 populations based on two samples

University of Ljubljana Doctoral Programme in Statistics Methodology of Statistical Research Written examination February 14 th, 2014.

Panel Data Analysis Fixed and Random Effects using Stata (v. 4.2)

UNIVERSITY OF WAIKATO. Hamilton New Zealand

Linear Regression Models with Logarithmic Transformations

25 Working with categorical data and factor variables

Introduction to structural equation modeling using the sem command

A LONGITUDINAL AND SURVIVAL MODEL WITH HEALTH CARE USAGE FOR INSURED ELDERLY. Workshop

Panel Data Analysis Josef Brüderl, University of Mannheim, March 2005

SAS Software to Fit the Generalized Linear Model

Transcription:

Sample Size Calculation for Longitudinal Studies Phil Schumm Department of Health Studies University of Chicago August 23, 2004 (Supported by National Institute on Aging grant P01 AG18911-01A1)

Introduction Motivation Special considerations Description of Approach Underlying model Use of -xtgee- Examples Example 1: Time-averaged difference Example 2: Difference in rate of change Possible Extensions

Motivation Make sample size calculations more accessible Permit researchers to experiment easily with many different scenarios Emphasize relationship between sample size calculation and actual analysis of the data

Special considerations Longitudinal studies involve special considerations Correlation between data points for a given individual Sample attrition (drop-out) Effect(s) of interest often have complicated form (e.g., nonlinear across time) Effects of other covariates

Underlying model Statistical model Let Y i = (y i1,..., y ini ) T be an n i 1 vector of outcome values for the ith subject (i = 1,..., m), and X i = (x i1,..., x ini ) T be a corresponding n i p matrix of covariate values. Note: Y i = X i β + ɛ i ɛ i (0, V ) Within-subject correlation R = V /σ 2 ɛ i and ɛ j are independent for all i j

Underlying model Generalized least squares estimator ˆβ GLS = Var( ˆβ GLS ) = ( m i=1 ( m X i V 1 X i ) 1 ( m X i V 1 i X i i=1 = f (X, R, σ 2 ) ) 1 i=1 X i V 1 Y i )

Use of -xtgee- Steps in using -xtgee- to calculate variance 1. Create a pseudo-dataset containing X and Y for m subjects 2. Enter the correlation matrix R 3. Use -xtgee- command with fixed correlation R and dispersion parameter σ 2

Example 1: Time-averaged difference Time-averaged difference between two groups Suppose: 2 groups of equal size (m/2 subjects each) All subjects measured at n = 3 time points (no drop-out) Constant within-subject correlation ρ = 0.5 σ 2 = 1 We want 90% power to detect a difference d of 0.25 at the two-sided 0.05 level. m = (4σ 2 /nd 2 )(1 + (n 1)ρ)(z α/2 + z β ) 2 = 448 (224 per group)

Example 1: Time-averaged difference Create pseudo-dataset and enter correlation matrix. input g t 1. 0 1 2. 0 2 3. 0 3 4. 1 1 5. 1 2 6. 1 3 7. end g t. expand 224 (1338 observations created). bys t (g): gen id = _n. gen y = uniform(). mat R = J(3,3,0.5) + I(3)*(1-0.5). mat list R symmetric R[3,3] c1 c2 c3 r1 1 r2.5 1 r3.5.5 1

Example 1: Time-averaged difference Use -xtgee- to calculate variance. xtgee y g, i(id) t(t) corr(fixed R) scale(1) Iteration 1: tolerance = 2.308e-16 GEE population-averaged model Number of obs = 1344 Group and time vars: id t Number of groups = 448 Link: identity Obs per group: min = 3 Family: Gaussian avg = 3.0 Correlation: fixed (specified) max = 3 Wald chi2(1) = 0.01 Scale parameter: 1 Prob > chi2 = 0.9266 ------------------------------------------------------------------------------ y Coef. Std. Err. z P> z [95% Conf. Interval] -------------+---------------------------------------------------------------- g -.0071089.0771517-0.09 0.927 -.1583235.1441056 _cons.5041997.0545545 9.24 0.000.3972749.6111245 ------------------------------------------------------------------------------

Example 1: Time-averaged difference Use -sampsi- to calculate power. sampsi 0 0.25, alpha(0.05) n1(1) sd1(0.0771517) onesample Estimated power for one-sample comparison of mean to hypothesized value Test Ho: m = 0, where m is the mean in the population Assumptions: alpha = 0.0500 (two-sided) alternative m =.25 sd =.077152 sample size n = 1 Estimated power: power = 0.8998

Example 2: Difference in rate of change Difference in rate of change between two groups 10% drop-out per time point Constant within-subject correlation ρ = 0.8 Linear change in each group over time We want to compute power for detecting a 0.25 difference in the rate of change over entire study.

Example 2: Difference in rate of change Create pseudo-dataset and enter correlation matrix. xtdes, i(id) t(t) id: 1, 2,..., 448 n = 448 t: 1, 2,..., 3 T = 3 Delta(t) = 1; (3-1)+1 = 3 (id*t uniquely identifies each observation) Distribution of T_i: min 5% 25% 50% 75% 95% max 1 1 3 3 3 3 3 Freq. Percent Cum. Pattern ---------------------------+--------- 364 81.25 81.25 111 44 9.82 91.07 1.. 40 8.93 100.00 11. ---------------------------+--------- 448 100.00 XXX. mat R = J(3,3,0.8) + I(3)*(1-0.8)

Example 2: Difference in rate of change Use -xtgee- to calculate variance. xi: xtgee y i.g*t, i(id) t(t) corr(fixed R) scale(1) i.g _Ig_0-1 (naturally coded; _Ig_0 omitted) i.g*t _IgXt_# (coded as above) Iteration 1: tolerance =.00579064 Iteration 2: tolerance = 4.388e-16 GEE population-averaged model Number of obs = 1216 Group and time vars: id t Number of groups = 448 Link: identity Obs per group: min = 1 Family: Gaussian avg = 2.7 Correlation: fixed (specified) max = 3 Wald chi2(3) = 0.18 Scale parameter: 1 Prob > chi2 = 0.9808 ------------------------------------------------------------------------------ y Coef. Std. Err. z P> z [95% Conf. Interval] -------------+---------------------------------------------------------------- _Ig_1 -.0121577.1075186-0.11 0.910 -.2228903.1985749 t.007212.0229825 0.31 0.754 -.0378328.0522568 _IgXt_1 -.002222.0325021-0.07 0.945 -.065925.061481 _cons.5004684.0760271 6.58 0.000.351458.6494789 ------------------------------------------------------------------------------

Example 2: Difference in rate of change Use -sampsi- to calculate power. loc d = 0.25/3. sampsi 0 d, alpha(0.05) n1(1) sd1(0.0325021) onesample Estimated power for one-sample comparison of mean to hypothesized value Test Ho: m = 0, where m is the mean in the population Assumptions: alpha = 0.0500 (two-sided) alternative m =.083333 sd =.032502 sample size n = 1 Estimated power: power = 0.7271

Extensions of the basic method Generalized linear models Effects of other covariates Var( ˆβ) = f (X, β, R, φ) Note: For random X, may need to increase size of pseudo-dataset and then scale up variance estimate accordingly.