Two-phase designs in epidemiology
|
|
|
- Arnold Trevor Green
- 10 years ago
- Views:
Transcription
1 Two-phase designs in epidemiology Thomas Lumley November 20, 2013 This document explains how to analyse case cohort and two-phase case control studies with the survey package, using examples from html. Some of the examples were published by Breslow & Chatterjee (1999). The data are relapse rates from the National Wilm s Tumor Study (NWTS). Wilm s Tumour is a rare cancer of the kidney in children. Intensive treatment cures the majority of cases, but prognosis is poor when the disease is advanced at diagnosis and for some histological subtypes. The histological characterisation of the tumour is difficult, and histological group as determined by the NWTS central pathologist predicts much better than determinations by local institution pathologists. In fact, local institution histology can be regarded statistically as a pure surrogate for the central lab histology. In these examples we will pretend that the (binary) local institution histology determination (instit) is avavailable for all children in the study and that the central lab histology (histol) is obtained for a probability sample of specimens in a two-phase design. We treat the initial sampling of the study as simple random sampling from an infinite superpopulation. We also have data on disease stage, a four-level variable; on relapse; and on time to relapse. Case control designs Breslow & Chatterjee (1999) use the NWTS data to illustrate two-phase case control designs. The data are available at in compressed form; we first expand to one record per patient. > library(survey) > load(system.file("doc","nwts.rda",package="survey")) > nwtsnb<-nwts > nwtsnb$case<-nwts$case-nwtsb$case > nwtsnb$control<-nwts$control-nwtsb$control > a<-rbind(nwtsb,nwtsnb) > a$in.ccs<-rep(c(true,false),each=16) > b<-rbind(a,a) > b$rel<-rep(c(1,0),each=32) > b$n<-ifelse(b$rel,b$case,b$control) > index<-rep(1:64,b$n) > nwt.exp<-b[index,c(1:3,6,7)] > nwt.exp$id<-1:4088 As we actually do know histol for all patients we can fit the logistic regression model with full sampling to compare with the two-phase analyses > glm(rel~factor(stage)*factor(histol), family=binomial, data=nwt.exp) 1
2 glm(formula = rel ~ factor(stage) * factor(histol), family = binomial, data = nwt.exp) (Intercept) factor(stage) factor(stage)3 factor(stage) factor(histol)2 factor(stage)2:factor(histol) factor(stage)3:factor(histol)2 factor(stage)4:factor(histol) Degrees of Freedom: 4087 Total (i.e. Null); Null Deviance: 3306 Residual Deviance: 2943 AIC: Residual The second phase sample consists of all patients with unfavorable histology as determined by local institution pathologists, all cases, and a 20% sample of the remainder. Phase two is thus a stratified random sample without replacement, with strata defined by the interaction of instit and rel. > dccs2<-twophase(id=list(~id,~id),subset=~in.ccs, + strata=list(null,~interaction(instit,rel)),data=nwt.exp) > summary(svyglm(rel~factor(stage)*factor(histol),family=binomial,design=dccs2)) svyglm(formula = rel ~ factor(stage) * factor(histol), family = binomial, design = dccs2) Survey design: twophase2(id = id, strata = strata, probs = probs, fpc = fpc, subset = subset, data = data) Estimate Std. Error t value Pr(> t ) (Intercept) < 2e-16 *** factor(stage) ** factor(stage) * factor(stage) *** factor(histol) e-05 *** factor(stage)2:factor(histol) factor(stage)3:factor(histol) factor(stage)4:factor(histol) Signif. codes: 0 ^aăÿ***^aăź ^aăÿ**^aăź 0.01 ^aăÿ*^aăź 0.05 ^aăÿ.^aăź 0.1 ^aăÿ ^aăź 1 (Dispersion parameter for binomial family taken to be ) Number of Fisher Scoring iterations: 4 2
3 Disease stage at the time of surgery is also recorded. It could be used to further stratify the sampling, or, as in this example, to post-stratify. We can analyze the data either pretending that the sampling was stratified or using calibrate to post-stratify the design. > dccs8<-twophase(id=list(~id,~id),subset=~in.ccs, + strata=list(null,~interaction(instit,stage,rel)),data=nwt.exp) > gccs8<-calibrate(dccs2,phase=2,formula=~interaction(instit,stage,rel)) > summary(svyglm(rel~factor(stage)*factor(histol),family=binomial,design=dccs8)) svyglm(formula = rel ~ factor(stage) * factor(histol), family = binomial, design = dccs8) Survey design: twophase2(id = id, strata = strata, probs = probs, fpc = fpc, subset = subset, data = data) Estimate Std. Error t value Pr(> t ) (Intercept) < 2e-16 *** factor(stage) e-07 *** factor(stage) e-07 *** factor(stage) e-09 *** factor(histol) e-06 *** factor(stage)2:factor(histol) factor(stage)3:factor(histol) factor(stage)4:factor(histol) Signif. codes: 0 ^aăÿ***^aăź ^aăÿ**^aăź 0.01 ^aăÿ*^aăź 0.05 ^aăÿ.^aăź 0.1 ^aăÿ ^aăź 1 (Dispersion parameter for binomial family taken to be ) Number of Fisher Scoring iterations: 4 > summary(svyglm(rel~factor(stage)*factor(histol),family=binomial,design=gccs8)) svyglm(formula = rel ~ factor(stage) * factor(histol), family = binomial, design = gccs8) Survey design: calibrate(dccs2, phase = 2, formula = ~interaction(instit, stage, rel)) Estimate Std. Error t value Pr(> t ) (Intercept) < 2e-16 *** factor(stage) e-07 *** factor(stage) e-07 *** factor(stage) e-09 *** factor(histol) e-06 *** 3
4 factor(stage)2:factor(histol) factor(stage)3:factor(histol) factor(stage)4:factor(histol) Signif. codes: 0 ^aăÿ***^aăź ^aăÿ**^aăź 0.01 ^aăÿ*^aăź 0.05 ^aăÿ.^aăź 0.1 ^aăÿ ^aăź 1 (Dispersion parameter for binomial family taken to be ) Number of Fisher Scoring iterations: 4 Case cohort designs In the case cohort design for survival analysis, a P % sample of a cohort is taken at recruitment for the second phase, and all participants who experience the event (cases) are later added to the phase-two sample. Viewing the sampling design as progressing through time in this way, as originally proposed, gives a double sampling design at phase two. It is simpler to view the process sub specie aeternitatis, and to note that cases are sampled with probability 1, and controls with probability P/100. The subcohort will often be determined retrospectively rather than at recruitment, giving stratified random sampling without replacement, stratified on case status. If the subcohort is determined prospectively we can use the same analysis, post-stratifying rather than stratifying. There have been many analyses proposed for the case cohort design (Therneau & Li, 1999). We consider only those that can be expressed as a Horvitz Thompson estimator for the Cox model. First we load the data and the necessary packages. The version of the NWTS data that includes survival times is not identical to the data set used for case control analyses above. > library(survey) > library(survival) > data(nwtco) > ntwco<-subset(nwtco,!is.na(edrel)) Again, we fit a model that uses histol for all patients, to compare with the two-phase design > coxph(surv(edrel, rel)~factor(stage)+factor(histol)+i(age/12),data=nwtco) coxph(formula = Surv(edrel, rel) ~ factor(stage) + factor(histol) + I(age/12), data = nwtco) factor(stage) e-08 factor(stage) e-11 factor(stage) e+00 factor(histol) e+00 I(age/12) e-06 Likelihood ratio test=395 on 5 df, p=0 n= 4028, number of events= 571 We define a two-phase survey design using simple random superpopulation sampling for the first phase, and sampling without replacement stratified on rel for the second phase. The subset 4
5 argument specifies that observations are in the phase-two sample if they are in the subcohort or are cases. As before, the data structure is rectangular, but variables measured at phase two may be NA for participants not included at phase two. We compare the result to that given by survival::cch for Lin & Ying s (1993) approach to the case cohort design. > (dcch<-twophase(id=list(~seqno,~seqno), strata=list(null,~rel), + subset=~i(in.subcohort rel), data=nwtco)) Two-phase sparse-matrix design: twophase2(id = id, strata = strata, probs = probs, fpc = fpc, subset = subset, data = data) Phase 1: Independent Sampling design (with replacement) svydesign(ids = ~seqno) Phase 2: Stratified Independent Sampling design svydesign(ids = ~seqno, strata = ~rel, fpc = `*phase1*`) > svycoxph(surv(edrel,rel)~factor(stage)+factor(histol)+i(age/12), + design=dcch) svycoxph(formula = Surv(edrel, rel) ~ factor(stage) + factor(histol) + I(age/12), design = dcch) factor(stage) e-05 factor(stage) e-04 factor(stage) e-12 factor(histol) e+00 I(age/12) e-02 Likelihood ratio test= on 5 df, p= n= 1154, number of events= 571 > subcoh <- nwtco$in.subcohort > selccoh <- with(nwtco, rel==1 subcoh==1) > ccoh.data <- nwtco[selccoh,] > ccoh.data$subcohort <- subcoh[selccoh] > cch(surv(edrel, rel) ~ factor(stage) + factor(histol) + I(age/12), + data =ccoh.data, subcoh = ~subcohort, id=~seqno, + cohort.size=4028, method="linying") Case-cohort analysis,x$method, LinYing with subcohort of 668 from cohort of 4028 cch(formula = Surv(edrel, rel) ~ factor(stage) + factor(histol) + I(age/12), data = ccoh.data, subcoh = ~subcohort, id = ~seqno, cohort.size = 4028, method = "LinYing") 5
6 Value SE Z p factor(stage) e-05 factor(stage) e-04 factor(stage) e-12 factor(histol) e+00 I(age/12) e-02 Barlow (1994) proposes an analysis that ignores the finite population correction at the second phase. This simplifies the standard error estimation, as the design can be expressed as one-phase stratified superpopulation sampling. The standard errors will be somewhat conservative. More data preparation is needed for this analysis as the weights change over time. > nwtco$eventrec<-rep(0,nrow(nwtco)) > nwtco.extra<-subset(nwtco, rel==1) > nwtco.extra$eventrec<-1 > nwtco.expd<-rbind(subset(nwtco,in.subcohort==1),nwtco.extra) > nwtco.expd$stop<-with(nwtco.expd, + ifelse(rel &!eventrec, edrel-0.001,edrel)) > nwtco.expd$start<-with(nwtco.expd, + ifelse(rel & eventrec, edrel-0.001, 0)) > nwtco.expd$event<-with(nwtco.expd, + ifelse(rel & eventrec, 1, 0)) > nwtco.expd$pwts<-ifelse(nwtco.expd$event, 1, 1/with(nwtco,mean(in.subcohort rel))) The analysis corresponds to a cluster-sampled design in which individuals are sampled stratified by subcohort membership and then time periods are sampled stratified by event status. Having individual as the primary sampling unit is necessary for correct standard error calculation. > (dbarlow<-svydesign(id=~seqno+eventrec, strata=~in.subcohort+rel, + data=nwtco.expd, weight=~pwts)) Stratified 2 - level Cluster Sampling design (with replacement) With (1154, 1239) clusters. svydesign(id = ~seqno + eventrec, strata = ~in.subcohort + rel, data = nwtco.expd, weight = ~pwts) > svycoxph(surv(start,stop,event)~factor(stage)+factor(histol)+i(age/12), + design=dbarlow) svycoxph(formula = Surv(start, stop, event) ~ factor(stage) + factor(histol) + I(age/12), design = dbarlow) factor(stage) e-05 factor(stage) e-04 factor(stage) e-11 factor(histol) e+00 I(age/12) e-02 Likelihood ratio test= on 5 df, p= n= 1239, number of events= 571 6
7 In fact, as the finite population correction is not being used the second stage of the cluster sampling could be ignored. We can also produce the stratified bootstrap standard errors of Wacholder et al (1989), using a replicate weights analysis > (dwacholder <- as.svrepdesign(dbarlow,type="bootstrap",replicates=500)) as.svrepdesign(dbarlow, type = "bootstrap", replicates = 500) Survey bootstrap with 500 replicates. > svycoxph(surv(start,stop,event)~factor(stage)+factor(histol)+i(age/12), + design=dwacholder) svycoxph.svyrep.design(formula = Surv(start, stop, event) ~ factor(stage) + factor(histol) + I(age/12), design = dwacholder) factor(stage) e-05 factor(stage) e-03 factor(stage) e-10 factor(histol) e+00 I(age/12) e-02 Likelihood ratio test=na on 5 df, p=na n= 1239, number of events= 571 Exposure-stratified designs Borgan et al (2000) propose designs stratified or post-stratified on phase-one variables. The examples at use a different subcohort sample for this stratified design, so we load the new subcohort variable > load(system.file("doc","nwtco-subcohort.rda",package="survey")) > nwtco$subcohort<-subcohort > d_borganii <- twophase(id=list(~seqno,~seqno), + strata=list(null,~interaction(instit,rel)), + data=nwtco, subset=~i(rel subcohort)) > (b2<-svycoxph(surv(edrel,rel)~factor(stage)+factor(histol)+i(age/12), + design=d_borganii)) svycoxph(formula = Surv(edrel, rel) ~ factor(stage) + factor(histol) + I(age/12), design = d_borganii) factor(stage) e-02 factor(stage) e-03 factor(stage) e-07 factor(histol) e+00 I(age/12) e-01 Likelihood ratio test= on 5 df, p= n= 1062, number of events= 571 7
8 We can further post-stratify the design on disease stage and age with calibrate > d_borganiips <- calibrate(d_borganii, phase=2, formula=~age+interaction(instit,rel,stage)) > svycoxph(surv(edrel,rel)~factor(stage)+factor(histol)+i(age/12), + design=d_borganiips) svycoxph(formula = Surv(edrel, rel) ~ factor(stage) + factor(histol) + I(age/12), design = d_borganiips) factor(stage) e-06 factor(stage) e-08 factor(stage) e-16 factor(histol) e+00 I(age/12) e-01 Likelihood ratio test= on 5 df, p= n= 1062, number of events= 571 References Barlow WE (1994). Robust variance estimation for the case-cohort design. Biometrics 50: Borgan Ø, Langholz B, Samuelson SO, Goldstein L and Pogoda J (2000). Exposure stratified case-cohort designs, Lifetime Data Analysis 6:39-58 Breslow NW and Chatterjee N. (1999) Design and analysis of two-phase studies with binary outcome applied to Wilms tumour prognosis. Applied Statistics 48: Lin DY, and Ying Z (1993). Cox regression with incomplete covariate measurements. Journal of the American Statistical Association 88: Therneau TM and Li H., Computing the Cox model for case-cohort designs. Lifetime Data Analysis 5:99-112, 1999 Wacholder S, Gail MH, Pee D, and Brookmeyer R (1989) Alternate variance and efficiency calculations for the case-cohort design Biometrika, 76,
Generalized Linear Models
Generalized Linear Models We have previously worked with regression models where the response variable is quantitative and normally distributed. Now we turn our attention to two types of models where the
A survey analysis example
A survey analysis example Thomas Lumley November 20, 2013 This document provides a simple example analysis of a survey data set, a subsample from the California Academic Performance Index, an annual set
Régression logistique : introduction
Chapitre 16 Introduction à la statistique avec R Régression logistique : introduction Une variable à expliquer binaire Expliquer un risque suicidaire élevé en prison par La durée de la peine L existence
Multivariate Logistic Regression
1 Multivariate Logistic Regression As in univariate logistic regression, let π(x) represent the probability of an event that depends on p covariates or independent variables. Then, using an inv.logit formulation
A Handbook of Statistical Analyses Using R. Brian S. Everitt and Torsten Hothorn
A Handbook of Statistical Analyses Using R Brian S. Everitt and Torsten Hothorn CHAPTER 6 Logistic Regression and Generalised Linear Models: Blood Screening, Women s Role in Society, and Colonic Polyps
Lab 13: Logistic Regression
Lab 13: Logistic Regression Spam Emails Today we will be working with a corpus of emails received by a single gmail account over the first three months of 2012. Just like any other email address this account
INTRODUCTION TO SURVEY DATA ANALYSIS THROUGH STATISTICAL PACKAGES
INTRODUCTION TO SURVEY DATA ANALYSIS THROUGH STATISTICAL PACKAGES Hukum Chandra Indian Agricultural Statistics Research Institute, New Delhi-110012 1. INTRODUCTION A sample survey is a process for collecting
13. Poisson Regression Analysis
136 Poisson Regression Analysis 13. Poisson Regression Analysis We have so far considered situations where the outcome variable is numeric and Normally distributed, or binary. In clinical work one often
Journal of Statistical Software
JSS Journal of Statistical Software January 2011, Volume 38, Issue 5. http://www.jstatsoft.org/ Lexis: An R Class for Epidemiological Studies with Long-Term Follow-Up Martyn Plummer International Agency
STATISTICA Formula Guide: Logistic Regression. Table of Contents
: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary
Adequacy of Biomath. Models. Empirical Modeling Tools. Bayesian Modeling. Model Uncertainty / Selection
Directions in Statistical Methodology for Multivariable Predictive Modeling Frank E Harrell Jr University of Virginia Seattle WA 19May98 Overview of Modeling Process Model selection Regression shape Diagnostics
Logistic Regression (a type of Generalized Linear Model)
Logistic Regression (a type of Generalized Linear Model) 1/36 Today Review of GLMs Logistic Regression 2/36 How do we find patterns in data? We begin with a model of how the world works We use our knowledge
Analysis of complex survey samples.
Analysis of complex survey samples. Thomas Lumley Department of Biostatistics University of Washington April 15, 2004 Abstract I present software for analysing complex survey samples in R. The sampling
Outline. Dispersion Bush lupine survival Quasi-Binomial family
Outline 1 Three-way interactions 2 Overdispersion in logistic regression Dispersion Bush lupine survival Quasi-Binomial family 3 Simulation for inference Why simulations Testing model fit: simulating the
Regression Modeling Strategies
Frank E. Harrell, Jr. Regression Modeling Strategies With Applications to Linear Models, Logistic Regression, and Survival Analysis With 141 Figures Springer Contents Preface Typographical Conventions
Applied Statistics. J. Blanchet and J. Wadsworth. Institute of Mathematics, Analysis, and Applications EPF Lausanne
Applied Statistics J. Blanchet and J. Wadsworth Institute of Mathematics, Analysis, and Applications EPF Lausanne An MSc Course for Applied Mathematicians, Fall 2012 Outline 1 Model Comparison 2 Model
A Simple Method for Estimating Relative Risk using Logistic Regression. Fredi Alexander Diaz-Quijano
1 A Simple Method for Estimating Relative Risk using Logistic Regression. Fredi Alexander Diaz-Quijano Grupo Latinoamericano de Investigaciones Epidemiológicas, Organización Latinoamericana para el Fomento
Logistic regression (with R)
Logistic regression (with R) Christopher Manning 4 November 2007 1 Theory We can transform the output of a linear regression to be suitable for probabilities by using a logit link function on the lhs as
Statistical Models in R
Statistical Models in R Some Examples Steven Buechler Department of Mathematics 276B Hurley Hall; 1-6233 Fall, 2007 Outline Statistical Models Structure of models in R Model Assessment (Part IA) Anova
Tips for surviving the analysis of survival data. Philip Twumasi-Ankrah, PhD
Tips for surviving the analysis of survival data Philip Twumasi-Ankrah, PhD Big picture In medical research and many other areas of research, we often confront continuous, ordinal or dichotomous outcomes
Regression 3: Logistic Regression
Regression 3: Logistic Regression Marco Baroni Practical Statistics in R Outline Logistic regression Logistic regression in R Outline Logistic regression Introduction The model Looking at and comparing
Interpretation of Somers D under four simple models
Interpretation of Somers D under four simple models Roger B. Newson 03 September, 04 Introduction Somers D is an ordinal measure of association introduced by Somers (96)[9]. It can be defined in terms
VI. Introduction to Logistic Regression
VI. Introduction to Logistic Regression We turn our attention now to the topic of modeling a categorical outcome as a function of (possibly) several factors. The framework of generalized linear models
Simple example of collinearity in logistic regression
1 Confounding and Collinearity in Multivariate Logistic Regression We have already seen confounding and collinearity in the context of linear regression, and all definitions and issues remain essentially
Lecture 25. December 19, 2007. Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University.
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this
Handling missing data in Stata a whirlwind tour
Handling missing data in Stata a whirlwind tour 2012 Italian Stata Users Group Meeting Jonathan Bartlett www.missingdata.org.uk 20th September 2012 1/55 Outline The problem of missing data and a principled
Statistical Rules of Thumb
Statistical Rules of Thumb Second Edition Gerald van Belle University of Washington Department of Biostatistics and Department of Environmental and Occupational Health Sciences Seattle, WA WILEY AJOHN
A Bayesian hierarchical surrogate outcome model for multiple sclerosis
A Bayesian hierarchical surrogate outcome model for multiple sclerosis 3 rd Annual ASA New Jersey Chapter / Bayer Statistics Workshop David Ohlssen (Novartis), Luca Pozzi and Heinz Schmidli (Novartis)
Chapter 19 Statistical analysis of survey data. Abstract
Chapter 9 Statistical analysis of survey data James R. Chromy Research Triangle Institute Research Triangle Park, North Carolina, USA Savitri Abeyasekera The University of Reading Reading, UK Abstract
SAS Software to Fit the Generalized Linear Model
SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling
Statistics Graduate Courses
Statistics Graduate Courses STAT 7002--Topics in Statistics-Biological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.
Gerry Hobbs, Department of Statistics, West Virginia University
Decision Trees as a Predictive Modeling Method Gerry Hobbs, Department of Statistics, West Virginia University Abstract Predictive modeling has become an important area of interest in tasks such as credit
Statistics 305: Introduction to Biostatistical Methods for Health Sciences
Statistics 305: Introduction to Biostatistical Methods for Health Sciences Modelling the Log Odds Logistic Regression (Chap 20) Instructor: Liangliang Wang Statistics and Actuarial Science, Simon Fraser
CHAPTER 8 EXAMPLES: MIXTURE MODELING WITH LONGITUDINAL DATA
Examples: Mixture Modeling With Longitudinal Data CHAPTER 8 EXAMPLES: MIXTURE MODELING WITH LONGITUDINAL DATA Mixture modeling refers to modeling with categorical latent variables that represent subpopulations
Examining a Fitted Logistic Model
STAT 536 Lecture 16 1 Examining a Fitted Logistic Model Deviance Test for Lack of Fit The data below describes the male birth fraction male births/total births over the years 1931 to 1990. A simple logistic
Lecture 19: Conditional Logistic Regression
Lecture 19: Conditional Logistic Regression Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South Carolina
Overview Classes. 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7)
Overview Classes 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7) 2-4 Loglinear models (8) 5-4 15-17 hrs; 5B02 Building and
Indices of Model Fit STRUCTURAL EQUATION MODELING 2013
Indices of Model Fit STRUCTURAL EQUATION MODELING 2013 Indices of Model Fit A recommended minimal set of fit indices that should be reported and interpreted when reporting the results of SEM analyses:
Lecture 8: Gamma regression
Lecture 8: Gamma regression Claudia Czado TU München c (Claudia Czado, TU Munich) ZFS/IMS Göttingen 2004 0 Overview Models with constant coefficient of variation Gamma regression: estimation and testing
Case-control studies. Alfredo Morabia
Case-control studies Alfredo Morabia Division d épidémiologie Clinique, Département de médecine communautaire, HUG [email protected] www.epidemiologie.ch Outline Case-control study Relation to cohort
Module 223 Major A: Concepts, methods and design in Epidemiology
Module 223 Major A: Concepts, methods and design in Epidemiology Module : 223 UE coordinator Concepts, methods and design in Epidemiology Dates December 15 th to 19 th, 2014 Credits/ECTS UE description
Department/Academic Unit: Public Health Sciences Degree Program: Biostatistics Collaborative Program
Department/Academic Unit: Public Health Sciences Degree Program: Biostatistics Collaborative Program Department of Mathematics and Statistics Degree Level Expectations, Learning Outcomes, Indicators of
I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN
Beckman HLM Reading Group: Questions, Answers and Examples Carolyn J. Anderson Department of Educational Psychology I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN Linear Algebra Slide 1 of
MSwM examples. Jose A. Sanchez-Espigares, Alberto Lopez-Moreno Dept. of Statistics and Operations Research UPC-BarcelonaTech.
MSwM examples Jose A. Sanchez-Espigares, Alberto Lopez-Moreno Dept. of Statistics and Operations Research UPC-BarcelonaTech February 24, 2014 Abstract Two examples are described to illustrate the use of
Missing data and net survival analysis Bernard Rachet
Workshop on Flexible Models for Longitudinal and Survival Data with Applications in Biostatistics Warwick, 27-29 July 2015 Missing data and net survival analysis Bernard Rachet General context Population-based,
Personalized Predictive Medicine and Genomic Clinical Trials
Personalized Predictive Medicine and Genomic Clinical Trials Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer Institute http://brb.nci.nih.gov brb.nci.nih.gov Powerpoint presentations
Basic Statistics and Data Analysis for Health Researchers from Foreign Countries
Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Volkert Siersma [email protected] The Research Unit for General Practice in Copenhagen Dias 1 Content Quantifying association
Biostatistics Short Course Introduction to Longitudinal Studies
Biostatistics Short Course Introduction to Longitudinal Studies Zhangsheng Yu Division of Biostatistics Department of Medicine Indiana University School of Medicine Zhangsheng Yu (Indiana University) Longitudinal
Basic Study Designs in Analytical Epidemiology For Observational Studies
Basic Study Designs in Analytical Epidemiology For Observational Studies Cohort Case Control Hybrid design (case-cohort, nested case control) Cross-Sectional Ecologic OBSERVATIONAL STUDIES (Non-Experimental)
Using An Ordered Logistic Regression Model with SAS Vartanian: SW 541
Using An Ordered Logistic Regression Model with SAS Vartanian: SW 541 libname in1 >c:\=; Data first; Set in1.extract; A=1; PROC LOGIST OUTEST=DD MAXITER=100 ORDER=DATA; OUTPUT OUT=CC XBETA=XB P=PROB; MODEL
Chi Squared and Fisher's Exact Tests. Observed vs Expected Distributions
BMS 617 Statistical Techniques for the Biomedical Sciences Lecture 11: Chi-Squared and Fisher's Exact Tests Chi Squared and Fisher's Exact Tests This lecture presents two similarly structured tests, Chi-squared
Case Control Study, Nested
Case Control Study, Nested BRYAN LANGHOLZ Volume 1, pp. 646 655 In Encyclopedia of Biostatistics Second Edition (ISBN 0-470-84907-X) Edited by Peter Armitage & Theodore Colton John Wiley & Sons, Ltd, Chichester,
Random effects and nested models with SAS
Random effects and nested models with SAS /************* classical2.sas ********************* Three levels of factor A, four levels of B Both fixed Both random A fixed, B random B nested within A ***************************************************/
SPSS Guide: Regression Analysis
SPSS Guide: Regression Analysis I put this together to give you a step-by-step guide for replicating what we did in the computer lab. It should help you run the tests we covered. The best way to get familiar
Lecture 6: Poisson regression
Lecture 6: Poisson regression Claudia Czado TU München c (Claudia Czado, TU Munich) ZFS/IMS Göttingen 2004 0 Overview Introduction EDA for Poisson regression Estimation and testing in Poisson regression
Missing Data in Longitudinal Studies: To Impute or not to Impute? Robert Platt, PhD McGill University
Missing Data in Longitudinal Studies: To Impute or not to Impute? Robert Platt, PhD McGill University 1 Outline Missing data definitions Longitudinal data specific issues Methods Simple methods Multiple
PEER REVIEW HISTORY ARTICLE DETAILS VERSION 1 - REVIEW. Elizabeth Comino Centre fo Primary Health Care and Equity 12-Aug-2015
PEER REVIEW HISTORY BMJ Open publishes all reviews undertaken for accepted manuscripts. Reviewers are asked to complete a checklist review form (http://bmjopen.bmj.com/site/about/resources/checklist.pdf)
Psychology 205: Research Methods in Psychology
Psychology 205: Research Methods in Psychology Using R to analyze the data for study 2 Department of Psychology Northwestern University Evanston, Illinois USA November, 2012 1 / 38 Outline 1 Getting ready
Dealing with Missing Data
Dealing with Missing Data Roch Giorgi email: [email protected] UMR 912 SESSTIM, Aix Marseille Université / INSERM / IRD, Marseille, France BioSTIC, APHM, Hôpital Timone, Marseille, France January
Prevalence odds ratio or prevalence ratio in the analysis of cross sectional data: what is to be done?
272 Occup Environ Med 1998;55:272 277 Prevalence odds ratio or prevalence ratio in the analysis of cross sectional data: what is to be done? Mary Lou Thompson, J E Myers, D Kriebel Department of Biostatistics,
L3: Statistical Modeling with Hadoop
L3: Statistical Modeling with Hadoop Feng Li [email protected] School of Statistics and Mathematics Central University of Finance and Economics Revision: December 10, 2014 Today we are going to learn...
HLM software has been one of the leading statistical packages for hierarchical
Introductory Guide to HLM With HLM 7 Software 3 G. David Garson HLM software has been one of the leading statistical packages for hierarchical linear modeling due to the pioneering work of Stephen Raudenbush
Probability Calculator
Chapter 95 Introduction Most statisticians have a set of probability tables that they refer to in doing their statistical wor. This procedure provides you with a set of electronic statistical tables that
Checking proportionality for Cox s regression model
Checking proportionality for Cox s regression model by Hui Hong Zhang Thesis for the degree of Master of Science (Master i Modellering og dataanalyse) Department of Mathematics Faculty of Mathematics and
Early mortality rate (EMR) in Acute Myeloid Leukemia (AML)
Early mortality rate (EMR) in Acute Myeloid Leukemia (AML) George Yaghmour, MD Hematology Oncology Fellow PGY5 UTHSC/West cancer Center, Memphis, TN May,1st,2015 Off-Label Use Disclosure(s) I do not intend
Advanced Statistical Analysis of Mortality. Rhodes, Thomas E. and Freitas, Stephen A. MIB, Inc. 160 University Avenue. Westwood, MA 02090
Advanced Statistical Analysis of Mortality Rhodes, Thomas E. and Freitas, Stephen A. MIB, Inc 160 University Avenue Westwood, MA 02090 001-(781)-751-6356 fax 001-(781)-329-3379 [email protected] Abstract
Ordinal Regression. Chapter
Ordinal Regression Chapter 4 Many variables of interest are ordinal. That is, you can rank the values, but the real distance between categories is unknown. Diseases are graded on scales from least severe
DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9
DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9 Analysis of covariance and multiple regression So far in this course,
Introduction. Survival Analysis. Censoring. Plan of Talk
Survival Analysis Mark Lunt Arthritis Research UK Centre for Excellence in Epidemiology University of Manchester 01/12/2015 Survival Analysis is concerned with the length of time before an event occurs.
Package survey. February 20, 2015
Title analysis of complex survey samples Package survey February 20, 2015 Summary statistics, two-sample tests, rank tests, generalised linear models, cumulative link models, Cox models, loglinear models,
A General Approach to Variance Estimation under Imputation for Missing Survey Data
A General Approach to Variance Estimation under Imputation for Missing Survey Data J.N.K. Rao Carleton University Ottawa, Canada 1 2 1 Joint work with J.K. Kim at Iowa State University. 2 Workshop on Survey
Inferential Problems with Nonprobability Samples
Inferential Problems with Nonprobability Samples Richard Valliant University of Michigan & University of Maryland 9 Sep 2015 (UMich & UMD) WSS seminar 1 / 18 Types of samples Not all nonprobability samples
Lecture 14: GLM Estimation and Logistic Regression
Lecture 14: GLM Estimation and Logistic Regression Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South
Approaches for Analyzing Survey Data: a Discussion
Approaches for Analyzing Survey Data: a Discussion David Binder 1, Georgia Roberts 1 Statistics Canada 1 Abstract In recent years, an increasing number of researchers have been able to access survey microdata
Survival Analysis, Software
Survival Analysis, Software As used here, survival analysis refers to the analysis of data where the response variable is the time until the occurrence of some event (e.g. death), where some of the observations
MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group
MISSING DATA TECHNIQUES WITH SAS IDRE Statistical Consulting Group ROAD MAP FOR TODAY To discuss: 1. Commonly used techniques for handling missing data, focusing on multiple imputation 2. Issues that could
A Basic Introduction to Missing Data
John Fox Sociology 740 Winter 2014 Outline Why Missing Data Arise Why Missing Data Arise Global or unit non-response. In a survey, certain respondents may be unreachable or may refuse to participate. Item
Methods for Meta-analysis in Medical Research
Methods for Meta-analysis in Medical Research Alex J. Sutton University of Leicester, UK Keith R. Abrams University of Leicester, UK David R. Jones University of Leicester, UK Trevor A. Sheldon University
1 Logistic Regression
Generalized Linear Models 1 May 29, 2012 Generalized Linear Models (GLM) In the previous chapter on regression, we focused primarily on the classic setting where the response y is continuous and typically
Family economics data: total family income, expenditures, debt status for 50 families in two cohorts (A and B), annual records from 1990 1995.
Lecture 18 1. Random intercepts and slopes 2. Notation for mixed effects models 3. Comparing nested models 4. Multilevel/Hierarchical models 5. SAS versions of R models in Gelman and Hill, chapter 12 1
Prediction for Multilevel Models
Prediction for Multilevel Models Sophia Rabe-Hesketh Graduate School of Education & Graduate Group in Biostatistics University of California, Berkeley Institute of Education, University of London Joint
A Composite Likelihood Approach to Analysis of Survey Data with Sampling Weights Incorporated under Two-Level Models
A Composite Likelihood Approach to Analysis of Survey Data with Sampling Weights Incorporated under Two-Level Models Grace Y. Yi 13, JNK Rao 2 and Haocheng Li 1 1. University of Waterloo, Waterloo, Canada
Basic Statistical and Modeling Procedures Using SAS
Basic Statistical and Modeling Procedures Using SAS One-Sample Tests The statistical procedures illustrated in this handout use two datasets. The first, Pulse, has information collected in a classroom
School of Public Health and Health Services Department of Epidemiology and Biostatistics
School of Public Health and Health Services Department of Epidemiology and Biostatistics Master of Public Health and Graduate Certificate Biostatistics 0-04 Note: All curriculum revisions will be updated
ESTIMATION OF THE EFFECTIVE DEGREES OF FREEDOM IN T-TYPE TESTS FOR COMPLEX DATA
m ESTIMATION OF THE EFFECTIVE DEGREES OF FREEDOM IN T-TYPE TESTS FOR COMPLEX DATA Jiahe Qian, Educational Testing Service Rosedale Road, MS 02-T, Princeton, NJ 08541 Key Words" Complex sampling, NAEP data,
11. Analysis of Case-control Studies Logistic Regression
Research methods II 113 11. Analysis of Case-control Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:
Electronic Thesis and Dissertations UCLA
Electronic Thesis and Dissertations UCLA Peer Reviewed Title: A Multilevel Longitudinal Analysis of Teaching Effectiveness Across Five Years Author: Wang, Kairong Acceptance Date: 2013 Series: UCLA Electronic
CHAPTER 12 EXAMPLES: MONTE CARLO SIMULATION STUDIES
Examples: Monte Carlo Simulation Studies CHAPTER 12 EXAMPLES: MONTE CARLO SIMULATION STUDIES Monte Carlo simulation studies are often used for methodological investigations of the performance of statistical
Internet Appendix to CAPM for estimating cost of equity capital: Interpreting the empirical evidence
Internet Appendix to CAPM for estimating cost of equity capital: Interpreting the empirical evidence This document contains supplementary material to the paper titled CAPM for estimating cost of equity
Goodness of Fit Tests for Categorical Data: Comparing Stata, R and SAS
Goodness of Fit Tests for Categorical Data: Comparing Stata, R and SAS Rino Bellocco 1,2, Sc.D. Sara Algeri 1, MS 1 University of Milano-Bicocca, Milan, Italy & 2 Karolinska Institutet, Stockholm, Sweden
Examples of Using R for Modeling Ordinal Data
Examples of Using R for Modeling Ordinal Data Alan Agresti Department of Statistics, University of Florida Supplement for the book Analysis of Ordinal Categorical Data, 2nd ed., 2010 (Wiley), abbreviated
Multinomial and Ordinal Logistic Regression
Multinomial and Ordinal Logistic Regression ME104: Linear Regression Analysis Kenneth Benoit August 22, 2012 Regression with categorical dependent variables When the dependent variable is categorical,
Developing Risk Adjustment Techniques Using the SAS@ System for Assessing Health Care Quality in the lmsystem@
Developing Risk Adjustment Techniques Using the SAS@ System for Assessing Health Care Quality in the lmsystem@ Yanchun Xu, Andrius Kubilius Joint Commission on Accreditation of Healthcare Organizations,
Overview of Methods for Analyzing Cluster-Correlated Data. Garrett M. Fitzmaurice
Overview of Methods for Analyzing Cluster-Correlated Data Garrett M. Fitzmaurice Laboratory for Psychiatric Biostatistics, McLean Hospital Department of Biostatistics, Harvard School of Public Health Outline
UNDERGRADUATE DEGREE DETAILS : BACHELOR OF SCIENCE WITH
QATAR UNIVERSITY COLLEGE OF ARTS & SCIENCES Department of Mathematics, Statistics, & Physics UNDERGRADUATE DEGREE DETAILS : Program Requirements and Descriptions BACHELOR OF SCIENCE WITH A MAJOR IN STATISTICS
