Survival/Failer Time Analysis in Clinical Research

Size: px
Start display at page:

Download "Survival/Failer Time Analysis in Clinical Research"

Transcription

1 Vanderbilt Clinical Research Center Research Skills Workshop Survival/Failer Time Analysis in Clinical Research Zhiguo (Alex) Zhao Division of Cancer Biostatistics Department of Biostatistics Vanderbilt University School of Medicine October 29, 2010 Zhiguo (Alex) Zhao (VU) Survival Analysis Vanderbilt CRC, Oct. 29, / 36

2 Outline 1 Introduction What is survival analysis? Why do we need survival analysis? 2 Terms You Want to Know Censoring Survival & hazard 3 Methods Widely Used Estimate and interpret survival characteristics Compare survival in different groups Assess the relationship between explanatory variables and survival time 4 A Case Study Study introduction KM method Log-rank test Cox proportional hazard regression 5 Advanced Topics 6 Questions? Zhiguo (Alex) Zhao (VU) Survival Analysis Vanderbilt CRC, Oct. 29, / 36

3 Introduction What is survival analysis? What is survival analysis? Generally defined as a set of methods for analyzing data where the outcome variable is the time until the occurrence of an event of interest. Also called time-to-event analysis time to cardiovascular death after some treatment intervention time until a response (10% decrease in SBP) time until tumor recurrence time until AIDS for HIV patients time until infection time until pregnancy Zhiguo (Alex) Zhao (VU) Survival Analysis Vanderbilt CRC, Oct. 29, / 36

4 Introduction Why do we need survival analysis? Why do we need survival analysis? Why not use linear regression to model the survival time as a function of a set of predictor variables? Time to event is restricted to be positive, and has a skewed distribution Change of interest (probability of surviving past a certain point in time) Cannot effectively handle the censoring of the observations Censoring (incomplete observations of the survival time, partial information) Zhiguo (Alex) Zhao (VU) Survival Analysis Vanderbilt CRC, Oct. 29, / 36

5 Introduction Why do we need survival analysis? Why do we need survival analysis? How about a logistic regression? Change of interest (status at certain time point) Lower power Censoring problem Zhiguo (Alex) Zhao (VU) Survival Analysis Vanderbilt CRC, Oct. 29, / 36

6 Introduction Why do we need survival analysis? Why do we need survival analysis? Example: We want to predict 2-year cancer recurrence rate using patient characteristics, such as patient demo, tumor histology, gene profile. Logistic regression. If the only interest is the status at the end of 2-year follow-up, and such info is available for all subjects. Questions: If the result from another study is 1-year recurrence rate, and you want to compare the 2-year study to it. If some subject drop out. If subject A has recurrence at 2.1 years and subject B has recurrence at 5 years, should the two subjects be treated the same in your analysis? Zhiguo (Alex) Zhao (VU) Survival Analysis Vanderbilt CRC, Oct. 29, / 36

7 Terms You Want to Know Censoring What is censoring? In statistics, engineering, and medical research, censoring occurs when the value of an observation is only partially known. Different from missing Type of censoring: Left, Right, Interval fixed type I random type I type II Assume that it is non-informative about the event Result from: loss of follow-up drop out ALL patient dies in automobile accident before relapsing Bone marrow transplant patient dies of opportunistic infection before engraftment termination of the study (follow-up ends before event occurs) Zhiguo (Alex) Zhao (VU) Survival Analysis Vanderbilt CRC, Oct. 29, / 36

8 What is censoring? Terms You Want to Know Censoring Figure: Understand censoring Zhiguo (Alex) Zhao (VU) Survival Analysis Vanderbilt CRC, Oct. 29, / 36

9 Terms You Want to Know Survival & hazard Survival and hazard functions The survival function is: S(t) = Pr(T > t) Probability that a subject will survive past time t Non-increasing Smooth in theory. In practice, we see step functions. The hazard function, h(t), is the instantaneous rate at which events occur, given no previous events. Pr(t < T t + t T > t) h(t) = lim = f (t) t 0 t S(t) The cumulative hazard function, H(t), is the accumulated risk up to time t. Zhiguo (Alex) Zhao (VU) Survival Analysis Vanderbilt CRC, Oct. 29, / 36

10 Terms You Want to Know Survival and hazard functions Survival & hazard If we know any one of these three functions, we can derive the other two. h(t) = log(s(t)) t H(t) = log(s(t)) S(t) = exp( H(t)) Zhiguo (Alex) Zhao (VU) Survival Analysis Vanderbilt CRC, Oct. 29, / 36

11 Goals and methods Methods Widely Used Estimate and interpret survival characteristics Kaplan-Meier plots Parametric survival functions Median survival time 5-year survival rate Confidence intervals (CI) Compare survival in different groups Log-rank test Assess the relationship between explanatory variables and survival time Proportional hazards models Accelerated failure time models Zhiguo (Alex) Zhao (VU) Survival Analysis Vanderbilt CRC, Oct. 29, / 36

12 Kaplan-Meier estimator Methods Widely Used Estimate and interpret survival characteristics Also called product-limit estimator Non-parametric estimation of S Step-wise, not smooth, left closed Any jumping point is a failure time point Figure: Kaplan-Meier estimator calculation Zhiguo (Alex) Zhao (VU) Survival Analysis Vanderbilt CRC, Oct. 29, / 36

13 Kaplan-Meier estimator Methods Widely Used Estimate and interpret survival characteristics Median survival time, 1-year survival rate, CIs can be estimated. s^(t) Time (days) Zhiguo (Alex) Zhao (VU) Survival Analysis Vanderbilt CRC, Oct. 29, / 36

14 Methods Widely Used Estimate and interpret survival characteristics Parametric survival functions With more assumptions, we may model the data in more detail. Easily compute selected quantiles of the distribution Estimate the expected event time Estimate survival function more precisely than KM Popular distributions for estimating survival curves: Exponential Weibull Log-normal Gamma Zhiguo (Alex) Zhao (VU) Survival Analysis Vanderbilt CRC, Oct. 29, / 36

15 Methods Widely Used Estimate and interpret survival characteristics Exponential survival curve s^(t) Time (days) Figure: Kaplan-Meier and exponential survival curves Zhiguo (Alex) Zhao (VU) Survival Analysis Vanderbilt CRC, Oct. 29, / 36

16 Log-rank test Methods Widely Used Compare survival in different groups Idea: If survival is independent of group effect, then at each time point, roughly the same proportion in each group will have an event. Two-sample log-rank test: Group 1: Survival function S 1 (t) Group 2: Survival function S 2 (t) Statistical hypothesis: H 0 : S 1 (t) = S 2 (t) H A : S 1 (t) S 2 (t) Zhiguo (Alex) Zhao (VU) Survival Analysis Vanderbilt CRC, Oct. 29, / 36

17 Log-rank test Methods Widely Used Compare survival in different groups Proportion in Remission mercaptopurine (6 MP) Placebo Time since Enrollment (weeks) Zhiguo (Alex) Zhao (VU) Survival Analysis Vanderbilt CRC, Oct. 29, / 36

18 Methods Widely Used Compare survival in different groups Log-rank test Median survival time is 22.5 months for 6-MP group and 8 months for placebo group. The KM curve for 6-MP group (superior) lies above that for the placebo The gap seems to become bigger as time progresses. Zhiguo (Alex) Zhao (VU) Survival Analysis Vanderbilt CRC, Oct. 29, / 36

19 Methods Widely Used Compare survival in different groups Log-rank test Log-rank test in R: Call: survdiff(formula = Surv(WeeksinRemission, status) ~ treatment, data = leuk) N Observed Expected (O-E)^2/E (O-E)^2/V treatment=6-mp treatment=placebo Chisq= 16.8 on 1 degrees of freedom, p= 4.17e-05 The p value of the test is p<0.001, which implies a statistically significant difference in the survival of the two groups. Zhiguo (Alex) Zhao (VU) Survival Analysis Vanderbilt CRC, Oct. 29, / 36

20 Methods Widely Used Compare survival in different groups Log-rank test The method falls short in the following situations: Not work with continuous variables Cannot handle multiple factors Cannot quantify the differences Bad performance when two survival curves cross Zhiguo (Alex) Zhao (VU) Survival Analysis Vanderbilt CRC, Oct. 29, / 36

21 Methods Widely Used Accelerated failure time models Assess the relationship between explanatory variables and survival time AFT model assumes that the effect of a covariate is to multiply the predicted event time by some constant. AFT models can therefore be framed as linear models for the logarithm of the survival time. S(t X ) = ψ((log(t) X β)/σ) Zhiguo (Alex) Zhao (VU) Survival Analysis Vanderbilt CRC, Oct. 29, / 36

22 Methods Widely Used Assess the relationship between explanatory variables and survival time Proportional hazards models Modeling: h(t X ) = h(t)exp(x β) The most widely used survival regression Predictors act on a subject s hazard h(t) is underling hazard function exp(xβ) is called a relative hazard function The effect of the predictors is the same for all values of t. Any parametric hazard function can be used for h(t) h(t) can be left completely unspecified Zhiguo (Alex) Zhao (VU) Survival Analysis Vanderbilt CRC, Oct. 29, / 36

23 Methods Widely Used Proportional hazards models Assess the relationship between explanatory variables and survival time Figure: Proportional hazards Zhiguo (Alex) Zhao (VU) Survival Analysis Vanderbilt CRC, Oct. 29, / 36

24 Methods Widely Used Proportional hazards models Assess the relationship between explanatory variables and survival time Interpretation of coefficients: The regression coefficient for X j is the increase in log hazard at any time point if X j is increased by one unit and all other predictors are held constant. Interpretation of exp(β): The effect of increasing X j by 1 unit is to increase the hazard of the event by a factor of exp(β) at all points in time. What if X j increase from X 1 j to X 2 j? Hazard ratio. The ratio of hazard for an subject with predictor values X 2 j compared to an subject with predictor values X 1 j is exp((x 2 j X 1 j )β). What if X j is a binary predictor? X j = 1 if subject is male. X j = 0 if subject is female. The hazard of the event for male is exp(β) times that for female. (Assuming female as reference group.) Zhiguo (Alex) Zhao (VU) Survival Analysis Vanderbilt CRC, Oct. 29, / 36

25 Methods Widely Used Assess the relationship between explanatory variables and survival time Cox proportional hazards model A semiparametric model Makes no assumptions about the underling survival function Assumes parametric form for the effect of the predictors on the hazard More interested in the parameter estimates than the shape of the hazard. Zhiguo (Alex) Zhao (VU) Survival Analysis Vanderbilt CRC, Oct. 29, / 36

26 A Case Study Study introduction An Eastern Cooperative Oncology Group study A randomized trial comparing two treatments for ovarian cancer. Data dictionary: Variable Explanation Coding futime survival or censoring time in days fustat censoring status 1=death, 0=censoring age in years resid.ds residual disease present 1=No, 2=Yes rx treatment group ecog.ps ECOG performance status 1 is better Zhiguo (Alex) Zhao (VU) Survival Analysis Vanderbilt CRC, Oct. 29, / 36

27 Life table A Case Study KM method > fit1=survfit(surv(futime,fustat)~1, data=ovarian) > summary(fit1) Call: survfit(formula = Surv(futime, fustat) ~ 1, data = ovarian) time n.risk n.event survival std.err lower 95% CI upper 95% CI Zhiguo (Alex) Zhao (VU) Survival Analysis Vanderbilt CRC, Oct. 29, / 36

28 Overall KM curve A Case Study KM method Proportion of survival Time (days) Zhiguo (Alex) Zhao (VU) Survival Analysis Vanderbilt CRC, Oct. 29, / 36

29 A Case Study KM method Zhiguo (Alex) Zhao (VU) Figure: Kaplan-Meier Survival curves Analysisby treatment group Vanderbilt CRC, Oct. 29, / 36 KM curve by treatment group > fit2=survfit(surv(futime,fustat)~rx, data=ovarian) > plot(fit2,lty = 1:2,lwd=2,ylim=c(0.3,1.0),xlab="Time (days)", ylab="proportion of survival",col=1:2) > legend("topright", legend=c("treatment 1","Treatment 2"), lty = 1:2,col=1:2) Proportion of survival Treatment 1 Treatment Time (days)

30 A Case Study KM curve by treatment group KM method > fit2 Call: survfit(formula = Surv(futime, fustat) ~ rx, data = ovarian) records n.max n.start events median 0.95LCL 0.95UCL rx= NA rx= NA 475 NA Zhiguo (Alex) Zhao (VU) Survival Analysis Vanderbilt CRC, Oct. 29, / 36

31 A Case Study Log-rank test Use log-rank test to compare two survival curves > survdiff(surv(futime,fustat)~rx, data=ovarian) Call: survdiff(formula = Surv(futime, fustat) ~ rx, data = ovarian) N Observed Expected (O-E)^2/E (O-E)^2/V rx= rx= Chisq= 1.1 on 1 degrees of freedom, p= Zhiguo (Alex) Zhao (VU) Survival Analysis Vanderbilt CRC, Oct. 29, / 36

32 A Case Study Assess effect of age on survival Cox proportional hazard regression > summary(fit3 <- coxph(surv(futime,fustat)~age, data=ovarian)) Call: coxph(formula = Surv(futime, fustat) ~ age, data = ovarian) n= 26 coef exp(coef) se(coef) z Pr(> z ) age ** --- Signif. codes: 0 Ś***Š Ś**Š 0.01 Ś*Š 0.05 Ś.Š 0.1 Ś Š 1 exp(coef) exp(-coef) lower.95 upper.95 age Rsquare= (max possible= ) Likelihood ratio test= on 1 df, p= Wald test = on 1 df, p= Score (logrank) test = on 1 df, p= Zhiguo (Alex) Zhao (VU) Survival Analysis Vanderbilt CRC, Oct. 29, / 36

33 Rsquare= (max possible= ) Likelihood ratio test= on 2 df, p= Wald test = on 2 df, p= Zhiguo (Alex) Zhao (VU) Survival Analysis Vanderbilt CRC, Oct. 29, / 36 A Case Study Cox proportional hazard regression Assess effect of treatment while age was adjusted > summary(fit4 <- coxph(surv(futime,fustat)~rx+age, data=ovarian)) Call: coxph(formula = Surv(futime, fustat) ~ rx + age, data = ovarian) n= 26 coef exp(coef) se(coef) z Pr(> z ) rx age ** --- Signif. codes: 0 Ś***Š Ś**Š 0.01 Ś*Š 0.05 Ś.Š 0.1 Ś Š 1 exp(coef) exp(-coef) lower.95 upper.95 rx age

34 Checking PH assumption A Case Study Cox proportional hazard regression > cox.zph(fit4) rho chisq p rx age GLOBAL NA Zhiguo (Alex) Zhao (VU) Survival Analysis Vanderbilt CRC, Oct. 29, / 36

35 Advanced Topics Advanced topics need further discussing Left truncation: Selective sampling, i.e. patient is included in the sample if a specific condition (e.g. T > t 0 ) is satisfied. Interval censoring: Information about the survival time is in the form t 1 < T < t 2. Competing risk: Involve multiple causes of failure. Time-dependent covariates: Covariates change with time. Dependent survival times: Bivariate survival models (e.g. correlated frailty models) can be used to analyze survival data on twins and relatives. Zhiguo (Alex) Zhao (VU) Survival Analysis Vanderbilt CRC, Oct. 29, / 36

36 Questions? Questions? The slides will be available at Zhiguo (Alex) Zhao (VU) Survival Analysis Vanderbilt CRC, Oct. 29, / 36

37 Questions? Questions? The slides will be available at If you have any questions, feel free to send me an at Zhiguo (Alex) Zhao (VU) Survival Analysis Vanderbilt CRC, Oct. 29, / 36

38 Questions? Questions? The slides will be available at If you have any questions, feel free to send me an at Thank you! Zhiguo (Alex) Zhao (VU) Survival Analysis Vanderbilt CRC, Oct. 29, / 36

Lecture 15 Introduction to Survival Analysis

Lecture 15 Introduction to Survival Analysis Lecture 15 Introduction to Survival Analysis BIOST 515 February 26, 2004 BIOST 515, Lecture 15 Background In logistic regression, we were interested in studying how risk factors were associated with presence

More information

Applied Statistics. J. Blanchet and J. Wadsworth. Institute of Mathematics, Analysis, and Applications EPF Lausanne

Applied Statistics. J. Blanchet and J. Wadsworth. Institute of Mathematics, Analysis, and Applications EPF Lausanne Applied Statistics J. Blanchet and J. Wadsworth Institute of Mathematics, Analysis, and Applications EPF Lausanne An MSc Course for Applied Mathematicians, Fall 2012 Outline 1 Model Comparison 2 Model

More information

SUMAN DUVVURU STAT 567 PROJECT REPORT

SUMAN DUVVURU STAT 567 PROJECT REPORT SUMAN DUVVURU STAT 567 PROJECT REPORT SURVIVAL ANALYSIS OF HEROIN ADDICTS Background and introduction: Current illicit drug use among teens is continuing to increase in many countries around the world.

More information

Survival analysis methods in Insurance Applications in car insurance contracts

Survival analysis methods in Insurance Applications in car insurance contracts Survival analysis methods in Insurance Applications in car insurance contracts Abder OULIDI 1-2 Jean-Marie MARION 1 Hérvé GANACHAUD 3 1 Institut de Mathématiques Appliquées (IMA) Angers France 2 Institut

More information

Tips for surviving the analysis of survival data. Philip Twumasi-Ankrah, PhD

Tips for surviving the analysis of survival data. Philip Twumasi-Ankrah, PhD Tips for surviving the analysis of survival data Philip Twumasi-Ankrah, PhD Big picture In medical research and many other areas of research, we often confront continuous, ordinal or dichotomous outcomes

More information

Generalized Linear Models

Generalized Linear Models Generalized Linear Models We have previously worked with regression models where the response variable is quantitative and normally distributed. Now we turn our attention to two types of models where the

More information

Introduction to Event History Analysis DUSTIN BROWN POPULATION RESEARCH CENTER

Introduction to Event History Analysis DUSTIN BROWN POPULATION RESEARCH CENTER Introduction to Event History Analysis DUSTIN BROWN POPULATION RESEARCH CENTER Objectives Introduce event history analysis Describe some common survival (hazard) distributions Introduce some useful Stata

More information

Regression Modeling Strategies

Regression Modeling Strategies Frank E. Harrell, Jr. Regression Modeling Strategies With Applications to Linear Models, Logistic Regression, and Survival Analysis With 141 Figures Springer Contents Preface Typographical Conventions

More information

Lecture 2 ESTIMATING THE SURVIVAL FUNCTION. One-sample nonparametric methods

Lecture 2 ESTIMATING THE SURVIVAL FUNCTION. One-sample nonparametric methods Lecture 2 ESTIMATING THE SURVIVAL FUNCTION One-sample nonparametric methods There are commonly three methods for estimating a survivorship function S(t) = P (T > t) without resorting to parametric models:

More information

Statistics in Retail Finance. Chapter 6: Behavioural models

Statistics in Retail Finance. Chapter 6: Behavioural models Statistics in Retail Finance 1 Overview > So far we have focussed mainly on application scorecards. In this chapter we shall look at behavioural models. We shall cover the following topics:- Behavioural

More information

Introduction. Survival Analysis. Censoring. Plan of Talk

Introduction. Survival Analysis. Censoring. Plan of Talk Survival Analysis Mark Lunt Arthritis Research UK Centre for Excellence in Epidemiology University of Manchester 01/12/2015 Survival Analysis is concerned with the length of time before an event occurs.

More information

11. Analysis of Case-control Studies Logistic Regression

11. Analysis of Case-control Studies Logistic Regression Research methods II 113 11. Analysis of Case-control Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:

More information

Modeling the Claim Duration of Income Protection Insurance Policyholders Using Parametric Mixture Models

Modeling the Claim Duration of Income Protection Insurance Policyholders Using Parametric Mixture Models Modeling the Claim Duration of Income Protection Insurance Policyholders Using Parametric Mixture Models Abstract This paper considers the modeling of claim durations for existing claimants under income

More information

Vignette for survrm2 package: Comparing two survival curves using the restricted mean survival time

Vignette for survrm2 package: Comparing two survival curves using the restricted mean survival time Vignette for survrm2 package: Comparing two survival curves using the restricted mean survival time Hajime Uno Dana-Farber Cancer Institute March 16, 2015 1 Introduction In a comparative, longitudinal

More information

Descriptive Statistics

Descriptive Statistics Descriptive Statistics Primer Descriptive statistics Central tendency Variation Relative position Relationships Calculating descriptive statistics Descriptive Statistics Purpose to describe or summarize

More information

Personalized Predictive Medicine and Genomic Clinical Trials

Personalized Predictive Medicine and Genomic Clinical Trials Personalized Predictive Medicine and Genomic Clinical Trials Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer Institute http://brb.nci.nih.gov brb.nci.nih.gov Powerpoint presentations

More information

The Cox Proportional Hazards Model

The Cox Proportional Hazards Model The Cox Proportional Hazards Model Mario Chen, PhD Advanced Biostatistics and RCT Workshop Office of AIDS Research, NIH ICSSC, FHI Goa, India, September 2009 1 The Model h i (t)=h 0 (t)exp(z i ), Z i =

More information

Survival Analysis of Left Truncated Income Protection Insurance Data. [March 29, 2012]

Survival Analysis of Left Truncated Income Protection Insurance Data. [March 29, 2012] Survival Analysis of Left Truncated Income Protection Insurance Data [March 29, 2012] 1 Qing Liu 2 David Pitt 3 Yan Wang 4 Xueyuan Wu Abstract One of the main characteristics of Income Protection Insurance

More information

200609 - ATV - Lifetime Data Analysis

200609 - ATV - Lifetime Data Analysis Coordinating unit: Teaching unit: Academic year: Degree: ECTS credits: 2015 200 - FME - School of Mathematics and Statistics 715 - EIO - Department of Statistics and Operations Research 1004 - UB - (ENG)Universitat

More information

Statistical Analysis of Life Insurance Policy Termination and Survivorship

Statistical Analysis of Life Insurance Policy Termination and Survivorship Statistical Analysis of Life Insurance Policy Termination and Survivorship Emiliano A. Valdez, PhD, FSA Michigan State University joint work with J. Vadiveloo and U. Dias Session ES82 (Statistics in Actuarial

More information

Competing-risks regression

Competing-risks regression Competing-risks regression Roberto G. Gutierrez Director of Statistics StataCorp LP Stata Conference Boston 2010 R. Gutierrez (StataCorp) Competing-risks regression July 15-16, 2010 1 / 26 Outline 1. Overview

More information

Modelling spousal mortality dependence: evidence of heterogeneities and implications

Modelling spousal mortality dependence: evidence of heterogeneities and implications 1/23 Modelling spousal mortality dependence: evidence of heterogeneities and implications Yang Lu Scor and Aix-Marseille School of Economics Lyon, September 2015 2/23 INTRODUCTION 3/23 Motivation It has

More information

SPSS TRAINING SESSION 3 ADVANCED TOPICS (PASW STATISTICS 17.0) Sun Li Centre for Academic Computing lsun@smu.edu.sg

SPSS TRAINING SESSION 3 ADVANCED TOPICS (PASW STATISTICS 17.0) Sun Li Centre for Academic Computing lsun@smu.edu.sg SPSS TRAINING SESSION 3 ADVANCED TOPICS (PASW STATISTICS 17.0) Sun Li Centre for Academic Computing lsun@smu.edu.sg IN SPSS SESSION 2, WE HAVE LEARNT: Elementary Data Analysis Group Comparison & One-way

More information

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Volkert Siersma siersma@sund.ku.dk The Research Unit for General Practice in Copenhagen Dias 1 Content Quantifying association

More information

Statistics in Medicine Research Lecture Series CSMC Fall 2014

Statistics in Medicine Research Lecture Series CSMC Fall 2014 Catherine Bresee, MS Senior Biostatistician Biostatistics & Bioinformatics Research Institute Statistics in Medicine Research Lecture Series CSMC Fall 2014 Overview Review concept of statistical power

More information

LOGISTIC REGRESSION ANALYSIS

LOGISTIC REGRESSION ANALYSIS LOGISTIC REGRESSION ANALYSIS C. Mitchell Dayton Department of Measurement, Statistics & Evaluation Room 1230D Benjamin Building University of Maryland September 1992 1. Introduction and Model Logistic

More information

Interpretation of Somers D under four simple models

Interpretation of Somers D under four simple models Interpretation of Somers D under four simple models Roger B. Newson 03 September, 04 Introduction Somers D is an ordinal measure of association introduced by Somers (96)[9]. It can be defined in terms

More information

HURDLE AND SELECTION MODELS Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics July 2009

HURDLE AND SELECTION MODELS Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics July 2009 HURDLE AND SELECTION MODELS Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics July 2009 1. Introduction 2. A General Formulation 3. Truncated Normal Hurdle Model 4. Lognormal

More information

Survival Analysis: An Introduction

Survival Analysis: An Introduction Survival Analysis: An Introduction Jaine Blayney Bioinformatics, CCRCB j.blayney@qub.ac.uk 24/09/2012 JKB 1 DEFINITION OF SURVIVAL ANALYSIS Survival analysis examines and models the time it takes for events

More information

Predicting Customer Default Times using Survival Analysis Methods in SAS

Predicting Customer Default Times using Survival Analysis Methods in SAS Predicting Customer Default Times using Survival Analysis Methods in SAS Bart Baesens Bart.Baesens@econ.kuleuven.ac.be Overview The credit scoring survival analysis problem Statistical methods for Survival

More information

13. Poisson Regression Analysis

13. Poisson Regression Analysis 136 Poisson Regression Analysis 13. Poisson Regression Analysis We have so far considered situations where the outcome variable is numeric and Normally distributed, or binary. In clinical work one often

More information

Linda Staub & Alexandros Gekenidis

Linda Staub & Alexandros Gekenidis Seminar in Statistics: Survival Analysis Chapter 2 Kaplan-Meier Survival Curves and the Log- Rank Test Linda Staub & Alexandros Gekenidis March 7th, 2011 1 Review Outcome variable of interest: time until

More information

Multiple logistic regression analysis of cigarette use among high school students

Multiple logistic regression analysis of cigarette use among high school students Multiple logistic regression analysis of cigarette use among high school students ABSTRACT Joseph Adwere-Boamah Alliant International University A binary logistic regression analysis was performed to predict

More information

The Kaplan-Meier Plot. Olaf M. Glück

The Kaplan-Meier Plot. Olaf M. Glück The Kaplan-Meier Plot 1 Introduction 2 The Kaplan-Meier-Estimator (product limit estimator) 3 The Kaplan-Meier Curve 4 From planning to the Kaplan-Meier Curve. An Example 5 Sources & References 1 Introduction

More information

Adequacy of Biomath. Models. Empirical Modeling Tools. Bayesian Modeling. Model Uncertainty / Selection

Adequacy of Biomath. Models. Empirical Modeling Tools. Bayesian Modeling. Model Uncertainty / Selection Directions in Statistical Methodology for Multivariable Predictive Modeling Frank E Harrell Jr University of Virginia Seattle WA 19May98 Overview of Modeling Process Model selection Regression shape Diagnostics

More information

PRACTICE PROBLEMS FOR BIOSTATISTICS

PRACTICE PROBLEMS FOR BIOSTATISTICS PRACTICE PROBLEMS FOR BIOSTATISTICS BIOSTATISTICS DESCRIBING DATA, THE NORMAL DISTRIBUTION 1. The duration of time from first exposure to HIV infection to AIDS diagnosis is called the incubation period.

More information

Ordinal Regression. Chapter

Ordinal Regression. Chapter Ordinal Regression Chapter 4 Many variables of interest are ordinal. That is, you can rank the values, but the real distance between categories is unknown. Diseases are graded on scales from least severe

More information

Checking proportionality for Cox s regression model

Checking proportionality for Cox s regression model Checking proportionality for Cox s regression model by Hui Hong Zhang Thesis for the degree of Master of Science (Master i Modellering og dataanalyse) Department of Mathematics Faculty of Mathematics and

More information

Tests for Two Survival Curves Using Cox s Proportional Hazards Model

Tests for Two Survival Curves Using Cox s Proportional Hazards Model Chapter 730 Tests for Two Survival Curves Using Cox s Proportional Hazards Model Introduction A clinical trial is often employed to test the equality of survival distributions of two treatment groups.

More information

A Handbook of Statistical Analyses Using R. Brian S. Everitt and Torsten Hothorn

A Handbook of Statistical Analyses Using R. Brian S. Everitt and Torsten Hothorn A Handbook of Statistical Analyses Using R Brian S. Everitt and Torsten Hothorn CHAPTER 6 Logistic Regression and Generalised Linear Models: Blood Screening, Women s Role in Society, and Colonic Polyps

More information

Statistical Models in R

Statistical Models in R Statistical Models in R Some Examples Steven Buechler Department of Mathematics 276B Hurley Hall; 1-6233 Fall, 2007 Outline Statistical Models Structure of models in R Model Assessment (Part IA) Anova

More information

Parametric and non-parametric statistical methods for the life sciences - Session I

Parametric and non-parametric statistical methods for the life sciences - Session I Why nonparametric methods What test to use? Rank Tests Parametric and non-parametric statistical methods for the life sciences - Session I Liesbeth Bruckers Geert Molenberghs Interuniversity Institute

More information

Efficacy analysis and graphical representation in Oncology trials - A case study

Efficacy analysis and graphical representation in Oncology trials - A case study Efficacy analysis and graphical representation in Oncology trials - A case study Anindita Bhattacharjee Vijayalakshmi Indana Cytel, Pune The views expressed in this presentation are our own and do not

More information

Handling missing data in Stata a whirlwind tour

Handling missing data in Stata a whirlwind tour Handling missing data in Stata a whirlwind tour 2012 Italian Stata Users Group Meeting Jonathan Bartlett www.missingdata.org.uk 20th September 2012 1/55 Outline The problem of missing data and a principled

More information

Probability Calculator

Probability Calculator Chapter 95 Introduction Most statisticians have a set of probability tables that they refer to in doing their statistical wor. This procedure provides you with a set of electronic statistical tables that

More information

Data Analysis, Research Study Design and the IRB

Data Analysis, Research Study Design and the IRB Minding the p-values p and Quartiles: Data Analysis, Research Study Design and the IRB Don Allensworth-Davies, MSc Research Manager, Data Coordinating Center Boston University School of Public Health IRB

More information

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN Beckman HLM Reading Group: Questions, Answers and Examples Carolyn J. Anderson Department of Educational Psychology I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN Linear Algebra Slide 1 of

More information

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96 1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years

More information

A new score predicting the survival of patients with spinal cord compression from myeloma

A new score predicting the survival of patients with spinal cord compression from myeloma A new score predicting the survival of patients with spinal cord compression from myeloma (1) Sarah Douglas, Department of Radiation Oncology, University of Lubeck, Germany; sarah_douglas@gmx.de (2) Steven

More information

Survival Analysis of Dental Implants. Abstracts

Survival Analysis of Dental Implants. Abstracts Survival Analysis of Dental Implants Andrew Kai-Ming Kwan 1,4, Dr. Fu Lee Wang 2, and Dr. Tak-Kun Chow 3 1 Census and Statistics Department, Hong Kong, China 2 Caritas Institute of Higher Education, Hong

More information

7.1 The Hazard and Survival Functions

7.1 The Hazard and Survival Functions Chapter 7 Survival Models Our final chapter concerns models for the analysis of data which have three main characteristics: (1) the dependent variable or response is the waiting time until the occurrence

More information

MGT 267 PROJECT. Forecasting the United States Retail Sales of the Pharmacies and Drug Stores. Done by: Shunwei Wang & Mohammad Zainal

MGT 267 PROJECT. Forecasting the United States Retail Sales of the Pharmacies and Drug Stores. Done by: Shunwei Wang & Mohammad Zainal MGT 267 PROJECT Forecasting the United States Retail Sales of the Pharmacies and Drug Stores Done by: Shunwei Wang & Mohammad Zainal Dec. 2002 The retail sale (Million) ABSTRACT The present study aims

More information

EDUCATION AND VOCABULARY MULTIPLE REGRESSION IN ACTION

EDUCATION AND VOCABULARY MULTIPLE REGRESSION IN ACTION EDUCATION AND VOCABULARY MULTIPLE REGRESSION IN ACTION EDUCATION AND VOCABULARY 5-10 hours of input weekly is enough to pick up a new language (Schiff & Myers, 1988). Dutch children spend 5.5 hours/day

More information

Confidence Intervals for Exponential Reliability

Confidence Intervals for Exponential Reliability Chapter 408 Confidence Intervals for Exponential Reliability Introduction This routine calculates the number of events needed to obtain a specified width of a confidence interval for the reliability (proportion

More information

Regression Analysis: A Complete Example

Regression Analysis: A Complete Example Regression Analysis: A Complete Example This section works out an example that includes all the topics we have discussed so far in this chapter. A complete example of regression analysis. PhotoDisc, Inc./Getty

More information

Multinomial and Ordinal Logistic Regression

Multinomial and Ordinal Logistic Regression Multinomial and Ordinal Logistic Regression ME104: Linear Regression Analysis Kenneth Benoit August 22, 2012 Regression with categorical dependent variables When the dependent variable is categorical,

More information

Parametric Survival Models

Parametric Survival Models Parametric Survival Models Germán Rodríguez grodri@princeton.edu Spring, 2001; revised Spring 2005, Summer 2010 We consider briefly the analysis of survival data when one is willing to assume a parametric

More information

Introduction to Survival Analysis

Introduction to Survival Analysis John Fox Lecture Notes Introduction to Survival Analysis Copyright 2014 by John Fox Introduction to Survival Analysis 1 1. Introduction I Survival analysis encompasses a wide variety of methods for analyzing

More information

Life Tables. Marie Diener-West, PhD Sukon Kanchanaraksa, PhD

Life Tables. Marie Diener-West, PhD Sukon Kanchanaraksa, PhD This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this

More information

Come scegliere un test statistico

Come scegliere un test statistico Come scegliere un test statistico Estratto dal Capitolo 37 of Intuitive Biostatistics (ISBN 0-19-508607-4) by Harvey Motulsky. Copyright 1995 by Oxfd University Press Inc. (disponibile in Iinternet) Table

More information

Time varying (or time-dependent) covariates

Time varying (or time-dependent) covariates Chapter 9 Time varying (or time-dependent) covariates References: Allison (*) p.138-153 Hosmer & Lemeshow Chapter 7, Section 3 Kalbfleisch & Prentice Section 5.3 Collett Chapter 7 Kleinbaum Chapter 6 Cox

More information

Duration Analysis. Econometric Analysis. Dr. Keshab Bhattarai. April 4, 2011. Hull Univ. Business School

Duration Analysis. Econometric Analysis. Dr. Keshab Bhattarai. April 4, 2011. Hull Univ. Business School Duration Analysis Econometric Analysis Dr. Keshab Bhattarai Hull Univ. Business School April 4, 2011 Dr. Bhattarai (Hull Univ. Business School) Duration April 4, 2011 1 / 27 What is Duration Analysis?

More information

Nominal and ordinal logistic regression

Nominal and ordinal logistic regression Nominal and ordinal logistic regression April 26 Nominal and ordinal logistic regression Our goal for today is to briefly go over ways to extend the logistic regression model to the case where the outcome

More information

Early mortality rate (EMR) in Acute Myeloid Leukemia (AML)

Early mortality rate (EMR) in Acute Myeloid Leukemia (AML) Early mortality rate (EMR) in Acute Myeloid Leukemia (AML) George Yaghmour, MD Hematology Oncology Fellow PGY5 UTHSC/West cancer Center, Memphis, TN May,1st,2015 Off-Label Use Disclosure(s) I do not intend

More information

SAS Software to Fit the Generalized Linear Model

SAS Software to Fit the Generalized Linear Model SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling

More information

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION HOD 2990 10 November 2010 Lecture Background This is a lightning speed summary of introductory statistical methods for senior undergraduate

More information

Gamma Distribution Fitting

Gamma Distribution Fitting Chapter 552 Gamma Distribution Fitting Introduction This module fits the gamma probability distributions to a complete or censored set of individual or grouped data values. It outputs various statistics

More information

Statistics 104 Final Project A Culture of Debt: A Study of Credit Card Spending in America TF: Kevin Rader Anonymous Students: LD, MH, IW, MY

Statistics 104 Final Project A Culture of Debt: A Study of Credit Card Spending in America TF: Kevin Rader Anonymous Students: LD, MH, IW, MY Statistics 104 Final Project A Culture of Debt: A Study of Credit Card Spending in America TF: Kevin Rader Anonymous Students: LD, MH, IW, MY ABSTRACT: This project attempted to determine the relationship

More information

The first three steps in a logistic regression analysis with examples in IBM SPSS. Steve Simon P.Mean Consulting www.pmean.com

The first three steps in a logistic regression analysis with examples in IBM SPSS. Steve Simon P.Mean Consulting www.pmean.com The first three steps in a logistic regression analysis with examples in IBM SPSS. Steve Simon P.Mean Consulting www.pmean.com 2. Why do I offer this webinar for free? I offer free statistics webinars

More information

SUGI 29 Statistics and Data Analysis

SUGI 29 Statistics and Data Analysis Paper 194-29 Head of the CLASS: Impress your colleagues with a superior understanding of the CLASS statement in PROC LOGISTIC Michelle L. Pritchard and David J. Pasta Ovation Research Group, San Francisco,

More information

STATISTICAL ANALYSIS OF SAFETY DATA IN LONG-TERM CLINICAL TRIALS

STATISTICAL ANALYSIS OF SAFETY DATA IN LONG-TERM CLINICAL TRIALS STATISTICAL ANALYSIS OF SAFETY DATA IN LONG-TERM CLINICAL TRIALS Tailiang Xie, Ping Zhao and Joel Waksman, Wyeth Consumer Healthcare Five Giralda Farms, Madison, NJ 794 KEY WORDS: Safety Data, Adverse

More information

Basic Statistical and Modeling Procedures Using SAS

Basic Statistical and Modeling Procedures Using SAS Basic Statistical and Modeling Procedures Using SAS One-Sample Tests The statistical procedures illustrated in this handout use two datasets. The first, Pulse, has information collected in a classroom

More information

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm Mgt 540 Research Methods Data Analysis 1 Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm http://web.utk.edu/~dap/random/order/start.htm

More information

Multivariate Logistic Regression

Multivariate Logistic Regression 1 Multivariate Logistic Regression As in univariate logistic regression, let π(x) represent the probability of an event that depends on p covariates or independent variables. Then, using an inv.logit formulation

More information

Statistics Graduate Courses

Statistics Graduate Courses Statistics Graduate Courses STAT 7002--Topics in Statistics-Biological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.

More information

Study Guide for the Final Exam

Study Guide for the Final Exam Study Guide for the Final Exam When studying, remember that the computational portion of the exam will only involve new material (covered after the second midterm), that material from Exam 1 will make

More information

Lesson 14 14 Outline Outline

Lesson 14 14 Outline Outline Lesson 14 Confidence Intervals of Odds Ratio and Relative Risk Lesson 14 Outline Lesson 14 covers Confidence Interval of an Odds Ratio Review of Odds Ratio Sampling distribution of OR on natural log scale

More information

Principles of Hypothesis Testing for Public Health

Principles of Hypothesis Testing for Public Health Principles of Hypothesis Testing for Public Health Laura Lee Johnson, Ph.D. Statistician National Center for Complementary and Alternative Medicine johnslau@mail.nih.gov Fall 2011 Answers to Questions

More information

Organizing Your Approach to a Data Analysis

Organizing Your Approach to a Data Analysis Biost/Stat 578 B: Data Analysis Emerson, September 29, 2003 Handout #1 Organizing Your Approach to a Data Analysis The general theme should be to maximize thinking about the data analysis and to minimize

More information

Life Data Analysis using the Weibull distribution

Life Data Analysis using the Weibull distribution RELIABILITY ENGINEERING Life Data Analysis using the Weibull distribution PLOT Seminar October 2008 Ing. Ronald Schop Weibull: Reliability Engineering www.weibull.nl Content Why Reliability Weibull Statistics

More information

Developing Business Failure Prediction Models Using SAS Software Oki Kim, Statistical Analytics

Developing Business Failure Prediction Models Using SAS Software Oki Kim, Statistical Analytics Paper SD-004 Developing Business Failure Prediction Models Using SAS Software Oki Kim, Statistical Analytics ABSTRACT The credit crisis of 2008 has changed the climate in the investment and finance industry.

More information

Please follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software

Please follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software STATA Tutorial Professor Erdinç Please follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software 1.Wald Test Wald Test is used

More information

Study Design and Statistical Analysis

Study Design and Statistical Analysis Study Design and Statistical Analysis Anny H Xiang, PhD Department of Preventive Medicine University of Southern California Outline Designing Clinical Research Studies Statistical Data Analysis Designing

More information

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics For 2015 Examinations Aim The aim of the Probability and Mathematical Statistics subject is to provide a grounding in

More information

Distribution (Weibull) Fitting

Distribution (Weibull) Fitting Chapter 550 Distribution (Weibull) Fitting Introduction This procedure estimates the parameters of the exponential, extreme value, logistic, log-logistic, lognormal, normal, and Weibull probability distributions

More information

List of Examples. Examples 319

List of Examples. Examples 319 Examples 319 List of Examples DiMaggio and Mantle. 6 Weed seeds. 6, 23, 37, 38 Vole reproduction. 7, 24, 37 Wooly bear caterpillar cocoons. 7 Homophone confusion and Alzheimer s disease. 8 Gear tooth strength.

More information

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics. Course Catalog In order to be assured that all prerequisites are met, students must acquire a permission number from the education coordinator prior to enrolling in any Biostatistics course. Courses are

More information

Examining a Fitted Logistic Model

Examining a Fitted Logistic Model STAT 536 Lecture 16 1 Examining a Fitted Logistic Model Deviance Test for Lack of Fit The data below describes the male birth fraction male births/total births over the years 1931 to 1990. A simple logistic

More information

SOLUTIONS TO BIOSTATISTICS PRACTICE PROBLEMS

SOLUTIONS TO BIOSTATISTICS PRACTICE PROBLEMS SOLUTIONS TO BIOSTATISTICS PRACTICE PROBLEMS BIOSTATISTICS DESCRIBING DATA, THE NORMAL DISTRIBUTION SOLUTIONS 1. a. To calculate the mean, we just add up all 7 values, and divide by 7. In Xi i= 1 fancy

More information

Modeling Customer Lifetime Value Using Survival Analysis An Application in the Telecommunications Industry

Modeling Customer Lifetime Value Using Survival Analysis An Application in the Telecommunications Industry Paper 12028 Modeling Customer Lifetime Value Using Survival Analysis An Application in the Telecommunications Industry Junxiang Lu, Ph.D. Overland Park, Kansas ABSTRACT Increasingly, companies are viewing

More information

Survey, Statistics and Psychometrics Core Research Facility University of Nebraska-Lincoln. Log-Rank Test for More Than Two Groups

Survey, Statistics and Psychometrics Core Research Facility University of Nebraska-Lincoln. Log-Rank Test for More Than Two Groups Survey, Statistics and Psychometrics Core Research Facility University of Nebraska-Lincoln Log-Rank Test for More Than Two Groups Prepared by Harlan Sayles (SRAM) Revised by Julia Soulakova (Statistics)

More information

A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution

A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 4: September

More information

Normality Testing in Excel

Normality Testing in Excel Normality Testing in Excel By Mark Harmon Copyright 2011 Mark Harmon No part of this publication may be reproduced or distributed without the express permission of the author. mark@excelmasterseries.com

More information

ASSIGNMENT 4 PREDICTIVE MODELING AND GAINS CHARTS

ASSIGNMENT 4 PREDICTIVE MODELING AND GAINS CHARTS DATABASE MARKETING Fall 2015, max 24 credits Dead line 15.10. ASSIGNMENT 4 PREDICTIVE MODELING AND GAINS CHARTS PART A Gains chart with excel Prepare a gains chart from the data in \\work\courses\e\27\e20100\ass4b.xls.

More information

An Application of Weibull Analysis to Determine Failure Rates in Automotive Components

An Application of Weibull Analysis to Determine Failure Rates in Automotive Components An Application of Weibull Analysis to Determine Failure Rates in Automotive Components Jingshu Wu, PhD, PE, Stephen McHenry, Jeffrey Quandt National Highway Traffic Safety Administration (NHTSA) U.S. Department

More information

VI. Introduction to Logistic Regression

VI. Introduction to Logistic Regression VI. Introduction to Logistic Regression We turn our attention now to the topic of modeling a categorical outcome as a function of (possibly) several factors. The framework of generalized linear models

More information

2013 MBA Jump Start Program. Statistics Module Part 3

2013 MBA Jump Start Program. Statistics Module Part 3 2013 MBA Jump Start Program Module 1: Statistics Thomas Gilbert Part 3 Statistics Module Part 3 Hypothesis Testing (Inference) Regressions 2 1 Making an Investment Decision A researcher in your firm just

More information

MSwM examples. Jose A. Sanchez-Espigares, Alberto Lopez-Moreno Dept. of Statistics and Operations Research UPC-BarcelonaTech.

MSwM examples. Jose A. Sanchez-Espigares, Alberto Lopez-Moreno Dept. of Statistics and Operations Research UPC-BarcelonaTech. MSwM examples Jose A. Sanchez-Espigares, Alberto Lopez-Moreno Dept. of Statistics and Operations Research UPC-BarcelonaTech February 24, 2014 Abstract Two examples are described to illustrate the use of

More information

Module 5: Multiple Regression Analysis

Module 5: Multiple Regression Analysis Using Statistical Data Using to Make Statistical Decisions: Data Multiple to Make Regression Decisions Analysis Page 1 Module 5: Multiple Regression Analysis Tom Ilvento, University of Delaware, College

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression A regression with two or more explanatory variables is called a multiple regression. Rather than modeling the mean response as a straight line, as in simple regression, it is

More information