Introduction to Event History Analysis DUSTIN BROWN POPULATION RESEARCH CENTER

Similar documents
Introduction. Survival Analysis. Censoring. Plan of Talk

SUMAN DUVVURU STAT 567 PROJECT REPORT

Predicting Customer Default Times using Survival Analysis Methods in SAS

The Cox Proportional Hazards Model

Statistics in Retail Finance. Chapter 6: Behavioural models

Developing Business Failure Prediction Models Using SAS Software Oki Kim, Statistical Analytics

An Application of the Cox Proportional Hazards Model to the Construction of Objective Vintages for Credit in Financial Institutions, Using PROC PHREG

ATV - Lifetime Data Analysis

Applying Survival Analysis Techniques to Loan Terminations for HUD s Reverse Mortgage Insurance Program - HECM

Lecture 15 Introduction to Survival Analysis

Topic 3 - Survival Analysis

7.1 The Hazard and Survival Functions

Survival analysis methods in Insurance Applications in car insurance contracts

Tips for surviving the analysis of survival data. Philip Twumasi-Ankrah, PhD

Parametric Survival Models

An Application of Weibull Analysis to Determine Failure Rates in Automotive Components

This can dilute the significance of a departure from the null hypothesis. We can focus the test on departures of a particular form.

A course on Longitudinal data analysis : what did we learn? COMPASS 11/03/2011

Modeling Customer Lifetime Value Using Survival Analysis An Application in the Telecommunications Industry

Lecture 2 ESTIMATING THE SURVIVAL FUNCTION. One-sample nonparametric methods

Predicting Customer Churn in the Telecommunications Industry An Application of Survival Analysis Modeling Using SAS

Regression Modeling Strategies

UNDERGRADUATE DEGREE DETAILS : BACHELOR OF SCIENCE WITH

Using An Ordered Logistic Regression Model with SAS Vartanian: SW 541

Modeling the Claim Duration of Income Protection Insurance Policyholders Using Parametric Mixture Models

Comparison of Survival Curves

Competing-risks regression

Duration Analysis. Econometric Analysis. Dr. Keshab Bhattarai. April 4, Hull Univ. Business School

Design and Analysis of Phase III Clinical Trials

Statistical Analysis of Life Insurance Policy Termination and Survivorship

Survival Analysis Approaches and New Developments using SAS. Jane Lu, AstraZeneca Pharmaceuticals, Wilmington, DE David Shen, Independent Consultant

SAS and R calculations for cause specific hazard ratios in a competing risks analysis with time dependent covariates

Name of the module: Multivariate biostatistics and SPSS Number of module:

Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression

Survey, Statistics and Psychometrics Core Research Facility University of Nebraska-Lincoln. Log-Rank Test for More Than Two Groups

Introduction to Survival Analysis

SPSS TRAINING SESSION 3 ADVANCED TOPICS (PASW STATISTICS 17.0) Sun Li Centre for Academic Computing lsun@smu.edu.sg

Survival Analysis, Software

Statistics Graduate Courses

Complex Survey Design Using Stata

BayesX - Software for Bayesian Inference in Structured Additive Regression

5 Modeling Survival Data with Parametric Regression

Applied Statistics. J. Blanchet and J. Wadsworth. Institute of Mathematics, Analysis, and Applications EPF Lausanne

VI. Introduction to Logistic Regression

Exam C, Fall 2006 PRELIMINARY ANSWER KEY

SAS Software to Fit the Generalized Linear Model

business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar

V. Kumar Andrew Petersen Instructor s Presentation Slides

Basic Statistical and Modeling Procedures Using SAS

Department/Academic Unit: Public Health Sciences Degree Program: Biostatistics Collaborative Program

BS3b Statistical Lifetime-Models

Survival Analysis of Left Truncated Income Protection Insurance Data. [March 29, 2012]

TABLE OF CONTENTS. GENERAL AND HISTORICAL PREFACE iii SIXTH EDITION PREFACE v PART ONE: REVIEW AND BACKGROUND MATERIAL

**BEGINNING OF EXAMINATION** The annual number of claims for an insured has probability function: , 0 < q < 1.

Sampling Error Estimation in Design-Based Analysis of the PSID Data

CHAPTER 12 EXAMPLES: MONTE CARLO SIMULATION STUDIES

STATISTICAL ANALYSIS OF SAFETY DATA IN LONG-TERM CLINICAL TRIALS

Statistical Rules of Thumb

Poisson Models for Count Data

An Application of the G-formula to Asbestos and Lung Cancer. Stephen R. Cole. Epidemiology, UNC Chapel Hill. Slides:

Using Stata for Categorical Data Analysis

SUGI 29 Statistics and Data Analysis

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.

Analysis of Survey Data Using the SAS SURVEY Procedures: A Primer

Metadata and ADaM.

STAT2400 STAT2400 STAT2400 STAT2400 STAT2400 STAT2400 STAT2400 STAT2400&3400 STAT2400&3400 STAT2400&3400 STAT2400&3400 STAT3400 STAT3400

Binary Logistic Regression

An Introduction to Survival Analysis

Survival Analysis And The Application Of Cox's Proportional Hazards Modeling Using SAS

Additional sources Compilation of sources:

Consider a study in which. How many subjects? The importance of sample size calculations. An insignificant effect: two possibilities.

Distance to Event vs. Propensity of Event A Survival Analysis vs. Logistic Regression Approach

13. Poisson Regression Analysis

Probability Calculator

Cool Tools for PROC LOGISTIC

Tests for Two Survival Curves Using Cox s Proportional Hazards Model

Time varying (or time-dependent) covariates

Checking proportionality for Cox s regression model

Beyond Age and Sex: Enhancing Annuity Pricing

Lecture 19: Conditional Logistic Regression

STATISTICS COURSES UNDERGRADUATE CERTIFICATE FACULTY. Explanation of Course Numbers. Bachelor's program. Master's programs.

Survival Distributions, Hazard Functions, Cumulative Hazards

Survival Analysis: Introduction

Nonparametric adaptive age replacement with a one-cycle criterion

Comparison of resampling method applied to censored data

Exam P - Total 23/

240ST014 - Data Analysis of Transport and Logistics

Master programme in Statistics

Unit 12 Logistic Regression Supplementary Chapter 14 in IPS On CD (Chap 16, 5th ed.)

Prevalence odds ratio or prevalence ratio in the analysis of cross sectional data: what is to be done?

Errata and updates for ASM Exam C/Exam 4 Manual (Sixteenth Edition) sorted by page

Advanced Statistical Analysis of Mortality. Rhodes, Thomas E. and Freitas, Stephen A. MIB, Inc. 160 University Avenue. Westwood, MA 02090

Transcription:

Introduction to Event History Analysis DUSTIN BROWN POPULATION RESEARCH CENTER

Objectives Introduce event history analysis Describe some common survival (hazard) distributions Introduce some useful Stata and SAS commands Discuss practical issues worth keeping in mind

Objectives Introduce event history analysis Describe some common survival (hazard) distributions Introduce some useful Stata and SAS commands Discuss practical issues worth keeping in mind

What is event history analysis? A set of statistical techniques used to analyze the time it takes an event to occur within a specified time interval Also called survival analysis (demography, biostatistics), reliability analysis (engineering), duration analysis (economics) The basic logic behind these methods is from the life table Types of Events Mortality, Marriage, Fertility, Recidivism, Graduation, Retirement, etc.

Basic Concepts in Event History Analysis Events Repeatable vs. Non-Repeatable Single vs. Multiple Exposure to Risk (i.e., Measuring Time) Risk Set Discrete vs. Continuous Time Censoring Hazard & Survival Functions Non-Parametric vs. Semi-Parametric vs. Parametric Distributional Assumptions (Proportionality)

Basic Concepts: Different Types of Events Event (the outcome) - A discrete transition between two states Non-Repeatable Events Transition can occur only once (absorbing state) Examples: AAAAA DDDD, NNNNNNNNNNN FFFFF BBBBB Repeatable Events Transition can occur more than once (non-absorbing state) Examples: MMMMMMM DDDDDDDD, HHHHHHH DDDDDDDD Single Events i.e., AAAAA DDDD Multiple Events i.e., AAAAA CCCCCC DDDDDD vv. CCC DDDDDD Methods for competing risks an extension of those for single events

Basic Concepts: Time (Exposure, Duration) Time is the core component of event history analysis Risk Set Individuals 1 at risk of experiencing some event Risk exposure occurs in an observation interval (study time) The observation interval is when the clock begins and ends One of two outcomes are possible in the observation interval Failure Event occurs in the interval (i.e., death) Censoring Event does not occur in the interval (i.e., survival) Time usually is measured in discrete units (i.e., years, months) Time theoretically can be measured in (quasi) continuous units (i.e., hours, minutes, seconds) 1 Note that the unit of analysis does not necessarily have to be individuals.

Basic Concepts: Time (Exposure, Duration) Source: Blossfeld & Rohwer. 2002. Techniques of Event History Modeling, 2 nd Ed. (p. 40)

Objectives Introduce event history analysis Describe some common survival (hazard) distributions Introduce some useful Stata and SAS commands Discuss a few practical issues worth keeping in mind

Basic Concepts: Hazard & Survival Functions Hazard Function Instantaneous probability 2 that an event will occur at time t, conditional that the event has not already occurred. λ t = lim t 0 P t + t > T t T t) t = f(t) S(t) Also called the Hazard Rate or the Force of Mortality With some additional math, you can get the Survival Function S t = eee λλ 2 Note that strictly speaking the hazard rate is a probability only in discrete-time models.

Basic Concepts: Hazard & Survival Functions Models impose different distributional assumptions on the hazard Three basic types of hazard (survival) functions are common Each one imposes different amounts of structure on the data The ultimate decision to use one approach over another should be driven by: Your specific research question How well the model fits the actual data Practical concerns i.e., difficulty estimating with available software, interpretability, typical approach in previous research

Basic Concepts: Hazard & Survival Functions Non-Parametric Models No assumptions about the baseline hazard distribution Pros: Imposes the least structure, easy to estimate and interpret Cons: Difficult to incorporate predictors (mostly descriptive) Examples: Kaplan-Meier, Nelson-Alan, Classic Life Table Parametric Models Baseline hazard assumed to vary in a specific manner with time Pros: Easy to incorporate covariates, gives baseline hazard to calculate rates, smoothes noisy data Cons: Imposes the most structure, need to be sure that estimated distribution matches the data Examples: Weibull (decrease or increase), Gompertz (exponential increase), Exponential (constant)

Basic Concepts: Hazard & Survival Functions Semi-Parametric Models Baseline hazard is not pre-determined, but it must be positive. Pros: Covariates easily incorporated, less structure than parametric, smoothes noisy data Cons: Does not provide the baseline hazard Cannot calculate rates (absolute differences) Can only interpret in terms of relative differentials Any specification errors are absorbed into the coefficients Examples: Cox Proportional Hazards (most popular model) Proportional Hazards Assumption The hazard rate is equivalent over time across groups Cox models must satisfy this assumption Some parametric models - Weibull, Gompertz, Exponential, etc.

Objectives Introduce event history analysis Describe some common survival (hazard) distributions Introduce some useful Stata and SAS commands Discuss practical issues worth keeping in mind

How To Estimate Hazard Models SAS lifereg (parametric models), phreg (Cox models), lifetest (Kaplan-Meier), other user-written macros available Stata 3 - streg (parametric models), stcox (Cox models), sts test ( Kaplan-Meier ), other specialty packages as.ado files R package survival (parametric and Cox models), KMsurv (Kaplan-Meier), other specialty packages frailpack, etc. SPSS coxreg (Cox models), km (Kaplan-Meier), no parametric models available? 3 You must "stset" the data before estimating survival models in Stata. Type "help st" for details.

Stata Example: Exponential Model

Stata Example: Cox PH Models

SAS Example: Cox Proportional Hazards Model proc phreg data = nhis ; model expos*dead(0) = female age ; run ; Parameter Analysis of Maximum Likelihood Estimates Parameter Standard DF Estimate Error Chi-Square Pr > ChiSq Hazard Ratio lths 1 0.67266 0.020010 1129.662 <.0001 1.959 hs 1 0.45942 0.019990 528.4468 <.0001 1.583 scol 1 0.35431 0.021880 262.1947 <.0001 1.425 female 1-0.44060 0.011830 1387.689 <.0001 0.644 age 1 0.08639 0.000428 40766.7 <.0001 1.090

Objectives Introduce event history analysis Describe some common survival (hazard) distributions Introduce some useful Stata and SAS commands Discuss a few practical issues worth keeping in mind

Other Issues: Data Structure The data structure has important substantive implications The models shown here were estimated on individual-level data Models estimated on person-period data can be used to answer other substantive questions. Easy to calculate various life table functions - central death rates (m x ), probabilities of death (q x ), etc. Easy to incorporate time-varying covariates (age, etc.)

Other Issues: Alternative Models Always test model assumptions, evaluate model fit, etc. Compare how well various models fit the data Fit statistics - BIC, AIC, etc. Fitted vs. Observed values Do the distributions overlap? Proportionality assumption (proportional hazards models) Other approaches often yield equivalent results Count models - Poisson models, Negative Binomial models, etc. Logistic Regression Similar to Cox models, especially when few observations are censored