# Lecture 2 ESTIMATING THE SURVIVAL FUNCTION. One-sample nonparametric methods

Save this PDF as:

Size: px
Start display at page:

Download "Lecture 2 ESTIMATING THE SURVIVAL FUNCTION. One-sample nonparametric methods"

## Transcription

1 Lecture 2 ESTIMATING THE SURVIVAL FUNCTION One-sample nonparametric methods There are commonly three methods for estimating a survivorship function S(t) = P (T > t) without resorting to parametric models: (1) Kaplan-Meier (2) Nelson-Aalen or Fleming-Harrington (via estimating the cumulative hazard) (3) Life-table (Actuarial Estimator) We will mainly consider the first two. 1

2 (1) The Kaplan-Meier Estimator The Kaplan-Meier (or KM) estimator is probably the most popular approach. Motivation (no censoring): Remission times (weeks) for 21 leukemia patients receiving control treatment (Table 1.1 of Cox & Oakes): 1, 1, 2, 2, 3, 4, 4, 5, 5, 8, 8, 8, 8, 11, 11, 12, 12, 15, 17, 22, 23 We estimate S(10), the probability that an individual survives to week 10 or later, by How would you calculate the standard error of the estimated survival? Ŝ(10) = ˆP (T > 10) = 8 21 = (Answer: se[ŝ(10)] = 0.106) What about Ŝ(8)? Is it or 8 21? 2

3 Values of t Ŝ(t) t < 1 21/21= t < 2 19/21= t < 3 17/21= t < 4 4 t < 5 5 t < 8 8 t < t < t < t < t < t < 23 In most software packages, the survival function is evaluated just after time t, i.e., at t +. In this case, we only count the individuals with T > t. 3

4 Survival Time Figure 1: Example for leukemia data (control arm) 4

5 Empirical Survival Function: When there is no censoring, the general formula is: # individuals with T > t n i=1 S n (t) = = I(T i > t) total sample size n Note that F n (t) = 1 S n (t) is the empirical CDF. Also I(T i > t) Bernoulli(S(t)), so that 1. S n (t) converges in probability to S(t) (consistency); 2. n{s n (t) S(t)} N(0, S(t)[1 S(t)]) in distribution. [Make sure that you know these.] 5

6 What if there is censoring? Consider the treated group from Table 1.1 of Cox and Oakes: 6, 6, 6, 6 +, 7, 9 +, 10, 10 +, 11 +, 13, 16, , 20 +, 22, 23, 25 +, 32 +, 32 +, 34 +, 35 + [Note: times with + are right censored] We know S(5)= 21/21, because everyone survived at least until week 5 or greater. But, we can t say S(7) = 17/21, because we don t know the status of the person who was censored at time 6. In a 1958 paper in the Journal of the American Statistical Association, Kaplan and Meier proposed a way to estimate S(t) nonparametrically, even in the presence of censoring. The method is based on the ideas of conditional probability. 6

7 [Reading:] A quick review of conditional probability Conditional Probability: Suppose A and B are two events. Then, P (A B) P (A B) = P (B) Multiplication law of probability: can be obtained from the above relationship, by multiplying both sides by P (B): P (A B) = P (A B) P (B) Extension to more than 2 events: Suppose A 1, A 2...A k are k different events. Then, the probability of all k events happening together can be written as a product of conditional probabilities: P (A 1 A 2... A k ) = P (A k A k 1... A 1 ) P (A k 1 A k 2... A 1 )... P (A 2 A 1 ) P (A 1 ) 7

8 Now, let s apply these ideas to estimate S(t): Intuition behind the Kaplan-Meier Estimator Think of dividing the observed timespan of the study into a series of fine intervals so that there is a separate interval for each time of death or censoring: D C C D D D Using the law of conditional probability, P (T > t) = j P (survive j-th interval I j survived to start of I j ) = j λ j where the product is taken over all the intervals preceding time t. 8

9 4 possibilities for each interval: (1) No death or censoring - conditional probability of surviving the interval is 1; (2) Censoring - assume they survive to the end of the interval (the intervals are very small), so that the conditional probability of surviving the interval is 1; (3) Death, but no censoring - conditional probability of not surviving the interval is # deaths (d) divided by # at risk (r) at the beginning of the interval. So the conditional probability of surviving the interval is 1 d/r; (4) Tied deaths and censoring - assume censorings last to the end of the interval, so that conditional probability of surviving the interval is still 1 d/r. General Formula for jth interval: It turns out we can write a general formula for the conditional probability of surviving the j-th interval that holds for all 4 cases: 1 d j r j 9

10 We could use the same approach by grouping the event times into intervals (say, one interval for each month), and then counting up the number of deaths (events) in each to estimate the probability of surviving the interval (this is called the lifetable estimate). However, the assumption that those censored last until the end of the interval wouldn t be quite accurate, so we would end up with a cruder approximation. Here as the intervals get finer and finer, the approximations made in estimating the probabilities of getting through each interval become more and more accurate, at the end the estimator converges to the true S(t) in probability (proof not shown here). This intuition explains why an alternative name for the KM is the product-limit estimator. 10

11 The Kaplan-Meier estimator of the survivorship function (or survival probability) S(t) = P (T > t) is: Ŝ(t) = j:τ j t = j:τ j t r j d j r j ( 1 d j r j ) where τ 1,...τ K is the set of K distinct uncensored failure times observed in the sample d j is the number of failures at τ j r j is the number of individuals at risk right before the j-th failure time (everyone who died or censored at or after that time). Furthermore, let c j be the number of censored observations between the j-th and (j + 1)-st failure times. Any censoring tied at τ j are included in c j, but not censorings tied at τ j+1. Note: two useful formulas are: (1) r j = r j 1 d j 1 c j 1 (2) r j = l j (c l + d l ) 11

12 Calculating the KM - leukemia treated group Make a table with a row for every death or censoring time: τ j d j c j r j 1 (d j /r j ) Ŝ(τ j ) = Note that: Ŝ(t) only changes at death (failure) times; Ŝ(t) is 1 up to the first death time; Ŝ(t) only goes to 0 if the last observation is uncensored; When there is no censoring, the KM estimator equals the empirical survival estimate 12

13 Survival Time Figure 2: KM plot for treated leukemia patients Output from a software KM Estimator: failure time: failure/censor: weeks remiss Beg. Net Survivor Std. Time Total Fail Lost Function Error [95% Conf. Int.]

14 (Note: the above is from Stata, but most software output similar KM tables.) 14

15 [Reading:] Redistribution to the right algorithm (Efron, 1967) There is another way to compute the KM estimator. In the absence of censoring, Ŝ(t) is just the proportion of individuals with T > t. The idea behind Efron s approach is to spread the contributions of censored observations out over all the possible times to their right. Algorithm: Step (1): arrange the n observation times (failures or censorings) in increasing order. If there are ties, put censored after failures. Step (2): Assign weight (1/n) to each time. Step (3): Moving from left to right, each time you encounter a censored observation, distribute its mass to all times to its right. Step (4): Calculate Ŝj by subtracting the final weight for time j from Ŝj 1 15

16 Example of redistribute to the right algorithm Consider the following event times: 2, 2.5+, 3, 3, 4, 4.5+, 5, 6, 7 The algorithm goes as follows: (Step 1) (Step 4) Times Step 2 Step 3a Step 3b Ŝ(τ j ) 2 1/9= /9= /9= /9= /9= /9= /9= /9= This comes out the same as the product-limit approach. 16

17 Properties of the KM estimator When there is no censoring, KM estimator is the same as the empirical estimator: # deaths after time t Ŝ(t) = n where n is the number of individuals in the study. As said before Ŝ(t) asymp. N (S(t), S(t)[1 S(t)]/n) How does censoring affect this? Ŝ(t) is still consistent for the true S(t); Ŝ(t) is still asymptotically normal; The variance is more complicated. The proofs can be done using the usual method (by writing as sum of i.i.d. terms plus o p (1)) but it is laborious, or it can be done using counting processes which was considered more elegant, or by empirical processes method which is very powerful for semiparametric inferences. 17

18 The KM estimator is also an MLE You can read in Cox and Oakes book Section 4.2. Here we need to think of the distribution function F (t) as an (infinite dimensional) parameter, and we try to find the ˆF (or Ŝ = 1 ˆF ) that maximizes a nonparametric likelihood. Such a MLE is called a NPMLE. As it turns out, such a ˆF (t) has to be discrete (in order to for the likelihood to be bounded), with masses only at the uncensored failure times {a j } j. Cox and Oakes book Section 4.2 (please read if you have time) shows that the right-censored data likelihood for such a discrete distribution can be written as L(λ) = g j=1 λ d j j (1 λ j) r j d j where λ j is the discrete hazard at a j. Note that the likelihood is the same as that of g independent binomials. Therefore, the maximum likelihood estimator of λ j is (why): ˆλ j = d j /r j 18

19 For a discrete survival distribution S(t) = (1 λ j ) j:a j t Now we plug in the MLE s of λ to estimate S(t) (why): Ŝ(t) = (1 ˆλ j ) This is the NPMLE of S. j:a j t = j:a j t ( 1 d ) j r j 19

20 Greenwood s formula for variance Note that the KM estimator is Ŝ(t) = where ˆλ j = d j /r j. j:τ j t (1 ˆλ j ) Since the ˆλ j s are just binomial proportions given r j s, then Var(ˆλ j ) ˆλ j (1 ˆλ j ) r j Also, the ˆλ j s are asymptotically independent. Since Ŝ(t) is a function of the λ j s, we can estimate its variance using the Delta method. Delta method: If Y n is (asymptotically) normal with mean µ and variance σ 2, g is differentiable and g (µ) 0, then g(y n ) is approximately normally distributed with mean g(µ) and variance [g (µ)] 2 σ 2. 20

21 Greenwood s formula (continued) Instead of dealing with Ŝ(t) directly, we will look at its log (why?): log[ŝ(t)] = log(1 ˆλ j ) j:τ j t Thus, by approximate independence of the ˆλ j s, Var(log[Ŝ(t)]) = Var[log(1 ˆλ j )] j:τ j t = ( ) 2 1 j:τ j t 1 ˆλ Var(ˆλj ) j = ( )2 1 ˆλj (1 ˆλ j ) 1 ˆλ j r j j:τ j t = j:τ j t = j:τ j t ˆλ j (1 ˆλ j )r j d j (r j d j )r j Now, Ŝ(t) = exp[log[ŝ(t)]]. Using Delta method once again, Var(Ŝ(t)) = [Ŝ(t)]2 Var [log[ŝ(t)] ] 21

22 Greenwood s Formula: Var(Ŝ(t)) = [Ŝ(t)]2 j:τ j t d j (r j d j )r j 22

23 Confidence intervals For a 95% confidence interval, we could use Ŝ(t) ± z 1 α/2 se[ŝ(t)] where se[ŝ(t)] is calculated using Greenwood s formula. Here t is fixed, and this is refered to as pointwise confidence interval. Problem: This approach can yield values > 1 or < 0. Better approach: Get a 95% confidence interval for L(t) = log( log(s(t))) Since this quantity is unrestricted, the confidence interval will be in the right range when we transform back: S(t) = exp( exp(l(t)). [ To see why this works: 0 Ŝ(t) 1 log[ŝ(t)] 0 0 log[ŝ(t)] ] log [ log[ŝ(t)] ] 23

24 Log-log Approach for Confidence Intervals: (1) Define L(t) = log( log(s(t))) (2) Form a 95% confidence interval for L(t) based on ˆL(t), yielding [ ˆL(t) A, ˆL(t) + A] (3) Since S(t) = exp( exp(l(t)), the confidence bounds for the 95% CI of S(t) are: [ ] exp{ eˆl(t)+a }, exp{ eˆl(t) A } (note that the upper and lower bounds switch) (4) Substituting ˆL(t) = log( log(ŝ(t))) back into the above bounds, we get confidence bounds of ([Ŝ(t)]eA, [Ŝ(t)]e A) 24

25 What is A? A is 1.96 se( ˆL(t)) To calculate this, we need to calculate Var( ˆL(t)) ] = Var [log( log(ŝ(t))) From our previous calculations, we know Var(log[Ŝ(t)]) = d j (r j d j )r j j:τ j t Applying the delta method again, we get: Var( ˆL(t)) = = Var(log( log[ŝ(t)])) 1 d j [log Ŝ(t)]2 (r j d j )r j j:τ j t We take the square root of the above to get se( ˆL(t)), 25

26 Survival Time Figure 3: KM Survival Estimate and Confidence intervals (type log-log ) Different software might use different approaches to calculate the CI. Software for Kaplan-Meier Curves Stata - stset and sts commands SAS - proc lifetest R - survfit() in survival package 26

27 R allows different types of CI (some are truncated to be between 0 and 1): 95 percent confidence interval is of type "log" time n.risk n.event survival std.dev lower 95% CI upper 95% CI percent confidence interval is of type "log-log" time n.risk n.event survival std.dev lower 95% CI upper 95% CI percent confidence interval is of type "plain" time n.risk n.event survival std.dev lower 95% CI upper 95% CI

28 Mean, Median, Quantiles based on the KM Mean is not well estimated with censored data, since we often don t observe the right tail. Median - by definition, this is the time, τ, such that S(τ) = 0.5. In practice, it is often defined as the smallest time such that Ŝ(τ) 0.5. The median is more appropriate for censored survival data than the mean. For the treated leukemia patients, we find: The median is thus 23. Ŝ(22) = Ŝ(23) = Lower quartile (25 th percentile): the smallest time (LQ) such that Ŝ(LQ) 0.75 Upper quartile (75 th percentile): the smallest time (UQ) such that Ŝ(UQ)

29 Left truncated KM estimate Suppose that there is left truncation, so the observed data is (Q i, X i, δ i ), i = 1,..., n. Now the risk set at any time t consists of subjects who have entered the study, and have not failed or been censored by that time, i.e. {i : Q i < t X i }. So r j = n i=1 I(Q i < τ j X i ). We still have Ŝ(t) = j:τ j t ( 1 d ) j r j where τ 1,...τ K is the set of K distinct uncensored failure times observed in the sample, d j is the number of failures at τ j, and r j is the number of individuals at risk right before the j-th failure time (everyone who had entered and who died or censored at or after that time). When min n i=1 Q i = t 0 > 0, then the KM estimates P (T > t T > t 0 ). 29

30 The left truncated KM is still an NPMLE (Wang, 1991). The Greenwood s formula for variance still applies. In R (and most other softwares) it is handled by something like Surv(time=Q, time2=x, event=δ). Read [required] Tsai, Jewell and Wang (1987). 30

31 (2) Nelson-Aalen (Fleming-Harrington) estimator Estimating the cumulative hazard If we can estimate Λ(t) = t 0 λ(u)du, the cumulative hazard at time t, then we can estimate S(t) = e Λ(t). Just as we did for the KM, think of dividing the observed time span of the study into a series of fine intervals so that there is only one event per interval: D C C D D D Λ(t) can then be approximated by a sum: Λ(t) λ j j j:τ j t where the sum is over intervals up to t, λ j is the value of the hazard in the j-th interval and j is the width of that interval. Since λ is approximately the conditional probability of dying in the interval, we can further estimate λ j j by d j /r j. 31

32 This gives the Nelson-Aalen estimator: ˆΛ NA (t) = d j /r j. j:τ j t It follows that ˆΛ(t), like the KM, changes only at the observed death (event) times. Example: D C C D D D r j n n n n-1 n-1 n-2 n-2 n-3 n-4 d j c j ˆλ(t j ) 0 0 1/n n 3 1 n 4 ˆΛ(t j ) 0 0 1/n 1/n 1/n 1/n 1/n Once we have ˆΛ NA (t), we can obtain the Fleming-Harrington estimator of S(t): Ŝ F H (t) = exp( ˆΛ NA (t)). 32

33 In general, the FH estimator of the survival function should be close to the Kaplan-Meier estimator, ŜKM(t). We can compare the Fleming-Harrington survival estimate to the KM estimate using a subgroup of the nursing home data: skm sfh In this example, it looks like the Fleming-Harrington estimator is slightly higher than the KM at every time point, but with larger datasets the two will typically be much closer. Question: do you think that the two estimators are asymptotically equivalent? 33

34 Note: We can also go the other way: we can take the Kaplan-Meier estimate of S(t), and use it to calculate an alternative estimate of the cumulative hazard function: ˆΛ KM (t) = log ŜKM(t) 34

35 [Reading:] (3) The Lifetable or Actuarial Estimator one of the oldest techniques around used by actuaries, demographers, etc. applies when the data are grouped Our goal is still to estimate the survival function, hazard, and density function, but this is sometimes complicated by the fact that we don t know exactly when during a time interval an event occurs. 35

36 There are several types of lifetable methods according to the data sources: Population Life Tables cohort life table - describes the mortality experience from birth to death for a particular cohort of people born at about the same time. People at risk at the start of the interval are those who survived the previous interval. current life table - constructed from (1) census information on the number of individuals alive at each age, for a given year and (2) vital statistics on the number of deaths or failures in a given year, by age. This type of lifetable is often reported in terms of a hypothetical cohort of 100,000 people. Generally, censoring is not an issue for Population Life Tables. Clinical Life tables - applies to grouped survival data from studies in patients with specific diseases. Because patients can enter the study at different times, or be lost to follow-up, censoring must be allowed. 36

37 Notation the j-th time interval is [t j 1, t j ) c j - the number of censorings in the j-th interval d j - the number of failures in the j-th interval r j is the number entering the interval Example: 2418 Males with Angina Pectoris (chest pain, from book by Lee, p.91) Year after Diagnosis j d j c j r j r j = r j c j /2 [0, 1) [1, 2) ( ) [2, 3) [3, 4) [4, 5) [5, 6) [6, 7) etc.. 37

38 Estimating the survivorship function If we apply the KM formula directly to the numbers in the table on the previous page, estimating S(t) as Ŝ(t) = ( 1 d ) j, r j j:τ j <t this approach is unsatisfactory for grouped data because it treats the problem as though it were in discrete time, with events happening only at 1 yr, 2 yr, etc. In fact, we should try to calculate the conditional probability of dying within the interval, given survival to the beginning of it. What should we do with the censored subjects? Let r j denote the effective number of subjects at risk. If we assume that censorings occur: at the beginning of each interval: r j = r j c j at the end of each interval: r j = r j on average halfway through the interval: r j = r j c j /2 The last assumption yields the Actuarial Estimator. It is appropriate if censorings occur uniformly throughout the interval. 38

39 Constructing the lifetable First, some additional notation for the j-th interval, [t j 1, t j ): Midpoint (t mj ) - useful for plotting the density and the hazard function Width (b j = t j t j 1 ) needed for calculating the hazard in the j-th interval Quantities estimated: Conditional probability of dying (event) ˆq j = d j /r j Conditional probability of surviving ˆp j = 1 ˆq j Cumulative probability of surviving at t j : Ŝ(t j ) = l j = l j ˆp l ( 1 d ) l r l 39

40 Other quantities estimated at the midpoint of the j-th interval: Hazard in the j-th interval (why) ˆλ(t mj ) = = d j b j (r j d j/2) ˆq j b j (1 ˆq j /2) density at the midpoint of the j-th interval (why) ˆf(t mj ) = Ŝ(t j 1) Ŝ(t j) b j = Ŝ(t j 1) ˆq j b j Note: Another way to get this is: ˆf(t mj ) = ˆλ(t mj )Ŝ(t mj) = ˆλ(t mj )[Ŝ(t j) + Ŝ(t j 1)]/2 40

41 E s t i m a t e d S u r v i v a l L o w e r L i m i t o f T i m e I n t e r v a l Figure 4: Life table estimate of survival 41

42 E s t i m a t e d h a z a r d L o w e r L i m i t o f T i m e I n t e r v a l Figure 5: Estimated discrete hazard 42

### Survival Analysis: Introduction

Survival Analysis: Introduction Survival Analysis typically focuses on time to event data. In the most general sense, it consists of techniques for positivevalued random variables, such as time to death

### Lecture 15 Introduction to Survival Analysis

Lecture 15 Introduction to Survival Analysis BIOST 515 February 26, 2004 BIOST 515, Lecture 15 Background In logistic regression, we were interested in studying how risk factors were associated with presence

### 2 Right Censoring and Kaplan-Meier Estimator

2 Right Censoring and Kaplan-Meier Estimator In biomedical applications, especially in clinical trials, two important issues arise when studying time to event data (we will assume the event to be death.

### Gamma Distribution Fitting

Chapter 552 Gamma Distribution Fitting Introduction This module fits the gamma probability distributions to a complete or censored set of individual or grouped data values. It outputs various statistics

### Estimating survival functions has interested statisticians for numerous years.

ZHAO, GUOLIN, M.A. Nonparametric and Parametric Survival Analysis of Censored Data with Possible Violation of Method Assumptions. (2008) Directed by Dr. Kirsten Doehler. 55pp. Estimating survival functions

### Introduction. Survival Analysis. Censoring. Plan of Talk

Survival Analysis Mark Lunt Arthritis Research UK Centre for Excellence in Epidemiology University of Manchester 01/12/2015 Survival Analysis is concerned with the length of time before an event occurs.

### Non-Parametric Estimation in Survival Models

Non-Parametric Estimation in Survival Models Germán Rodríguez grodri@princeton.edu Spring, 2001; revised Spring 2005 We now discuss the analysis of survival data without parametric assumptions about the

### Competing Risks in Survival Analysis

Competing Risks in Survival Analysis So far, we ve assumed that there is only one survival endpoint of interest, and that censoring is independent of the event of interest. However, in many contexts it

### Errata and updates for ASM Exam C/Exam 4 Manual (Sixteenth Edition) sorted by page

Errata for ASM Exam C/4 Study Manual (Sixteenth Edition) Sorted by Page 1 Errata and updates for ASM Exam C/Exam 4 Manual (Sixteenth Edition) sorted by page Practice exam 1:9, 1:22, 1:29, 9:5, and 10:8

### Comparison of Survival Curves

Comparison of Survival Curves We spent the last class looking at some nonparametric approaches for estimating the survival function, Ŝ(t), over time for a single sample of individuals. Now we want to compare

### Survival Distributions, Hazard Functions, Cumulative Hazards

Week 1 Survival Distributions, Hazard Functions, Cumulative Hazards 1.1 Definitions: The goals of this unit are to introduce notation, discuss ways of probabilistically describing the distribution of a

### Survival Analysis of Left Truncated Income Protection Insurance Data. [March 29, 2012]

Survival Analysis of Left Truncated Income Protection Insurance Data [March 29, 2012] 1 Qing Liu 2 David Pitt 3 Yan Wang 4 Xueyuan Wu Abstract One of the main characteristics of Income Protection Insurance

### Distribution (Weibull) Fitting

Chapter 550 Distribution (Weibull) Fitting Introduction This procedure estimates the parameters of the exponential, extreme value, logistic, log-logistic, lognormal, normal, and Weibull probability distributions

### Biostatistics: DESCRIPTIVE STATISTICS: 2, VARIABILITY

Biostatistics: DESCRIPTIVE STATISTICS: 2, VARIABILITY 1. Introduction Besides arriving at an appropriate expression of an average or consensus value for observations of a population, it is important to

### Introduction to Survival Analysis

Introduction to Survival Analysis Introduction to Survival Analysis CF Jeff Lin, MD., PhD. December 27, 2005 c Jeff Lin, MD., PhD. c Jeff Lin, MD., PhD. Introduction to Survival Analysis, 1 Introduction

### R 2 -type Curves for Dynamic Predictions from Joint Longitudinal-Survival Models

Faculty of Health Sciences R 2 -type Curves for Dynamic Predictions from Joint Longitudinal-Survival Models Inference & application to prediction of kidney graft failure Paul Blanche joint work with M-C.

### More details on the inputs, functionality, and output can be found below.

Overview: The SMEEACT (Software for More Efficient, Ethical, and Affordable Clinical Trials) web interface (http://research.mdacc.tmc.edu/smeeactweb) implements a single analysis of a two-armed trial comparing

### Exam C, Fall 2006 PRELIMINARY ANSWER KEY

Exam C, Fall 2006 PRELIMINARY ANSWER KEY Question # Answer Question # Answer 1 E 19 B 2 D 20 D 3 B 21 A 4 C 22 A 5 A 23 E 6 D 24 E 7 B 25 D 8 C 26 A 9 E 27 C 10 D 28 C 11 E 29 C 12 B 30 B 13 C 31 C 14

### SUMAN DUVVURU STAT 567 PROJECT REPORT

SUMAN DUVVURU STAT 567 PROJECT REPORT SURVIVAL ANALYSIS OF HEROIN ADDICTS Background and introduction: Current illicit drug use among teens is continuing to increase in many countries around the world.

### Life Tables. Marie Diener-West, PhD Sukon Kanchanaraksa, PhD

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this

### Survival Analysis Using SPSS. By Hui Bian Office for Faculty Excellence

Survival Analysis Using SPSS By Hui Bian Office for Faculty Excellence Survival analysis What is survival analysis Event history analysis Time series analysis When use survival analysis Research interest

### Lecture 4 PARAMETRIC SURVIVAL MODELS

Lecture 4 PARAMETRIC SURVIVAL MODELS Some Parametric Survival Distributions (defined on t 0): The Exponential distribution (1 parameter) f(t) = λe λt (λ > 0) S(t) = t = e λt f(u)du λ(t) = f(t) S(t) = λ

### Linda Staub & Alexandros Gekenidis

Seminar in Statistics: Survival Analysis Chapter 2 Kaplan-Meier Survival Curves and the Log- Rank Test Linda Staub & Alexandros Gekenidis March 7th, 2011 1 Review Outcome variable of interest: time until

### F. Farrokhyar, MPhil, PhD, PDoc

Learning objectives Descriptive Statistics F. Farrokhyar, MPhil, PhD, PDoc To recognize different types of variables To learn how to appropriately explore your data How to display data using graphs How

### 200609 - ATV - Lifetime Data Analysis

Coordinating unit: Teaching unit: Academic year: Degree: ECTS credits: 2015 200 - FME - School of Mathematics and Statistics 715 - EIO - Department of Statistics and Operations Research 1004 - UB - (ENG)Universitat

### Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur

Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur Module No. #01 Lecture No. #15 Special Distributions-VI Today, I am going to introduce

### Two-Sample T-Tests Assuming Equal Variance (Enter Means)

Chapter 4 Two-Sample T-Tests Assuming Equal Variance (Enter Means) Introduction This procedure provides sample size and power calculations for one- or two-sided two-sample t-tests when the variances of

### 7.1 The Hazard and Survival Functions

Chapter 7 Survival Models Our final chapter concerns models for the analysis of data which have three main characteristics: (1) the dependent variable or response is the waiting time until the occurrence

### Two-Sample T-Tests Allowing Unequal Variance (Enter Difference)

Chapter 45 Two-Sample T-Tests Allowing Unequal Variance (Enter Difference) Introduction This procedure provides sample size and power calculations for one- or two-sided two-sample t-tests when no assumption

### Survey, Statistics and Psychometrics Core Research Facility University of Nebraska-Lincoln. Log-Rank Test for More Than Two Groups

Survey, Statistics and Psychometrics Core Research Facility University of Nebraska-Lincoln Log-Rank Test for More Than Two Groups Prepared by Harlan Sayles (SRAM) Revised by Julia Soulakova (Statistics)

### Topic 3 - Survival Analysis

Topic 3 - Survival Analysis 1. Topics...2 2. Learning objectives...3 3. Grouped survival data - leukemia example...4 3.1 Cohort survival data schematic...5 3.2 Tabulation of events and time at risk...6

### Maximum Likelihood Estimation

Math 541: Statistical Theory II Lecturer: Songfeng Zheng Maximum Likelihood Estimation 1 Maximum Likelihood Estimation Maximum likelihood is a relatively simple method of constructing an estimator for

### 6.4 Normal Distribution

Contents 6.4 Normal Distribution....................... 381 6.4.1 Characteristics of the Normal Distribution....... 381 6.4.2 The Standardized Normal Distribution......... 385 6.4.3 Meaning of Areas under

### Censoring and truncation

Censoring and truncation ST3242: Introduction to Survival Analysis Alex Cook August 2008 ST3242 : Censoring and truncation 1/22 Definitions Censoring Censoring is when an observation is incomplete due

### Lecture 02: Statistical Inference for Binomial Parameters

Lecture 02: Statistical Inference for Binomial Parameters Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University

### Descriptive Statistics. Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion

Descriptive Statistics Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion Statistics as a Tool for LIS Research Importance of statistics in research

### Statistical Analysis of Life Insurance Policy Termination and Survivorship

Statistical Analysis of Life Insurance Policy Termination and Survivorship Emiliano A. Valdez, PhD, FSA Michigan State University joint work with J. Vadiveloo and U. Dias Session ES82 (Statistics in Actuarial

### Dongfeng Li. Autumn 2010

Autumn 2010 Chapter Contents Some statistics background; ; Comparing means and proportions; variance. Students should master the basic concepts, descriptive statistics measures and graphs, basic hypothesis

### Sudhanshu K. Ghoshal, Ischemia Research Institute, San Francisco, CA

ABSTRACT: Multivariable Cox Proportional Hazard Model by SAS PHREG & Validation by Bootstrapping Using SAS Randomizers With Applications Involving Electrocardiology and Organ Transplants Sudhanshu K. Ghoshal,

### 4. Continuous Random Variables, the Pareto and Normal Distributions

4. Continuous Random Variables, the Pareto and Normal Distributions A continuous random variable X can take any value in a given range (e.g. height, weight, age). The distribution of a continuous random

### Comparison of resampling method applied to censored data

International Journal of Advanced Statistics and Probability, 2 (2) (2014) 48-55 c Science Publishing Corporation www.sciencepubco.com/index.php/ijasp doi: 10.14419/ijasp.v2i2.2291 Research Paper Comparison

### DATA INTERPRETATION AND STATISTICS

PholC60 September 001 DATA INTERPRETATION AND STATISTICS Books A easy and systematic introductory text is Essentials of Medical Statistics by Betty Kirkwood, published by Blackwell at about 14. DESCRIPTIVE

### 5.1 Identifying the Target Parameter

University of California, Davis Department of Statistics Summer Session II Statistics 13 August 20, 2012 Date of latest update: August 20 Lecture 5: Estimation with Confidence intervals 5.1 Identifying

### Statistics 104: Section 6!

Page 1 Statistics 104: Section 6! TF: Deirdre (say: Dear-dra) Bloome Email: dbloome@fas.harvard.edu Section Times Thursday 2pm-3pm in SC 109, Thursday 5pm-6pm in SC 705 Office Hours: Thursday 6pm-7pm SC

### Life Table Analysis using Weighted Survey Data

Life Table Analysis using Weighted Survey Data James G. Booth and Thomas A. Hirschl June 2005 Abstract Formulas for constructing valid pointwise confidence bands for survival distributions, estimated using

### Interpretation of Somers D under four simple models

Interpretation of Somers D under four simple models Roger B. Newson 03 September, 04 Introduction Somers D is an ordinal measure of association introduced by Somers (96)[9]. It can be defined in terms

### NCSS Statistical Software

Chapter 06 Introduction This procedure provides several reports for the comparison of two distributions, including confidence intervals for the difference in means, two-sample t-tests, the z-test, the

### REGRESSION LINES IN STATA

REGRESSION LINES IN STATA THOMAS ELLIOTT 1. Introduction to Regression Regression analysis is about eploring linear relationships between a dependent variable and one or more independent variables. Regression

### Exploratory Data Analysis

Exploratory Data Analysis Johannes Schauer johannes.schauer@tugraz.at Institute of Statistics Graz University of Technology Steyrergasse 17/IV, 8010 Graz www.statistics.tugraz.at February 12, 2008 Introduction

### Competing-risks regression

Competing-risks regression Roberto G. Gutierrez Director of Statistics StataCorp LP Stata Conference Boston 2010 R. Gutierrez (StataCorp) Competing-risks regression July 15-16, 2010 1 / 26 Outline 1. Overview

### Multiple random variables

Multiple random variables Multiple random variables We essentially always consider multiple random variables at once. The key concepts: Joint, conditional and marginal distributions, and independence of

### SOCIETY OF ACTUARIES/CASUALTY ACTUARIAL SOCIETY EXAM C CONSTRUCTION AND EVALUATION OF ACTUARIAL MODELS EXAM C SAMPLE QUESTIONS

SOCIETY OF ACTUARIES/CASUALTY ACTUARIAL SOCIETY EXAM C CONSTRUCTION AND EVALUATION OF ACTUARIAL MODELS EXAM C SAMPLE QUESTIONS Copyright 005 by the Society of Actuaries and the Casualty Actuarial Society

### 9-3.4 Likelihood ratio test. Neyman-Pearson lemma

9-3.4 Likelihood ratio test Neyman-Pearson lemma 9-1 Hypothesis Testing 9-1.1 Statistical Hypotheses Statistical hypothesis testing and confidence interval estimation of parameters are the fundamental

### Introduction to Event History Analysis DUSTIN BROWN POPULATION RESEARCH CENTER

Introduction to Event History Analysis DUSTIN BROWN POPULATION RESEARCH CENTER Objectives Introduce event history analysis Describe some common survival (hazard) distributions Introduce some useful Stata

### Nonparametric Statistics

Nonparametric Statistics References Some good references for the topics in this course are 1. Higgins, James (2004), Introduction to Nonparametric Statistics 2. Hollander and Wolfe, (1999), Nonparametric

### CALCULATIONS & STATISTICS

CALCULATIONS & STATISTICS CALCULATION OF SCORES Conversion of 1-5 scale to 0-100 scores When you look at your report, you will notice that the scores are reported on a 0-100 scale, even though respondents

### Design and Analysis of Equivalence Clinical Trials Via the SAS System

Design and Analysis of Equivalence Clinical Trials Via the SAS System Pamela J. Atherton Skaff, Jeff A. Sloan, Mayo Clinic, Rochester, MN 55905 ABSTRACT An equivalence clinical trial typically is conducted

### Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Summary of Formulas and Concepts Descriptive Statistics (Ch. 1-4) Definitions Population: The complete set of numerical information on a particular quantity in which an investigator is interested. We assume

### Introduction to survival analysis

Chapter Introduction to survival analysis. Introduction In this chapter we examine another method used in the analysis of intervention trials, cohort studies and data routinely collected by hospital and

### MBA 611 STATISTICS AND QUANTITATIVE METHODS

MBA 611 STATISTICS AND QUANTITATIVE METHODS Part I. Review of Basic Statistics (Chapters 1-11) A. Introduction (Chapter 1) Uncertainty: Decisions are often based on incomplete information from uncertain

### 1 SAMPLE SIGN TEST. Non-Parametric Univariate Tests: 1 Sample Sign Test 1. A non-parametric equivalent of the 1 SAMPLE T-TEST.

Non-Parametric Univariate Tests: 1 Sample Sign Test 1 1 SAMPLE SIGN TEST A non-parametric equivalent of the 1 SAMPLE T-TEST. ASSUMPTIONS: Data is non-normally distributed, even after log transforming.

### A frequency distribution is a table used to describe a data set. A frequency table lists intervals or ranges of data values called data classes

A frequency distribution is a table used to describe a data set. A frequency table lists intervals or ranges of data values called data classes together with the number of data values from the set that

### Normal distribution. ) 2 /2σ. 2π σ

Normal distribution The normal distribution is the most widely known and used of all distributions. Because the normal distribution approximates many natural phenomena so well, it has developed into a

### MAT140: Applied Statistical Methods Summary of Calculating Confidence Intervals and Sample Sizes for Estimating Parameters

MAT140: Applied Statistical Methods Summary of Calculating Confidence Intervals and Sample Sizes for Estimating Parameters Inferences about a population parameter can be made using sample statistics for

### 13.2 Measures of Central Tendency

13.2 Measures of Central Tendency Measures of Central Tendency For a given set of numbers, it may be desirable to have a single number to serve as a kind of representative value around which all the numbers

### Generalized Linear Models. Today: definition of GLM, maximum likelihood estimation. Involves choice of a link function (systematic component)

Generalized Linear Models Last time: definition of exponential family, derivation of mean and variance (memorize) Today: definition of GLM, maximum likelihood estimation Include predictors x i through

### Checking proportionality for Cox s regression model

Checking proportionality for Cox s regression model by Hui Hong Zhang Thesis for the degree of Master of Science (Master i Modellering og dataanalyse) Department of Mathematics Faculty of Mathematics and

### 4. Introduction to Statistics

Statistics for Engineers 4-1 4. Introduction to Statistics Descriptive Statistics Types of data A variate or random variable is a quantity or attribute whose value may vary from one unit of investigation

### CHAPTER 2 Estimating Probabilities

CHAPTER 2 Estimating Probabilities Machine Learning Copyright c 2016. Tom M. Mitchell. All rights reserved. *DRAFT OF January 24, 2016* *PLEASE DO NOT DISTRIBUTE WITHOUT AUTHOR S PERMISSION* This is a

### National Joint Replacement Registry. Demographics and Outcomes of Shoulder Arthroplasty

National Joint Replacement Registry Demographics and Outcomes of Shoulder Arthroplasty SUPPLEMENTARY REPORT 2014 CONTENTS INTRODUCTION... 1 Data Collection and Validation... 1 Outcome Assessment... 1 Report

### Standard Deviation Estimator

CSS.com Chapter 905 Standard Deviation Estimator Introduction Even though it is not of primary interest, an estimate of the standard deviation (SD) is needed when calculating the power or sample size of

### Test of Hypotheses. Since the Neyman-Pearson approach involves two statistical hypotheses, one has to decide which one

Test of Hypotheses Hypothesis, Test Statistic, and Rejection Region Imagine that you play a repeated Bernoulli game: you win \$1 if head and lose \$1 if tail. After 10 plays, you lost \$2 in net (4 heads

### Models Where the Fate of Every Individual is Known

Models Where the Fate of Every Individual is Known This class of models is important because they provide a theory for estimation of survival probability and other parameters from radio-tagged animals.

### Statistical Intervals. Chapter 7 Stat 4570/5570 Material from Devore s book (Ed 8), and Cengage

7 Statistical Intervals Chapter 7 Stat 4570/5570 Material from Devore s book (Ed 8), and Cengage Confidence Intervals The CLT tells us that as the sample size n increases, the sample mean X is close to

### Statistics in Retail Finance. Chapter 6: Behavioural models

Statistics in Retail Finance 1 Overview > So far we have focussed mainly on application scorecards. In this chapter we shall look at behavioural models. We shall cover the following topics:- Behavioural

### Basics of Statistical Machine Learning

CS761 Spring 2013 Advanced Machine Learning Basics of Statistical Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu Modern machine learning is rooted in statistics. You will find many familiar

### Tips for surviving the analysis of survival data. Philip Twumasi-Ankrah, PhD

Tips for surviving the analysis of survival data Philip Twumasi-Ankrah, PhD Big picture In medical research and many other areas of research, we often confront continuous, ordinal or dichotomous outcomes

### Introduction to Stata

Introduction to Stata September 23, 2014 Stata is one of a few statistical analysis programs that social scientists use. Stata is in the mid-range of how easy it is to use. Other options include SPSS,

### Stats on the TI 83 and TI 84 Calculator

Stats on the TI 83 and TI 84 Calculator Entering the sample values STAT button Left bracket { Right bracket } Store (STO) List L1 Comma Enter Example: Sample data are {5, 10, 15, 20} 1. Press 2 ND and

### Point and Interval Estimates

Point and Interval Estimates Suppose we want to estimate a parameter, such as p or µ, based on a finite sample of data. There are two main methods: 1. Point estimate: Summarize the sample by a single number

### 3.4 Statistical inference for 2 populations based on two samples

3.4 Statistical inference for 2 populations based on two samples Tests for a difference between two population means The first sample will be denoted as X 1, X 2,..., X m. The second sample will be denoted

### Confidence Intervals for the Area Under an ROC Curve

Chapter 261 Confidence Intervals for the Area Under an ROC Curve Introduction Receiver operating characteristic (ROC) curves are used to assess the accuracy of a diagnostic test. The technique is used

### Survival Analysis of Dental Implants. Abstracts

Survival Analysis of Dental Implants Andrew Kai-Ming Kwan 1,4, Dr. Fu Lee Wang 2, and Dr. Tak-Kun Chow 3 1 Census and Statistics Department, Hong Kong, China 2 Caritas Institute of Higher Education, Hong

### 2.0 Lesson Plan. Answer Questions. Summary Statistics. Histograms. The Normal Distribution. Using the Standard Normal Table

2.0 Lesson Plan Answer Questions 1 Summary Statistics Histograms The Normal Distribution Using the Standard Normal Table 2. Summary Statistics Given a collection of data, one needs to find representations

### STAT 830 Convergence in Distribution

STAT 830 Convergence in Distribution Richard Lockhart Simon Fraser University STAT 830 Fall 2011 Richard Lockhart (Simon Fraser University) STAT 830 Convergence in Distribution STAT 830 Fall 2011 1 / 31

### 4. Life Insurance Payments

4. Life Insurance Payments A life insurance premium must take into account the following factors 1. The amount to be paid upon the death of the insured person and its present value. 2. The distribution

### Introduction to Hypothesis Testing. Point estimation and confidence intervals are useful statistical inference procedures.

Introduction to Hypothesis Testing Point estimation and confidence intervals are useful statistical inference procedures. Another type of inference is used frequently used concerns tests of hypotheses.

### Properties of Future Lifetime Distributions and Estimation

Properties of Future Lifetime Distributions and Estimation Harmanpreet Singh Kapoor and Kanchan Jain Abstract Distributional properties of continuous future lifetime of an individual aged x have been studied.

### Simple Regression Theory II 2010 Samuel L. Baker

SIMPLE REGRESSION THEORY II 1 Simple Regression Theory II 2010 Samuel L. Baker Assessing how good the regression equation is likely to be Assignment 1A gets into drawing inferences about how close the

### Tests for Two Survival Curves Using Cox s Proportional Hazards Model

Chapter 730 Tests for Two Survival Curves Using Cox s Proportional Hazards Model Introduction A clinical trial is often employed to test the equality of survival distributions of two treatment groups.

### I. CENSUS SURVIVAL METHOD

I. CENSUS SURVIVAL METHOD Census survival methods are the oldest and most widely applicable methods of estimating adult mortality. These methods assume that mortality levels can be estimated from the survival

### Confidence Intervals for Exponential Reliability

Chapter 408 Confidence Intervals for Exponential Reliability Introduction This routine calculates the number of events needed to obtain a specified width of a confidence interval for the reliability (proportion

### Hypothesis Testing COMP 245 STATISTICS. Dr N A Heard. 1 Hypothesis Testing 2 1.1 Introduction... 2 1.2 Error Rates and Power of a Test...

Hypothesis Testing COMP 45 STATISTICS Dr N A Heard Contents 1 Hypothesis Testing 1.1 Introduction........................................ 1. Error Rates and Power of a Test.............................

### STATISTICS FOR PSYCH MATH REVIEW GUIDE

STATISTICS FOR PSYCH MATH REVIEW GUIDE ORDER OF OPERATIONS Although remembering the order of operations as BEDMAS may seem simple, it is definitely worth reviewing in a new context such as statistics formulae.

### SYSM 6304: Risk and Decision Analysis Lecture 3 Monte Carlo Simulation

SYSM 6304: Risk and Decision Analysis Lecture 3 Monte Carlo Simulation M. Vidyasagar Cecil & Ida Green Chair The University of Texas at Dallas Email: M.Vidyasagar@utdallas.edu September 19, 2015 Outline

### Statistics 641 - EXAM II - 1999 through 2003

Statistics 641 - EXAM II - 1999 through 2003 December 1, 1999 I. (40 points ) Place the letter of the best answer in the blank to the left of each question. (1) In testing H 0 : µ 5 vs H 1 : µ > 5, the

### 1 The Brownian bridge construction

The Brownian bridge construction The Brownian bridge construction is a way to build a Brownian motion path by successively adding finer scale detail. This construction leads to a relatively easy proof

### Let m denote the margin of error. Then

S:105 Statistical Methods and Computing Sample size for confidence intervals with σ known t Intervals Lecture 13 Mar. 6, 009 Kate Cowles 374 SH, 335-077 kcowles@stat.uiowa.edu 1 The margin of error The