# Sudhanshu K. Ghoshal, Ischemia Research Institute, San Francisco, CA

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 ABSTRACT: Multivariable Cox Proportional Hazard Model by SAS PHREG & Validation by Bootstrapping Using SAS Randomizers With Applications Involving Electrocardiology and Organ Transplants Sudhanshu K. Ghoshal, Ischemia Research Institute, San Francisco, CA The paper presents statistical procedures of multivariable model building for survival analysis of patients undergoing surgery with many preoperative risk factors. In the first example we have added electrocardiologic risk factors to the traditional clinical and demographic risk factors. All analyses were performed on SAS The procedure described here mainly concentrates on Cox's regression analysis with risk factors assumed to be constant over time. In the last section a more generalized version of Cox's proportional hazard model is presented. Some possible applications have been presented. 1.0 INTRODUCTION: The paper mainly discusses procedures for the analysis of survival data for patients undergoing non-cardiac surgery (example-1) and heart transplants (example-2). Our analysis included Cox's Multivariate Proportional Hazard Models (SAS PHREG) with stepwise selection process. The models were validated by bootstrapping based on Efron's technique and the samples were generated by SAS Randomizer. The next two sections contain brief description of the types of the proportional hazard models used in our study. 2.0 COX'S MODEL WITH RISK FACTORS ASSUMED TO BE CONSTANT OVER TIME Suppose the survival times of n individuals are available, where 'r' of these are deaths and remaining n- r are living. If we assume that there are p risk factors (explanatory variables) so that x=(x 1, x 2,.. x p )', Let h 0 (t) be the hazard function for an individual for whom the values of all the risk factor that make up the vector x are zero. The function h 0 (t) is called the baseline hazard function. The hazard function of the i-th individual can be written as h i (t) = ψ(x i )h 0 (t), where ψ(x i ) is a function of the values of the vector of explanatory variables of the i-th individual. The function ψ() may be interpreted as the hazard at time t for an individual whose vector of explanatory variables is x i, relative to the hazard for an individual for whom x=0. Since hazard function is nonnegative, ψ(x i ) may be expressed as ψ(x i ) = exp(β'x i ) where η i = β'x i= β 1 x 1i + β 2 x 2i + +β p x 1p for i= 1,2,3,...n, This is the most commonly used model for the analysis of survival data. So the general proportional hazard model becomes h i (t) = exp(β'x i )h 0 (t) All the risk factors are assumed to be constant over time η i =β 1 x 1i + β 2 x 2i + +β p x 1p is called the linear component of the model. It is also called the risk score or prognostic index for the i-th individual

2 Let us consider a case where the data consists of n observed survival times, denoted by t 1,t 2,...t n and δ i is a censoring function which is zero if the i- th survival time t i is right-censored and unity otherwise. The β coefficients in the proportional hazard model may be estimated by the method of maximum likelihood. Using Cox's analysis(1972) the corresponding likelihood function for the proportional hazard model is given by r {expβ'x(j) / ( exp( β'x i))} j= 1 l R(t (j)) where R(t i ) is the risk set at time t i. The corresponding log-likelihood function is given by n δi { β' xi - log exp( β' x) l } i= 1 l R(t) i The maximum likelihood estimates of the β-parameters in the proportional hazard model can be found by maximizing this log-likelihood function using numerical methods. A brief discussions are presented in example-1. The cases of ties are handled by Efron's technique in PROC PHREG. 3.0 COX'S GENERALIZED REGRESSION ANALYSIS WITH TIME DEPENDENT RISK FACTORS: Cox's generalized hazard function model may be explained as follows: In this case x(t)=(x 1 (t),x 2 (t)...x p (t)) ' the hazard function for the i-th individual is h i (t)=exp(β x i (t))h 0 (t) where i= 1,2,3,...n, and β x i (t)=β 1 x 1i (t)+β 2 x 2i (t)++β p x pi (t) is the value of the component of the linear predictor of the model for the i- th individual.h 0 (t) is the baseline hazard function for an individual for whom all the variables are zero at the time origin and remain at this same value through time. Here x ji (t) is the value of the j-th risk factor at time t for the i-th individual, some or all of the risk factors may be time dependent. Integrating the hazard function the survivor function for the i-th individual is given by Si(t) = exp{- exp( t 0 p j 1 β jx ji(u))h 0(u)du} The corresponding partial log-likelihood function may be written as n p δj{ βjx ji(t i) - log exp( βjxji( ti))} i= 1 j= 1 i R(ti) j= 1 p Particular cases of this model has been used in organ transplant studies. A brief discussions are presented in example-2. 4.O VALIDATION BY BOOTSTRAPPING: It is important to validate the models whenever possible Normally when we have large number into two parts: -training set of observations, the data may be split -validation set

3 to perform cross validation. However when we have limited amount of data we cannot split the data into two parts. However,it is possible to validate the model developed by bootstrapping. This is based on the technique developed by Efron. The concept may be described as follows: Suppose we have a random sample x = (x 1, x 2,.. x p) from an unknown probability distribution F.We wish to estimate a parameter of interest Θ = t(f) on the basis of x. For this we estimate Θ = s(x) from x. Problem is to determine how accurate is. Θ. The bias of the estimate is given by bias( Θ, Θ )=E F [s(x)] - t(f) Bootstrap methods depends on the notion of a bootstrap sample. A bootstrap sample x * 1, x2 *... xp * is defined to be a random sample of size n drawn with replacement from a population of n subjects (x 1, x 2.. x p ) So the bootstrap set (x * 1,x2 *,. xp * ) consists of a member of the original data set some appearing zero times, some appearing once, some appearing twice etc. Corresponding to a bootstrap data set x * there is a bootstrap replication of Θ * =s(x * ) The quantity s(x * ) is the result of applying the same function s(.) to as x * as was applied to x. For example if s(x) is the sample mean x _ then s(x * ) is the mean of the corresponding bootstrap data set 5.0 THE ALGORITHM FOR ESTIMATING STANDARD ERROR 1. Select B independent bootstrap samples x *1, x *2... x *B each consisting of n data values drawn with replacement from x = (x 1, x 2,..x p) According to Efron B will normally be in the range of Evaluate the bootstrap replication corresponding to each bootstrap sample: Θ (b)=s(x *b ) where b=1,2, B 3.Estimate the standard error S e (Θ) of the B replications where B seb Θ Θ = { [ *( b) - *(.)] /( B-1)} b= 1 2 1/2 by the sample standard deviation where Θ B * (.) Θ * = (b)/b b=1 The estimate of the bias is bias Θ(.) F) B = t(

4 6.0 BOOTSTRAP ESTIMATE OF PREDICTION ERROR: Select B independent bootstrap samples x *1, x *2, x *p as before. Apply logistic regression or Cox's regression with stepwise selection to find a best fit model to each of these bootstrap samples. The model is applied to the original data the prediction error err_orig. The model is applied to the x*i bootstrap sample data the apparent error is err_boot. optimism = err_orig - err_boot The overall estimate of optimism is the average of the B differences. Once an estimate of optimism is obtained, it is added to the apparent error rate to obtain an improved estimate of the prediction error. Example: Suppose we wish to estimate the expected value of the Sumers' D rank correlation coefficient between predicted and observed survival time. This may be done as follows: a. Develop the model using original population with say logistic regression stepwise selection procedure. Let Dapp denote the apparent D from this model. 2. develop the stepwise model on one of the B samples. The apparent D computed on this bootstrap model is Dboot. 3. Apply this model to the original data and calculate D on the original data, call it Dorig 4. The optimism in the fit is diff=dorig - Dboot 5. The step is repeated B times. 6. The average of diff is the average optimism call it opt_av The bootstrap corrected performance of the original stepwise model = Dapp + opt_av This is a nearly unbiased estimate of the expected value of the external predictive discrimination of the process which generated Dapp Similar strategy is applied for estimating the prediction error at time t in a survival model. Instead of computing Sumers D we compute the statistic D_Stat = difference between mean predicted 2-year survival probability and Kaplan- Meier 2-year survival estimate 7.0 Example 1: In this example we have described the statistical procedure of multivariable model building in a survival analysis of 336 surgery patients having many preoperative risk factors. The risk factors include demographic, traditional clinical and electrocardiologic variables. The procedure Proc Corr was employed to determine the correlation between the risk factors and interaction terms were added whenever necessary. The procedure described here concentrates on Cox's regression analysis with risk factors not varying with time. In the process of determining associations of various risk factors to patient mortality univariate LOGISTIC REGRESSION was applied. Our next step was to determine the best main effect multivariable model by fitting a subset of predictors ( with p value <0.1).We have employed Proc PHREG with stepwise selection process. Models were developed for the different periods of follow up (inhospital, 1 year, 2 year, etc). In the final models predictors with multivariable two-tailed p <.05 were retained. For each period of survivability prognostic risk scores were computed from the Cox regression coefficients. The distribution of prognostic scores was divided into three risk categories - low, moderate and high. The upper threshold was set equal to the average of medium and maximum values of the prognostic scores while the lower threshold was set equal to the average of mean and lower quartile of the prognostic scores. Kaplan-Meier survival curves, probability of survival were computed for the three risk categories and each year (period) of follow up. The models were validated by bootstrapping based on the generalization

5 of Efron's technique using SAS randomizer.cases of ties were handled by Efron's technique. BOOTSTRAPPING 200 samples were created from the original populations of the patients employing SAS randomizer RANUNI(seed, x). The estimated standard error of the mean of survival estimates of the samples was small for all the risk groups. Example -2: Crowley and Hu have shown how time dependent risk factors can be used in organ transplantation. Here the risk factor x 1 (t) takes the value 0 if the patient has not received a transplant at time t and unity otherwise, the hazard of death for the i-th individual at time t is given by h i (t) = exp{η i + β 1 x 1i (t)}h 0 (t) η i is a linear combination of other non-time-dependent explanatory variables whose values have been recorded at the time origin for the i-th individual, x 1i is the value of x 1 for the individual at time t. Cox and Oakes (1984) suggested an improvement to the hazard model. Details are omitted for lack of space. Acknowledgments: The views and results presented here are the personal opinion of the author and Ischemia Research Institute is not anyway responsible for them. Dr.Sudhanshu K. Ghoshal Ph.D, D.Sc Senior Research Scientist Ischemia Research Institute San Frncisco, CA References: Collett, D (1994) : Modeling Survival Data in Medical Research, London: Chapman & Hall Cox, D.R (1972) : Regression Models and Life Tables (with Discussion), Journal of the Royal Statistical Society, B34, Cox, D.R and Oakes, D (1984) : Analysis of Survival Data, Chapman and Hall, London Crowley, J and Hu, M.(1977) Covariance Analysis of heart transplant survival data. Journal of the American Statistical Association, 78, Efron B (1979) : Bootstrap Methods : Another Look at the Jackknife: Annals of Statistics 7: 1-26 Efron B (1981a) : Censored data and the bootstrap J of American Statistical Asso, 76: Efron B (1981b): Nonparametric standard error and confidence intervals: Canadian Journal of Stat,9, : Efron B and Tibshirani R(1986): Bootstrap Methods for standard errors : SIAM, Philadelpia Example-1: Results: Model:A. Cox's Proportional Hazard Model for 1 year survivability

6 No of Hazard p-value Coefficient Risk Factor Ratio Parameter 1. SDNN < 50 msec ASA Physical Status >= Ventricular Tachycardia Index Surgery for cancer (VT) 5.Moderate or Severe Limitation of Activity 6.Estimated Creatinine Clearance < 0.83 SDNN = standard deviation of normal to normal QRS intervals, ASA = American Society of Anesthesiologists In previous studies Clinicians found demographic & clinical variables were associative with mortality. It is nice to find the Electrocardiologic variables No 1 & 3 are also associative and included in the model. A. We used SAS randomizer to generate 200 random samples out of the original population of patients (336).. On univariate analysis of on the 200 samples we found the following No of times significant out of 200 % 1. SDNN < 50 msec 159/200 80% 2. ASA Physical Status >=4 106/200 53% 3. Ventricular Tachycardia 154/200 77% 4. Index Surgery for cancer 163/200 82% 5. Moderate or Severe 163/200 82% Limitation of Activity 6. Estimated Creatinine 180/200 90% Clearance < Couplet 82/200 41% 8. Couplet or VT 67/200 34% 9. Using Bronchodilator 111/200 56% Medication 10. Histchf 136/200 68% etc etc. Rest of them were lot less associative with mortality. Broncho & histchf were not significant in the original population & was not selected (forward selection) in the final model. Their p-values were >.1. We retained risk factors with p-values <.1 in univariate analysis & <.o5 in the final model. Fig -1 shows survivability with hi, moderate and low prognostic scores. Fig -2 One to five year survivability with respect to risk scores in general. Fig 3 shows survivability with or without electrocardiologic variables ( SDNN & VT).

### Statistics in Retail Finance. Chapter 6: Behavioural models

Statistics in Retail Finance 1 Overview > So far we have focussed mainly on application scorecards. In this chapter we shall look at behavioural models. We shall cover the following topics:- Behavioural

### Introduction. Survival Analysis. Censoring. Plan of Talk

Survival Analysis Mark Lunt Arthritis Research UK Centre for Excellence in Epidemiology University of Manchester 01/12/2015 Survival Analysis is concerned with the length of time before an event occurs.

### Comparison of resampling method applied to censored data

International Journal of Advanced Statistics and Probability, 2 (2) (2014) 48-55 c Science Publishing Corporation www.sciencepubco.com/index.php/ijasp doi: 10.14419/ijasp.v2i2.2291 Research Paper Comparison

### Survival Analysis of Left Truncated Income Protection Insurance Data. [March 29, 2012]

Survival Analysis of Left Truncated Income Protection Insurance Data [March 29, 2012] 1 Qing Liu 2 David Pitt 3 Yan Wang 4 Xueyuan Wu Abstract One of the main characteristics of Income Protection Insurance

### Lecture 2 ESTIMATING THE SURVIVAL FUNCTION. One-sample nonparametric methods

Lecture 2 ESTIMATING THE SURVIVAL FUNCTION One-sample nonparametric methods There are commonly three methods for estimating a survivorship function S(t) = P (T > t) without resorting to parametric models:

### Survival Analysis Using SPSS. By Hui Bian Office for Faculty Excellence

Survival Analysis Using SPSS By Hui Bian Office for Faculty Excellence Survival analysis What is survival analysis Event history analysis Time series analysis When use survival analysis Research interest

### SUMAN DUVVURU STAT 567 PROJECT REPORT

SUMAN DUVVURU STAT 567 PROJECT REPORT SURVIVAL ANALYSIS OF HEROIN ADDICTS Background and introduction: Current illicit drug use among teens is continuing to increase in many countries around the world.

### Regression Modeling Strategies

Frank E. Harrell, Jr. Regression Modeling Strategies With Applications to Linear Models, Logistic Regression, and Survival Analysis With 141 Figures Springer Contents Preface Typographical Conventions

### Parametric survival models

Parametric survival models ST3242: Introduction to Survival Analysis Alex Cook October 2008 ST3242 : Parametric survival models 1/17 Last time in survival analysis Reintroduced parametric models Distinguished

: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

Statistics Graduate Courses STAT 7002--Topics in Statistics-Biological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.

### Non-Parametric Estimation in Survival Models

Non-Parametric Estimation in Survival Models Germán Rodríguez grodri@princeton.edu Spring, 2001; revised Spring 2005 We now discuss the analysis of survival data without parametric assumptions about the

### SAS Software to Fit the Generalized Linear Model

SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling

### Predicting Customer Default Times using Survival Analysis Methods in SAS

Predicting Customer Default Times using Survival Analysis Methods in SAS Bart Baesens Bart.Baesens@econ.kuleuven.ac.be Overview The credit scoring survival analysis problem Statistical methods for Survival

### Nominal and ordinal logistic regression

Nominal and ordinal logistic regression April 26 Nominal and ordinal logistic regression Our goal for today is to briefly go over ways to extend the logistic regression model to the case where the outcome

### Tips for surviving the analysis of survival data. Philip Twumasi-Ankrah, PhD

Tips for surviving the analysis of survival data Philip Twumasi-Ankrah, PhD Big picture In medical research and many other areas of research, we often confront continuous, ordinal or dichotomous outcomes

### Part 3: Methods for Propensity Analysis

Part 3: Methods for Propensity Analysis SMDM Short Course on Propensity Charles Thomas Center for Health Care Research & Policy Outline Definition of the Propensity Score Estimating the Propensity Score

### Applied Statistics. J. Blanchet and J. Wadsworth. Institute of Mathematics, Analysis, and Applications EPF Lausanne

Applied Statistics J. Blanchet and J. Wadsworth Institute of Mathematics, Analysis, and Applications EPF Lausanne An MSc Course for Applied Mathematicians, Fall 2012 Outline 1 Model Comparison 2 Model

### Lecture 4 PARAMETRIC SURVIVAL MODELS

Lecture 4 PARAMETRIC SURVIVAL MODELS Some Parametric Survival Distributions (defined on t 0): The Exponential distribution (1 parameter) f(t) = λe λt (λ > 0) S(t) = t = e λt f(u)du λ(t) = f(t) S(t) = λ

### Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C

### Survival Analysis of the Patients Diagnosed with Non-Small Cell Lung Cancer Using SAS Enterprise Miner 13.1

Paper 11682-2016 Survival Analysis of the Patients Diagnosed with Non-Small Cell Lung Cancer Using SAS Enterprise Miner 13.1 Raja Rajeswari Veggalam, Akansha Gupta; SAS and OSU Data Mining Certificate

### Time varying (or time-dependent) covariates

Chapter 9 Time varying (or time-dependent) covariates References: Allison (*) p.138-153 Hosmer & Lemeshow Chapter 7, Section 3 Kalbfleisch & Prentice Section 5.3 Collett Chapter 7 Kleinbaum Chapter 6 Cox

### Lecture 15 Introduction to Survival Analysis

Lecture 15 Introduction to Survival Analysis BIOST 515 February 26, 2004 BIOST 515, Lecture 15 Background In logistic regression, we were interested in studying how risk factors were associated with presence

### Estimating survival functions has interested statisticians for numerous years.

ZHAO, GUOLIN, M.A. Nonparametric and Parametric Survival Analysis of Censored Data with Possible Violation of Method Assumptions. (2008) Directed by Dr. Kirsten Doehler. 55pp. Estimating survival functions

### Personalized Predictive Medicine and Genomic Clinical Trials

Personalized Predictive Medicine and Genomic Clinical Trials Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer Institute http://brb.nci.nih.gov brb.nci.nih.gov Powerpoint presentations

### Survival Analysis of Dental Implants. Abstracts

Survival Analysis of Dental Implants Andrew Kai-Ming Kwan 1,4, Dr. Fu Lee Wang 2, and Dr. Tak-Kun Chow 3 1 Census and Statistics Department, Hong Kong, China 2 Caritas Institute of Higher Education, Hong

### 7.1 The Hazard and Survival Functions

Chapter 7 Survival Models Our final chapter concerns models for the analysis of data which have three main characteristics: (1) the dependent variable or response is the waiting time until the occurrence

### 11. Analysis of Case-control Studies Logistic Regression

Research methods II 113 11. Analysis of Case-control Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:

### Multivariate Normal Distribution

Multivariate Normal Distribution Lecture 4 July 21, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Lecture #4-7/21/2011 Slide 1 of 41 Last Time Matrices and vectors Eigenvalues

### Survival Analysis And The Application Of Cox's Proportional Hazards Modeling Using SAS

Paper 244-26 Survival Analysis And The Application Of Cox's Proportional Hazards Modeling Using SAS Tyler Smith, and Besa Smith, Department of Defense Center for Deployment Health Research, Naval Health

### Problem of Missing Data

VASA Mission of VA Statisticians Association (VASA) Promote & disseminate statistical methodological research relevant to VA studies; Facilitate communication & collaboration among VA-affiliated statisticians;

### Regression, least squares

Regression, least squares Joe Felsenstein Department of Genome Sciences and Department of Biology Regression, least squares p.1/24 Fitting a straight line X Two distinct cases: The X values are chosen

### 4. Introduction to Statistics

Statistics for Engineers 4-1 4. Introduction to Statistics Descriptive Statistics Types of data A variate or random variable is a quantity or attribute whose value may vary from one unit of investigation

### Penalized regression: Introduction

Penalized regression: Introduction Patrick Breheny August 30 Patrick Breheny BST 764: Applied Statistical Modeling 1/19 Maximum likelihood Much of 20th-century statistics dealt with maximum likelihood

### Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus Tihomir Asparouhov and Bengt Muthén Mplus Web Notes: No. 15 Version 8, August 5, 2014 1 Abstract This paper discusses alternatives

### Design and Analysis of Equivalence Clinical Trials Via the SAS System

Design and Analysis of Equivalence Clinical Trials Via the SAS System Pamela J. Atherton Skaff, Jeff A. Sloan, Mayo Clinic, Rochester, MN 55905 ABSTRACT An equivalence clinical trial typically is conducted

### Simple Sensitivity Analyses for Matched Samples. CRSP 500 Spring 2008 Thomas E. Love, Ph. D., Instructor. Goal of a Formal Sensitivity Analysis

Goal of a Formal Sensitivity Analysis To replace a general qualitative statement that applies in all observational studies the association we observe between treatment and outcome does not imply causation

### A Basic Introduction to Missing Data

John Fox Sociology 740 Winter 2014 Outline Why Missing Data Arise Why Missing Data Arise Global or unit non-response. In a survey, certain respondents may be unreachable or may refuse to participate. Item

### Modeling the Claim Duration of Income Protection Insurance Policyholders Using Parametric Mixture Models

Modeling the Claim Duration of Income Protection Insurance Policyholders Using Parametric Mixture Models Abstract This paper considers the modeling of claim durations for existing claimants under income

### CHAPTER 12 EXAMPLES: MONTE CARLO SIMULATION STUDIES

Examples: Monte Carlo Simulation Studies CHAPTER 12 EXAMPLES: MONTE CARLO SIMULATION STUDIES Monte Carlo simulation studies are often used for methodological investigations of the performance of statistical

### An Application of the Cox Proportional Hazards Model to the Construction of Objective Vintages for Credit in Financial Institutions, Using PROC PHREG

Paper 3140-2015 An Application of the Cox Proportional Hazards Model to the Construction of Objective Vintages for Credit in Financial Institutions, Using PROC PHREG Iván Darío Atehortua Rojas, Banco Colpatria

### SAS and R calculations for cause specific hazard ratios in a competing risks analysis with time dependent covariates

SAS and R calculations for cause specific hazard ratios in a competing risks analysis with time dependent covariates Martin Wolkewitz, Ralf Peter Vonberg, Hajo Grundmann, Jan Beyersmann, Petra Gastmeier,

### Interpretation of Somers D under four simple models

Interpretation of Somers D under four simple models Roger B. Newson 03 September, 04 Introduction Somers D is an ordinal measure of association introduced by Somers (96)[9]. It can be defined in terms

### Notes for STA 437/1005 Methods for Multivariate Data

Notes for STA 437/1005 Methods for Multivariate Data Radford M. Neal, 26 November 2010 Random Vectors Notation: Let X be a random vector with p elements, so that X = [X 1,..., X p ], where denotes transpose.

### Multiple Imputation for Missing Data: A Cautionary Tale

Multiple Imputation for Missing Data: A Cautionary Tale Paul D. Allison University of Pennsylvania Address correspondence to Paul D. Allison, Sociology Department, University of Pennsylvania, 3718 Locust

### Survey, Statistics and Psychometrics Core Research Facility University of Nebraska-Lincoln. Log-Rank Test for More Than Two Groups

Survey, Statistics and Psychometrics Core Research Facility University of Nebraska-Lincoln Log-Rank Test for More Than Two Groups Prepared by Harlan Sayles (SRAM) Revised by Julia Soulakova (Statistics)

### Prognosis: diagnosing the future. The temporal dimension of diagnosis. Predictive models

Prognosis: diagnosing the future. The temporal dimension of diagnosis. Predictive models Gennaro D Amico Gastroenterology Unit V Cervello Hospital Palermo gedamico@libero.it Diagnosing the future The link

### LOGISTIC REGRESSION. Nitin R Patel. where the dependent variable, y, is binary (for convenience we often code these values as

LOGISTIC REGRESSION Nitin R Patel Logistic regression extends the ideas of multiple linear regression to the situation where the dependent variable, y, is binary (for convenience we often code these values

### A Tutorial on Logistic Regression

A Tutorial on Logistic Regression Ying So, SAS Institute Inc., Cary, NC ABSTRACT Many procedures in SAS/STAT can be used to perform logistic regression analysis: CATMOD, GENMOD,LOGISTIC, and PROBIT. Each

### More details on the inputs, functionality, and output can be found below.

Overview: The SMEEACT (Software for More Efficient, Ethical, and Affordable Clinical Trials) web interface (http://research.mdacc.tmc.edu/smeeactweb) implements a single analysis of a two-armed trial comparing

### Gamma Distribution Fitting

Chapter 552 Gamma Distribution Fitting Introduction This module fits the gamma probability distributions to a complete or censored set of individual or grouped data values. It outputs various statistics

### Multivariate Statistical Inference and Applications

Multivariate Statistical Inference and Applications ALVIN C. RENCHER Department of Statistics Brigham Young University A Wiley-Interscience Publication JOHN WILEY & SONS, INC. New York Chichester Weinheim

### Least Squares Estimation

Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David

### Tests for Two Survival Curves Using Cox s Proportional Hazards Model

Chapter 730 Tests for Two Survival Curves Using Cox s Proportional Hazards Model Introduction A clinical trial is often employed to test the equality of survival distributions of two treatment groups.

### R 2 -type Curves for Dynamic Predictions from Joint Longitudinal-Survival Models

Faculty of Health Sciences R 2 -type Curves for Dynamic Predictions from Joint Longitudinal-Survival Models Inference & application to prediction of kidney graft failure Paul Blanche joint work with M-C.

### LOGISTIC REGRESSION ANALYSIS

LOGISTIC REGRESSION ANALYSIS C. Mitchell Dayton Department of Measurement, Statistics & Evaluation Room 1230D Benjamin Building University of Maryland September 1992 1. Introduction and Model Logistic

### How to Conduct a Hypothesis Test

How to Conduct a Hypothesis Test The idea of hypothesis testing is relatively straightforward. In various studies we observe certain events. We must ask, is the event due to chance alone, or is there some

### Yiming Peng, Department of Statistics. February 12, 2013

Regression Analysis Using JMP Yiming Peng, Department of Statistics February 12, 2013 2 Presentation and Data http://www.lisa.stat.vt.edu Short Courses Regression Analysis Using JMP Download Data to Desktop

### Building risk prediction models - with a focus on Genome-Wide Association Studies. Charles Kooperberg

Building risk prediction models - with a focus on Genome-Wide Association Studies Risk prediction models Based on data: (D i, X i1,..., X ip ) i = 1,..., n we like to fit a model P(D = 1 X 1,..., X p )

### ASSIGNMENT 4 PREDICTIVE MODELING AND GAINS CHARTS

DATABASE MARKETING Fall 2015, max 24 credits Dead line 15.10. ASSIGNMENT 4 PREDICTIVE MODELING AND GAINS CHARTS PART A Gains chart with excel Prepare a gains chart from the data in \\work\courses\e\27\e20100\ass4b.xls.

### Statistics in Retail Finance. Chapter 2: Statistical models of default

Statistics in Retail Finance 1 Overview > We consider how to build statistical models of default, or delinquency, and how such models are traditionally used for credit application scoring and decision

### HA Territory-wide PCI Audit 2003-05

HA Territory-wide PCI Audit 23-5 5 PCI Audit Working Group Central Committee (Cardiac Services) HA Convention 26 Percutaneous Coronary Intervention Background HA AP target 2/3, coordinated by PCI Working

### Lecture 3: Linear methods for classification

Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,

### This supplementary material has been provided by the authors to give readers additional information about their work.

SUPPLEMENTAL MATERIAL Table S1. The logistic regression model used to calculate the propensity score. Table S2. Distribution of propensity score among the treat and control groups of the full and matched

### , has mean A) 0.3. B) the smaller of 0.8 and 0.5. C) 0.15. D) which cannot be determined without knowing the sample results.

BA 275 Review Problems - Week 9 (11/20/06-11/24/06) CD Lessons: 69, 70, 16-20 Textbook: pp. 520-528, 111-124, 133-141 An SRS of size 100 is taken from a population having proportion 0.8 of successes. An

### Competing-risks regression

Competing-risks regression Roberto G. Gutierrez Director of Statistics StataCorp LP Stata Conference Boston 2010 R. Gutierrez (StataCorp) Competing-risks regression July 15-16, 2010 1 / 26 Outline 1. Overview

### Simple linear regression

Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between

### Primary Prevention of Cardiovascular Disease with a Mediterranean diet

Primary Prevention of Cardiovascular Disease with a Mediterranean diet Alejandro Vicente Carrillo, Brynja Ingadottir, Anne Fältström, Evelyn Lundin, Micaela Tjäderborn GROUP 2 Background The traditional

### L10: Probability, statistics, and estimation theory

L10: Probability, statistics, and estimation theory Review of probability theory Bayes theorem Statistics and the Normal distribution Least Squares Error estimation Maximum Likelihood estimation Bayesian

### Checking proportionality for Cox s regression model

Checking proportionality for Cox s regression model by Hui Hong Zhang Thesis for the degree of Master of Science (Master i Modellering og dataanalyse) Department of Mathematics Faculty of Mathematics and

### Overview Classes. 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7)

Overview Classes 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7) 2-4 Loglinear models (8) 5-4 15-17 hrs; 5B02 Building and

### Descriptive Statistics

Descriptive Statistics Primer Descriptive statistics Central tendency Variation Relative position Relationships Calculating descriptive statistics Descriptive Statistics Purpose to describe or summarize

### Multiple Linear Regression in Data Mining

Multiple Linear Regression in Data Mining Contents 2.1. A Review of Multiple Linear Regression 2.2. Illustration of the Regression Process 2.3. Subset Selection in Linear Regression 1 2 Chap. 2 Multiple

### Adequacy of Biomath. Models. Empirical Modeling Tools. Bayesian Modeling. Model Uncertainty / Selection

Directions in Statistical Methodology for Multivariable Predictive Modeling Frank E Harrell Jr University of Virginia Seattle WA 19May98 Overview of Modeling Process Model selection Regression shape Diagnostics

### Poisson Models for Count Data

Chapter 4 Poisson Models for Count Data In this chapter we study log-linear models for count data under the assumption of a Poisson error structure. These models have many applications, not only to the

### A Study to Predict No Show Probability for a Scheduled Appointment at Free Health Clinic

A Study to Predict No Show Probability for a Scheduled Appointment at Free Health Clinic Report prepared for Brandon Slama Department of Health Management and Informatics University of Missouri, Columbia

### X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)

CORRELATION AND REGRESSION / 47 CHAPTER EIGHT CORRELATION AND REGRESSION Correlation and regression are statistical methods that are commonly used in the medical literature to compare two or more variables.

### The Probit Link Function in Generalized Linear Models for Data Mining Applications

Journal of Modern Applied Statistical Methods Copyright 2013 JMASM, Inc. May 2013, Vol. 12, No. 1, 164-169 1538 9472/13/\$95.00 The Probit Link Function in Generalized Linear Models for Data Mining Applications

### Multilevel Modeling of Complex Survey Data

Multilevel Modeling of Complex Survey Data Tihomir Asparouhov 1, Bengt Muthen 2 Muthen & Muthen 1 University of California, Los Angeles 2 Abstract We describe a multivariate, multilevel, pseudo maximum

### Parametric Models. dh(t) dt > 0 (1)

Parametric Models: The Intuition Parametric Models As we saw early, a central component of duration analysis is the hazard rate. The hazard rate is the probability of experiencing an event at time t i

### Effect of Risk and Prognosis Factors on Breast Cancer Survival: Study of a Large Dataset with a Long Term Follow-up

Georgia State University ScholarWorks @ Georgia State University Mathematics Theses Department of Mathematics and Statistics 7-28-2012 Effect of Risk and Prognosis Factors on Breast Cancer Survival: Study

### UCLA STAT 13 Statistical Methods - Final Exam Review Solutions Chapter 7 Sampling Distributions of Estimates

UCLA STAT 13 Statistical Methods - Final Exam Review Solutions Chapter 7 Sampling Distributions of Estimates 1. (a) (i) µ µ (ii) σ σ n is exactly Normally distributed. (c) (i) is approximately Normally

### Introduction to General and Generalized Linear Models

Introduction to General and Generalized Linear Models General Linear Models - part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby

### Estimation of σ 2, the variance of ɛ

Estimation of σ 2, the variance of ɛ The variance of the errors σ 2 indicates how much observations deviate from the fitted surface. If σ 2 is small, parameters β 0, β 1,..., β k will be reliably estimated

### Confidence Intervals for Exponential Reliability

Chapter 408 Confidence Intervals for Exponential Reliability Introduction This routine calculates the number of events needed to obtain a specified width of a confidence interval for the reliability (proportion

### Wes, Delaram, and Emily MA751. Exercise 4.5. 1 p(x; β) = [1 p(xi ; β)] = 1 p(x. y i [βx i ] log [1 + exp {βx i }].

Wes, Delaram, and Emily MA75 Exercise 4.5 Consider a two-class logistic regression problem with x R. Characterize the maximum-likelihood estimates of the slope and intercept parameter if the sample for

### Examples on Variable Selection in PCA in Sensory Descriptive and Consumer Data

Examples on Variable Selection in PCA in Sensory Descriptive and Consumer Data Per Lea, Frank Westad, Margrethe Hersleth MATFORSK, Ås, Norway Harald Martens KVL, Copenhagen, Denmark 6 th Sensometrics Meeting

### Elevated heart rate at twelve months after heart transplantation is an independent predictor of long term mortality

Elevated heart rate at twelve months after heart transplantation is an independent predictor of long term mortality C. Tomas, MA Castel, E Roig, I. Vallejos, C. Plata, F. Pérez-Villa Cardiology Department,

### Model selection in R featuring the lasso. Chris Franck LISA Short Course March 26, 2013

Model selection in R featuring the lasso Chris Franck LISA Short Course March 26, 2013 Goals Overview of LISA Classic data example: prostate data (Stamey et. al) Brief review of regression and model selection.

### Introduction to Event History Analysis DUSTIN BROWN POPULATION RESEARCH CENTER

Introduction to Event History Analysis DUSTIN BROWN POPULATION RESEARCH CENTER Objectives Introduce event history analysis Describe some common survival (hazard) distributions Introduce some useful Stata

### Introduction to Fixed Effects Methods

Introduction to Fixed Effects Methods 1 1.1 The Promise of Fixed Effects for Nonexperimental Research... 1 1.2 The Paired-Comparisons t-test as a Fixed Effects Method... 2 1.3 Costs and Benefits of Fixed

### FINDING SUBGROUPS OF ENHANCED TREATMENT EFFECT. Jeremy M G Taylor Jared Foster University of Michigan Steve Ruberg Eli Lilly

FINDING SUBGROUPS OF ENHANCED TREATMENT EFFECT Jeremy M G Taylor Jared Foster University of Michigan Steve Ruberg Eli Lilly 1 1. INTRODUCTION and MOTIVATION 2. PROPOSED METHOD Random Forests Classification

### Principle Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression

Principle Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression Saikat Maitra and Jun Yan Abstract: Dimension reduction is one of the major tasks for multivariate

### COURSE PLAN BDA: Biomedical Data Analysis Master in Bioinformatics for Health Sciences. 2015-2016 Academic Year Qualification.

COURSE PLAN BDA: Biomedical Data Analysis Master in Bioinformatics for Health Sciences 2015-2016 Academic Year Qualification. Master's Degree 1. Description of the subject Subject name: Biomedical Data

### Linda Staub & Alexandros Gekenidis

Seminar in Statistics: Survival Analysis Chapter 2 Kaplan-Meier Survival Curves and the Log- Rank Test Linda Staub & Alexandros Gekenidis March 7th, 2011 1 Review Outcome variable of interest: time until

### What is the interpretation of R 2?

What is the interpretation of R 2? Karl G. Jöreskog October 2, 1999 Consider a regression equation between a dependent variable y and a set of explanatory variables x'=(x 1, x 2,..., x q ): or in matrix

### Data Analysis, Research Study Design and the IRB

Minding the p-values p and Quartiles: Data Analysis, Research Study Design and the IRB Don Allensworth-Davies, MSc Research Manager, Data Coordinating Center Boston University School of Public Health IRB