Lecture 4: Bivariate and Multivariate Failure Times

Size: px
Start display at page:

Download "Lecture 4: Bivariate and Multivariate Failure Times"

Transcription

1 Lecture 4: Bivariate and Multivariate Failure Times In many survival studies, the failure times may be clustered, matched or repeatedly measured, and failure times are likely to be correlated. The choice of a statistical model relies on the nature of correlation structure of failure time measurements and there are numerous way to model the dependence of correlated survival data. Example. Several individuals in one cluster shared observed or unobserved risk factors; e.g., onset age of a certain disease (cancer) for siblings. Example. One individual experiences multiple failure events, e.g., failure times of left and right eyes; failure times of two kidneys.

2 4.1 Bivariate failure times: Several dependence measures In studies involving bivariate events, interest could be focused on marginal effects of treatment (exposure) or dependence structure between bivariate times. For example, in familial studies, investigators may be interested in strength of disease aggregation, say, hypothesizing greater heritability for early-onset than later-onset disease.

3 Dependence measures Correlation coefficient. The traditional way of evaluating dependence between two random variables is the correlation coefficient (Pearson correlation): Corr[T 1, T 2 ] = Cov[T 1, T 2 ] V ar[t1 ]V ar[t 2 ] = Cov[T 1, T 2 ] SD[T 1 ] SD[T 2 ], where Cov[T 1, T 2 ] = E[(T 1 E[T 1 ])(T 2 E[T 2 ])]. Note that 1 Corr[T 1, T 2 ] 1.

4 Kendall s tau (coefficient of concordance). For failure time data, it is natural to demand a rank-based measure that is invariant subject to both linear and nonlinear monotonic change of scale of failure times. One easily interpreted measure that does have the rank-invariant property is the Kendall s tau (coefficient of concordance), τ = E[sign{(T 11 T 21 )(T 12 T 22 )}]. A more transparent formulation for continuous failure times is τ = P {(T 11 T 21 )(T 12 T 22 ) > 0} P {(T 11 T 21 )(T 12 T 22 ) < 0} = 2P {(T 11 T 21 )(T 12 T 22 ) > 0} 1

5 The Kendall s tau is a measure of rank correlation for bivariate dependence. If the agreement between the two rankings is perfect (i.e., the two rankings are the same) the coefficient has value 1. If the disagreement between the two rankings is perfect (i.e., one ranking is the reverse of the other) the coefficient has value -1. If X and Y are independent, then τ = 0.

6 Cross Ratio In familial examples, researches tend to believe that genetic influences may exist only in early ages. The global measures, such as Kendall s tau, is not ideal for addressing the concepts of early/late dependence. To address the question of local dependence, we need measures which evaluate dependence at a single time point, such as the cross ratio. For continuous (T 1, T 2 ), define the bivariate hazard function λ(t 1, t 2 ) = f(t 1, t 2 )/S(t 1, t 2 ). The cross ratio at (t 1, t 2 ) is defined as θ(t 1, t 2 ) = λ(t 1 = t 1 T 2 = t 2 ) λ(t 1 = t 1 T 2 > t 2 ) = = S(t 1, t 2 )f(t 1, t 2 ) f(t 1,t 2) s S(t 2 1,s 2) s2 =t 2 s 1 S(s 1,t 2) s1 =t 1 S(t 1,t 2) s 1 S(s 1, t 2 ) s1=t 1 s 2 S(t 1, s 2 ) s2=t 2

7 The cross ratio θ(t 1, t 2 ) is interpreted as the ratio of one s failure risk at time t 1 if his/her partner is known to have failed versus survived at time t 2. The cross ratio measures the degree of dependence between T 1 and T 2, where independence is implied by θ(t 1, t 2 ) = 1. When two failure times are exchangeable, such as the failure times from (identical) twins, the cross ratio is symmetric with respect to the two components; that is, the cross ratio for (T 1, T 2 ) is the same as the cross ratio for (T 2, T 1 ).

8 4.2 Nonparametric Estimation for Bivariate Distribution Nonparametric estimation for bivariate distribution is mathematically complicated compared to the Kaplan-Meier estimator for univariate survival data. Popular approaches include the works of Dabrowska (1988) among others. Dabrowska extended the univariate Kaplan-Meier approach by defining a bivariate hazard function and estimated the joint distribution via the bivariate hazard function. Prentice and Cai did not define a joint hazard. Instead, they gave a representation of the bivariate survival function in terms of the marginal survival functions and covariance between counting process martingales of two components. Remark: Dabrowska Estimator is discussed in appendix (reading with option)

9 4.3 Copula Model for Bivariate Failure Times One of the earliest family of distributions for correlated bivariate measurements is the Copula family, in which the marginal distributions are uniform on the unit interval. The Copula family includes many popular bivariate failure time models and has gained considerable attention in statistical literature because of its flexibility in modelling: In statistics, a copula is used as a general way of formulating a multivariate distribution in that various general types of dependence can be represented. The approach to formulating a multivariate distribution using a copula is based on the idea that a simple transformation can be made of each marginal variable in such a way that each transformed marginal variable has a uniform distribution. Once this is done, the dependence structure can be expressed as a multivariate distribution on the obtained uniforms, and a copula is precisely a multivariate distribution on marginally uniform random variables.

10 Suppose that C(t 1, t 2 ) is a joint survival function with density c(t 1, t 2 ) on [0, 1] 2, that is, c : [0, 1] [0, 1] [0, 1]. Let (T 1, T 2 ) denote the paired failure times, (S 1, S 2 ) and (f 1, f 2 ) denote the corresponding marginal survival and density functions. Then the joint survival function of (T 1, T 2 ) in the Copula family is given by S(t 1, t 2 ) = C {S 1 (t 1 ), S 2 (t 2 )}.

11 Archimedian Copula model. The survival function in this subclass has the following form S(t 1, t 2 ) = φ [ φ 1 {S 1 (t 1 )} + φ 1 {S 2 (t 2 )} ], where 0 φ 1, φ(0) = 1, φ < 0, φ > 0 (a convex decreasing function). If φ is a Laplace transform of some distribution (of W ), φ(t) = E(e tw ), the AC model reduces to the proportional frailty model. The Copula models can be formulated by the marginal distributions and Copula. This two-step approach of modelling is convenient because many tractable models are readily available for the marginal distributions. Also, the Copula models make sense for illustrating dependence. Other Copula models include Clayton s Family, Frank s Family, Positive stable copula, etc. (Clayton, 1978; Hougaard, 1986; Frank,1979).

12 4.4 Frailty models for multivariate failure times A commonly used approach to model multivariate failure times, the frailty model, is to specify independence among multivariate failure times conditional on an unobserved positive-valued variable, W, called frailty. Assume that the hazard function of T ij given W i = w (frailty) is λ j (t j W i = w) = w λ 0j (t j ), which is a proportional frailty model with the baseline hazard function λ 0j ( ). Let B j ( ) be the corresponding survival function for λ 0j ( ).

13 Univariate inference. Conditioning on W i = w, the survival function of T ij is {B j (t j )} w. Conditioning on W i = w, the survival function of (T i1,..., T im ) is S(t W i = w) = m j=1 {B j(t j )} w, t = (t 1,..., t m ). The unconditional survival function of T ij is S j (t j ) = φ{ logb j (t j )} where φ( ) is the Laplace transform of the random variable W i, i.e., φ(t) = E(e twi ).

14 Conditioning on W i = w, the hazard function of T ij is λ j (t j W i = w) = w λ 0j (t j ). By extending the proportional hazards model, the more general setting of the proportional frailty model can be expressed as, for j = 1,..., m, λ j (t j ; x ij, w i ) = w i λ 0j (t j )exp(βx ij )

15 Bivariate inference. The bivariate survival function satisfies S(t 1, t 2 ) = {B 1 (t 1 )B 2 (t 2 )} w df W (w) where F W indicates the frailty distribution of W. It follows that S(t 1, t 2 ) = φ( logb 1 (t 1 ) logb 2 (t 2 )), where φ( ) is the Laplace transform of the random variable W. Inferential procedures for frailty models have exclusively focused on likelihood-based approaches.

16 Bivariate distributions generated by frailty models are seen to be a subclass of the archimedean distributions (Genest and MacKey, 1989, American Statistician), providing that φ(u) is a Laplace transform. With B j (t j ) = exp[ φ 1 (S j (t j )], the bivariate distribution can be written as S(t 1, t 2 ) = 2 exp[ wφ 1 (S j (t j ))df W (w) j=1 = φ [ φ 1 {S 1 (t 1 )} + φ 1 {S 2 (t 2 )} ] However, the converse statement is false because there is no guarantee that the function φ(u) is a Laplace transformation.

17 Example 1 : Gamma frailty models (Clayton model) In this model, the frailty W follows a Gamma distribution with expectation 1 and variance α > 0. The corresponding Laplace transform is φ(u) = (1 + u) 1 α. The failure times (T 1, T 2 ) are positively correlated when α > 0 and independent when α = 0. The joint survival function can be written as S(t 1, t 2 ) = [S 1 (t 1 ) α + S 2 (t 2 ) α 1] 1/α.

18 Example2 : Stable frailty models Hougaard (1986) proposed a class of multivariate model, where the frailty W follows the positive stable distribution with parameter α so that the Laplace transform is φ(u) = exp( u a ), 0 < a < 1 The corresponding joint survival function is S(t 1, t 2 ) = exp( [( log S 1 (t 1 )) 1/a + ( log S 2 (t 2 )) 1/a ] a ). A notable property of the stable frailty model is that if the conditional hazards are proportional, then the hazard in the marginal distributions are also proportional, but with different baseline hazards and regression coefficients.

19 4.5 Regression models and methods There are two commonly used regression models, for clustered survival data, to account for intra-cluster association: frailty models and marginal models. In frailty models, the dependence structure is explicitly specified by some unobserved random quantities, frailties, common to observations from the same cluster. Marginal models, in contrast, model the marginal failure time distribution and leave the structure of the intra-cluster association unspecified but adjust for it in the inference.

20 Marginal regression models. The marginal model assumes for different baseline hazards for different strata, and it allows for different parameters for different strata given the covariates (Wei et al., JASA 1989). Marginal proportional hazards model where the event history, N(t), is not included as a part of the conditional statistics: λ(t X(t)) = λ 0 (t)exp{x(t)β}. The marginal model is generally ideal for identifying treatment effects and risk factors. The usual estimation procedure is adopted by Wei et al. to estimate the regression parameter β under independent censoring assumption; that is, the usual risk-set-based estimating equations are used for estimation of β. A robust sandwich-estimator from estimating equations is used for estimating the variance-covariance matrix of ˆβ.

21 Frailty regression models. A frailty model typically assumes that, conditioning on possibly time-dependent covariates X = x and frailty Z = z, T j is distributed with pdf f j (y x, z, θ), θ Θ. Denote the observed data in the sample by {(Y ij, δ ij, X ij ) : j = 1, 2,..., m i, i = 1, 2,..., n}, where δ ij is a censoring indicator, Y ij = T ij if δ ij = 1, Y ij = C ij if δ ij = 0. Assume that the cluster-specific latent variable Z i is distributed with pdf h(z; γ), γ Γ, and that the censoring time C ij is independent of (T ij, Z i ) conditioning on X ij = x ij. Further assume that the cluster size M i is independent of (T ij, C ij, X ij, Z i ). The likelihood function can be expressed as n m i L f j (y ij x ij, z i ; θ) δij S j (y ij x ij, z i ; θ) 1 δij h(z i; γ)dz i i=1 j=1

22 Remarks: - In literature the EM algorithm has been used as the tool for identifying the MLE ˆβ. The convergence in EM algorithm might be slow, if it converges at all, which depends on the amount of information to be iterated for the survival model, its censoring pattern, and the choice of starting value. - In some cases, the EM algorithm does not convergence at all.

23 Assuming frailty is gamma distributed, we can construct the likelihood function for the model. If we assume a parametric form for the baseline hazard function, then by directly maximizing that likelihood function would yield the maximum likelihood estimates. Inverting the resulting information matrix, one gets the estimates of the variability of the parameter estimates. However, if the baseline hazard function is not parameterized, EM algorithm gets us the semiparametric estimates. There are various problems of obtaining variance estimates for frailty parameter, regression coefficients and cumulative baseline hazards using the observed non-parametric information matrix from a shared Gamma frailty model (Andersen et al., 1997, Biometrics).

24 Appendix Dabrowska Estimator (optional reading). The Dabrowska estimator (1988) overcomes some undesirable features of previously developed estimators such as non-uniqueness, inconsistency, and lack of weak convergence theory. It was found that the bivariate hazard function λ(t 1, t 2 ) = f(t 1, t 2 )/S(t 1, t 2 ) is insufficient to determine the survival function in the bivariate case. The relation between the bivariate survival and hazard function is d 2 { } { } d d log S(t 1, t 2 ) = λ(t 1, t 2 ) log S(t 1, t 2 ) log S(t 1, t 2 ), dt 1 dt 2 dt 1 dt 2 or equivalently,

25 log S(dt 1, dt 2 ) = λ(t 1, t 2 )dt 1 dt 2 {log S(dt 1, t 2 )dt 2 } {log S(t 1, dt 2 )dt 1 } = f(t 1, t 2 )dt 1 dt 2 S(t 1, t 2 ) S(dt 1, t 2 )dt 2 S(t 1, t 2 ) S(t 1, dt 2 )dt 1 S(t 1, t 2 ) = { t2 t1 0 { t2 0 0 } log S(dt 1, dt 2) + log S(t 1, 0) + log S(0, t 2 ) } [log S(t 1, dt 2) log S(0, dt 2)] + log S(t 1, 0) + log S(0, t 2 ) = {(log S(t 1, t 2 ) log S(t 1, 0)) (log S(0, t 2 ) log S(0, 0))} + log S(t 1, 0) + log S(0, t 2 ) = log S(t 1, t 2 )

26 Thus, by taking the exponential transformation, { t1 t2 } S(t 1, t 2 ) = S(t 1, 0)S(0, t 2 ) exp log S(dt 1, dt 2 ). 0 0 Observations are paired times (T 1, T 2 ) with corresponding failure indicators (D 1, D 2 ). The estimator is as follows. First define the bivariate risk set at time (t 1, t 2 ) R(t 1, t 2 ) = I(T i1 t 1, T i2 t 2 ). i

27 The number of bivariate events occurring at time (t 1, t 2 ) is K 11 (t 1, t 2 ) = i D i1 D i2 I(T i1 = t 1, T i2 = t 2 ), and the number of events for the first component among those where the second component is alive at time t 2 is K 10 (t 1, t 2 ) = i D i1 I{T i1 = t 1, T i2 t 2 }.

28 Similar to K 10 (t 1, t 2 ), K 01 (t 1, t 2 ) = i D i2 I(T i1 t 1, T i2 = t 2 ). Then we define the quantities relative to the risk set L 11 (t 1, t 2 ) = K 11 (t 1, t 2 )/R(t 1, t 2 ) L 10 (t 1, t 2 ) = K 10 (t 1, t 2 )/R(t 1, t 2 ) L 01 (t 1, t 2 ) = K 01 (t 1, t 2 )/R(t 1, t 2 ).

29 The Dabrowska estimator is built on the two marginal K-M estimators and an association measure H, where Ŝ(t 1, t 2 ) = {1 L 10 (u, 0)} {1 L 01 (0, u)} {1 H(u, v)} u t 1 u t 2 0 u t 1,0 v t 2 H(t 1, t 2 ) = L 10(t 1, t 2 )L 01 (t 1, t 2 ) L 11 (t 1, t 2 ) {1 L 10 (t 1, t 2 )}{1 L 01 (t 1, t 2 )}. The estimated survival function may not be monotone. It can be explained by the complicated censoring and the fact that three different portions of the data set are involved in estimation.

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written

More information

Interpretation of Somers D under four simple models

Interpretation of Somers D under four simple models Interpretation of Somers D under four simple models Roger B. Newson 03 September, 04 Introduction Somers D is an ordinal measure of association introduced by Somers (96)[9]. It can be defined in terms

More information

Package depend.truncation

Package depend.truncation Type Package Package depend.truncation May 28, 2015 Title Statistical Inference for Parametric and Semiparametric Models Based on Dependently Truncated Data Version 2.4 Date 2015-05-28 Author Takeshi Emura

More information

Overview. Longitudinal Data Variation and Correlation Different Approaches. Linear Mixed Models Generalized Linear Mixed Models

Overview. Longitudinal Data Variation and Correlation Different Approaches. Linear Mixed Models Generalized Linear Mixed Models Overview 1 Introduction Longitudinal Data Variation and Correlation Different Approaches 2 Mixed Models Linear Mixed Models Generalized Linear Mixed Models 3 Marginal Models Linear Models Generalized Linear

More information

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics. Course Catalog In order to be assured that all prerequisites are met, students must acquire a permission number from the education coordinator prior to enrolling in any Biostatistics course. Courses are

More information

BayesX - Software for Bayesian Inference in Structured Additive Regression

BayesX - Software for Bayesian Inference in Structured Additive Regression BayesX - Software for Bayesian Inference in Structured Additive Regression Thomas Kneib Faculty of Mathematics and Economics, University of Ulm Department of Statistics, Ludwig-Maximilians-University Munich

More information

Statistics Graduate Courses

Statistics Graduate Courses Statistics Graduate Courses STAT 7002--Topics in Statistics-Biological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.

More information

Statistics in Retail Finance. Chapter 6: Behavioural models

Statistics in Retail Finance. Chapter 6: Behavioural models Statistics in Retail Finance 1 Overview > So far we have focussed mainly on application scorecards. In this chapter we shall look at behavioural models. We shall cover the following topics:- Behavioural

More information

Survival Distributions, Hazard Functions, Cumulative Hazards

Survival Distributions, Hazard Functions, Cumulative Hazards Week 1 Survival Distributions, Hazard Functions, Cumulative Hazards 1.1 Definitions: The goals of this unit are to introduce notation, discuss ways of probabilistically describing the distribution of a

More information

Analysing Questionnaires using Minitab (for SPSS queries contact -) Graham.Currell@uwe.ac.uk

Analysing Questionnaires using Minitab (for SPSS queries contact -) Graham.Currell@uwe.ac.uk Analysing Questionnaires using Minitab (for SPSS queries contact -) Graham.Currell@uwe.ac.uk Structure As a starting point it is useful to consider a basic questionnaire as containing three main sections:

More information

Nonparametric adaptive age replacement with a one-cycle criterion

Nonparametric adaptive age replacement with a one-cycle criterion Nonparametric adaptive age replacement with a one-cycle criterion P. Coolen-Schrijner, F.P.A. Coolen Department of Mathematical Sciences University of Durham, Durham, DH1 3LE, UK e-mail: Pauline.Schrijner@durham.ac.uk

More information

II. DISTRIBUTIONS distribution normal distribution. standard scores

II. DISTRIBUTIONS distribution normal distribution. standard scores Appendix D Basic Measurement And Statistics The following information was developed by Steven Rothke, PhD, Department of Psychology, Rehabilitation Institute of Chicago (RIC) and expanded by Mary F. Schmidt,

More information

Applied Statistics. J. Blanchet and J. Wadsworth. Institute of Mathematics, Analysis, and Applications EPF Lausanne

Applied Statistics. J. Blanchet and J. Wadsworth. Institute of Mathematics, Analysis, and Applications EPF Lausanne Applied Statistics J. Blanchet and J. Wadsworth Institute of Mathematics, Analysis, and Applications EPF Lausanne An MSc Course for Applied Mathematicians, Fall 2012 Outline 1 Model Comparison 2 Model

More information

Association Between Variables

Association Between Variables Contents 11 Association Between Variables 767 11.1 Introduction............................ 767 11.1.1 Measure of Association................. 768 11.1.2 Chapter Summary.................... 769 11.2 Chi

More information

7.1 The Hazard and Survival Functions

7.1 The Hazard and Survival Functions Chapter 7 Survival Models Our final chapter concerns models for the analysis of data which have three main characteristics: (1) the dependent variable or response is the waiting time until the occurrence

More information

STATISTICAL ANALYSIS OF SAFETY DATA IN LONG-TERM CLINICAL TRIALS

STATISTICAL ANALYSIS OF SAFETY DATA IN LONG-TERM CLINICAL TRIALS STATISTICAL ANALYSIS OF SAFETY DATA IN LONG-TERM CLINICAL TRIALS Tailiang Xie, Ping Zhao and Joel Waksman, Wyeth Consumer Healthcare Five Giralda Farms, Madison, NJ 794 KEY WORDS: Safety Data, Adverse

More information

A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution

A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 4: September

More information

A Basic Introduction to Missing Data

A Basic Introduction to Missing Data John Fox Sociology 740 Winter 2014 Outline Why Missing Data Arise Why Missing Data Arise Global or unit non-response. In a survey, certain respondents may be unreachable or may refuse to participate. Item

More information

Least Squares Estimation

Least Squares Estimation Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David

More information

Predicting Customer Default Times using Survival Analysis Methods in SAS

Predicting Customer Default Times using Survival Analysis Methods in SAS Predicting Customer Default Times using Survival Analysis Methods in SAS Bart Baesens Bart.Baesens@econ.kuleuven.ac.be Overview The credit scoring survival analysis problem Statistical methods for Survival

More information

Logistic Regression (1/24/13)

Logistic Regression (1/24/13) STA63/CBB540: Statistical methods in computational biology Logistic Regression (/24/3) Lecturer: Barbara Engelhardt Scribe: Dinesh Manandhar Introduction Logistic regression is model for regression used

More information

Module 3: Correlation and Covariance

Module 3: Correlation and Covariance Using Statistical Data to Make Decisions Module 3: Correlation and Covariance Tom Ilvento Dr. Mugdim Pašiƒ University of Delaware Sarajevo Graduate School of Business O ften our interest in data analysis

More information

Pricing of a worst of option using a Copula method M AXIME MALGRAT

Pricing of a worst of option using a Copula method M AXIME MALGRAT Pricing of a worst of option using a Copula method M AXIME MALGRAT Master of Science Thesis Stockholm, Sweden 2013 Pricing of a worst of option using a Copula method MAXIME MALGRAT Degree Project in Mathematical

More information

SUMAN DUVVURU STAT 567 PROJECT REPORT

SUMAN DUVVURU STAT 567 PROJECT REPORT SUMAN DUVVURU STAT 567 PROJECT REPORT SURVIVAL ANALYSIS OF HEROIN ADDICTS Background and introduction: Current illicit drug use among teens is continuing to increase in many countries around the world.

More information

Spatial Statistics Chapter 3 Basics of areal data and areal data modeling

Spatial Statistics Chapter 3 Basics of areal data and areal data modeling Spatial Statistics Chapter 3 Basics of areal data and areal data modeling Recall areal data also known as lattice data are data Y (s), s D where D is a discrete index set. This usually corresponds to data

More information

UNIVERSITY OF NAIROBI

UNIVERSITY OF NAIROBI UNIVERSITY OF NAIROBI MASTERS IN PROJECT PLANNING AND MANAGEMENT NAME: SARU CAROLYNN ELIZABETH REGISTRATION NO: L50/61646/2013 COURSE CODE: LDP 603 COURSE TITLE: RESEARCH METHODS LECTURER: GAKUU CHRISTOPHER

More information

QUANTITATIVE METHODS BIOLOGY FINAL HONOUR SCHOOL NON-PARAMETRIC TESTS

QUANTITATIVE METHODS BIOLOGY FINAL HONOUR SCHOOL NON-PARAMETRIC TESTS QUANTITATIVE METHODS BIOLOGY FINAL HONOUR SCHOOL NON-PARAMETRIC TESTS This booklet contains lecture notes for the nonparametric work in the QM course. This booklet may be online at http://users.ox.ac.uk/~grafen/qmnotes/index.html.

More information

Introduction to General and Generalized Linear Models

Introduction to General and Generalized Linear Models Introduction to General and Generalized Linear Models General Linear Models - part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby

More information

Multivariate Normal Distribution

Multivariate Normal Distribution Multivariate Normal Distribution Lecture 4 July 21, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Lecture #4-7/21/2011 Slide 1 of 41 Last Time Matrices and vectors Eigenvalues

More information

PS 271B: Quantitative Methods II. Lecture Notes

PS 271B: Quantitative Methods II. Lecture Notes PS 271B: Quantitative Methods II Lecture Notes Langche Zeng zeng@ucsd.edu The Empirical Research Process; Fundamental Methodological Issues 2 Theory; Data; Models/model selection; Estimation; Inference.

More information

Adequacy of Biomath. Models. Empirical Modeling Tools. Bayesian Modeling. Model Uncertainty / Selection

Adequacy of Biomath. Models. Empirical Modeling Tools. Bayesian Modeling. Model Uncertainty / Selection Directions in Statistical Methodology for Multivariable Predictive Modeling Frank E Harrell Jr University of Virginia Seattle WA 19May98 Overview of Modeling Process Model selection Regression shape Diagnostics

More information

Monte Carlo-based statistical methods (MASM11/FMS091)

Monte Carlo-based statistical methods (MASM11/FMS091) Monte Carlo-based statistical methods (MASM11/FMS091) Jimmy Olsson Centre for Mathematical Sciences Lund University, Sweden Lecture 5 Sequential Monte Carlo methods I February 5, 2013 J. Olsson Monte Carlo-based

More information

A SURVEY ON CONTINUOUS ELLIPTICAL VECTOR DISTRIBUTIONS

A SURVEY ON CONTINUOUS ELLIPTICAL VECTOR DISTRIBUTIONS A SURVEY ON CONTINUOUS ELLIPTICAL VECTOR DISTRIBUTIONS Eusebio GÓMEZ, Miguel A. GÓMEZ-VILLEGAS and J. Miguel MARÍN Abstract In this paper it is taken up a revision and characterization of the class of

More information

Maximum Likelihood Estimation

Maximum Likelihood Estimation Math 541: Statistical Theory II Lecturer: Songfeng Zheng Maximum Likelihood Estimation 1 Maximum Likelihood Estimation Maximum likelihood is a relatively simple method of constructing an estimator for

More information

SAMPLE SIZE TABLES FOR LOGISTIC REGRESSION

SAMPLE SIZE TABLES FOR LOGISTIC REGRESSION STATISTICS IN MEDICINE, VOL. 8, 795-802 (1989) SAMPLE SIZE TABLES FOR LOGISTIC REGRESSION F. Y. HSIEH* Department of Epidemiology and Social Medicine, Albert Einstein College of Medicine, Bronx, N Y 10461,

More information

Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur

Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur Module No. #01 Lecture No. #15 Special Distributions-VI Today, I am going to introduce

More information

Lecture 3: Linear methods for classification

Lecture 3: Linear methods for classification Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,

More information

Modeling and Analysis of Call Center Arrival Data: A Bayesian Approach

Modeling and Analysis of Call Center Arrival Data: A Bayesian Approach Modeling and Analysis of Call Center Arrival Data: A Bayesian Approach Refik Soyer * Department of Management Science The George Washington University M. Murat Tarimcilar Department of Management Science

More information

Regression Modeling Strategies

Regression Modeling Strategies Frank E. Harrell, Jr. Regression Modeling Strategies With Applications to Linear Models, Logistic Regression, and Survival Analysis With 141 Figures Springer Contents Preface Typographical Conventions

More information

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN Beckman HLM Reading Group: Questions, Answers and Examples Carolyn J. Anderson Department of Educational Psychology I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN Linear Algebra Slide 1 of

More information

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1) CORRELATION AND REGRESSION / 47 CHAPTER EIGHT CORRELATION AND REGRESSION Correlation and regression are statistical methods that are commonly used in the medical literature to compare two or more variables.

More information

Gamma Distribution Fitting

Gamma Distribution Fitting Chapter 552 Gamma Distribution Fitting Introduction This module fits the gamma probability distributions to a complete or censored set of individual or grouped data values. It outputs various statistics

More information

LOGISTIC REGRESSION. Nitin R Patel. where the dependent variable, y, is binary (for convenience we often code these values as

LOGISTIC REGRESSION. Nitin R Patel. where the dependent variable, y, is binary (for convenience we often code these values as LOGISTIC REGRESSION Nitin R Patel Logistic regression extends the ideas of multiple linear regression to the situation where the dependent variable, y, is binary (for convenience we often code these values

More information

Lecture 2 ESTIMATING THE SURVIVAL FUNCTION. One-sample nonparametric methods

Lecture 2 ESTIMATING THE SURVIVAL FUNCTION. One-sample nonparametric methods Lecture 2 ESTIMATING THE SURVIVAL FUNCTION One-sample nonparametric methods There are commonly three methods for estimating a survivorship function S(t) = P (T > t) without resorting to parametric models:

More information

Sections 2.11 and 5.8

Sections 2.11 and 5.8 Sections 211 and 58 Timothy Hanson Department of Statistics, University of South Carolina Stat 704: Data Analysis I 1/25 Gesell data Let X be the age in in months a child speaks his/her first word and

More information

Problem of Missing Data

Problem of Missing Data VASA Mission of VA Statisticians Association (VASA) Promote & disseminate statistical methodological research relevant to VA studies; Facilitate communication & collaboration among VA-affiliated statisticians;

More information

Multivariate Statistical Inference and Applications

Multivariate Statistical Inference and Applications Multivariate Statistical Inference and Applications ALVIN C. RENCHER Department of Statistics Brigham Young University A Wiley-Interscience Publication JOHN WILEY & SONS, INC. New York Chichester Weinheim

More information

From the help desk: Bootstrapped standard errors

From the help desk: Bootstrapped standard errors The Stata Journal (2003) 3, Number 1, pp. 71 80 From the help desk: Bootstrapped standard errors Weihua Guan Stata Corporation Abstract. Bootstrapping is a nonparametric approach for evaluating the distribution

More information

Poisson Models for Count Data

Poisson Models for Count Data Chapter 4 Poisson Models for Count Data In this chapter we study log-linear models for count data under the assumption of a Poisson error structure. These models have many applications, not only to the

More information

Semiparametric Multinomial Logit Models for the Analysis of Brand Choice Behaviour

Semiparametric Multinomial Logit Models for the Analysis of Brand Choice Behaviour Semiparametric Multinomial Logit Models for the Analysis of Brand Choice Behaviour Thomas Kneib Department of Statistics Ludwig-Maximilians-University Munich joint work with Bernhard Baumgartner & Winfried

More information

Chapter 1 Introduction. 1.1 Introduction

Chapter 1 Introduction. 1.1 Introduction Chapter 1 Introduction 1.1 Introduction 1 1.2 What Is a Monte Carlo Study? 2 1.2.1 Simulating the Rolling of Two Dice 2 1.3 Why Is Monte Carlo Simulation Often Necessary? 4 1.4 What Are Some Typical Situations

More information

Analyzing Structural Equation Models With Missing Data

Analyzing Structural Equation Models With Missing Data Analyzing Structural Equation Models With Missing Data Craig Enders* Arizona State University cenders@asu.edu based on Enders, C. K. (006). Analyzing structural equation models with missing data. In G.

More information

CHAPTER 12 EXAMPLES: MONTE CARLO SIMULATION STUDIES

CHAPTER 12 EXAMPLES: MONTE CARLO SIMULATION STUDIES Examples: Monte Carlo Simulation Studies CHAPTER 12 EXAMPLES: MONTE CARLO SIMULATION STUDIES Monte Carlo simulation studies are often used for methodological investigations of the performance of statistical

More information

Factor analysis. Angela Montanari

Factor analysis. Angela Montanari Factor analysis Angela Montanari 1 Introduction Factor analysis is a statistical model that allows to explain the correlations between a large number of observed correlated variables through a small number

More information

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics For 2015 Examinations Aim The aim of the Probability and Mathematical Statistics subject is to provide a grounding in

More information

Probability Calculator

Probability Calculator Chapter 95 Introduction Most statisticians have a set of probability tables that they refer to in doing their statistical wor. This procedure provides you with a set of electronic statistical tables that

More information

Goodness of fit assessment of item response theory models

Goodness of fit assessment of item response theory models Goodness of fit assessment of item response theory models Alberto Maydeu Olivares University of Barcelona Madrid November 1, 014 Outline Introduction Overall goodness of fit testing Two examples Assessing

More information

Imputing Missing Data using SAS

Imputing Missing Data using SAS ABSTRACT Paper 3295-2015 Imputing Missing Data using SAS Christopher Yim, California Polytechnic State University, San Luis Obispo Missing data is an unfortunate reality of statistics. However, there are

More information

Statistical Models in R

Statistical Models in R Statistical Models in R Some Examples Steven Buechler Department of Mathematics 276B Hurley Hall; 1-6233 Fall, 2007 Outline Statistical Models Structure of models in R Model Assessment (Part IA) Anova

More information

Tail-Dependence an Essential Factor for Correctly Measuring the Benefits of Diversification

Tail-Dependence an Essential Factor for Correctly Measuring the Benefits of Diversification Tail-Dependence an Essential Factor for Correctly Measuring the Benefits of Diversification Presented by Work done with Roland Bürgi and Roger Iles New Views on Extreme Events: Coupled Networks, Dragon

More information

What s New in Econometrics? Lecture 8 Cluster and Stratified Sampling

What s New in Econometrics? Lecture 8 Cluster and Stratified Sampling What s New in Econometrics? Lecture 8 Cluster and Stratified Sampling Jeff Wooldridge NBER Summer Institute, 2007 1. The Linear Model with Cluster Effects 2. Estimation with a Small Number of Groups and

More information

CORRELATIONAL ANALYSIS: PEARSON S r Purpose of correlational analysis The purpose of performing a correlational analysis: To discover whether there

CORRELATIONAL ANALYSIS: PEARSON S r Purpose of correlational analysis The purpose of performing a correlational analysis: To discover whether there CORRELATIONAL ANALYSIS: PEARSON S r Purpose of correlational analysis The purpose of performing a correlational analysis: To discover whether there is a relationship between variables, To find out the

More information

Regression III: Advanced Methods

Regression III: Advanced Methods Lecture 16: Generalized Additive Models Regression III: Advanced Methods Bill Jacoby Michigan State University http://polisci.msu.edu/jacoby/icpsr/regress3 Goals of the Lecture Introduce Additive Models

More information

ECON 142 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE #2

ECON 142 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE #2 University of California, Berkeley Prof. Ken Chay Department of Economics Fall Semester, 005 ECON 14 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE # Question 1: a. Below are the scatter plots of hourly wages

More information

Chapter G08 Nonparametric Statistics

Chapter G08 Nonparametric Statistics G08 Nonparametric Statistics Chapter G08 Nonparametric Statistics Contents 1 Scope of the Chapter 2 2 Background to the Problems 2 2.1 Parametric and Nonparametric Hypothesis Testing......................

More information

Statistical tests for SPSS

Statistical tests for SPSS Statistical tests for SPSS Paolo Coletti A.Y. 2010/11 Free University of Bolzano Bozen Premise This book is a very quick, rough and fast description of statistical tests and their usage. It is explicitly

More information

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus Tihomir Asparouhov and Bengt Muthén Mplus Web Notes: No. 15 Version 8, August 5, 2014 1 Abstract This paper discusses alternatives

More information

Portfolio Distribution Modelling and Computation. Harry Zheng Department of Mathematics Imperial College h.zheng@imperial.ac.uk

Portfolio Distribution Modelling and Computation. Harry Zheng Department of Mathematics Imperial College h.zheng@imperial.ac.uk Portfolio Distribution Modelling and Computation Harry Zheng Department of Mathematics Imperial College h.zheng@imperial.ac.uk Workshop on Fast Financial Algorithms Tanaka Business School Imperial College

More information

More details on the inputs, functionality, and output can be found below.

More details on the inputs, functionality, and output can be found below. Overview: The SMEEACT (Software for More Efficient, Ethical, and Affordable Clinical Trials) web interface (http://research.mdacc.tmc.edu/smeeactweb) implements a single analysis of a two-armed trial comparing

More information

How To Understand The Theory Of Probability

How To Understand The Theory Of Probability Graduate Programs in Statistics Course Titles STAT 100 CALCULUS AND MATR IX ALGEBRA FOR STATISTICS. Differential and integral calculus; infinite series; matrix algebra STAT 195 INTRODUCTION TO MATHEMATICAL

More information

1 Prior Probability and Posterior Probability

1 Prior Probability and Posterior Probability Math 541: Statistical Theory II Bayesian Approach to Parameter Estimation Lecturer: Songfeng Zheng 1 Prior Probability and Posterior Probability Consider now a problem of statistical inference in which

More information

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm Mgt 540 Research Methods Data Analysis 1 Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm http://web.utk.edu/~dap/random/order/start.htm

More information

Statistical Rules of Thumb

Statistical Rules of Thumb Statistical Rules of Thumb Second Edition Gerald van Belle University of Washington Department of Biostatistics and Department of Environmental and Occupational Health Sciences Seattle, WA WILEY AJOHN

More information

SENSITIVITY ANALYSIS AND INFERENCE. Lecture 12

SENSITIVITY ANALYSIS AND INFERENCE. Lecture 12 This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this

More information

Lecture 15 Introduction to Survival Analysis

Lecture 15 Introduction to Survival Analysis Lecture 15 Introduction to Survival Analysis BIOST 515 February 26, 2004 BIOST 515, Lecture 15 Background In logistic regression, we were interested in studying how risk factors were associated with presence

More information

Principle of Data Reduction

Principle of Data Reduction Chapter 6 Principle of Data Reduction 6.1 Introduction An experimenter uses the information in a sample X 1,..., X n to make inferences about an unknown parameter θ. If the sample size n is large, then

More information

Confidence Intervals for Exponential Reliability

Confidence Intervals for Exponential Reliability Chapter 408 Confidence Intervals for Exponential Reliability Introduction This routine calculates the number of events needed to obtain a specified width of a confidence interval for the reliability (proportion

More information

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not. Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C

More information

Application of a Psychometric Rating Model to

Application of a Psychometric Rating Model to Application of a Psychometric Rating Model to Ordered Categories Which Are Scored with Successive Integers David Andrich The University of Western Australia A latent trait measurement model in which ordered

More information

Study Guide for the Final Exam

Study Guide for the Final Exam Study Guide for the Final Exam When studying, remember that the computational portion of the exam will only involve new material (covered after the second midterm), that material from Exam 1 will make

More information

The VAR models discussed so fare are appropriate for modeling I(0) data, like asset returns or growth rates of macroeconomic time series.

The VAR models discussed so fare are appropriate for modeling I(0) data, like asset returns or growth rates of macroeconomic time series. Cointegration The VAR models discussed so fare are appropriate for modeling I(0) data, like asset returns or growth rates of macroeconomic time series. Economic theory, however, often implies equilibrium

More information

Predict the Popularity of YouTube Videos Using Early View Data

Predict the Popularity of YouTube Videos Using Early View Data 000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

More information

MEU. INSTITUTE OF HEALTH SCIENCES COURSE SYLLABUS. Biostatistics

MEU. INSTITUTE OF HEALTH SCIENCES COURSE SYLLABUS. Biostatistics MEU. INSTITUTE OF HEALTH SCIENCES COURSE SYLLABUS title- course code: Program name: Contingency Tables and Log Linear Models Level Biostatistics Hours/week Ther. Recite. Lab. Others Total Master of Sci.

More information

An Internal Model for Operational Risk Computation

An Internal Model for Operational Risk Computation An Internal Model for Operational Risk Computation Seminarios de Matemática Financiera Instituto MEFF-RiskLab, Madrid http://www.risklab-madrid.uam.es/ Nicolas Baud, Antoine Frachot & Thierry Roncalli

More information

Modelling spousal mortality dependence: evidence of heterogeneities and implications

Modelling spousal mortality dependence: evidence of heterogeneities and implications 1/23 Modelling spousal mortality dependence: evidence of heterogeneities and implications Yang Lu Scor and Aix-Marseille School of Economics Lyon, September 2015 2/23 INTRODUCTION 3/23 Motivation It has

More information

Introduction to time series analysis

Introduction to time series analysis Introduction to time series analysis Margherita Gerolimetto November 3, 2010 1 What is a time series? A time series is a collection of observations ordered following a parameter that for us is time. Examples

More information

Basics of Statistical Machine Learning

Basics of Statistical Machine Learning CS761 Spring 2013 Advanced Machine Learning Basics of Statistical Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu Modern machine learning is rooted in statistics. You will find many familiar

More information

CONDITIONAL, PARTIAL AND RANK CORRELATION FOR THE ELLIPTICAL COPULA; DEPENDENCE MODELLING IN UNCERTAINTY ANALYSIS

CONDITIONAL, PARTIAL AND RANK CORRELATION FOR THE ELLIPTICAL COPULA; DEPENDENCE MODELLING IN UNCERTAINTY ANALYSIS CONDITIONAL, PARTIAL AND RANK CORRELATION FOR THE ELLIPTICAL COPULA; DEPENDENCE MODELLING IN UNCERTAINTY ANALYSIS D. Kurowicka, R.M. Cooke Delft University of Technology, Mekelweg 4, 68CD Delft, Netherlands

More information

PROPERTIES OF THE SAMPLE CORRELATION OF THE BIVARIATE LOGNORMAL DISTRIBUTION

PROPERTIES OF THE SAMPLE CORRELATION OF THE BIVARIATE LOGNORMAL DISTRIBUTION PROPERTIES OF THE SAMPLE CORRELATION OF THE BIVARIATE LOGNORMAL DISTRIBUTION Chin-Diew Lai, Department of Statistics, Massey University, New Zealand John C W Rayner, School of Mathematics and Applied Statistics,

More information

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses. DESCRIPTIVE STATISTICS The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses. DESCRIPTIVE VS. INFERENTIAL STATISTICS Descriptive To organize,

More information

Linear Threshold Units

Linear Threshold Units Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear

More information

SPSS TRAINING SESSION 3 ADVANCED TOPICS (PASW STATISTICS 17.0) Sun Li Centre for Academic Computing lsun@smu.edu.sg

SPSS TRAINING SESSION 3 ADVANCED TOPICS (PASW STATISTICS 17.0) Sun Li Centre for Academic Computing lsun@smu.edu.sg SPSS TRAINING SESSION 3 ADVANCED TOPICS (PASW STATISTICS 17.0) Sun Li Centre for Academic Computing lsun@smu.edu.sg IN SPSS SESSION 2, WE HAVE LEARNT: Elementary Data Analysis Group Comparison & One-way

More information

Parametric Survival Models

Parametric Survival Models Parametric Survival Models Germán Rodríguez grodri@princeton.edu Spring, 2001; revised Spring 2005, Summer 2010 We consider briefly the analysis of survival data when one is willing to assume a parametric

More information

Regression III: Advanced Methods

Regression III: Advanced Methods Lecture 4: Transformations Regression III: Advanced Methods William G. Jacoby Michigan State University Goals of the lecture The Ladder of Roots and Powers Changing the shape of distributions Transforming

More information

Sample Size and Power in Clinical Trials

Sample Size and Power in Clinical Trials Sample Size and Power in Clinical Trials Version 1.0 May 011 1. Power of a Test. Factors affecting Power 3. Required Sample Size RELATED ISSUES 1. Effect Size. Test Statistics 3. Variation 4. Significance

More information

Generating Random Numbers Variance Reduction Quasi-Monte Carlo. Simulation Methods. Leonid Kogan. MIT, Sloan. 15.450, Fall 2010

Generating Random Numbers Variance Reduction Quasi-Monte Carlo. Simulation Methods. Leonid Kogan. MIT, Sloan. 15.450, Fall 2010 Simulation Methods Leonid Kogan MIT, Sloan 15.450, Fall 2010 c Leonid Kogan ( MIT, Sloan ) Simulation Methods 15.450, Fall 2010 1 / 35 Outline 1 Generating Random Numbers 2 Variance Reduction 3 Quasi-Monte

More information

Marshall-Olkin distributions and portfolio credit risk

Marshall-Olkin distributions and portfolio credit risk Marshall-Olkin distributions and portfolio credit risk Moderne Finanzmathematik und ihre Anwendungen für Banken und Versicherungen, Fraunhofer ITWM, Kaiserslautern, in Kooperation mit der TU München und

More information

Department of Mathematics, Indian Institute of Technology, Kharagpur Assignment 2-3, Probability and Statistics, March 2015. Due:-March 25, 2015.

Department of Mathematics, Indian Institute of Technology, Kharagpur Assignment 2-3, Probability and Statistics, March 2015. Due:-March 25, 2015. Department of Mathematics, Indian Institute of Technology, Kharagpur Assignment -3, Probability and Statistics, March 05. Due:-March 5, 05.. Show that the function 0 for x < x+ F (x) = 4 for x < for x

More information

Penalized regression: Introduction

Penalized regression: Introduction Penalized regression: Introduction Patrick Breheny August 30 Patrick Breheny BST 764: Applied Statistical Modeling 1/19 Maximum likelihood Much of 20th-century statistics dealt with maximum likelihood

More information