# BIAS EVALUATION IN THE PROPORTIONAL HAZARDS MODEL

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 BIAS EVALUATION IN THE PROPORTIONAL HAZARDS MODEL E. A. COLOSIMO, A. F. SILVA, and F. R. B. CRUZ Departamento de Estatística, Universidade Federal de Minas Gerais, Belo Horizonte - MG, Brazil We consider two approaches for bias evaluation and reduction in the proportional hazards model proposed by Cox. The first one is an analytical approach in which we derive the n 1 bias term of the maximum partial likelihood estimator. The second approach consists of resampling methods, namely the jackknife and the bootstrap. We compare all methods through a comprehensive set of Monte Carlo simulations. The results suggest that bias-corrected estimators have better finitesample performance than the standard maximum partial likelihood estimator. There is some evidence of the bootstrap-correction superiority over the jackknifecorrection but its performance is similar to the analytical estimator. Finally an application illustrates the proposed approaches. Keywords: Bias correction; bootstrap; jackknife; Weibull regression model. 1

2 1 Introduction In the past years, special attention has been given to the proportional hazards model (PHM) proposed by Cox (1972). This model provides a flexible method for exploring the association of covariates with failure rates and for studying the effect of a covariate of interest, such as treatment, while adjusting for other covariates. It also allows for time-dependent covariates. The applications, in many instances, in which this model is used, have small sample sizes. For example, in a Phase II clinical trial with 20 patients and approximately 20% censoring, the effective sample size is about 16 patients. Estimation of the coefficients of the model is based on the partial likelihood (Cox, 1975). Such estimates have biases that are typically of order n 1 in large samples, where n is the sample size. In small or moderate-sized samples such as the situation above, these biases can be large. It is then helpful to have rough estimates of their size and simple formulae for bias correction. There has been considerable interest in recent years in bias evaluation and correction for the maximum likelihood estimates. In fact, the basic methodology for calculating the n 1 biases of the maximum likelihood estimates has been applied to nonlinear regression models with normal errors (Box, 1971; Cook et al., 1986), binary response models (Sowden, 1971), logistic discrimination problems (McLachlan, 1980), generalized linear models (Cordeiro and McCullagh, 1991), generalized log-gamma regression models (Young and Bakir, 1987), nonlinear exponential family regression models (Paula, 1992), multiplicative heteroscedastic regression models (Cordeiro, 2

3 1993). However, we could not find bias correction results for the maximum likelihood estimates related to the PHM. The purpose of this paper is to present two approaches, analytical and resampling, for bias evaluation and reduction in the PHM. The paper is outlined as follows. In Section 2, we present the PHM and the maximum partial likelihood estimator. The bias of order n 1 of this estimator is derived in Section 3 taking in details the one-parameter special case. In Section 4, we describe briefly the resampling techniques used, bootstrap and jackknife, moving to the simulation study of Section 5. An application illustrates the proposed approaches in Section 6. 2 The PHM The most popular form of the proportional hazards model, for covariates not dependent on time, uses the exponential form for the relative hazard, so that the hazard function is given by λ(t) = λ 0 (t) exp(z β), (1) where λ 0 (t), the baseline hazard function, is an unknown non-negative function of time, Z is a row vector of covariates and β is a p-vector of parameters to be estimated. Estimation of β is based on the partial log-likelihood which in the absence of ties is written for the model (1) as n l(β) = δ i (Z iβ) log exp(z jβ), (2) i=1 j R i 3

4 where R i = {k t k t i } is the risk set at time t i, Z i = (z i1,...,z ip ) is an observed value of Z, and δ i is the failure indicator. Estimates of β are obtained by maximizing (2), that is called maximum partial likelihood estimate (MPLE), which is equivalent to solving the equations defined by the score vector n δ i [z ik A ik (β)] = 0, k = 1,...p, (3) i=1 [z jk exp(z where A ik (β) = i j. exp(z j Ri j β) The kl element of the observed information matrix is given by where B ikl (β) = 2 l(β) β k β l = j R i [z jk z jl exp(z j β)] j Ri exp(z j β). n δ i [B ikl (β) A ik (β)a il (β)], (4) i=1 The derivatives of third order of the partial log-likelihood are necessary to obtain the term of order n 1 of the bias. The klm element of this term is given by 3 l(β) β k β l β m = n i=1 δ i [C iklm (β) A ik (β)b ilm (β) A il (β)b ikm (β) where C iklm (β) = A im (β)b ikl (β) + 2A ik (β)a il (β)a im (β)], (5) j R i [z jk z jl z jm exp(z j β)] j Ri exp(z j β). 3 Bias of order n 1 We denote the partial log-likelihood function (2) by l. We shall use the following tensor notation for mixed cumulants of the log-likelihood derivatives: κ rs = E ( ) 2 l β r β s, κrst = E ( ) 3 l β r β s β t, κr,s = E ( ) l l β s, κ (t) rs = κrs β t, 4 β r

5 and so on. The tensor notation has the advantage of being a unified notation that includes both moments and cumulants as special cases (Lawley, 1956). All κ s refer to a total over the sample and are, in general, of order n. Note that the Fisher information matrix has elements κ r,s = κ rs and let κ r,s = κ rs denote the corresponding elements of its inverse. The mixed cumulants satisfy certain equations, which facilitate their calculations, such as κ rst = κ (t) rs κ rs,t. Let B( β) be the n 1 bias of β. From the general expression for the multiparameter n 1 biases of the maximum likelihood estimator given by Cox and Snell (1968), we can write B( β) = ( κ ar κ st κ (t) rs 1 ) 2 κ rst, (6) where the summations with respect to r, s, and t are from 1 to p. A detailed discussion of this expression can be found in McCullagh (1987) and Cordeiro and McCullagh (1991). In order to get the term of order n 1 of the bias in expression (6), we have to obtain some mixed cumulants of the partial log-likelihood. It means, take expectations of the derivative elements presented in Section 2. Calculation of unconditional expectations would require a fuller specification of the censoring mechanism. This information is not generally available. However, these expectations can be taken conditional on the entire history of failures and censorings up to each time t of failure. This is the way used to build the partial likelihood and allows a direct verification that the terms of l do have some of the desirable properties of the increments of the log-likelihood function. 5

6 In this way the observed and expected values of the derivatives of l taken over a single risk set are identical (Cox and Oakes, 1984). In the special case of one parameter PHM, expression (6) for the bias to order n 1 simplifies to B( β) = 1 2κ 2 ββ ( κβββ 2κ (β) ββ ) κ βββ =, (7) 2κ 2 ββ where κ ββ = n i=1 δ i [B ikk (β) A 2 ik(β)] and κ βββ = n i=1 δ i [C ikkk (β) 3A ik (β)b ikk (β) + 2A 3 ik(β)]. We can evaluate (7) at β = β and define a corrected estimator by β C = β B( β). (8) 4 Resampling Methods In the resampling context, two frequently used methods are: the jackknife (Quenouille, 1949, 1956) and the bootstrap (Efron, 1979; Efron and Tibshirani, 1993). The jackknife procedure may be described as follows: let β be an unknown parameter and T 1,T 2,...,T n a sample of n i.i.d. observations with joint distribution function F β which depends on β. Suppose that a reasonably good estimation method (but biased) is used. Indicate by ˆβ (i), i = 1,...,n, the estimate of β obtained by removing the i-th observation, that is, ˆβ (i) = ˆβ(T 1,...,T i 1,T i+1,...,t n ). Let ˆβ be an estimate of β based on all n observations. Define the new estimate as β i = nˆβ (n 1)ˆβ (i) = ˆβ (n 1)(ˆβ (i) ˆβ), i = 1,...n. The bias-corrected jackknife estimate of β is then the average of the β i, 6

7 i = 1,...,n, that is, β J = nˆβ (n 1)ˆβ (.) = ˆβ (n 1)( ˆ β (.) ˆβ), (9) where (n 1)(ˆβ (.) ˆβ) is the jackknife estimate of bias and ˆβ (.) = n i=1 ˆβ(i) /n. The jackknife estimate β J eliminates the term of order n 1 of the bias. The (nonparametric) bootstrap procedure may be described as follows: let the parameter of interest be written as the functional β = t(f) of the distribution function F and ˆβ = t( ˆF) be its plug-in estimate, where ˆF is the empirical distribution function of the data t = (t 1,...,t n ). The bias of ˆβ is defined as bias F = E F (ˆβ) β = E F (ˆβ) t(f). Draw a bootstrap sample t = (t 1,...,t n) from the empirical distribution function ˆF. A bootstrap sample t 1,...,t n is defined as a random sample of size n drawn with replacement from the original data (t 1,...,t n ). The bootstrap estimate of the bias of ˆβ is then defined as biasˆf = EˆF(ˆβ ) t(ˆf), where E ˆF(ˆβ ) is the expectation of ˆβ based on the empirical distribution function of the bootstrap sample and t( ˆF) is the plug-in estimate of β. The bootstrap estimate of the bias may be approximated by a Monte Carlo simulation procedure. Choose B independent bootstrap samples t 1,t 2,...,t B from the empirical distribution ˆF. Evaluate the bootstrap replications ˆβ b, b = 1,...,B, and approximate the expectation E ˆF(ˆβ ) by ˆβ (.) = B b=1 ˆβ b /B. The bootstrap estimate of the bias of ˆβ, of order n 1, based on the B repli- 7

8 cations is then given by B bias B = (ˆβ b /B ) ˆβ. b=1 Thus the (nonparametric) bootstrap bias-corrected estimate of β is β B = 2ˆβ ˆβ (.). (10) We remark that there is a wrong tendency to view ˆβ (.) as the bootstrap bias-corrected estimate (see Efron and Tibshirani, 1993, p. 138). In our situation, right-censored data is of the form {(t 1,δ 1 ),...,(t n,δ n )} following the notation established in Section 2. The observed pairs (t i,δ i ) are iid observations from a distribution F on R {0, 1} and the plug-in estimate is β = t( F), where F in this case, is the Kaplan-Meier estimate (Kaplan-Meier, 1958). Bootstrap bias-correct estimate β B is the same as that obtained in Equation (10), except that the individual data points are now the pairs (t i,δ i ) (Efron, 1981). 5 Simulation Study In this section we performed Monte Carlo simulations comparing the performance of the usual MPLE and its corrected versions. For each experiment, we computed the following estimates: (i) the MPLE, (ii) the corrected estimate β C, given by (8), (iii) the jackknife estimate β J, given by (9), and (iv) the (nonparametric) bootstrap estimates β B, given by (10). The simulation study was based on a Weibull regression model. The log-likelihood function (2) assumes no ties. Breslow s (1974) approximation for the log-likelihood function was used to handle ties in bootstrap samples. 8

9 Two independent sets of independent random variables T = (T 1,...,T n ) and U = (U 1,...,U n ) were generated for each repetition and the lifetime min(t i,u i ) and δ i were recorded. T i is a vector of realizations of a oneparameter Weibull[ρ, exp(z i β)] and U i, corresponding to the random censoring mechanism, is U(0, θ). The covariate z was generated once as a standard normal and it was maintained the same in all repetitions. The parameter β was set equal to 1.0 and 10,000 replications were run for each simulation. The bootstrap estimates were based in samples drawn with replacement from a censored sample based in a Weibull model {(x 1,d 1 ), (x 2,d 2 ),..., (x n,d n }, where d i = { 1, if xi is uncensored 0, if x i is censored. The bootstrap estimates were computed using N = 200 bootstrap replications. Larger values than 200 have been tried in others simulations (not shown) but the results are essentially the same. The simulations were performed for several combinations varying the sample sizes, n = 10, 20, 30, the proportion of censoring in the sample, F = 0%, 30%, 60%, and the Weibull shape parameter ρ = 0.2, 0.5, 1.0, 2.0. The proportion of censoring, P(U i < T i ), was obtained by controlling the value of the parameter θ. Table I displays the simulations sample means and the root of the mean square error (RMSE) for all four estimators. As expected, the bias of the MPLE increases when the sample size n decreases or when the proportion of censoring F increases. In general, the bias increases as the shape parameter of the Weibull distribution increases. It can be observed that the bias is really large for F = 60%, 30% and n = 10. 9

10 From Table I, it seems that there is a substantial bias reduction using the corrected estimator β C when compared with the standard MPLE. A similar reduction happens with the mean square error and that is an indication of no variance inflation when using the corrected estimator β C. Regarding the resampling methods, in most of the cases tested, they had a better performance over the standard MPLE. Excepting those cases with very high censoring proportion (F = 60%) and very low sample sizes (n = 10), the bias reduction and the RMSE reduction were noticeable. These conclusions are consistent with the recent results reported by Ferrari and Silva (1997), in which simulation studies demonstrated that jackknife and bootstrap methods for bias correction may not work properly with very low sample sizes. The bootstrap bias corrected estimator β B is better than the jackknife β J in terms of bias reduction and it has the smallest RMSE. In general, it seems that β B and β C have a similar bias reduction performance. A referee questioned whether the nature of the random censoring in this simululation study may lead to some blurring because of variation in the actual degree of censoring in the simulated samples. Another set of Monte Carlo simulations were performed for a fixed number of censoring observations (type II censoring) under the same conditions as in Table I. The results obtained (not shown) are very similar to those presented in Table I. 6 Illustrative Example Feigl and Zelen (1965) presented a data set of 17 patients who died of acute myelogenous leukemia. These patients formed a group identified by the pres- 10

11 Table I: Sample Means and Mean Square Error Root ρ F n β RMSE βc RMSE βj RMSE βb RMSE

12 Table II: Point and 95% Confidence Estimates for the Example Data β βc βj βb Estimate S.E CI (-2.36,-0.45) (-2,36,-0.45) (-4.88,19.5) (-2.59,1.23) ence of a morphologic characteristic of white cells. The survival time response t is time to death measured in weeks from diagnosis and a covariate z is log 10 of initial white blood cell count. There was not censoring observations. The association between t and z is the main aspect of interest. Table II displays the estimates for the parameter β associated with covariate z and their respective 95% confidence intervals (CI). Jackknife and bootstrap confidence intervals are built in terms of their empirical percentiles. MPLE β is close to the corrected estimate β C but it is not close to resampling bias corrected estimates. According to the simulation results obtained in Section 5, β B and β C are in general the less biased estimates. It seems to be in agreement to the analysis performed by Cox and Snell (1981) using an exponential regression model. They obtained an estimate of for β. On the other hand, the confidence interval based on the bootstrap has length wider than those based on β and β C. The main reason might be the asymmetric distributions of the survival times. It can also be observed the disagreement between these estimates in judging the importance of the covariate to explain the response in a significance level of Jackknife confidence interval is not appropriate since it is based on just 17 resamples. 12

13 7 Final Remarks The main purpose of this paper is to present analytical and resampling methods for bias evaluation and reduction in the PHM proposed by Cox (1972), a model that has been useful in a considerable number of practical applications. Conducted for the special one parameter PHM, our simulation results suggest that bias-corrected estimates have better performance than the standard maximum partial likelihood estimates. Although computationally intensive, resampling methods may be an attractive alternative for bias reduction, avoiding the sophisticated mathematics commonly present in analytical methods. In particular, we show some evidence that the bootstrap is superior than the jackknife-corrected estimate but its performance is similar to the analytical estimator. Acknowledgment The authors wish to thank an associate editor (Prof. John Hinde), a referee and Prof. Glaura Franco for their valuable comments, and constructive suggestions on an earlier version of the paper. The work of Enrico Antônio Colosimo, Aldy Fernandes da Silva and Frederico Rodrigues Borges da Cruz is partially founded by the Brazilian agencies Conselho Nacional de Desenvolvimento Científico e Tecnológico, CNPq, and Fundação de Amparo à Pesquisa do Estado de Minas Gerais, FAPEMIG. 13

14 References Box, M. J. (1971). Bias in nonlinear estimation (with discussion). Journal of the Royal Statistical Society B, 33, Breslow, N. E. (1974). Covariance analysis of censored survival data. Biometrics, 30, Cook, R. D.; Tsai, C.-L.; Wei, B. C. (1986). Bias in nonlinear regression. Biometrika, 74, Cordeiro, G. M. (1993). Bartlett corrections and bias correction for two heteroscedastic regression models. Communications in Statistics, Theory and Methods, 22, Cordeiro, G. M.; McCullagh, P. (1991). Bias correction in generalized linear models. Journal of the Royal Statistical Society B, 53, Cox, D. R. (1972). Regression models and life tables (with discussion). Journal of the Royal Statistical Society B, 34, Cox, D. R. (1975). Partial likelihood. Biometrika, 62, Cox, D. R.; Oakes, D. (1984). Analysis of Survival Data. Chapman and Hall, London. Cox, D. R.; Snell, E. J. (1968). A general definition of residuals (with discussion). Journal of the Royal Statistical Society B, 30, Cox, D. R.; Snell, E. J. (1981). Applied Statistics: Principles and Examples. Chapman and Hall, London. 14

15 Efron, B. (1979). Bootstrap methods: another look at the jackknife. Annals of Statistics, 7, Efron, B. (1981). Censored data and the bootstrap. Journal of the American Statistical Association, 76, Efron, B.; Tibshirani, R. (1993). An Introduction to the Bootstrap. Chapman and Hall, London. Feigl, P.; Zelen, M. (1965). Estimation of exponential survival probabilities with concomitant information. Biometrics, 21, Ferrari, S. L. P.; Silva, A. F. (1997). Analytical and resampling-based bias corrections for one-parameter models. Technical Report RT-MAE 9732, Instituto de Matemática e Estatística da Universidade de São Paulo, Brazil. Kaplan, E. L.; Meier, P. (1958). Nonparametric estimation from incomplete observations. Journal of the American Statistical Association, 53, Lawley, D. N. (1956). A generalized method for approximating to the distribution of the likelihood ratio criteria. Biometrika, 43, McCullagh, P. (1987). Tensor Methods in Statistics, Chapman and Hall, London. McLachlan, G. J. (1980). A note on bias correction in maximum likelihood estimation with logistic discrimination. Technometrics, 22,

16 Paula, G. A. (1992). Bias correction for exponential family nonlinear models. Journal of Statistical Computations and Simulation, 40, Sowden, R. R. (1971). Bias and accuracy of parameter estimates in a quantal response model. Biometrika, 58, Quenouille, M. H. (1949). Approximate tests of correlation in time-series. Journal of the Royal Statistical Society B, 25, Quenouille, M. H. (1956). Note in bias in estimation. Biometrika, 43, Young, D. H.; Bakir, S. T. (1987). Bias correction for a generalized loggamma regression model. Technometrics, 29,

### Comparison of resampling method applied to censored data

International Journal of Advanced Statistics and Probability, 2 (2) (2014) 48-55 c Science Publishing Corporation www.sciencepubco.com/index.php/ijasp doi: 10.14419/ijasp.v2i2.2291 Research Paper Comparison

### Survival Analysis Using SPSS. By Hui Bian Office for Faculty Excellence

Survival Analysis Using SPSS By Hui Bian Office for Faculty Excellence Survival analysis What is survival analysis Event history analysis Time series analysis When use survival analysis Research interest

### Parametric survival models

Parametric survival models ST3242: Introduction to Survival Analysis Alex Cook October 2008 ST3242 : Parametric survival models 1/17 Last time in survival analysis Reintroduced parametric models Distinguished

### Statistics in Retail Finance. Chapter 6: Behavioural models

Statistics in Retail Finance 1 Overview > So far we have focussed mainly on application scorecards. In this chapter we shall look at behavioural models. We shall cover the following topics:- Behavioural

### Survival Analysis of Left Truncated Income Protection Insurance Data. [March 29, 2012]

Survival Analysis of Left Truncated Income Protection Insurance Data [March 29, 2012] 1 Qing Liu 2 David Pitt 3 Yan Wang 4 Xueyuan Wu Abstract One of the main characteristics of Income Protection Insurance

### SAS Software to Fit the Generalized Linear Model

SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling

### TESTING THE ONE-PART FRACTIONAL RESPONSE MODEL AGAINST AN ALTERNATIVE TWO-PART MODEL

TESTING THE ONE-PART FRACTIONAL RESPONSE MODEL AGAINST AN ALTERNATIVE TWO-PART MODEL HARALD OBERHOFER AND MICHAEL PFAFFERMAYR WORKING PAPER NO. 2011-01 Testing the One-Part Fractional Response Model against

### Applied Statistics. J. Blanchet and J. Wadsworth. Institute of Mathematics, Analysis, and Applications EPF Lausanne

Applied Statistics J. Blanchet and J. Wadsworth Institute of Mathematics, Analysis, and Applications EPF Lausanne An MSc Course for Applied Mathematicians, Fall 2012 Outline 1 Model Comparison 2 Model

### Lecture 4 PARAMETRIC SURVIVAL MODELS

Lecture 4 PARAMETRIC SURVIVAL MODELS Some Parametric Survival Distributions (defined on t 0): The Exponential distribution (1 parameter) f(t) = λe λt (λ > 0) S(t) = t = e λt f(u)du λ(t) = f(t) S(t) = λ

: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

Statistics Graduate Courses STAT 7002--Topics in Statistics-Biological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.

### Parametric Models. dh(t) dt > 0 (1)

Parametric Models: The Intuition Parametric Models As we saw early, a central component of duration analysis is the hazard rate. The hazard rate is the probability of experiencing an event at time t i

### Interpretation of Somers D under four simple models

Interpretation of Somers D under four simple models Roger B. Newson 03 September, 04 Introduction Somers D is an ordinal measure of association introduced by Somers (96)[9]. It can be defined in terms

### From the help desk: Bootstrapped standard errors

The Stata Journal (2003) 3, Number 1, pp. 71 80 From the help desk: Bootstrapped standard errors Weihua Guan Stata Corporation Abstract. Bootstrapping is a nonparametric approach for evaluating the distribution

### Bootstrapping Big Data

Bootstrapping Big Data Ariel Kleiner Ameet Talwalkar Purnamrita Sarkar Michael I. Jordan Computer Science Division University of California, Berkeley {akleiner, ameet, psarkar, jordan}@eecs.berkeley.edu

### Lecture 15 Introduction to Survival Analysis

Lecture 15 Introduction to Survival Analysis BIOST 515 February 26, 2004 BIOST 515, Lecture 15 Background In logistic regression, we were interested in studying how risk factors were associated with presence

### Lecture 2 ESTIMATING THE SURVIVAL FUNCTION. One-sample nonparametric methods

Lecture 2 ESTIMATING THE SURVIVAL FUNCTION One-sample nonparametric methods There are commonly three methods for estimating a survivorship function S(t) = P (T > t) without resorting to parametric models:

### Regression Modeling Strategies

Frank E. Harrell, Jr. Regression Modeling Strategies With Applications to Linear Models, Logistic Regression, and Survival Analysis With 141 Figures Springer Contents Preface Typographical Conventions

### SUMAN DUVVURU STAT 567 PROJECT REPORT

SUMAN DUVVURU STAT 567 PROJECT REPORT SURVIVAL ANALYSIS OF HEROIN ADDICTS Background and introduction: Current illicit drug use among teens is continuing to increase in many countries around the world.

### The Variability of P-Values. Summary

The Variability of P-Values Dennis D. Boos Department of Statistics North Carolina State University Raleigh, NC 27695-8203 boos@stat.ncsu.edu August 15, 2009 NC State Statistics Departement Tech Report

### Gamma Distribution Fitting

Chapter 552 Gamma Distribution Fitting Introduction This module fits the gamma probability distributions to a complete or censored set of individual or grouped data values. It outputs various statistics

### Overview. Longitudinal Data Variation and Correlation Different Approaches. Linear Mixed Models Generalized Linear Mixed Models

Overview 1 Introduction Longitudinal Data Variation and Correlation Different Approaches 2 Mixed Models Linear Mixed Models Generalized Linear Mixed Models 3 Marginal Models Linear Models Generalized Linear

### Least Squares Estimation

Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David

### Non-Parametric Estimation in Survival Models

Non-Parametric Estimation in Survival Models Germán Rodríguez grodri@princeton.edu Spring, 2001; revised Spring 2005 We now discuss the analysis of survival data without parametric assumptions about the

### **BEGINNING OF EXAMINATION** The annual number of claims for an insured has probability function: , 0 < q < 1.

**BEGINNING OF EXAMINATION** 1. You are given: (i) The annual number of claims for an insured has probability function: 3 p x q q x x ( ) = ( 1 ) 3 x, x = 0,1,, 3 (ii) The prior density is π ( q) = q,

### UNDERGRADUATE DEGREE DETAILS : BACHELOR OF SCIENCE WITH

QATAR UNIVERSITY COLLEGE OF ARTS & SCIENCES Department of Mathematics, Statistics, & Physics UNDERGRADUATE DEGREE DETAILS : Program Requirements and Descriptions BACHELOR OF SCIENCE WITH A MAJOR IN STATISTICS

### Overview Classes. 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7)

Overview Classes 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7) 2-4 Loglinear models (8) 5-4 15-17 hrs; 5B02 Building and

### CHAPTER 12 EXAMPLES: MONTE CARLO SIMULATION STUDIES

Examples: Monte Carlo Simulation Studies CHAPTER 12 EXAMPLES: MONTE CARLO SIMULATION STUDIES Monte Carlo simulation studies are often used for methodological investigations of the performance of statistical

### Methodological aspects of small area estimation from the National Electronic Health Records Survey (NEHRS). 1

Methodological aspects of small area estimation from the National Electronic Health Records Survey (NEHRS). 1 Vladislav Beresovsky (hvy4@cdc.gov), Janey Hsiao National Center for Health Statistics, CDC

### Adequacy of Biomath. Models. Empirical Modeling Tools. Bayesian Modeling. Model Uncertainty / Selection

Directions in Statistical Methodology for Multivariable Predictive Modeling Frank E Harrell Jr University of Virginia Seattle WA 19May98 Overview of Modeling Process Model selection Regression shape Diagnostics

### Econometric Analysis of Cross Section and Panel Data Second Edition. Jeffrey M. Wooldridge. The MIT Press Cambridge, Massachusetts London, England

Econometric Analysis of Cross Section and Panel Data Second Edition Jeffrey M. Wooldridge The MIT Press Cambridge, Massachusetts London, England Preface Acknowledgments xxi xxix I INTRODUCTION AND BACKGROUND

### Tests for Two Survival Curves Using Cox s Proportional Hazards Model

Chapter 730 Tests for Two Survival Curves Using Cox s Proportional Hazards Model Introduction A clinical trial is often employed to test the equality of survival distributions of two treatment groups.

### More details on the inputs, functionality, and output can be found below.

Overview: The SMEEACT (Software for More Efficient, Ethical, and Affordable Clinical Trials) web interface (http://research.mdacc.tmc.edu/smeeactweb) implements a single analysis of a two-armed trial comparing

### Checking proportionality for Cox s regression model

Checking proportionality for Cox s regression model by Hui Hong Zhang Thesis for the degree of Master of Science (Master i Modellering og dataanalyse) Department of Mathematics Faculty of Mathematics and

### University of Ljubljana Doctoral Programme in Statistics Methodology of Statistical Research Written examination February 14 th, 2014.

University of Ljubljana Doctoral Programme in Statistics ethodology of Statistical Research Written examination February 14 th, 2014 Name and surname: ID number: Instructions Read carefully the wording

### Bootstrapping Multivariate Spectra

Bootstrapping Multivariate Spectra Jeremy Berkowitz Federal Reserve Board Francis X. Diebold University of Pennsylvania and NBER his Draft August 3, 1997 Address correspondence to: F.X. Diebold Department

### Distribution (Weibull) Fitting

Chapter 550 Distribution (Weibull) Fitting Introduction This procedure estimates the parameters of the exponential, extreme value, logistic, log-logistic, lognormal, normal, and Weibull probability distributions

### Reliability estimators for the components of series and parallel systems: The Weibull model

Reliability estimators for the components of series and parallel systems: The Weibull model Felipe L. Bhering 1, Carlos Alberto de Bragança Pereira 1, Adriano Polpo 2 1 Department of Statistics, University

### Prediction and Confidence Intervals in Regression

Fall Semester, 2001 Statistics 621 Lecture 3 Robert Stine 1 Prediction and Confidence Intervals in Regression Preliminaries Teaching assistants See them in Room 3009 SH-DH. Hours are detailed in the syllabus.

### A LOGNORMAL MODEL FOR INSURANCE CLAIMS DATA

REVSTAT Statistical Journal Volume 4, Number 2, June 2006, 131 142 A LOGNORMAL MODEL FOR INSURANCE CLAIMS DATA Authors: Daiane Aparecida Zuanetti Departamento de Estatística, Universidade Federal de São

### Modeling the Claim Duration of Income Protection Insurance Policyholders Using Parametric Mixture Models

Modeling the Claim Duration of Income Protection Insurance Policyholders Using Parametric Mixture Models Abstract This paper considers the modeling of claim durations for existing claimants under income

### Introduction. Survival Analysis. Censoring. Plan of Talk

Survival Analysis Mark Lunt Arthritis Research UK Centre for Excellence in Epidemiology University of Manchester 01/12/2015 Survival Analysis is concerned with the length of time before an event occurs.

### L10: Probability, statistics, and estimation theory

L10: Probability, statistics, and estimation theory Review of probability theory Bayes theorem Statistics and the Normal distribution Least Squares Error estimation Maximum Likelihood estimation Bayesian

### INTRODUCTORY STATISTICS

INTRODUCTORY STATISTICS FIFTH EDITION Thomas H. Wonnacott University of Western Ontario Ronald J. Wonnacott University of Western Ontario WILEY JOHN WILEY & SONS New York Chichester Brisbane Toronto Singapore

### SPSS TRAINING SESSION 3 ADVANCED TOPICS (PASW STATISTICS 17.0) Sun Li Centre for Academic Computing lsun@smu.edu.sg

SPSS TRAINING SESSION 3 ADVANCED TOPICS (PASW STATISTICS 17.0) Sun Li Centre for Academic Computing lsun@smu.edu.sg IN SPSS SESSION 2, WE HAVE LEARNT: Elementary Data Analysis Group Comparison & One-way

### STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct

### THE IMPACT OF EXCHANGE RATE VOLATILITY ON BRAZILIAN MANUFACTURED EXPORTS

THE IMPACT OF EXCHANGE RATE VOLATILITY ON BRAZILIAN MANUFACTURED EXPORTS ANTONIO AGUIRRE UFMG / Department of Economics CEPE (Centre for Research in International Economics) Rua Curitiba, 832 Belo Horizonte

### Note on the EM Algorithm in Linear Regression Model

International Mathematical Forum 4 2009 no. 38 1883-1889 Note on the M Algorithm in Linear Regression Model Ji-Xia Wang and Yu Miao College of Mathematics and Information Science Henan Normal University

### Chapter 1 Introduction to Econometrics

Chapter 1 Introduction to Econometrics Econometrics deals with the measurement of economic relationships. It is an integration of economics, mathematical economics and statistics with an objective to provide

### 4. Introduction to Statistics

Statistics for Engineers 4-1 4. Introduction to Statistics Descriptive Statistics Types of data A variate or random variable is a quantity or attribute whose value may vary from one unit of investigation

### Multiple Imputation for Missing Data: A Cautionary Tale

Multiple Imputation for Missing Data: A Cautionary Tale Paul D. Allison University of Pennsylvania Address correspondence to Paul D. Allison, Sociology Department, University of Pennsylvania, 3718 Locust

### 200609 - ATV - Lifetime Data Analysis

Coordinating unit: Teaching unit: Academic year: Degree: ECTS credits: 2015 200 - FME - School of Mathematics and Statistics 715 - EIO - Department of Statistics and Operations Research 1004 - UB - (ENG)Universitat

### An extension of the factoring likelihood approach for non-monotone missing data

An extension of the factoring likelihood approach for non-monotone missing data Jae Kwang Kim Dong Wan Shin January 14, 2010 ABSTRACT We address the problem of parameter estimation in multivariate distributions

### Technology Step-by-Step Using StatCrunch

Technology Step-by-Step Using StatCrunch Section 1.3 Simple Random Sampling 1. Select Data, highlight Simulate Data, then highlight Discrete Uniform. 2. Fill in the following window with the appropriate

### Bias in the Estimation of Mean Reversion in Continuous-Time Lévy Processes

Bias in the Estimation of Mean Reversion in Continuous-Time Lévy Processes Yong Bao a, Aman Ullah b, Yun Wang c, and Jun Yu d a Purdue University, IN, USA b University of California, Riverside, CA, USA

### Multivariate Statistical Inference and Applications

Multivariate Statistical Inference and Applications ALVIN C. RENCHER Department of Statistics Brigham Young University A Wiley-Interscience Publication JOHN WILEY & SONS, INC. New York Chichester Weinheim

### Confidence Intervals for Spearman s Rank Correlation

Chapter 808 Confidence Intervals for Spearman s Rank Correlation Introduction This routine calculates the sample size needed to obtain a specified width of Spearman s rank correlation coefficient confidence

### Cointegration. Basic Ideas and Key results. Egon Zakrajšek Division of Monetary Affairs Federal Reserve Board

Cointegration Basic Ideas and Key results Egon Zakrajšek Division of Monetary Affairs Federal Reserve Board Summer School in Financial Mathematics Faculty of Mathematics & Physics University of Ljubljana

### Problem of Missing Data

VASA Mission of VA Statisticians Association (VASA) Promote & disseminate statistical methodological research relevant to VA studies; Facilitate communication & collaboration among VA-affiliated statisticians;

### Master programme in Statistics

Master programme in Statistics Björn Holmquist 1 1 Department of Statistics Lund University Cramérsällskapets årskonferens, 2010-03-25 Master programme Vad är ett Master programme? Breddmaster vs Djupmaster

### Estimating survival functions has interested statisticians for numerous years.

ZHAO, GUOLIN, M.A. Nonparametric and Parametric Survival Analysis of Censored Data with Possible Violation of Method Assumptions. (2008) Directed by Dr. Kirsten Doehler. 55pp. Estimating survival functions

### Tips for surviving the analysis of survival data. Philip Twumasi-Ankrah, PhD

Tips for surviving the analysis of survival data Philip Twumasi-Ankrah, PhD Big picture In medical research and many other areas of research, we often confront continuous, ordinal or dichotomous outcomes

### Survival analysis methods in Insurance Applications in car insurance contracts

Survival analysis methods in Insurance Applications in car insurance contracts Abder OULIDI 1-2 Jean-Marie MARION 1 Hérvé GANACHAUD 3 1 Institut de Mathématiques Appliquées (IMA) Angers France 2 Institut

### R 2 -type Curves for Dynamic Predictions from Joint Longitudinal-Survival Models

Faculty of Health Sciences R 2 -type Curves for Dynamic Predictions from Joint Longitudinal-Survival Models Inference & application to prediction of kidney graft failure Paul Blanche joint work with M-C.

### Principal Components Analysis (PCA)

Principal Components Analysis (PCA) Janette Walde janette.walde@uibk.ac.at Department of Statistics University of Innsbruck Outline I Introduction Idea of PCA Principle of the Method Decomposing an Association

### A LONGITUDINAL AND SURVIVAL MODEL WITH HEALTH CARE USAGE FOR INSURED ELDERLY. Workshop

A LONGITUDINAL AND SURVIVAL MODEL WITH HEALTH CARE USAGE FOR INSURED ELDERLY Ramon Alemany Montserrat Guillén Xavier Piulachs Lozada Riskcenter - IREA Universitat de Barcelona http://www.ub.edu/riskcenter

### Modeling and Analysis of Call Center Arrival Data: A Bayesian Approach

Modeling and Analysis of Call Center Arrival Data: A Bayesian Approach Refik Soyer * Department of Management Science The George Washington University M. Murat Tarimcilar Department of Management Science

### Penalized regression: Introduction

Penalized regression: Introduction Patrick Breheny August 30 Patrick Breheny BST 764: Applied Statistical Modeling 1/19 Maximum likelihood Much of 20th-century statistics dealt with maximum likelihood

### Introduction to Monte-Carlo Methods

Introduction to Monte-Carlo Methods Bernard Lapeyre Halmstad January 2007 Monte-Carlo methods are extensively used in financial institutions to compute European options prices to evaluate sensitivities

### An Alternative Route to Performance Hypothesis Testing

EDHEC-Risk Institute 393-400 promenade des Anglais 06202 Nice Cedex 3 Tel.: +33 (0)4 93 18 32 53 E-mail: research@edhec-risk.com Web: www.edhec-risk.com An Alternative Route to Performance Hypothesis Testing

### A Comparative Study of the Pickup Method and its Variations Using a Simulated Hotel Reservation Data

A Comparative Study of the Pickup Method and its Variations Using a Simulated Hotel Reservation Data Athanasius Zakhary, Neamat El Gayar Faculty of Computers and Information Cairo University, Giza, Egypt

### Duration Analysis. Econometric Analysis. Dr. Keshab Bhattarai. April 4, 2011. Hull Univ. Business School

Duration Analysis Econometric Analysis Dr. Keshab Bhattarai Hull Univ. Business School April 4, 2011 Dr. Bhattarai (Hull Univ. Business School) Duration April 4, 2011 1 / 27 What is Duration Analysis?

### Cross Validation techniques in R: A brief overview of some methods, packages, and functions for assessing prediction models.

Cross Validation techniques in R: A brief overview of some methods, packages, and functions for assessing prediction models. Dr. Jon Starkweather, Research and Statistical Support consultant This month

### MATH BOOK OF PROBLEMS SERIES. New from Pearson Custom Publishing!

MATH BOOK OF PROBLEMS SERIES New from Pearson Custom Publishing! The Math Book of Problems Series is a database of math problems for the following courses: Pre-algebra Algebra Pre-calculus Calculus Statistics

### Handling attrition and non-response in longitudinal data

Longitudinal and Life Course Studies 2009 Volume 1 Issue 1 Pp 63-72 Handling attrition and non-response in longitudinal data Harvey Goldstein University of Bristol Correspondence. Professor H. Goldstein

### An Application of the Cox Proportional Hazards Model to the Construction of Objective Vintages for Credit in Financial Institutions, Using PROC PHREG

Paper 3140-2015 An Application of the Cox Proportional Hazards Model to the Construction of Objective Vintages for Credit in Financial Institutions, Using PROC PHREG Iván Darío Atehortua Rojas, Banco Colpatria

### Nominal and ordinal logistic regression

Nominal and ordinal logistic regression April 26 Nominal and ordinal logistic regression Our goal for today is to briefly go over ways to extend the logistic regression model to the case where the outcome

### 1. The maximum likelihood principle 2. Properties of maximum-likelihood estimates

The maximum-likelihood method Volker Blobel University of Hamburg March 2005 1. The maximum likelihood principle 2. Properties of maximum-likelihood estimates Keys during display: enter = next page; =

### MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group

MISSING DATA TECHNIQUES WITH SAS IDRE Statistical Consulting Group ROAD MAP FOR TODAY To discuss: 1. Commonly used techniques for handling missing data, focusing on multiple imputation 2. Issues that could

### Design and Analysis of Equivalence Clinical Trials Via the SAS System

Design and Analysis of Equivalence Clinical Trials Via the SAS System Pamela J. Atherton Skaff, Jeff A. Sloan, Mayo Clinic, Rochester, MN 55905 ABSTRACT An equivalence clinical trial typically is conducted

### Data Matching Optimal and Greedy

Chapter 13 Data Matching Optimal and Greedy Introduction This procedure is used to create treatment-control matches based on propensity scores and/or observed covariate variables. Both optimal and greedy

### The Delta Method and Applications

Chapter 5 The Delta Method and Applications 5.1 Linear approximations of functions In the simplest form of the central limit theorem, Theorem 4.18, we consider a sequence X 1, X,... of independent and

### Inferential Statistics

Inferential Statistics Sampling and the normal distribution Z-scores Confidence levels and intervals Hypothesis testing Commonly used statistical methods Inferential Statistics Descriptive statistics are

### Parametric versus Semi/nonparametric Regression Models

Parametric versus Semi/nonparametric Regression Models Hamdy F. F. Mahmoud Virginia Polytechnic Institute and State University Department of Statistics LISA short course series- July 23, 2014 Hamdy Mahmoud

### Mälardalen University

Mälardalen University http://www.mdh.se 1/38 Value at Risk and its estimation Anatoliy A. Malyarenko Department of Mathematics & Physics Mälardalen University SE-72 123 Västerås,Sweden email: amo@mdh.se

### Model-based boosting in R

Model-based boosting in R Introduction to Gradient Boosting Matthias Schmid Institut für Medizininformatik, Biometrie und Epidemiologie (IMBE) Friedrich-Alexander-Universität Erlangen-Nürnberg Statistical

### Statistical Analysis with Missing Data

Statistical Analysis with Missing Data Second Edition RODERICK J. A. LITTLE DONALD B. RUBIN WILEY- INTERSCIENCE A JOHN WILEY & SONS, INC., PUBLICATION Contents Preface PARTI OVERVIEW AND BASIC APPROACHES

### An Application of the G-formula to Asbestos and Lung Cancer. Stephen R. Cole. Epidemiology, UNC Chapel Hill. Slides: www.unc.

An Application of the G-formula to Asbestos and Lung Cancer Stephen R. Cole Epidemiology, UNC Chapel Hill Slides: www.unc.edu/~colesr/ 1 Acknowledgements Collaboration with David B. Richardson, Haitao

### LOGISTIC REGRESSION ANALYSIS

LOGISTIC REGRESSION ANALYSIS C. Mitchell Dayton Department of Measurement, Statistics & Evaluation Room 1230D Benjamin Building University of Maryland September 1992 1. Introduction and Model Logistic

### Exam C, Fall 2006 PRELIMINARY ANSWER KEY

Exam C, Fall 2006 PRELIMINARY ANSWER KEY Question # Answer Question # Answer 1 E 19 B 2 D 20 D 3 B 21 A 4 C 22 A 5 A 23 E 6 D 24 E 7 B 25 D 8 C 26 A 9 E 27 C 10 D 28 C 11 E 29 C 12 B 30 B 13 C 31 C 14

### Canonical Correlation

Chapter 400 Introduction Canonical correlation analysis is the study of the linear relations between two sets of variables. It is the multivariate extension of correlation analysis. Although we will present

### MEU. INSTITUTE OF HEALTH SCIENCES COURSE SYLLABUS. Biostatistics

MEU. INSTITUTE OF HEALTH SCIENCES COURSE SYLLABUS title- course code: Program name: Contingency Tables and Log Linear Models Level Biostatistics Hours/week Ther. Recite. Lab. Others Total Master of Sci.

### Java Modules for Time Series Analysis

Java Modules for Time Series Analysis Agenda Clustering Non-normal distributions Multifactor modeling Implied ratings Time series prediction 1. Clustering + Cluster 1 Synthetic Clustering + Time series

### Quantitative Methods for Finance

Quantitative Methods for Finance Module 1: The Time Value of Money 1 Learning how to interpret interest rates as required rates of return, discount rates, or opportunity costs. 2 Learning how to explain

### Chapter 3 RANDOM VARIATE GENERATION

Chapter 3 RANDOM VARIATE GENERATION In order to do a Monte Carlo simulation either by hand or by computer, techniques must be developed for generating values of random variables having known distributions.

### Examples on Variable Selection in PCA in Sensory Descriptive and Consumer Data

Examples on Variable Selection in PCA in Sensory Descriptive and Consumer Data Per Lea, Frank Westad, Margrethe Hersleth MATFORSK, Ås, Norway Harald Martens KVL, Copenhagen, Denmark 6 th Sensometrics Meeting

### Simple Regression Theory II 2010 Samuel L. Baker

SIMPLE REGRESSION THEORY II 1 Simple Regression Theory II 2010 Samuel L. Baker Assessing how good the regression equation is likely to be Assignment 1A gets into drawing inferences about how close the