# Comparison of resampling method applied to censored data

Save this PDF as:

Size: px
Start display at page:

## Transcription

2 International Journal of Advanced Statistics and Probability Methodology We are interested in the uncertainty associated to the estimated values of certain parameters. In many cases, as there are certain difficulties in identifying the distribution associated to the estimator of the parameters, we use the bootstrap and the Jackknife methods, two simulation techniques to calculate deviation and to construct the confidence interval for the estimators. In this paper, we investigate models with simulated censored data and a real dataset using Cox Regression Model. For the real dataset, we compare the estimator with the results obtained by Cho [3]. We simulate the average time from an Exponential distribution and we estimate the parameters using maximum likelihood, aiming to verify the performance of the method. The criterion to evaluate the quality of the estimators is the mean squared error and the coverage probability. We evaluated the methods referred above for simulated data and for a real dataset, Stanford heart transplant data [12] Bootstrap Proposed by Efron [7], the bootstrap is a simulation technique to evaluate the standard error of the estimation of a parameter (Martinez et al. [10]). The idea is to perform various resamplings with replacement of the dataset. Let ˆθ = t(x) the estimator of θ calculated from a sample (x 1,..., x n ). A sample bootstrap x (i) = (x 1,..., x n) is a resample with replacement of size n from (x 1,..., x n ), the index i = 1,..., B refers to the number of wanted replicas B. This way, the bootstrap estimator of the variance of ˆθ is given by V ar(ˆθ) = 1 B 1 B (t(x (i) ) ˆθ ) 2, (1) where t(x (i) ) is the estimator of θ based on the i-th replica bootstrap and ˆθ = B 1 B t(x (i)). For a sufficiently large number of replicas, we can calculate confidence intervals (with significance level 2α) for ˆθ through normal approximation, which is given by IC B [ˆθ, (1 2α)] = [ˆθ z α [ V ar(ˆθ)] 1/2 ; ˆθ + z α [ V ar(ˆθ)] 1/2 ], (2) where z α corresponds to the α-th quantile of standard normal distribution. We can estimate the empiric density ˆF as a normal approximation of real F. As such, we can estimate the confidence interval using ˆF quantiles, so the interval is given by IC B [ˆθ, (1 2α)] = [ ˆF (α); ˆF (1 α)]. (3) 2.2. Jackknife The Jackknife method is a resampling technique such as bootstrap. The idea consists of resampling the initial sample removing one or more observations in each replica. In this case, the number of replicas will be equal to the sample size, when we take off just one observation. The Jackknife samples x ( i) = (x 1,..., x i 1, x i+1,..., x n ) are the data with the i th observation removed, the estimator is ˆθ = n 1 n t(x ( i)). We can calculate the Jackknife estimator of the bias vies jack = (n 1)(ˆθ ˆθ). So, the Jackknife estimator of the variance of ˆθ is V ar(ˆθ) = n 1 n n (t(x ( i) ) ˆθ ) 2, (4) where t(x ( i) ) is the estimator of θ based on i-th Jackknife replica. Here we have the empiric density, so we will use the confidence interval for normal distribution approximation, that is constructed analogously to above mentioned. As in bootstrap, we estimate the standard error using the standard deviation applied to Jackknife replicas.

3 50 International Journal of Advanced Statistics and Probability 2.3. Cox Regression Model The Cox Regression model is a model for the hazard analysis which allows the analysis of survival data in the presence of covariables. Suppose we want to estimate the lifetime (t) of patients submitted to two types of treatment: x (x = 0 for the standard treatment and x = 1 for the new treatment). Thus, we define as λ 0 (t) the standard treatment failure rate and λ 1 (t) the new treatment failure rate. Therefore, the reason between these two failure rates (K) is represented by λ 1 (t) λ 0 (t) = K, (5) Considered constant for every time t. Considering that K = exp (βx) we have λ(t) = λ 0 (t)exp {βx} (6) the Cox model for a single covariable. A Cox model with p covariables, vector of covariables x = (x 1, x 2,, x p ) T and the vector of parameters associated to covariables β = (β 1, β 2,, β p ), is defined by λ(t) = λ 0 (t)g ( x T β ), (7) where λ 0 (t) is the baseline hazard function and g( ) is the link function, which can be used in several ways, but Cox (1972) suggested g ( x T β ) = exp ( x T β ) = exp {β 1 x 1 + β 2 x β p x p }. (8) Generalizing the failure rate for the individuals i and j is given by λ i (t) λ j (t) = λ ( 0(t)exp x T i β ) λ 0 (t)exp ( x T j β) = exp { x T i β x T j β }, (9) that doesn t depend on the time t. The estimation of the parameters of the Cox model is made using the maximum likelihood method, where the likelihood function considering a random sample t = (t 1, t 2,, t n ) is given by: L(β) = n [f (t i x i )] δi [S (t i x i )] 1 δi = n [λ (t i x i )] δi S (t i x i ) (10) in which δ i is the censoring indicator and S(t) the survival function. In Cox model we have { ti S (t i x i ) = exp λ 0 (u)exp { x T i β } } du = [S 0 (t i )] exp{xt β} i, (11) 0 thus, the likelihood function is n [ L(β) = λ0 (t i )exp { x T i β }] δ i [S0 (t i )] exp{xt β} i. (12) Later Cox proposed the partial maximum likelihood method (Cox [5]) to estimate the parameters of the model.

4 International Journal of Advanced Statistics and Probability Implementation and Simulation 3.1. Generation of the survival times Considering the survival function in (11) we can find the distribution function of the Cox regression model. F (t i x i ) = 1 S (t i x i ) = 1 [S 0 (t i )] exp{xt i β}. (13) According to Bender et al. [1] in order to generate the survival times of the Cox regression model, we can use the inverse function method in (13) for the exponential, Weibull and Gompertz distribution. 1. Exponential Distribution Considering the exponential link function, we have the generation of time given by T = log(u) λexp { x T i β} (14) in which u U(0, 1). Considering that the survival times are exponentially distributed with scale parameters λexp { x T i β}, that are dependent on the regression coefficients and on the considered covariables. 2. Weibull Distribution Considering the Weibul link function, we have the survival time given by T = ( log(u) λexp { x T i β} ) 1 ν (15) in which u U(0, 1). In Weibull distribution, the survival times ate distributed with scale parameters λexp { x T i β} and fixed parameter ν, when (ν = 1) we have a exponential distribution. 3. Gompertz Distribution Considering the Gompertz link function we have the survival time generation given by T = 1 [ α log 1 αlog(u) ] λexp(x T i β) (16) in which u U(0, 1). In Gompertz distribution, the survival times are distributed with scale parameters λexp { x T i β} and fixed parameters α, so, when (α = 0) we have an exponential distribution Resampling Process Method The resampling using the bootstrap method was made in two ways. The first way was to resample every line of the data matrix. The second way was a stratified sample using the censored and the uncensored data. The resampling using the Jackknife method was made taking the lines off the data matrix one at a time. Two datasets were generated using the following process In the first dataset we generate the response variable only, i.e., the survival time after the transplant, according to the exponential model with rate λexp { x T i β}, in which we used the dependent variables of the real dataset. The second dataset was generate in a stratified way based on the censors, considering the time with exponential distribution with rate λexp { x T i β} and for the ages we considered a Poisson distribution with parameter µ equal to the mean of the ages of the patients. The censor was random distributed with the same proportion as the real data.

5 52 International Journal of Advanced Statistics and Probability 4. Results and Discussions The real dataset was used to illustrate the proposed method. From 184 cases, 71 are censored and the response variable is survival time after heart transplant, the covariables are the average age of the patients and its square root value. The average age of the patients is years, with a dispersion around the mean of The covariable T5 mismatch wasn t used because it isn t significant. The Jackknife residual was calculates for the influential data points analysis and we found 4 outliers, such as the obtained in Cho et al. [3]. The influential data points were removed from the data. Table 1: Real Data assint. bootstrap boot. stratified Jackknife β 0-0, , , , β 1 0, , , , CI(0,95;β 0 ) inf -0, , , , CI(0,95;β 0 ) sup -0, , , , CI(0,95;β 1 ) inf 0, , , , CI(0,95;β 1 ) sup 0, , , , emp. CI(0,95;β 0 ) inf 0-0, , , emp. CI(0,95;β 0 ) sup 0-0, , , emp. CI(0,95;β 1 ) inf 0 0, , , emp. CI(0,95;β 1 ) sup 0 0, , , In Table (1) we can observe that, for this dataset, the results using Jackknife are better than using bootstrap, however, we observe that the results are very close. In Fig. (1) we can observe the asymptotic confidence intervals for the parameters β 0 and β 1, respectively Fig. 1: Asymptotic confidence interval for β 0 and β 1, considering exponential link function For Table (2) the response variable, i.e., the survival time after transplant, was simulated. We considered as true the value of the parameter β 0 = 0, and β 1 = 0, , so we replicated 1000 times the simulation to calculate the coverage probability through asymptotic confidence interval and using the percentile 0, 025 and the 0, 975. Using asymptotic confidence interval, we observe that the coverage probability based on Jackknife presents a better result than the bootstrap. However, when we compare the coverage probability using the percentile we observe that the bootstrap presents a better result. In Fig. (2) we observe the asymptotic confidence intervals for the parameters β 0 and β 1, respectively. In Fig. (3) we observe the coverage probability the the respective parameters.

6 International Journal of Advanced Statistics and Probability 53 Table 2: Data with simulated response assint. bootstrap boot. stratified Jackknife β 0-0, , , , β 1 0, , , , CI(0,95;β 0 ) inf -0, , , , CI(0,95;β 0 ) sup -0, , , , CI(0,95;β 1 ) inf 0, , , , CI(0,95;β 1 ) sup 0, , , , Coverage β 0 0,986 0,982 0,984 0,985 Coverage β 1 0,957 0,956 0,955 0,960 emp. IC(0,95;β 0 ) inf 0-0, , , emp. IC(0,95;β 0 ) sup 0-0, , , emp. IC(0,95;β 1 ) inf 0 0, , , emp. IC(0,95;β 1 ) sup 0 0, , , emp. Coverage β 0 0 0,981 0,980 0,152 emp. Coverage β 1 0 0,949 0,949 0,112 Bias β 0 0, , , , Bias β 1-0, , , , Fig. 2: Asymptotic confidence interval for β 0 and β 1, considering the simulation of the survival time with exponential link function Table 3: Simulated Data assint. bootstrap boot. stratified jackknife β 0-0, , , , β 1 0, , , , IC(0,95;β 0 ) inf -0, , , , IC(0,95;β 0 ) sup 0, , , , IC(0,95;β 1 ) inf -0, , , , IC(0,95;β 1 ) sup 0, , , , Coverage β 0 0,949 0,956 0,954 0,957 Coverage β 1 0,954 0,957 0,957 0,959 emp. CI(0,95;β 0 ) inf 0-0, , , emp. CI(0,95;β 0 ) sup 0 0, , , emp. CI(0,95;β 1 ) inf 0-0, , , emp. CI(0,95;β 1 ) sup 0 0, , , emp. Coverage β 0 0 0,948 0,952 0,097 emp. Coverage β 1 0 0,957 0,952 0,105 Bias β 0 0, , , , Bias β 1-0, , , ,000379

7 54 International Journal of Advanced Statistics and Probability Fig. 3: Coverage for β 0 and β 1n considering the simulation of the survival time with exponential link function Fig. 4: Asymptotic confidence interval for β 0 simulating data, considering exponential link function For the Table (3), we generated the all dataset. We considered as true the value of the parameter β 0 = 0, and β 1 = 0, , so we replicated 1000 times the simulation to calculate the coverage probability through asymptotic confidence interval and using the percentile 0, 025 and the 0, 975. Using asymptotic confidence interval we observe that the coverage probability based on Jackknife presented a better result than the bootstrap. However, when we compare the coverage probability using the percentile we observe that the bootstrap presented a better result, as Table (2). In the Fig. (4) we observe the asymptotic confidence intervals for the parameters β 0 and β 1, respectively. And Fig. (5) we observe the coverage probability for respective parameters.

8 International Journal of Advanced Statistics and Probability Fig. 5: Coverage for β 0 simulating data, considering exponential link function 5. Final Considerations In this paper we could obtain information about the performance of the evaluated methods, that may support an decision in which technique simulation to use. We observe that the Jackknife residual is efficient to analyze influential data points in the response variable. References [1] Bender, R., Augustin, T., Bletter, M., Generating Survival Times to Simulate Cox Proportional Hazards Models, Sonderforschungsbereich Vol.386, (2003), pp [2] Chernick, M. R., Bootstrap Methods: A Guide for Practitioners and Researchers, Wiley Series in Probability and Statistics, New York, (2007). [3] Cho, H., Ibrahim, J. G., Sinha, D., Zhu, H., Bayesian Case Influence Diagnostics for Survival Models, Biometrics, Vol.65, (2009), pp [4] Colosimo, E. A., Giolo, S. R., Análise de sobrevivência aplicada, Edgard Blucher, (2006). [5] Cox, D. R., Regression models and life-tables (with discussion), Journal of the Royal Statistical Society, Series B: Methodological, Vol.34, (1972), pp [6] Cox, D. R., Partial Likehood, Biometrika, Vol.62, (1975), pp [7] Efron, B., Bootstrap Methods: Another Look at the Jackknife, The Annals of Statistics, (1979), Vol.7 [8] Efron, B., The Jackknife, the Bootstrap and Other Resampling Plans, Department of Statistics, Stanford University, (1994). [9] Efron, B., Tibshirani, R.J., An Introduction to the Bootstrap, Monographs ond Statistics and Applied Probability, Vol.57 (1993). [10] Martinez, E. Z., Louzada-Neto, F., Bootstrap confidence interval estimation, Rev. Mat. Estat.,(Sã Paulo), Vol.19, (2001), pp [11] Miller, R. G., The Jackknife - A review, Biometrika, Vol.61, (1974), pp [12] Miller, R., Halpern, J., Regression with censored data, Biometrika, Vol.69, (1982), pp [13] Shao, J.; TU, D., The Jackknife and Bootstrap. Springer-Verlag, 281, (1995).

### BIAS EVALUATION IN THE PROPORTIONAL HAZARDS MODEL

BIAS EVALUATION IN THE PROPORTIONAL HAZARDS MODEL E. A. COLOSIMO, A. F. SILVA, and F. R. B. CRUZ Departamento de Estatística, Universidade Federal de Minas Gerais, 31270-901 - Belo Horizonte - MG, Brazil

### Sudhanshu K. Ghoshal, Ischemia Research Institute, San Francisco, CA

ABSTRACT: Multivariable Cox Proportional Hazard Model by SAS PHREG & Validation by Bootstrapping Using SAS Randomizers With Applications Involving Electrocardiology and Organ Transplants Sudhanshu K. Ghoshal,

### Lecture 2 ESTIMATING THE SURVIVAL FUNCTION. One-sample nonparametric methods

Lecture 2 ESTIMATING THE SURVIVAL FUNCTION One-sample nonparametric methods There are commonly three methods for estimating a survivorship function S(t) = P (T > t) without resorting to parametric models:

### The Bootstrap. 1 Introduction. The Bootstrap 2. Short Guides to Microeconometrics Fall Kurt Schmidheiny Unversität Basel

Short Guides to Microeconometrics Fall 2016 The Bootstrap Kurt Schmidheiny Unversität Basel The Bootstrap 2 1a) The asymptotic sampling distribution is very difficult to derive. 1b) The asymptotic sampling

### A LOGNORMAL MODEL FOR INSURANCE CLAIMS DATA

REVSTAT Statistical Journal Volume 4, Number 2, June 2006, 131 142 A LOGNORMAL MODEL FOR INSURANCE CLAIMS DATA Authors: Daiane Aparecida Zuanetti Departamento de Estatística, Universidade Federal de São

### Statistical Analysis of Life Insurance Policy Termination and Survivorship

Statistical Analysis of Life Insurance Policy Termination and Survivorship Emiliano A. Valdez, PhD, FSA Michigan State University joint work with J. Vadiveloo and U. Dias Session ES82 (Statistics in Actuarial

### Confidence Intervals for Exponential Reliability

Chapter 408 Confidence Intervals for Exponential Reliability Introduction This routine calculates the number of events needed to obtain a specified width of a confidence interval for the reliability (proportion

### Errata and updates for ASM Exam C/Exam 4 Manual (Sixteenth Edition) sorted by page

Errata for ASM Exam C/4 Study Manual (Sixteenth Edition) Sorted by Page 1 Errata and updates for ASM Exam C/Exam 4 Manual (Sixteenth Edition) sorted by page Practice exam 1:9, 1:22, 1:29, 9:5, and 10:8

### Interpretation of Somers D under four simple models

Interpretation of Somers D under four simple models Roger B. Newson 03 September, 04 Introduction Somers D is an ordinal measure of association introduced by Somers (96)[9]. It can be defined in terms

### Web-based Supplementary Materials for Bayesian Effect Estimation. Accounting for Adjustment Uncertainty by Chi Wang, Giovanni

1 Web-based Supplementary Materials for Bayesian Effect Estimation Accounting for Adjustment Uncertainty by Chi Wang, Giovanni Parmigiani, and Francesca Dominici In Web Appendix A, we provide detailed

### Gamma Distribution Fitting

Chapter 552 Gamma Distribution Fitting Introduction This module fits the gamma probability distributions to a complete or censored set of individual or grouped data values. It outputs various statistics

### Simple Linear Regression Inference

Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation

### The Variability of P-Values. Summary

The Variability of P-Values Dennis D. Boos Department of Statistics North Carolina State University Raleigh, NC 27695-8203 boos@stat.ncsu.edu August 15, 2009 NC State Statistics Departement Tech Report

### Survival analysis methods in Insurance Applications in car insurance contracts

Survival analysis methods in Insurance Applications in car insurance contracts Abder OULIDI 1-2 Jean-Marie MARION 1 Hérvé GANACHAUD 3 1 Institut de Mathématiques Appliquées (IMA) Angers France 2 Institut

### Orthogonal Distance Regression

Applied and Computational Mathematics Division NISTIR 89 4197 Center for Computing and Applied Mathematics Orthogonal Distance Regression Paul T. Boggs and Janet E. Rogers November, 1989 (Revised July,

### Nonparametric Predictive Methods for Bootstrap and Test Reproducibility

Nonparametric Predictive Methods for Bootstrap and Test Reproducibility Sulafah BinHimd A Thesis presented for the degree of Doctor of Philosophy Department of Mathematical Sciences University of Durham

### Publication List. Chen Zehua Department of Statistics & Applied Probability National University of Singapore

Publication List Chen Zehua Department of Statistics & Applied Probability National University of Singapore Publications Journal Papers 1. Y. He and Z. Chen (2014). A sequential procedure for feature selection

### A Comparison of Correlation Coefficients via a Three-Step Bootstrap Approach

ISSN: 1916-9795 Journal of Mathematics Research A Comparison of Correlation Coefficients via a Three-Step Bootstrap Approach Tahani A. Maturi (Corresponding author) Department of Mathematical Sciences,

### Least Squares Estimation

Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David

### Experimental data analysis Lecture 3: Confidence intervals. Dodo Das

Experimental data analysis Lecture 3: Confidence intervals Dodo Das Review of lecture 2 Nonlinear regression - Iterative likelihood maximization Levenberg-Marquardt algorithm (Hybrid of steepest descent

### From the help desk: Bootstrapped standard errors

The Stata Journal (2003) 3, Number 1, pp. 71 80 From the help desk: Bootstrapped standard errors Weihua Guan Stata Corporation Abstract. Bootstrapping is a nonparametric approach for evaluating the distribution

### Properties of Future Lifetime Distributions and Estimation

Properties of Future Lifetime Distributions and Estimation Harmanpreet Singh Kapoor and Kanchan Jain Abstract Distributional properties of continuous future lifetime of an individual aged x have been studied.

### Resampling: Bootstrapping and Randomization. Presented by: Jenn Fortune, Alayna Gillespie, Ivana Pejakovic, and Anne Bergen

Resampling: Bootstrapping and Randomization Presented by: Jenn Fortune, Alayna Gillespie, Ivana Pejakovic, and Anne Bergen Outline 1. The logic of resampling and bootstrapping 2. Bootstrapping: confidence

### SUMAN DUVVURU STAT 567 PROJECT REPORT

SUMAN DUVVURU STAT 567 PROJECT REPORT SURVIVAL ANALYSIS OF HEROIN ADDICTS Background and introduction: Current illicit drug use among teens is continuing to increase in many countries around the world.

### Modeling and Analysis of Call Center Arrival Data: A Bayesian Approach

Modeling and Analysis of Call Center Arrival Data: A Bayesian Approach Refik Soyer * Department of Management Science The George Washington University M. Murat Tarimcilar Department of Management Science

### Note on the EM Algorithm in Linear Regression Model

International Mathematical Forum 4 2009 no. 38 1883-1889 Note on the M Algorithm in Linear Regression Model Ji-Xia Wang and Yu Miao College of Mathematics and Information Science Henan Normal University

### 1 Teaching notes on GMM 1.

Bent E. Sørensen January 23, 2007 1 Teaching notes on GMM 1. Generalized Method of Moment (GMM) estimation is one of two developments in econometrics in the 80ies that revolutionized empirical work in

### **BEGINNING OF EXAMINATION** The annual number of claims for an insured has probability function: , 0 < q < 1.

**BEGINNING OF EXAMINATION** 1. You are given: (i) The annual number of claims for an insured has probability function: 3 p x q q x x ( ) = ( 1 ) 3 x, x = 0,1,, 3 (ii) The prior density is π ( q) = q,

### Fitting Subject-specific Curves to Grouped Longitudinal Data

Fitting Subject-specific Curves to Grouped Longitudinal Data Djeundje, Viani Heriot-Watt University, Department of Actuarial Mathematics & Statistics Edinburgh, EH14 4AS, UK E-mail: vad5@hw.ac.uk Currie,

### What Teachers Should Know about the Bootstrap: Resampling in the Undergraduate Statistics Curriculum

What Teachers Should Know about the Bootstrap: Resampling in the Undergraduate Statistics Curriculum Tim Hesterberg Google timhesterberg@gmail.com November 19, 2014 Abstract I have three goals in this

### fifty Fathoms Statistics Demonstrations for Deeper Understanding Tim Erickson

fifty Fathoms Statistics Demonstrations for Deeper Understanding Tim Erickson Contents What Are These Demos About? How to Use These Demos If This Is Your First Time Using Fathom Tutorial: An Extended Example

Statistics Graduate Courses STAT 7002--Topics in Statistics-Biological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.

### R 2 -type Curves for Dynamic Predictions from Joint Longitudinal-Survival Models

Faculty of Health Sciences R 2 -type Curves for Dynamic Predictions from Joint Longitudinal-Survival Models Inference & application to prediction of kidney graft failure Paul Blanche joint work with M-C.

### Lecture 7 Linear Regression Diagnostics

Lecture 7 Linear Regression Diagnostics BIOST 515 January 27, 2004 BIOST 515, Lecture 6 Major assumptions 1. The relationship between the outcomes and the predictors is (approximately) linear. 2. The error

### Constrained Bayes and Empirical Bayes Estimator Applications in Insurance Pricing

Communications for Statistical Applications and Methods 2013, Vol 20, No 4, 321 327 DOI: http://dxdoiorg/105351/csam2013204321 Constrained Bayes and Empirical Bayes Estimator Applications in Insurance

### Lecture 15 Introduction to Survival Analysis

Lecture 15 Introduction to Survival Analysis BIOST 515 February 26, 2004 BIOST 515, Lecture 15 Background In logistic regression, we were interested in studying how risk factors were associated with presence

### Survival Analysis of Left Truncated Income Protection Insurance Data. [March 29, 2012]

Survival Analysis of Left Truncated Income Protection Insurance Data [March 29, 2012] 1 Qing Liu 2 David Pitt 3 Yan Wang 4 Xueyuan Wu Abstract One of the main characteristics of Income Protection Insurance

### Department of Mathematics, Indian Institute of Technology, Kharagpur Assignment 2-3, Probability and Statistics, March 2015. Due:-March 25, 2015.

Department of Mathematics, Indian Institute of Technology, Kharagpur Assignment -3, Probability and Statistics, March 05. Due:-March 5, 05.. Show that the function 0 for x < x+ F (x) = 4 for x < for x

### Lecture 3: Linear methods for classification

Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,

### Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.

Course Catalog In order to be assured that all prerequisites are met, students must acquire a permission number from the education coordinator prior to enrolling in any Biostatistics course. Courses are

### Statistics in Retail Finance. Chapter 6: Behavioural models

Statistics in Retail Finance 1 Overview > So far we have focussed mainly on application scorecards. In this chapter we shall look at behavioural models. We shall cover the following topics:- Behavioural

### Parametric Survival Models

Parametric Survival Models Germán Rodríguez grodri@princeton.edu Spring, 2001; revised Spring 2005, Summer 2010 We consider briefly the analysis of survival data when one is willing to assume a parametric

### Econometric Analysis of Cross Section and Panel Data Second Edition. Jeffrey M. Wooldridge. The MIT Press Cambridge, Massachusetts London, England

Econometric Analysis of Cross Section and Panel Data Second Edition Jeffrey M. Wooldridge The MIT Press Cambridge, Massachusetts London, England Preface Acknowledgments xxi xxix I INTRODUCTION AND BACKGROUND

### Exam C, Fall 2006 PRELIMINARY ANSWER KEY

Exam C, Fall 2006 PRELIMINARY ANSWER KEY Question # Answer Question # Answer 1 E 19 B 2 D 20 D 3 B 21 A 4 C 22 A 5 A 23 E 6 D 24 E 7 B 25 D 8 C 26 A 9 E 27 C 10 D 28 C 11 E 29 C 12 B 30 B 13 C 31 C 14

### SAS Software to Fit the Generalized Linear Model

SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling

### Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus Tihomir Asparouhov and Bengt Muthén Mplus Web Notes: No. 15 Version 8, August 5, 2014 1 Abstract This paper discusses alternatives

### Reliability estimators for the components of series and parallel systems: The Weibull model

Reliability estimators for the components of series and parallel systems: The Weibull model Felipe L. Bhering 1, Carlos Alberto de Bragança Pereira 1, Adriano Polpo 2 1 Department of Statistics, University

### Improved approximations for multilevel models with binary responses

Improved approximations for multilevel models with binary responses (Goldstein, H. and Rasbash, J. (1996). Improved approximations for multilevel models with binary responses. Journal of the Royal Statistical

### Bayesian Statistics in One Hour. Patrick Lam

Bayesian Statistics in One Hour Patrick Lam Outline Introduction Bayesian Models Applications Missing Data Hierarchical Models Outline Introduction Bayesian Models Applications Missing Data Hierarchical

### 0.1 Solutions to CAS Exam ST, Fall 2015

0.1 Solutions to CAS Exam ST, Fall 2015 The questions can be found at http://www.casact.org/admissions/studytools/examst/f15-st.pdf. 1. The variance equals the parameter λ of the Poisson distribution for

### Estimating survival functions has interested statisticians for numerous years.

ZHAO, GUOLIN, M.A. Nonparametric and Parametric Survival Analysis of Censored Data with Possible Violation of Method Assumptions. (2008) Directed by Dr. Kirsten Doehler. 55pp. Estimating survival functions

### Non-Parametric Estimation in Survival Models

Non-Parametric Estimation in Survival Models Germán Rodríguez grodri@princeton.edu Spring, 2001; revised Spring 2005 We now discuss the analysis of survival data without parametric assumptions about the

### Maximum Likelihood Estimation

Math 541: Statistical Theory II Lecturer: Songfeng Zheng Maximum Likelihood Estimation 1 Maximum Likelihood Estimation Maximum likelihood is a relatively simple method of constructing an estimator for

### Checking proportionality for Cox s regression model

Checking proportionality for Cox s regression model by Hui Hong Zhang Thesis for the degree of Master of Science (Master i Modellering og dataanalyse) Department of Mathematics Faculty of Mathematics and

### 7.1 The Hazard and Survival Functions

Chapter 7 Survival Models Our final chapter concerns models for the analysis of data which have three main characteristics: (1) the dependent variable or response is the waiting time until the occurrence

### C: LEVEL 800 {MASTERS OF ECONOMICS( ECONOMETRICS)}

C: LEVEL 800 {MASTERS OF ECONOMICS( ECONOMETRICS)} 1. EES 800: Econometrics I Simple linear regression and correlation analysis. Specification and estimation of a regression model. Interpretation of regression

### Applications of R Software in Bayesian Data Analysis

Article International Journal of Information Science and System, 2012, 1(1): 7-23 International Journal of Information Science and System Journal homepage: www.modernscientificpress.com/journals/ijinfosci.aspx

### A study on the bi-aspect procedure with location and scale parameters

통계연구(2012), 제17권 제1호, 19-26 A study on the bi-aspect procedure with location and scale parameters (Short Title: Bi-aspect procedure) Hyo-Il Park 1) Ju Sung Kim 2) Abstract In this research we propose a

### Nonparametric statistics and model selection

Chapter 5 Nonparametric statistics and model selection In Chapter, we learned about the t-test and its variations. These were designed to compare sample means, and relied heavily on assumptions of normality.

### Survival Distributions, Hazard Functions, Cumulative Hazards

Week 1 Survival Distributions, Hazard Functions, Cumulative Hazards 1.1 Definitions: The goals of this unit are to introduce notation, discuss ways of probabilistically describing the distribution of a

### The Proportional Hazard Model in Economics Jerry A. Hausman and Tiemen M. Woutersen MIT and Johns Hopkins University.

The Proportional Hazard Model in Economics Jerry A. Hausman and Tiemen M. Woutersen MIT and Johns Hopkins University February 2005 Abstract. This paper reviews proportional hazard models and how the thinking

### Delme John Pritchard

THE GENETICS OF ALZHEIMER S DISEASE, MODELLING DISABILITY AND ADVERSE SELECTION IN THE LONGTERM CARE INSURANCE MARKET By Delme John Pritchard Submitted for the Degree of Doctor of Philosophy at HeriotWatt

### Course 4 Examination Questions And Illustrative Solutions. November 2000

Course 4 Examination Questions And Illustrative Solutions Novemer 000 1. You fit an invertile first-order moving average model to a time series. The lag-one sample autocorrelation coefficient is 0.35.

### Statistical Analysis with Missing Data

Statistical Analysis with Missing Data Second Edition RODERICK J. A. LITTLE DONALD B. RUBIN WILEY- INTERSCIENCE A JOHN WILEY & SONS, INC., PUBLICATION Contents Preface PARTI OVERVIEW AND BASIC APPROACHES

### Analysis of Bayesian Dynamic Linear Models

Analysis of Bayesian Dynamic Linear Models Emily M. Casleton December 17, 2010 1 Introduction The main purpose of this project is to explore the Bayesian analysis of Dynamic Linear Models (DLMs). The main

### Handling missing data in Stata a whirlwind tour

Handling missing data in Stata a whirlwind tour 2012 Italian Stata Users Group Meeting Jonathan Bartlett www.missingdata.org.uk 20th September 2012 1/55 Outline The problem of missing data and a principled

### Introduction to Monte-Carlo Methods

Introduction to Monte-Carlo Methods Bernard Lapeyre Halmstad January 2007 Monte-Carlo methods are extensively used in financial institutions to compute European options prices to evaluate sensitivities

### UNDERGRADUATE DEGREE DETAILS : BACHELOR OF SCIENCE WITH

QATAR UNIVERSITY COLLEGE OF ARTS & SCIENCES Department of Mathematics, Statistics, & Physics UNDERGRADUATE DEGREE DETAILS : Program Requirements and Descriptions BACHELOR OF SCIENCE WITH A MAJOR IN STATISTICS

### Nonparametric Bootstrap Tests for the Generalized Behrens-Fisher Problem. Paul Cotofrei University of Neuchatel

Nonparametric Bootstrap Tests for the Generalized Behrens-Fisher Problem Paul Cotofrei University of Neuchatel 1 Introduction Like other concepts and approaches that have dominated the statistical literature

### SOCIETY OF ACTUARIES/CASUALTY ACTUARIAL SOCIETY EXAM C CONSTRUCTION AND EVALUATION OF ACTUARIAL MODELS EXAM C SAMPLE QUESTIONS

SOCIETY OF ACTUARIES/CASUALTY ACTUARIAL SOCIETY EXAM C CONSTRUCTION AND EVALUATION OF ACTUARIAL MODELS EXAM C SAMPLE QUESTIONS Copyright 005 by the Society of Actuaries and the Casualty Actuarial Society

### Dongfeng Li. Autumn 2010

Autumn 2010 Chapter Contents Some statistics background; ; Comparing means and proportions; variance. Students should master the basic concepts, descriptive statistics measures and graphs, basic hypothesis

### Probability Calculator

Chapter 95 Introduction Most statisticians have a set of probability tables that they refer to in doing their statistical wor. This procedure provides you with a set of electronic statistical tables that

### 200609 - ATV - Lifetime Data Analysis

Coordinating unit: Teaching unit: Academic year: Degree: ECTS credits: 2015 200 - FME - School of Mathematics and Statistics 715 - EIO - Department of Statistics and Operations Research 1004 - UB - (ENG)Universitat

### Estimation and attribution of changes in extreme weather and climate events

IPCC workshop on extreme weather and climate events, 11-13 June 2002, Beijing. Estimation and attribution of changes in extreme weather and climate events Dr. David B. Stephenson Department of Meteorology

### Parametric Models Part I: Maximum Likelihood and Bayesian Density Estimation

Parametric Models Part I: Maximum Likelihood and Bayesian Density Estimation Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2015 CS 551, Fall 2015

### Generalized Linear Models. Today: definition of GLM, maximum likelihood estimation. Involves choice of a link function (systematic component)

Generalized Linear Models Last time: definition of exponential family, derivation of mean and variance (memorize) Today: definition of GLM, maximum likelihood estimation Include predictors x i through

: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

### Department of Economics

Department of Economics On Testing for Diagonality of Large Dimensional Covariance Matrices George Kapetanios Working Paper No. 526 October 2004 ISSN 1473-0278 On Testing for Diagonality of Large Dimensional

### Logit Models for Binary Data

Chapter 3 Logit Models for Binary Data We now turn our attention to regression models for dichotomous data, including logistic regression and probit analysis. These models are appropriate when the response

### Tests for Two Survival Curves Using Cox s Proportional Hazards Model

Chapter 730 Tests for Two Survival Curves Using Cox s Proportional Hazards Model Introduction A clinical trial is often employed to test the equality of survival distributions of two treatment groups.

### A Basic Introduction to Missing Data

John Fox Sociology 740 Winter 2014 Outline Why Missing Data Arise Why Missing Data Arise Global or unit non-response. In a survey, certain respondents may be unreachable or may refuse to participate. Item

### Distribution (Weibull) Fitting

Chapter 550 Distribution (Weibull) Fitting Introduction This procedure estimates the parameters of the exponential, extreme value, logistic, log-logistic, lognormal, normal, and Weibull probability distributions

### Bootstrapping Big Data

Bootstrapping Big Data Ariel Kleiner Ameet Talwalkar Purnamrita Sarkar Michael I. Jordan Computer Science Division University of California, Berkeley {akleiner, ameet, psarkar, jordan}@eecs.berkeley.edu

### Examining credit card consumption pattern

Examining credit card consumption pattern Yuhao Fan (Economics Department, Washington University in St. Louis) Abstract: In this paper, I analyze the consumer s credit card consumption data from a commercial

### PARTIAL LEAST SQUARES IS TO LISREL AS PRINCIPAL COMPONENTS ANALYSIS IS TO COMMON FACTOR ANALYSIS. Wynne W. Chin University of Calgary, CANADA

PARTIAL LEAST SQUARES IS TO LISREL AS PRINCIPAL COMPONENTS ANALYSIS IS TO COMMON FACTOR ANALYSIS. Wynne W. Chin University of Calgary, CANADA ABSTRACT The decision of whether to use PLS instead of a covariance

### Model Selection and Claim Frequency for Workers Compensation Insurance

Model Selection and Claim Frequency for Workers Compensation Insurance Jisheng Cui, David Pitt and Guoqi Qian Abstract We consider a set of workers compensation insurance claim data where the aggregate

### Missing Data in Longitudinal Studies: To Impute or not to Impute? Robert Platt, PhD McGill University

Missing Data in Longitudinal Studies: To Impute or not to Impute? Robert Platt, PhD McGill University 1 Outline Missing data definitions Longitudinal data specific issues Methods Simple methods Multiple

### CHAPTER 3 EXAMPLES: REGRESSION AND PATH ANALYSIS

Examples: Regression And Path Analysis CHAPTER 3 EXAMPLES: REGRESSION AND PATH ANALYSIS Regression analysis with univariate or multivariate dependent variables is a standard procedure for modeling relationships

### Monte Carlo Simulation (General Simulation Models)

Monte Carlo Simulation (General Simulation Models) STATGRAPHICS Rev. 9/16/2013 Summary... 1 Example #1... 1 Example #2... 8 Summary Monte Carlo simulation is used to estimate the distribution of variables

### SAS Certificate Applied Statistics and SAS Programming

SAS Certificate Applied Statistics and SAS Programming SAS Certificate Applied Statistics and Advanced SAS Programming Brigham Young University Department of Statistics offers an Applied Statistics and

### ˆx wi k = i ) x jk e x j β. c i. c i {x ik x wi k}, (1)

Residuals Schoenfeld Residuals Assume p covariates and n independent observations of time, covariates, and censoring, which are represented as (t i, x i, c i ), where i = 1, 2,..., n, and c i = 1 for uncensored

### CHAPTER 12 EXAMPLES: MONTE CARLO SIMULATION STUDIES

Examples: Monte Carlo Simulation Studies CHAPTER 12 EXAMPLES: MONTE CARLO SIMULATION STUDIES Monte Carlo simulation studies are often used for methodological investigations of the performance of statistical

### UNIVERSITY OF OSLO. The Poisson model is a common model for claim frequency.

UNIVERSITY OF OSLO Faculty of mathematics and natural sciences Candidate no Exam in: STK 4540 Non-Life Insurance Mathematics Day of examination: December, 9th, 2015 Examination hours: 09:00 13:00 This

### Centre for Central Banking Studies

Centre for Central Banking Studies Technical Handbook No. 4 Applied Bayesian econometrics for central bankers Andrew Blake and Haroon Mumtaz CCBS Technical Handbook No. 4 Applied Bayesian econometrics

### 7 Hypothesis testing - one sample tests

7 Hypothesis testing - one sample tests 7.1 Introduction Definition 7.1 A hypothesis is a statement about a population parameter. Example A hypothesis might be that the mean age of students taking MAS113X

### AUTOCORRELATED RESIDUALS OF ROBUST REGRESSION

AUTOCORRELATED RESIDUALS OF ROBUST REGRESSION Jan Kalina Abstract The work is devoted to the Durbin-Watson test for robust linear regression methods. First we explain consequences of the autocorrelation

### Linear Models for Continuous Data

Chapter 2 Linear Models for Continuous Data The starting point in our exploration of statistical models in social research will be the classical linear model. Stops along the way include multiple linear