Applied Statistics. J. Blanchet and J. Wadsworth. Institute of Mathematics, Analysis, and Applications EPF Lausanne

Save this PDF as:

Size: px
Start display at page:

Transcription

1 Applied Statistics J. Blanchet and J. Wadsworth Institute of Mathematics, Analysis, and Applications EPF Lausanne An MSc Course for Applied Mathematicians, Fall 2012

2 Outline 1 Model Comparison 2 Model Diagnostics in Proportional Hazards

3 Part I Model Comparison

4 Comparing Survival Curves Two groups Suppose that interest lies in whether two groups have different survival curves Plotting the curves gives an impression of whether they are the same What can we say more formally?

5 Comparing Survival Curves Log rank test The log-rank test allows us to compare survival curves for two groups The log-rank test works by comparing observed failures with expected failures under the null hypothesis of no difference between groups

6 Comparing Survival Curves Log rank test Recall the notation for ordered times 0 t (1) t (n) with n (i) the number at risk just prior to time t (i), and d (i) the number of failures at time t (i). Let k = 1, 2 represent the two groups and Define n k,(i) be the number at risk just prior to time t (i) in group k, so that n (i) = n 1,(i) + n 2,(i) d k,(i) be the number of failures at time t (i) in group k, so that d (i) = d 1,(i) + d 2,(i) E k = i d (i) n k,(i) n (i) O k = i d k,(i) as the expected and observed numbers of failures in group k.

7 Comparing Survival Curves Log rank test Under the null hypothesis of no difference between groups, the statistic X 2 = (O 1 E 1 ) 2 /V χ 2 1 as the sample size becomes large, where V = var(o 1 E 1 ): V = i d (i) n 1,(i) n 2,(i) (n (i) d (i) ) n 2 (i) (n (i) 1) This comes from the hypergeometric distribution (NB O 1 E 1 = (O 2 E 2 ), so either can be used in the definition)

8 Comparing Survival Curves Log rank test An alternative definition of the log-rank test statistic is 2 (E k O k ) 2 k=1 E k which is also asymptotically χ 2 1 under H 0. Either version can be used, but the second definition may be a bit more conservative (reject the null less often)

9 Log rank test in R survdiff{survival} Recall the Recidivism data; assume that fin (indicator of financial aid) is the only covariate > survdiff(surv(week, arrest) ~ fin, data=rossi) Call: survdiff(formula = Surv(week, arrest) ~ fin, data = Rossi) N Observed Expected (O-E)^2/E (O-E)^2/V fin= fin= Chisq= 3.8 on 1 degrees of freedom, p= The final column and the final line relate to the first test definition The penultimate column gives the statistics for the second test definition > [1] 3.82 > 1-pchisq(3.82,df=1) [1] The difference between groups is marginally non-significant at the 5% level

10 Comparing Survival Curves More than two groups The log-rank test can also be extended to compare more than two groups If there are G groups then the statistic asymptotically, where (O E) T V 1 (O E) χ 2 G 1 O E = (O 1 E 1,..., O G 1 E G 1 ) and V is its variance-covariance matrix. The formula G (E k O k ) 2 k=1 E k is also approximately χ 2 G 1 distributed.

11 Comparing Survival Curves Care needs to be taken when comparing groups with the log rank test, as the distribution of other covariates may not be the same within the two groups E.g. what if all the financial aid went to those over 40 years of age? This could cause us to infer a difference between groups, which is actually related to the effects of these other covariates More generally, we usually have a more complicated model than two groups, and so we want to know how to compare models in the presence of multiple covariates

12 Comparing Survival Curves Parametric models Let T denote the survival time, and z a vector of covariates Suppose that we model T z Weibull so that f (t z) = aλ a t a 1 e zβ exp{ (λt) a e zβ }, t 0. The log-likelihood for X i = min{t i, C i }, δ i = I(T i < C i ) is l(β, λ, α) = i δ i log f (x i z i, θ) + i (1 δ i ) log S(x i z i, θ). If we want to test any hypotheses about (β, λ, α), then we can use likelihood ratio tests

13 Likelihood ratio Paramteric models > wei <- survreg(surv(week,arrest) ~ fin + age + race + wexp + mar + paro + prio,data=rossi) > summary(wei) Call: survreg(formula = Surv(week, arrest) ~ fin + age + race + wexp + mar + paro + prio, data = Rossi) Value Std. Error z p (Intercept) e-21 fin e-02 age e-02 race e-01 wexp e-01 mar e-01 paro e-01 prio e-03 Log(scale) e-04 Scale= 0.712

14 Likelihood ratio Paramteric models paro does not look significant; fit the model without it: wei2 <- survreg(surv(week,arrest) ~ fin + age + race + wexp + mar + prio,data=rossi) > 2*(wei\$loglik[2] - wei2\$loglik[2]) [1] this likelihood ratio test has 1 df; so the critical value (at the 5% level) is We would not reject the hypothesis that paro has no effect.

15 Likelihood ratio Paramteric models Recall that the exponential model is a special case of the Weibull model Since it is nested, we can also do a likelihood ratio test for this > wei <- survreg(surv(week,arrest) ~ fin + age + race + wexp + mar + paro + prio,data=rossi) > expn <- survreg(surv(week,arrest) ~ fin + age + race + wexp + mar + paro + prio, dist="exponential", data=rossi) > 2*(wei\$loglik[2]-expn\$loglik[2]) [1] This is highly significant at the 5% level on 1 df.

16 Likelihood ratio Cox PH model Likelihood ratio tests are also applicable to the Cox Proportional Hazards partial likelihood The asymptotic distributions are the same, i.e. a likelihood ratio test with q constraints, asymptotically follows a χ 2 q distribution ## test for significance of paro under the Cox PH model > mod0 <- coxph(surv(week, arrest) ~ fin + age + race + wexp + mar + paro + prio,data=rossi) > mod1 <- coxph(surv(week, arrest) ~ fin + age + race + wexp + mar + prio, data=rossi) > 2*(mod0\$loglik[2]-mod1\$loglik[2]) [1] Not significant at the 5% level on 1 df

17 Cox PH model Other test statistics The output for a fitted Cox PH model gives three test statistics which compare the fitted model to a null model These are the likelihood ratio, Wald, and score tests all χ 2 under H 0 > mod5 <- coxph(surv(week, arrest) ~ age + prio,data=rossi) > summary(mod5) Call: coxph(formula = Surv(week, arrest) ~ age + prio, data = Rossi) n= 432, number of events= 114 coef exp(coef) se(coef) z Pr(> z ) age *** prio *** --- Signif. codes: 0 *** ** 0.01 * exp(coef) exp(-coef) lower.95 upper.95 age prio Rsquare= (max possible= ) Likelihood ratio test= on 2 df, p=2.657e-06 Wald test = on 2 df, p=4.766e-06 Score (logrank) test = on 2 df, p=2.723e-06 The score test is the same as the log rank test when there are only two groups

18 Part II Model Diagnostics in Proportional Hazards

19 Checking Proportional Hazards Schoenfeld residuals for the ith subject on the kth covariate is ˆr ik = δ i (z ik ˆ z xi k), where ˆ z xi k is given by j R(x i ) z jk e z j ˆβ j R(x i ) ez j ˆβ Scaled Schoenfeld residuals: ˆr ik = ˆr ik mean(ˆr ik, i = 1,..., n) sd(ˆr ik, i = 1,..., n) For PH model, the (scaled) residuals ˆr ik should exhibit a random (i.e. unsystematic) pattern at each failure time. Otherwise it suggests that as time passes, the covariate effect is changing.

20 Checking Proportional Hazards cox.zph{survival} mod2 <- coxph(surv(week, arrest) ~ fin + age + prio, data=rossi) cox.zph(mod2) rho chisq p fin age prio GLOBAL NA The function tests proportionality of all the predictors by looking at their interactions with time. The column rho is the Pearson correlation between the scaled Schoenfeld residuals and time for each covariate. The last row contains the global test for all the interactions tested at once. A p-value less than 0.05 indicates a violation of the proportionality assumption.

21 Checking Proportional Hazards plot, cox.zph {survival} Graphs of the scaled Schoenfeld residuals against time: par(mfrow=c(2,2)) plot(cox.zph(mod2)) Beta(t) for fin Time Beta(t) for age Time Beta(t) for prio Time The curve is a smoothing spline with ±2 standard-error envelopes around the fit. Systematic departures from a horizontal line are indicative of non-proportional hazards. Here, there appears to be a trend in the plot for age, with the age effect declining with time.

22 Checking linearity We assume in PH models that λ(t; z) = λ 0 (t)e zβ, This means that log λ(t; z) is linearly dependent on the covariates z. Is this true? The martingale residual for subject i is ˆM i = δ i e z i ˆβ xi 0 ˆλ 0 (u)du For each k, the plot of ˆM i against z ik, i = 1,..., n, should exhibit a random pattern with mean 0.

23 Checking linearity cox.zph{survival} res <- residuals(mod2, type="martingale") X <- as.matrix(rossi[,c("age", "prio")]) # matrix of covariates for (j in 1:2) { # residual plots plot(x[,j], res, xlab=c("age", "prio")[j], ylab="residuals") abline(h=0, lty=2) lines(lowess(x[,j], res, iter=0)) } residuals age residuals prio Nonlinearity is slight here.

24 Influential observations We want to check the influence of each observation on the estimate ˆβ. Let ˆβ i denote the estimated vector of coefficients computed on the sample with the ith subject deleted. The idea is to check for each i which component of the vector ˆβ ˆβ i has large absolute values. This involves fitting n + 1 Cox regression models, which can be computationally expensive. There is an approximation based on the fit obtained from the whole data: ˆβ ˆβ i can be approximated by dfbeta i = I( ˆβ) 1 (ˆr i1,..., ˆr ik ), where I( ˆβ) is the observed Fisher information matrix and, for k = 1,..., K, ˆr ik is a function of ˆβ and of Schoenfeld residual ˆr ik. Plots of the quantities ˆr ik against i are used to gauge the influence of the i subject on the k covariate.

25 dfbeta <- residuals(mod2, type="dfbeta") for (j in 1:3) { plot(dfbeta[,j], ylab=names(coef(mod2))[j]) abline(h=0, lty=2) } fin Index age Index prio Index Comparing the magnitudes of the largest dfbeta values to the regression coefficients ( 0.35, 0.07, 0.1) suggests that none of the observations is terribly influential individually.

Survival analysis methods in Insurance Applications in car insurance contracts

Survival analysis methods in Insurance Applications in car insurance contracts Abder OULIDI 1-2 Jean-Marie MARION 1 Hérvé GANACHAUD 3 1 Institut de Mathématiques Appliquées (IMA) Angers France 2 Institut

Chapter 7: Simple linear regression Learning Objectives

Chapter 7: Simple linear regression Learning Objectives Reading: Section 7.1 of OpenIntro Statistics Video: Correlation vs. causation, YouTube (2:19) Video: Intro to Linear Regression, YouTube (5:18) -

SUMAN DUVVURU STAT 567 PROJECT REPORT

SUMAN DUVVURU STAT 567 PROJECT REPORT SURVIVAL ANALYSIS OF HEROIN ADDICTS Background and introduction: Current illicit drug use among teens is continuing to increase in many countries around the world.

Cox Proportional-Hazards Regression for Survival Data in R

Cox Proportional-Hazards Regression for Survival Data in R An Appendix to An R Companion to Applied Regression, Second Edition John Fox & Sanford Weisberg last revision: 23 February 2011 Abstract Survival

: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

SAS Software to Fit the Generalized Linear Model

SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling

Nominal and ordinal logistic regression

Nominal and ordinal logistic regression April 26 Nominal and ordinal logistic regression Our goal for today is to briefly go over ways to extend the logistic regression model to the case where the outcome

Statistics in Retail Finance. Chapter 6: Behavioural models

Statistics in Retail Finance 1 Overview > So far we have focussed mainly on application scorecards. In this chapter we shall look at behavioural models. We shall cover the following topics:- Behavioural

A Handbook of Statistical Analyses Using R. Brian S. Everitt and Torsten Hothorn

A Handbook of Statistical Analyses Using R Brian S. Everitt and Torsten Hothorn CHAPTER 6 Logistic Regression and Generalised Linear Models: Blood Screening, Women s Role in Society, and Colonic Polyps

Regression Modeling Strategies

Frank E. Harrell, Jr. Regression Modeling Strategies With Applications to Linear Models, Logistic Regression, and Survival Analysis With 141 Figures Springer Contents Preface Typographical Conventions

Statistical Models in R

Statistical Models in R Some Examples Steven Buechler Department of Mathematics 276B Hurley Hall; 1-6233 Fall, 2007 Outline Statistical Models Structure of models in R Model Assessment (Part IA) Anova

Survey, Statistics and Psychometrics Core Research Facility University of Nebraska-Lincoln. Log-Rank Test for More Than Two Groups

Survey, Statistics and Psychometrics Core Research Facility University of Nebraska-Lincoln Log-Rank Test for More Than Two Groups Prepared by Harlan Sayles (SRAM) Revised by Julia Soulakova (Statistics)

Poisson Models for Count Data

Chapter 4 Poisson Models for Count Data In this chapter we study log-linear models for count data under the assumption of a Poisson error structure. These models have many applications, not only to the

Generalized Linear Models

Generalized Linear Models We have previously worked with regression models where the response variable is quantitative and normally distributed. Now we turn our attention to two types of models where the

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

Chapter 6: Multivariate Cointegration Analysis

Chapter 6: Multivariate Cointegration Analysis 1 Contents: Lehrstuhl für Department Empirische of Wirtschaftsforschung Empirical Research and und Econometrics Ökonometrie VI. Multivariate Cointegration

11. Analysis of Case-control Studies Logistic Regression

Research methods II 113 11. Analysis of Case-control Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:

Tests for Two Survival Curves Using Cox s Proportional Hazards Model

Chapter 730 Tests for Two Survival Curves Using Cox s Proportional Hazards Model Introduction A clinical trial is often employed to test the equality of survival distributions of two treatment groups.

DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9

DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9 Analysis of covariance and multiple regression So far in this course,

We extended the additive model in two variables to the interaction model by adding a third term to the equation.

Quadratic Models We extended the additive model in two variables to the interaction model by adding a third term to the equation. Similarly, we can extend the linear model in one variable to the quadratic

Introduction to General and Generalized Linear Models

Introduction to General and Generalized Linear Models General Linear Models - part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby

Least Squares Estimation

Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David

Statistical Models in R

Statistical Models in R Some Examples Steven Buechler Department of Mathematics 276B Hurley Hall; 1-6233 Fall, 2007 Outline Statistical Models Linear Models in R Regression Regression analysis is the appropriate

Lecture 14: GLM Estimation and Logistic Regression

Lecture 14: GLM Estimation and Logistic Regression Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South

Variance of OLS Estimators and Hypothesis Testing. Randomness in the model. GM assumptions. Notes. Notes. Notes. Charlie Gibbons ARE 212.

Variance of OLS Estimators and Hypothesis Testing Charlie Gibbons ARE 212 Spring 2011 Randomness in the model Considering the model what is random? Y = X β + ɛ, β is a parameter and not random, X may be

Multiple Linear Regression

Multiple Linear Regression A regression with two or more explanatory variables is called a multiple regression. Rather than modeling the mean response as a straight line, as in simple regression, it is

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Volkert Siersma siersma@sund.ku.dk The Research Unit for General Practice in Copenhagen Dias 1 Content Quantifying association

Time varying (or time-dependent) covariates

Chapter 9 Time varying (or time-dependent) covariates References: Allison (*) p.138-153 Hosmer & Lemeshow Chapter 7, Section 3 Kalbfleisch & Prentice Section 5.3 Collett Chapter 7 Kleinbaum Chapter 6 Cox

STATA Tutorial Professor Erdinç Please follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software 1.Wald Test Wald Test is used

Introduction to Survival Analysis

John Fox Lecture Notes Introduction to Survival Analysis Copyright 2014 by John Fox Introduction to Survival Analysis 1 1. Introduction I Survival analysis encompasses a wide variety of methods for analyzing

Lecture 3: Linear methods for classification

Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,

Modeling the Claim Duration of Income Protection Insurance Policyholders Using Parametric Mixture Models

Modeling the Claim Duration of Income Protection Insurance Policyholders Using Parametric Mixture Models Abstract This paper considers the modeling of claim durations for existing claimants under income

Regression, least squares

Regression, least squares Joe Felsenstein Department of Genome Sciences and Department of Biology Regression, least squares p.1/24 Fitting a straight line X Two distinct cases: The X values are chosen

Inferential Statistics

Inferential Statistics Sampling and the normal distribution Z-scores Confidence levels and intervals Hypothesis testing Commonly used statistical methods Inferential Statistics Descriptive statistics are

Parametric Models. dh(t) dt > 0 (1)

Parametric Models: The Intuition Parametric Models As we saw early, a central component of duration analysis is the hazard rate. The hazard rate is the probability of experiencing an event at time t i

Simple linear regression

Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between

CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression

Opening Example CHAPTER 13 SIMPLE LINEAR REGREION SIMPLE LINEAR REGREION! Simple Regression! Linear Regression Simple Regression Definition A regression model is a mathematical equation that descries the

Lecture 16: Generalized Additive Models Regression III: Advanced Methods Bill Jacoby Michigan State University http://polisci.msu.edu/jacoby/icpsr/regress3 Goals of the Lecture Introduce Additive Models

" Y. Notation and Equations for Regression Lecture 11/4. Notation:

Notation: Notation and Equations for Regression Lecture 11/4 m: The number of predictor variables in a regression Xi: One of multiple predictor variables. The subscript i represents any number from 1 through

Lecture 15 Introduction to Survival Analysis

Lecture 15 Introduction to Survival Analysis BIOST 515 February 26, 2004 BIOST 515, Lecture 15 Background In logistic regression, we were interested in studying how risk factors were associated with presence

Multivariate Analysis of Variance (MANOVA): I. Theory

Gregory Carey, 1998 MANOVA: I - 1 Multivariate Analysis of Variance (MANOVA): I. Theory Introduction The purpose of a t test is to assess the likelihood that the means for two groups are sampled from the

Simple Linear Regression Inference

Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation

Regression Analysis: A Complete Example

Regression Analysis: A Complete Example This section works out an example that includes all the topics we have discussed so far in this chapter. A complete example of regression analysis. PhotoDisc, Inc./Getty

Lecture 6: Poisson regression

Lecture 6: Poisson regression Claudia Czado TU München c (Claudia Czado, TU Munich) ZFS/IMS Göttingen 2004 0 Overview Introduction EDA for Poisson regression Estimation and testing in Poisson regression

E(y i ) = x T i β. yield of the refined product as a percentage of crude specific gravity vapour pressure ASTM 10% point ASTM end point in degrees F

Random and Mixed Effects Models (Ch. 10) Random effects models are very useful when the observations are sampled in a highly structured way. The basic idea is that the error associated with any linear,

Generalized Linear Models. Today: definition of GLM, maximum likelihood estimation. Involves choice of a link function (systematic component)

Generalized Linear Models Last time: definition of exponential family, derivation of mean and variance (memorize) Today: definition of GLM, maximum likelihood estimation Include predictors x i through

Linda Staub & Alexandros Gekenidis

Seminar in Statistics: Survival Analysis Chapter 2 Kaplan-Meier Survival Curves and the Log- Rank Test Linda Staub & Alexandros Gekenidis March 7th, 2011 1 Review Outcome variable of interest: time until

Statistics in Retail Finance. Chapter 2: Statistical models of default

Statistics in Retail Finance 1 Overview > We consider how to build statistical models of default, or delinquency, and how such models are traditionally used for credit application scoring and decision

Indices of Model Fit STRUCTURAL EQUATION MODELING 2013

Indices of Model Fit STRUCTURAL EQUATION MODELING 2013 Indices of Model Fit A recommended minimal set of fit indices that should be reported and interpreted when reporting the results of SEM analyses:

e = random error, assumed to be normally distributed with mean 0 and standard deviation σ

1 Linear Regression 1.1 Simple Linear Regression Model The linear regression model is applied if we want to model a numeric response variable and its dependency on at least one numeric factor variable.

Psychology 205: Research Methods in Psychology

Psychology 205: Research Methods in Psychology Using R to analyze the data for study 2 Department of Psychology Northwestern University Evanston, Illinois USA November, 2012 1 / 38 Outline 1 Getting ready

5. Linear Regression

5. Linear Regression Outline.................................................................... 2 Simple linear regression 3 Linear model............................................................. 4

Elements of statistics (MATH0487-1)

Elements of statistics (MATH0487-1) Prof. Dr. Dr. K. Van Steen University of Liège, Belgium December 10, 2012 Introduction to Statistics Basic Probability Revisited Sampling Exploratory Data Analysis -

Survival Analysis Using SPSS. By Hui Bian Office for Faculty Excellence

Survival Analysis Using SPSS By Hui Bian Office for Faculty Excellence Survival analysis What is survival analysis Event history analysis Time series analysis When use survival analysis Research interest

Logistic (RLOGIST) Example #1

Logistic (RLOGIST) Example #1 SUDAAN Statements and Results Illustrated EFFECTS RFORMAT, RLABEL REFLEVEL EXP option on MODEL statement Hosmer-Lemeshow Test Input Data Set(s): BRFWGT.SAS7bdat Example Using

Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares

Topic 4 - Analysis of Variance Approach to Regression Outline Partitioning sums of squares Degrees of freedom Expected mean squares General linear test - Fall 2013 R 2 and the coefficient of correlation

2. Simple Linear Regression

Research methods - II 3 2. Simple Linear Regression Simple linear regression is a technique in parametric statistics that is commonly used for analyzing mean response of a variable Y which changes according

Penalized regression: Introduction

Penalized regression: Introduction Patrick Breheny August 30 Patrick Breheny BST 764: Applied Statistical Modeling 1/19 Maximum likelihood Much of 20th-century statistics dealt with maximum likelihood

Checking proportionality for Cox s regression model

Checking proportionality for Cox s regression model by Hui Hong Zhang Thesis for the degree of Master of Science (Master i Modellering og dataanalyse) Department of Mathematics Faculty of Mathematics and

ANALYSING LIKERT SCALE/TYPE DATA, ORDINAL LOGISTIC REGRESSION EXAMPLE IN R.

ANALYSING LIKERT SCALE/TYPE DATA, ORDINAL LOGISTIC REGRESSION EXAMPLE IN R. 1. Motivation. Likert items are used to measure respondents attitudes to a particular question or statement. One must recall

MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group

MISSING DATA TECHNIQUES WITH SAS IDRE Statistical Consulting Group ROAD MAP FOR TODAY To discuss: 1. Commonly used techniques for handling missing data, focusing on multiple imputation 2. Issues that could

MGT 267 PROJECT. Forecasting the United States Retail Sales of the Pharmacies and Drug Stores. Done by: Shunwei Wang & Mohammad Zainal

MGT 267 PROJECT Forecasting the United States Retail Sales of the Pharmacies and Drug Stores Done by: Shunwei Wang & Mohammad Zainal Dec. 2002 The retail sale (Million) ABSTRACT The present study aims

Multinomial and Ordinal Logistic Regression

Multinomial and Ordinal Logistic Regression ME104: Linear Regression Analysis Kenneth Benoit August 22, 2012 Regression with categorical dependent variables When the dependent variable is categorical,

MSwM examples. Jose A. Sanchez-Espigares, Alberto Lopez-Moreno Dept. of Statistics and Operations Research UPC-BarcelonaTech.

MSwM examples Jose A. Sanchez-Espigares, Alberto Lopez-Moreno Dept. of Statistics and Operations Research UPC-BarcelonaTech February 24, 2014 Abstract Two examples are described to illustrate the use of

EDUCATION AND VOCABULARY MULTIPLE REGRESSION IN ACTION

EDUCATION AND VOCABULARY MULTIPLE REGRESSION IN ACTION EDUCATION AND VOCABULARY 5-10 hours of input weekly is enough to pick up a new language (Schiff & Myers, 1988). Dutch children spend 5.5 hours/day

Time-Series Regression and Generalized Least Squares in R

Time-Series Regression and Generalized Least Squares in R An Appendix to An R Companion to Applied Regression, Second Edition John Fox & Sanford Weisberg last revision: 11 November 2010 Abstract Generalized

This can dilute the significance of a departure from the null hypothesis. We can focus the test on departures of a particular form.

One-Degree-of-Freedom Tests Test for group occasion interactions has (number of groups 1) number of occasions 1) degrees of freedom. This can dilute the significance of a departure from the null hypothesis.

Interaction between quantitative predictors

Interaction between quantitative predictors In a first-order model like the ones we have discussed, the association between E(y) and a predictor x j does not depend on the value of the other predictors

SPSS TRAINING SESSION 3 ADVANCED TOPICS (PASW STATISTICS 17.0) Sun Li Centre for Academic Computing lsun@smu.edu.sg

SPSS TRAINING SESSION 3 ADVANCED TOPICS (PASW STATISTICS 17.0) Sun Li Centre for Academic Computing lsun@smu.edu.sg IN SPSS SESSION 2, WE HAVE LEARNT: Elementary Data Analysis Group Comparison & One-way

ANOVA. February 12, 2015

ANOVA February 12, 2015 1 ANOVA models Last time, we discussed the use of categorical variables in multivariate regression. Often, these are encoded as indicator columns in the design matrix. In [1]: %%R

Interpretation of Somers D under four simple models

Interpretation of Somers D under four simple models Roger B. Newson 03 September, 04 Introduction Somers D is an ordinal measure of association introduced by Somers (96)[9]. It can be defined in terms

Régression logistique : introduction

Chapitre 16 Introduction à la statistique avec R Régression logistique : introduction Une variable à expliquer binaire Expliquer un risque suicidaire élevé en prison par La durée de la peine L existence

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years

Chapter 13 Introduction to Linear Regression and Correlation Analysis

Chapter 3 Student Lecture Notes 3- Chapter 3 Introduction to Linear Regression and Correlation Analsis Fall 2006 Fundamentals of Business Statistics Chapter Goals To understand the methods for displaing

Examining a Fitted Logistic Model

STAT 536 Lecture 16 1 Examining a Fitted Logistic Model Deviance Test for Lack of Fit The data below describes the male birth fraction male births/total births over the years 1931 to 1990. A simple logistic

Statistics Graduate Courses STAT 7002--Topics in Statistics-Biological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.

Nonlinear Regression Functions. SW Ch 8 1/54/

Nonlinear Regression Functions SW Ch 8 1/54/ The TestScore STR relation looks linear (maybe) SW Ch 8 2/54/ But the TestScore Income relation looks nonlinear... SW Ch 8 3/54/ Nonlinear Regression General

University of Ljubljana Doctoral Programme in Statistics Methodology of Statistical Research Written examination February 14 th, 2014.

University of Ljubljana Doctoral Programme in Statistics ethodology of Statistical Research Written examination February 14 th, 2014 Name and surname: ID number: Instructions Read carefully the wording

COMPARISONS OF CUSTOMER LOYALTY: PUBLIC & PRIVATE INSURANCE COMPANIES.

277 CHAPTER VI COMPARISONS OF CUSTOMER LOYALTY: PUBLIC & PRIVATE INSURANCE COMPANIES. This chapter contains a full discussion of customer loyalty comparisons between private and public insurance companies

Spearman s correlation

Spearman s correlation Introduction Before learning about Spearman s correllation it is important to understand Pearson s correlation which is a statistical measure of the strength of a linear relationship

Ordinal Regression. Chapter

Ordinal Regression Chapter 4 Many variables of interest are ordinal. That is, you can rank the values, but the real distance between categories is unknown. Diseases are graded on scales from least severe

Correlation and Simple Linear Regression

Correlation and Simple Linear Regression We are often interested in studying the relationship among variables to determine whether they are associated with one another. When we think that changes in a

LOGISTIC REGRESSION ANALYSIS

LOGISTIC REGRESSION ANALYSIS C. Mitchell Dayton Department of Measurement, Statistics & Evaluation Room 1230D Benjamin Building University of Maryland September 1992 1. Introduction and Model Logistic

Developing Business Failure Prediction Models Using SAS Software Oki Kim, Statistical Analytics

Paper SD-004 Developing Business Failure Prediction Models Using SAS Software Oki Kim, Statistical Analytics ABSTRACT The credit crisis of 2008 has changed the climate in the investment and finance industry.

Semiparametric Multinomial Logit Models for the Analysis of Brand Choice Behaviour

Semiparametric Multinomial Logit Models for the Analysis of Brand Choice Behaviour Thomas Kneib Department of Statistics Ludwig-Maximilians-University Munich joint work with Bernhard Baumgartner & Winfried

Sydney Roberts Predicting Age Group Swimmers 50 Freestyle Time 1. 1. Introduction p. 2. 2. Statistical Methods Used p. 5. 3. 10 and under Males p.

Sydney Roberts Predicting Age Group Swimmers 50 Freestyle Time 1 Table of Contents 1. Introduction p. 2 2. Statistical Methods Used p. 5 3. 10 and under Males p. 8 4. 11 and up Males p. 10 5. 10 and under

Factor analysis. Angela Montanari

Factor analysis Angela Montanari 1 Introduction Factor analysis is a statistical model that allows to explain the correlations between a large number of observed correlated variables through a small number

Simple Linear Regression in SPSS STAT 314

Simple Linear Regression in SPSS STAT 314 1. Ten Corvettes between 1 and 6 years old were randomly selected from last year s sales records in Virginia Beach, Virginia. The following data were obtained,

KSTAT MINI-MANUAL. Decision Sciences 434 Kellogg Graduate School of Management

KSTAT MINI-MANUAL Decision Sciences 434 Kellogg Graduate School of Management Kstat is a set of macros added to Excel and it will enable you to do the statistics required for this course very easily. To

Multivariate Logistic Regression

1 Multivariate Logistic Regression As in univariate logistic regression, let π(x) represent the probability of an event that depends on p covariates or independent variables. Then, using an inv.logit formulation

Lecture 8: Gamma regression

Lecture 8: Gamma regression Claudia Czado TU München c (Claudia Czado, TU Munich) ZFS/IMS Göttingen 2004 0 Overview Models with constant coefficient of variation Gamma regression: estimation and testing

Descriptive Statistics

Descriptive Statistics Primer Descriptive statistics Central tendency Variation Relative position Relationships Calculating descriptive statistics Descriptive Statistics Purpose to describe or summarize

Biostatistics: Types of Data Analysis

Biostatistics: Types of Data Analysis Theresa A Scott, MS Vanderbilt University Department of Biostatistics theresa.scott@vanderbilt.edu http://biostat.mc.vanderbilt.edu/theresascott Theresa A Scott, MS

GLM I An Introduction to Generalized Linear Models

GLM I An Introduction to Generalized Linear Models CAS Ratemaking and Product Management Seminar March 2009 Presented by: Tanya D. Havlicek, Actuarial Assistant 0 ANTITRUST Notice The Casualty Actuarial

Overview Classes. 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7)

Overview Classes 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7) 2-4 Loglinear models (8) 5-4 15-17 hrs; 5B02 Building and

Developing Risk Adjustment Techniques Using the SAS@ System for Assessing Health Care Quality in the lmsystem@

Developing Risk Adjustment Techniques Using the SAS@ System for Assessing Health Care Quality in the lmsystem@ Yanchun Xu, Andrius Kubilius Joint Commission on Accreditation of Healthcare Organizations,

Week 5: Multiple Linear Regression

BUS41100 Applied Regression Analysis Week 5: Multiple Linear Regression Parameter estimation and inference, forecasting, diagnostics, dummy variables Robert B. Gramacy The University of Chicago Booth School