Score Test of Proportionality Assumption for Cox Models Xiao Chen, Statistical Consulting Group UCLA, Los Angeles, CA
|
|
|
- Britton Morgan
- 9 years ago
- Views:
Transcription
1 Score Test of Proportionality Assumption for Cox Models Xiao Chen, Statistical Consulting Group UCLA, Los Angeles, CA ABSTRACT Assessing the proportional hazards assumption is an important step to validate a Cox model for survival data. This paper provides a macro program of a score test based on scaled Schoenfeld residuals using SAS PROC IML with different choices of function forms of time variable. An example is presented to demonstrate the use of the score test and graphical tools in assessing the proportionality assumption. INTRODUCTION Cox proportional-hazards regression models are used widely for analyzing survival data and a key assumption in the Cox models is that the effect of any predictor variable is constant over time. There are two types of tests for proportionality assumption. One type is the Wald test for individual predictors and the partial likelihood ratio test for the global test. This can be performed using PROC PHREG in SAS by creating time varying covariates and using the test statement. The other type is the test based on the scaled Schoenfeld residuals, which will be presented here. In this case, testing the time dependent covariates is equivalent to testing for a non-zero slope in a generalized linear regression of the scaled Schoenfeld residuals on functions of time. A non-zero slope is an indication of a violation of the proportional hazard assumption. We can also perform an overall test on multiple predictor variables. The common choices for functions of time include log, rank and Kaplan-Meier together with the identity function, all of which have been included in the macro program, ph_score_test, presented here. As with any regression, it is very helpful to graph the scaled Schoenfeld residuals against a time variable so we can visually inspect possible patterns in addition to performing the tests of non-zero slopes. There are certain types on non-proportionality that will not be detected by the tests of non-zero slopes alone but that might become obvious when looking at the graphs of the residuals such as nonlinear relationship (i.e., a quadratic fit) between the residuals and the function of time or undue influence of outliers. SCORE TEST BASED ON SCALED SCHOENFELD RESIDUALS Schoenfeld residuals after a Cox model are defined for each predictor variable in the model. That is to say that the number of Schoenfeld residual variables is the same as the number of predictor variables. They are based on the contributions of each of the predictor variable to the log partial likelihood. Grambsch and Therneau (1994) show that scaled Schoenfeld residuals can be of a great use in diagnostics of Cox regression models, especially in assessing the proportional hazards assumption. In theory, the scaled Schoenfeld residuals are Schoenfeld residuals adjusted by the inverse of the covariance matrix of the Schoenfeld residuals. Grambsch and Therneau (1994) suggest that under the assumption that that the distribution of the predictor variable is similar in the various risk sets, the adjustment can be performed using the variance-covariance matrix of the parameter estimates divided by the number of events in the sample. The null hypothesis for the test on proportional hazards based on the scaled Schoenfeld residuals is that the slope of Schoenfeld residuals against a function of time is zero for each predictor variable. Once the scaled Schoenfeld residuals are created, we can perform this test using generalized linear regression approach. More precisely, the test statistic on an individual predictor variable is In this formula, r s is the variable of scaled Schoenfeld residuals, g(t) is the function of time predefined before the test, δ i is the indicator variable of event, Δ is the total number of events and V uu is the estimate for the variance of the parameter estimate of the predictor variable of interest. The sum is taken over all the observations in the data. It is asymptotically distributed as a χ 2 with 1 degree of freedom. The test statistic for the overall test on p predictor variables is as follows. where r i is the vector of the unscaled Schoenfeld residuals of interest. It has p degrees of freedom with asymptotically χ 2 distribution. 1
2 AN EXAMPLE The data set used for this example is taken Applied Survival Analysis: Regression Modeling of Time to Event Data, Chapter 6. The data set can be downloaded following the link. The time to event variable is lenfol and the censor variable is fstat. The predictor variables that we will use for the example are age, bmi, hr (heart rate) and gender. In this example, we will show how to manually create scaled Schoenfeld residuals and how to graphically inspect the possible deviation from the assumption of proportional hazards. We first run the Cox model using PROC PHREG. In this run, two data sets are created, the data set that contains the variance-covariance matrix, named est created using the outset option and another data set containing the Schoenfeld residuals for each predictor variable, named res, using the output statement. proc phreg data = whas500 outest=est covout; model lenfol*fstat(0) = age bmi hr gender; id id; output out=res ressch = age_r bmi_r hr_r gender_r; In order to create the scaled Schoenfeld residuals, we need to get the information on the total number of events. We use proc sql to sum up the censor variable and store the information in a macro variable called total. proc sql noprint; select sum(fstat) into :total from whas500; Now we have all the information we need for adjusting the Schoenfeld residuals using proc iml. proc iml; use res; read all variables {age_r bmi_r hr_r gender_r} into L where (fstat = 1); read all variables {lenfol fstat} into X where (fstat = 1); use est; read all var {age bmi hr gender} into V where (_type_ = "COV"); ssr = (&total)*l*v; W = X ssr; create p var {lenfol fstat sage_r sbmi_r shr_r sgender_r}; append from W; At this point, a data set called p has been created. This data set has the time variable, the censor variable and all the scaled Schoenfeld residual variables. To visually inspecting the trend, we can also make use some nonparametric smoothing technique such as provided by proc loess shown below for scaled Schoenfeld residual variable for the predictor variable hr (heart rate). This process will have to be done repeatedly for each Schoenfeld residual variable related to each predictor variable in the model. For the illustration purpose, we just show one. proc loess data=ats.p; model shr_r=lenfol /smooth=0.4; ods output OutputStatistics=myout; 2
3 Now we have done all the preparation for displaying the trend of scaled Schoenfeld residual for heart rate against the original time variable, lenfol. proc sort data = myout; by lenfol; symbol1 c = gray i = none v = circle h=.8 ; symbol2 c = black i = join v = none w=2.5; axis1 order=(-.1 to.15 by.05) minor=none label=(a=90 'Scaled Schonefeld Residuals') ; axis2 order=(0 to 2400 by 400) label=('time') minor=none; proc gplot data = myout; plot DepVar*lenfol=1 Pred*lenfol=2 /vaxis = axis1 haxis = axis2 vref=0 overlay; The plot does not show a strong trend along the original time variable, even though there is a slight sign of negative slope by the loess estimate. So far we have shown how to create the scaled Schoenfeld residuals from Schoenfeld residuals that SAS provided via PROC IML. We can also apply the macro program phreg_score_test to perform the test as shown below. %phreg_score_test(lenfol, fstat, age bmi hr gender, data=whas500); 3
4 The first column is the correlation of the scaled Schoenfeld residuals with the time variable. The second column is the test statistic defined previously. The global test is to test simultaneously all the slopes are zero. All the p-values are fairly large, indicating that the slopes are zero. REMARK Different common transformations of the time variable are available. These are rank, log and Kaplan-Meier estimate. The default transformation of the macro program phreg_score_test is the identity function. To specify other type of transformation of time, one can simply use the option type= as shown in the examples below. Even though, some simulation has been done to show that the log transformed time variable works pretty well, there are other situations where the behavior of the different time variables do differ. The decision on which time variable to use is case by case, largely depending on the theory and focus of the researchers. %phreg_score_test(lenfol, fstat, bmifp1 bmifp2, data=whas500); %phreg_score_test(lenfol, fstat, bmifp1 bmifp2, data=whas500, type="rank"); %phreg_score_test(lenfol, fstat, bmifp1 bmifp2, data=whas500, type="logtime"); %phreg_score_test(lenfol, fstat, bmifp1 bmifp2, data=whas500, type="km"); We will include a segment of the macro program to show what is involved in the computation. %macro phreg_score_test(time, event, xvars, strata, weight=, data=_last_, type="time"); %let xvar_r =; %let k = 1; %let v = %scan(&xvars, 1); %do %while ("&v"~=""); %let xvar_r = &xvar_r &v._r; %let k = %eval(&k + 1); %let v = %scan(&xvars, &k); %end; %let varnames = &time &xvars &xvar_r; ods listing close; proc phreg data=&data covout outest=_est_ (drop=_lnlike_); model &time*&event(0) = &xvars; strata &strata; output out = _res_ (where = (&event=1)) ressch = &xvar_r; proc sort data = _res_; by &time; /*counting the number of total events*/ proc sql noprint; select sum(&event) into :delta from &data; ods listing; 4
5 proc iml; reset noname printadv = 1; use _res_; read all variables {&xvar_r} into S; use _tvars_; read all variables {&time _logtime _Rtime s} into T; c = ncol(s); r = nrow(s); use _est_; read all var _num_ into V where (_TYPE_^="PARMS"); read all var {_name_} into N where (_TYPE_^="PARMS"); sv = J(r, c, 0); sv = &delta*s*v; %if (%upcase(&type)="time") %then %do; gbar = sum(t[,1])/δ top = J(c, 1, 0); top[i] = sum((t[, 1]-gbar)#sv[, i])**2; end; bottom = J(c, 1, 1); bottom[i] = &delta*t(t[,1]-gbar)*(t[,1]-gbar)*v[i,i]; end; chi2 = top/bottom; X = J(c+1, 4, 0); print "Score test of proportional hazards assumption"; print "Time variable: &time"; ct = T[, 1] - sum(t[,1])/δ norm_ct = sqrt(t(ct)*ct); csv = sv[, i] - sum(sv[,i])/δ n_csv = sqrt(t(csv)*csv); X[i, 1] = t(ct)*csv/(norm_ct*n_csv); /*correlation*/ X[i, 2] = chi2[i]; /*cstat*/ X[i, 3] = 1; X[i, 4] = 1- probchi(chi2[i], 1); /*probchi2*/ end; /* individual test*/ rname = N//"Global test"; cname={"rho" "Chi-Square" "df" "P-value"}; rowmat = J(1,c,1); a = (T[,1]-gbar)#S; rowmat[i] =sum(a[, i]); end; global = &delta*(rowmat*v*t(rowmat))/(t(t[,1]-gbar)*(t[,1]-gbar)); probchi2=1-probchi(global,c); X[c+1, 1] =.; X[c+1, 2] = global; X[c+1, 3] = c; X[c+1, 4] = probchi2; print x[rowname=rname colname=cname format=12.3]; %end; CONCLUSION This paper offers an implementation of the test on proportional hazards based on scaled Schoenfeld residuals. The implementation uses PROC IML and is embedded in a macro program. It offers both test on individual predictors and a global test on collectively all the variables of interest at once. It offers four different transformations of the time variable. The macro program can be downloaded following the link. For more examples on using this macro program, visit the textbook example page Chapter 6 of Applied Survival Analysis created by the Statistical Consulting Group at UCLA. 5
6 REFERENCES P. M. Grambsch, T. M. Therneau, Proportional hazards tests and diagnostics based on weighted residuals. Biometrika, 81: , 1994 D. W. Hosmer, Jr., S. Lemeshow and S. May, Applied Survival Analysis: Regression Modeling of Time to Event Data, 2 nd Edition, 2008 T. M. Therneau, P. M. Grambsch, Modeling Survival Data Extending the Cox Model, Springer-Verlag, New York 2000 CONTACT INFORMATION Your comments and questions are valued and encouraged. Contact the author at: Xiao Chen Statistical Consulting Group UCLA Academic Technology Services 5308 Math Sciences Box Los Angeles, CA Work Phone: (310) Fax: (310) [email protected] Web: SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. indicates USA registration. Other brand and product names are trademarks of their respective companies. 6
Applied Statistics. J. Blanchet and J. Wadsworth. Institute of Mathematics, Analysis, and Applications EPF Lausanne
Applied Statistics J. Blanchet and J. Wadsworth Institute of Mathematics, Analysis, and Applications EPF Lausanne An MSc Course for Applied Mathematicians, Fall 2012 Outline 1 Model Comparison 2 Model
Developing Business Failure Prediction Models Using SAS Software Oki Kim, Statistical Analytics
Paper SD-004 Developing Business Failure Prediction Models Using SAS Software Oki Kim, Statistical Analytics ABSTRACT The credit crisis of 2008 has changed the climate in the investment and finance industry.
STATISTICA Formula Guide: Logistic Regression. Table of Contents
: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary
Survey, Statistics and Psychometrics Core Research Facility University of Nebraska-Lincoln. Log-Rank Test for More Than Two Groups
Survey, Statistics and Psychometrics Core Research Facility University of Nebraska-Lincoln Log-Rank Test for More Than Two Groups Prepared by Harlan Sayles (SRAM) Revised by Julia Soulakova (Statistics)
Regression Modeling Strategies
Frank E. Harrell, Jr. Regression Modeling Strategies With Applications to Linear Models, Logistic Regression, and Survival Analysis With 141 Figures Springer Contents Preface Typographical Conventions
Survival analysis methods in Insurance Applications in car insurance contracts
Survival analysis methods in Insurance Applications in car insurance contracts Abder OULIDI 1-2 Jean-Marie MARION 1 Hérvé GANACHAUD 3 1 Institut de Mathématiques Appliquées (IMA) Angers France 2 Institut
Checking proportionality for Cox s regression model
Checking proportionality for Cox s regression model by Hui Hong Zhang Thesis for the degree of Master of Science (Master i Modellering og dataanalyse) Department of Mathematics Faculty of Mathematics and
Simple linear regression
Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between
Competing-risks regression
Competing-risks regression Roberto G. Gutierrez Director of Statistics StataCorp LP Stata Conference Boston 2010 R. Gutierrez (StataCorp) Competing-risks regression July 15-16, 2010 1 / 26 Outline 1. Overview
Notes on Applied Linear Regression
Notes on Applied Linear Regression Jamie DeCoster Department of Social Psychology Free University Amsterdam Van der Boechorststraat 1 1081 BT Amsterdam The Netherlands phone: +31 (0)20 444-8935 email:
SAS Software to Fit the Generalized Linear Model
SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling
SPSS TRAINING SESSION 3 ADVANCED TOPICS (PASW STATISTICS 17.0) Sun Li Centre for Academic Computing [email protected]
SPSS TRAINING SESSION 3 ADVANCED TOPICS (PASW STATISTICS 17.0) Sun Li Centre for Academic Computing [email protected] IN SPSS SESSION 2, WE HAVE LEARNT: Elementary Data Analysis Group Comparison & One-way
Tests for Two Survival Curves Using Cox s Proportional Hazards Model
Chapter 730 Tests for Two Survival Curves Using Cox s Proportional Hazards Model Introduction A clinical trial is often employed to test the equality of survival distributions of two treatment groups.
Please follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software
STATA Tutorial Professor Erdinç Please follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software 1.Wald Test Wald Test is used
Imputing Missing Data using SAS
ABSTRACT Paper 3295-2015 Imputing Missing Data using SAS Christopher Yim, California Polytechnic State University, San Luis Obispo Missing data is an unfortunate reality of statistics. However, there are
Modeling the Claim Duration of Income Protection Insurance Policyholders Using Parametric Mixture Models
Modeling the Claim Duration of Income Protection Insurance Policyholders Using Parametric Mixture Models Abstract This paper considers the modeling of claim durations for existing claimants under income
Gamma Distribution Fitting
Chapter 552 Gamma Distribution Fitting Introduction This module fits the gamma probability distributions to a complete or censored set of individual or grouped data values. It outputs various statistics
Getting Correct Results from PROC REG
Getting Correct Results from PROC REG Nathaniel Derby, Statis Pro Data Analytics, Seattle, WA ABSTRACT PROC REG, SAS s implementation of linear regression, is often used to fit a line without checking
Statistics in Retail Finance. Chapter 6: Behavioural models
Statistics in Retail Finance 1 Overview > So far we have focussed mainly on application scorecards. In this chapter we shall look at behavioural models. We shall cover the following topics:- Behavioural
Simple Linear Regression Inference
Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation
Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression
Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Objectives: To perform a hypothesis test concerning the slope of a least squares line To recognize that testing for a
Family economics data: total family income, expenditures, debt status for 50 families in two cohorts (A and B), annual records from 1990 1995.
Lecture 18 1. Random intercepts and slopes 2. Notation for mixed effects models 3. Comparing nested models 4. Multilevel/Hierarchical models 5. SAS versions of R models in Gelman and Hill, chapter 12 1
How Does My TI-84 Do That
How Does My TI-84 Do That A guide to using the TI-84 for statistics Austin Peay State University Clarksville, Tennessee How Does My TI-84 Do That A guide to using the TI-84 for statistics Table of Contents
11. Analysis of Case-control Studies Logistic Regression
Research methods II 113 11. Analysis of Case-control Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:
containing Kendall correlations; and the OUTH = option will create a data set containing Hoeffding statistics.
Getting Correlations Using PROC CORR Correlation analysis provides a method to measure the strength of a linear relationship between two numeric variables. PROC CORR can be used to compute Pearson product-moment
Data Mining: An Overview of Methods and Technologies for Increasing Profits in Direct Marketing. C. Olivia Rud, VP, Fleet Bank
Data Mining: An Overview of Methods and Technologies for Increasing Profits in Direct Marketing C. Olivia Rud, VP, Fleet Bank ABSTRACT Data Mining is a new term for the common practice of searching through
Modeling Lifetime Value in the Insurance Industry
Modeling Lifetime Value in the Insurance Industry C. Olivia Parr Rud, Executive Vice President, Data Square, LLC ABSTRACT Acquisition modeling for direct mail insurance has the unique challenge of targeting
Linear Models in STATA and ANOVA
Session 4 Linear Models in STATA and ANOVA Page Strengths of Linear Relationships 4-2 A Note on Non-Linear Relationships 4-4 Multiple Linear Regression 4-5 Removal of Variables 4-8 Independent Samples
LOGISTIC REGRESSION ANALYSIS
LOGISTIC REGRESSION ANALYSIS C. Mitchell Dayton Department of Measurement, Statistics & Evaluation Room 1230D Benjamin Building University of Maryland September 1992 1. Introduction and Model Logistic
Least Squares Estimation
Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David
Assumptions. Assumptions of linear models. Boxplot. Data exploration. Apply to response variable. Apply to error terms from linear model
Assumptions Assumptions of linear models Apply to response variable within each group if predictor categorical Apply to error terms from linear model check by analysing residuals Normality Homogeneity
X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)
CORRELATION AND REGRESSION / 47 CHAPTER EIGHT CORRELATION AND REGRESSION Correlation and regression are statistical methods that are commonly used in the medical literature to compare two or more variables.
Adequacy of Biomath. Models. Empirical Modeling Tools. Bayesian Modeling. Model Uncertainty / Selection
Directions in Statistical Methodology for Multivariable Predictive Modeling Frank E Harrell Jr University of Virginia Seattle WA 19May98 Overview of Modeling Process Model selection Regression shape Diagnostics
Module 3: Correlation and Covariance
Using Statistical Data to Make Decisions Module 3: Correlation and Covariance Tom Ilvento Dr. Mugdim Pašiƒ University of Delaware Sarajevo Graduate School of Business O ften our interest in data analysis
Elements of statistics (MATH0487-1)
Elements of statistics (MATH0487-1) Prof. Dr. Dr. K. Van Steen University of Liège, Belgium December 10, 2012 Introduction to Statistics Basic Probability Revisited Sampling Exploratory Data Analysis -
Homework 11. Part 1. Name: Score: / null
Name: Score: / Homework 11 Part 1 null 1 For which of the following correlations would the data points be clustered most closely around a straight line? A. r = 0.50 B. r = -0.80 C. r = 0.10 D. There is
Predicting Customer Churn in the Telecommunications Industry An Application of Survival Analysis Modeling Using SAS
Paper 114-27 Predicting Customer in the Telecommunications Industry An Application of Survival Analysis Modeling Using SAS Junxiang Lu, Ph.D. Sprint Communications Company Overland Park, Kansas ABSTRACT
Introduction to Data Analysis in Hierarchical Linear Models
Introduction to Data Analysis in Hierarchical Linear Models April 20, 2007 Noah Shamosh & Frank Farach Social Sciences StatLab Yale University Scope & Prerequisites Strong applied emphasis Focus on HLM
SUMAN DUVVURU STAT 567 PROJECT REPORT
SUMAN DUVVURU STAT 567 PROJECT REPORT SURVIVAL ANALYSIS OF HEROIN ADDICTS Background and introduction: Current illicit drug use among teens is continuing to increase in many countries around the world.
Session 7 Bivariate Data and Analysis
Session 7 Bivariate Data and Analysis Key Terms for This Session Previously Introduced mean standard deviation New in This Session association bivariate analysis contingency table co-variation least squares
Section 14 Simple Linear Regression: Introduction to Least Squares Regression
Slide 1 Section 14 Simple Linear Regression: Introduction to Least Squares Regression There are several different measures of statistical association used for understanding the quantitative relationship
Developing Risk Adjustment Techniques Using the SAS@ System for Assessing Health Care Quality in the lmsystem@
Developing Risk Adjustment Techniques Using the SAS@ System for Assessing Health Care Quality in the lmsystem@ Yanchun Xu, Andrius Kubilius Joint Commission on Accreditation of Healthcare Organizations,
MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group
MISSING DATA TECHNIQUES WITH SAS IDRE Statistical Consulting Group ROAD MAP FOR TODAY To discuss: 1. Commonly used techniques for handling missing data, focusing on multiple imputation 2. Issues that could
Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model
Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written
business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar
business statistics using Excel Glyn Davis & Branko Pecar OXFORD UNIVERSITY PRESS Detailed contents Introduction to Microsoft Excel 2003 Overview Learning Objectives 1.1 Introduction to Microsoft Excel
SPSS Explore procedure
SPSS Explore procedure One useful function in SPSS is the Explore procedure, which will produce histograms, boxplots, stem-and-leaf plots and extensive descriptive statistics. To run the Explore procedure,
Overview Classes. 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7)
Overview Classes 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7) 2-4 Loglinear models (8) 5-4 15-17 hrs; 5B02 Building and
Directions for using SPSS
Directions for using SPSS Table of Contents Connecting and Working with Files 1. Accessing SPSS... 2 2. Transferring Files to N:\drive or your computer... 3 3. Importing Data from Another File Format...
Paper D10 2009. Ranking Predictors in Logistic Regression. Doug Thompson, Assurant Health, Milwaukee, WI
Paper D10 2009 Ranking Predictors in Logistic Regression Doug Thompson, Assurant Health, Milwaukee, WI ABSTRACT There is little consensus on how best to rank predictors in logistic regression. This paper
Methods for Interaction Detection in Predictive Modeling Using SAS Doug Thompson, PhD, Blue Cross Blue Shield of IL, NM, OK & TX, Chicago, IL
Paper SA01-2012 Methods for Interaction Detection in Predictive Modeling Using SAS Doug Thompson, PhD, Blue Cross Blue Shield of IL, NM, OK & TX, Chicago, IL ABSTRACT Analysts typically consider combinations
Using SAS Proc Mixed for the Analysis of Clustered Longitudinal Data
Using SAS Proc Mixed for the Analysis of Clustered Longitudinal Data Kathy Welch Center for Statistical Consultation and Research The University of Michigan 1 Background ProcMixed can be used to fit Linear
CC03 PRODUCING SIMPLE AND QUICK GRAPHS WITH PROC GPLOT
1 CC03 PRODUCING SIMPLE AND QUICK GRAPHS WITH PROC GPLOT Sheng Zhang, Xingshu Zhu, Shuping Zhang, Weifeng Xu, Jane Liao, and Amy Gillespie Merck and Co. Inc, Upper Gwynedd, PA Abstract PROC GPLOT is a
CHAPTER 12 EXAMPLES: MONTE CARLO SIMULATION STUDIES
Examples: Monte Carlo Simulation Studies CHAPTER 12 EXAMPLES: MONTE CARLO SIMULATION STUDIES Monte Carlo simulation studies are often used for methodological investigations of the performance of statistical
Simple Regression Theory II 2010 Samuel L. Baker
SIMPLE REGRESSION THEORY II 1 Simple Regression Theory II 2010 Samuel L. Baker Assessing how good the regression equation is likely to be Assignment 1A gets into drawing inferences about how close the
NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )
Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates
Review Jeopardy. Blue vs. Orange. Review Jeopardy
Review Jeopardy Blue vs. Orange Review Jeopardy Jeopardy Round Lectures 0-3 Jeopardy Round $200 How could I measure how far apart (i.e. how different) two observations, y 1 and y 2, are from each other?
The KaleidaGraph Guide to Curve Fitting
The KaleidaGraph Guide to Curve Fitting Contents Chapter 1 Curve Fitting Overview 1.1 Purpose of Curve Fitting... 5 1.2 Types of Curve Fits... 5 Least Squares Curve Fits... 5 Nonlinear Curve Fits... 6
" Y. Notation and Equations for Regression Lecture 11/4. Notation:
Notation: Notation and Equations for Regression Lecture 11/4 m: The number of predictor variables in a regression Xi: One of multiple predictor variables. The subscript i represents any number from 1 through
Using R for Linear Regression
Using R for Linear Regression In the following handout words and symbols in bold are R functions and words and symbols in italics are entries supplied by the user; underlined words and symbols are optional
Chapter 7: Simple linear regression Learning Objectives
Chapter 7: Simple linear regression Learning Objectives Reading: Section 7.1 of OpenIntro Statistics Video: Correlation vs. causation, YouTube (2:19) Video: Intro to Linear Regression, YouTube (5:18) -
Dongfeng Li. Autumn 2010
Autumn 2010 Chapter Contents Some statistics background; ; Comparing means and proportions; variance. Students should master the basic concepts, descriptive statistics measures and graphs, basic hypothesis
DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9
DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9 Analysis of covariance and multiple regression So far in this course,
Multiple Regression: What Is It?
Multiple Regression Multiple Regression: What Is It? Multiple regression is a collection of techniques in which there are multiple predictors of varying kinds and a single outcome We are interested in
9.2 User s Guide SAS/STAT. Introduction. (Book Excerpt) SAS Documentation
SAS/STAT Introduction (Book Excerpt) 9.2 User s Guide SAS Documentation This document is an individual chapter from SAS/STAT 9.2 User s Guide. The correct bibliographic citation for the complete manual
Package smoothhr. November 9, 2015
Encoding UTF-8 Type Package Depends R (>= 2.12.0),survival,splines Package smoothhr November 9, 2015 Title Smooth Hazard Ratio Curves Taking a Reference Value Version 1.0.2 Date 2015-10-29 Author Artur
Chapter 10. Key Ideas Correlation, Correlation Coefficient (r),
Chapter 0 Key Ideas Correlation, Correlation Coefficient (r), Section 0-: Overview We have already explored the basics of describing single variable data sets. However, when two quantitative variables
Paper 208-28. KEYWORDS PROC TRANSPOSE, PROC CORR, PROC MEANS, PROC GPLOT, Macro Language, Mean, Standard Deviation, Vertical Reference.
Paper 208-28 Analysis of Method Comparison Studies Using SAS Mohamed Shoukri, King Faisal Specialist Hospital & Research Center, Riyadh, KSA and Department of Epidemiology and Biostatistics, University
Modeling Customer Lifetime Value Using Survival Analysis An Application in the Telecommunications Industry
Paper 12028 Modeling Customer Lifetime Value Using Survival Analysis An Application in the Telecommunications Industry Junxiang Lu, Ph.D. Overland Park, Kansas ABSTRACT Increasingly, companies are viewing
INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA)
INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA) As with other parametric statistics, we begin the one-way ANOVA with a test of the underlying assumptions. Our first assumption is the assumption of
1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96
1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years
SAS Certificate Applied Statistics and SAS Programming
SAS Certificate Applied Statistics and SAS Programming SAS Certificate Applied Statistics and Advanced SAS Programming Brigham Young University Department of Statistics offers an Applied Statistics and
INTRODUCTION TO MULTIPLE CORRELATION
CHAPTER 13 INTRODUCTION TO MULTIPLE CORRELATION Chapter 12 introduced you to the concept of partialling and how partialling could assist you in better interpreting the relationship between two primary
Interpretation of Somers D under four simple models
Interpretation of Somers D under four simple models Roger B. Newson 03 September, 04 Introduction Somers D is an ordinal measure of association introduced by Somers (96)[9]. It can be defined in terms
Regression III: Advanced Methods
Lecture 16: Generalized Additive Models Regression III: Advanced Methods Bill Jacoby Michigan State University http://polisci.msu.edu/jacoby/icpsr/regress3 Goals of the Lecture Introduce Additive Models
MATH. ALGEBRA I HONORS 9 th Grade 12003200 ALGEBRA I HONORS
* Students who scored a Level 3 or above on the Florida Assessment Test Math Florida Standards (FSA-MAFS) are strongly encouraged to make Advanced Placement and/or dual enrollment courses their first choices
SIMPLE LINEAR CORRELATION. r can range from -1 to 1, and is independent of units of measurement. Correlation can be done on two dependent variables.
SIMPLE LINEAR CORRELATION Simple linear correlation is a measure of the degree to which two variables vary together, or a measure of the intensity of the association between two variables. Correlation
Example: Boats and Manatees
Figure 9-6 Example: Boats and Manatees Slide 1 Given the sample data in Table 9-1, find the value of the linear correlation coefficient r, then refer to Table A-6 to determine whether there is a significant
Simple Second Order Chi-Square Correction
Simple Second Order Chi-Square Correction Tihomir Asparouhov and Bengt Muthén May 3, 2010 1 1 Introduction In this note we describe the second order correction for the chi-square statistic implemented
HLM software has been one of the leading statistical packages for hierarchical
Introductory Guide to HLM With HLM 7 Software 3 G. David Garson HLM software has been one of the leading statistical packages for hierarchical linear modeling due to the pioneering work of Stephen Raudenbush
This chapter will demonstrate how to perform multiple linear regression with IBM SPSS
CHAPTER 7B Multiple Regression: Statistical Methods Using IBM SPSS This chapter will demonstrate how to perform multiple linear regression with IBM SPSS first using the standard method and then using the
Module 5: Multiple Regression Analysis
Using Statistical Data Using to Make Statistical Decisions: Data Multiple to Make Regression Decisions Analysis Page 1 Module 5: Multiple Regression Analysis Tom Ilvento, University of Delaware, College
Technical report. in SPSS AN INTRODUCTION TO THE MIXED PROCEDURE
Linear mixedeffects modeling in SPSS AN INTRODUCTION TO THE MIXED PROCEDURE Table of contents Introduction................................................................3 Data preparation for MIXED...................................................3
Introduction to Fixed Effects Methods
Introduction to Fixed Effects Methods 1 1.1 The Promise of Fixed Effects for Nonexperimental Research... 1 1.2 The Paired-Comparisons t-test as a Fixed Effects Method... 2 1.3 Costs and Benefits of Fixed
Alex Vidras, David Tysinger. Merkle Inc.
Using PROC LOGISTIC, SAS MACROS and ODS Output to evaluate the consistency of independent variables during the development of logistic regression models. An example from the retail banking industry ABSTRACT
Introduction to proc glm
Lab 7: Proc GLM and one-way ANOVA STT 422: Summer, 2004 Vince Melfi SAS has several procedures for analysis of variance models, including proc anova, proc glm, proc varcomp, and proc mixed. We mainly will
Predicting Customer Default Times using Survival Analysis Methods in SAS
Predicting Customer Default Times using Survival Analysis Methods in SAS Bart Baesens [email protected] Overview The credit scoring survival analysis problem Statistical methods for Survival
Basic Statistics and Data Analysis for Health Researchers from Foreign Countries
Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Volkert Siersma [email protected] The Research Unit for General Practice in Copenhagen Dias 1 Content Quantifying association
Lean Six Sigma Analyze Phase Introduction. TECH 50800 QUALITY and PRODUCTIVITY in INDUSTRY and TECHNOLOGY
TECH 50800 QUALITY and PRODUCTIVITY in INDUSTRY and TECHNOLOGY Before we begin: Turn on the sound on your computer. There is audio to accompany this presentation. Audio will accompany most of the online
SP10 From GLM to GLIMMIX-Which Model to Choose? Patricia B. Cerrito, University of Louisville, Louisville, KY
SP10 From GLM to GLIMMIX-Which Model to Choose? Patricia B. Cerrito, University of Louisville, Louisville, KY ABSTRACT The purpose of this paper is to investigate several SAS procedures that are used in
Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares
Topic 4 - Analysis of Variance Approach to Regression Outline Partitioning sums of squares Degrees of freedom Expected mean squares General linear test - Fall 2013 R 2 and the coefficient of correlation
Introduction to Quantitative Methods
Introduction to Quantitative Methods October 15, 2009 Contents 1 Definition of Key Terms 2 2 Descriptive Statistics 3 2.1 Frequency Tables......................... 4 2.2 Measures of Central Tendencies.................
Multivariate Logistic Regression
1 Multivariate Logistic Regression As in univariate logistic regression, let π(x) represent the probability of an event that depends on p covariates or independent variables. Then, using an inv.logit formulation
Statistics 2014 Scoring Guidelines
AP Statistics 2014 Scoring Guidelines College Board, Advanced Placement Program, AP, AP Central, and the acorn logo are registered trademarks of the College Board. AP Central is the official online home
A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution
A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 4: September
5 Correlation and Data Exploration
5 Correlation and Data Exploration Correlation In Unit 3, we did some correlation analyses of data from studies related to the acquisition order and acquisition difficulty of English morphemes by both
Survival Analysis of Left Truncated Income Protection Insurance Data. [March 29, 2012]
Survival Analysis of Left Truncated Income Protection Insurance Data [March 29, 2012] 1 Qing Liu 2 David Pitt 3 Yan Wang 4 Xueyuan Wu Abstract One of the main characteristics of Income Protection Insurance
Outline. Definitions Descriptive vs. Inferential Statistics The t-test - One-sample t-test
The t-test Outline Definitions Descriptive vs. Inferential Statistics The t-test - One-sample t-test - Dependent (related) groups t-test - Independent (unrelated) groups t-test Comparing means Correlation
10. Comparing Means Using Repeated Measures ANOVA
10. Comparing Means Using Repeated Measures ANOVA Objectives Calculate repeated measures ANOVAs Calculate effect size Conduct multiple comparisons Graphically illustrate mean differences Repeated measures
Basic Statistical and Modeling Procedures Using SAS
Basic Statistical and Modeling Procedures Using SAS One-Sample Tests The statistical procedures illustrated in this handout use two datasets. The first, Pulse, has information collected in a classroom
