Regression Modeling Strategies

Save this PDF as:
 WORD  PNG  TXT  JPG

Size: px
Start display at page:

Download "Regression Modeling Strategies"

Transcription

1 Frank E. Harrell, Jr. Regression Modeling Strategies With Applications to Linear Models, Logistic Regression, and Survival Analysis With 141 Figures Springer

2 Contents Preface Typographical Conventions vii xxiii 1 Introduction Hypothesis Testing, Estimation, and Prediction Examples of Uses of Predictive Multivariable Modeling Planning for Modeling Emphasizing Continuous Variables Choice of the Model Further Reading 8 2 General Aspects of Fitting Regression Models Notation for Multivariable Regression Models Model Formulations Interpreting Model Parameters Nominal Predictors Interactions 14

3 xiv Contents Example: Inference for a Simple Model Relaxing Linearity Assumption for Continuous Predictors Simple Nonlinear Terms Splines for Estimating Shape of Regression Function and Determining Predictor Transformations Cubic Spline Functions Restricted Cubic Splines Choosing Number and Position of Knots Nonparametric Regression Advantages of Regression Splines over Other Methods Recursive Partitioning: Tree-Based Models Multiple Degree of Freedom Tests of Association Assessment of Model Fit Regression Assumptions Modeling and Testing Complex Interactions Fitting Ordinal Predictors Distributional Assumptions Further Reading Problems 37 3 Missing Data Types of Missing Data Prelude to Modeling Missing Values for Different Types of Response Variables Problems with Simple Alternatives to Imputation Strategies for Developing Imputation Algorithms Single Conditional Mean Imputation Multiple Imputation Summary and Rough Guidelines Further Reading Problems 51 4 Multivariable Modeling Strategies Prespecification of Predictor Complexity Without Later Simplification 53

4 Contents xv 4.2 Checking Assumptions of Multiple Predictors Simultaneously Variable Selection Overfitting and Limits on Number of Predictors Shrinkage Collinearity Data Reduction Variable Clustering Transformation and Scaling Variables Without Using Y Simultaneous Transformation and Imputation Simple Scoring of Variable Clusters Simplifying Cluster Scores How Much Data Reduction Is Necessary? Overly Influential Observations Comparing Two Models Summary: Possible Modeling Strategies Developing Predictive Models Developing Models for Effect Estimation Developing Models for Hypothesis Testing Further Reading 84 5 Resampling, Validating, Describing, and Simplifying the Model The Bootstrap Model Validation Introduction Which Quantities Should Be Used in Validation? Data-Splitting Improvements on Data-Splitting: Resampling Validation Using the Bootstrap Describing the Fitted Model Simplifying the Final Model by Approximating It Difficulties Using Full Models Approximating the Full Model Further Reading S-Plus Software 105

5 xvi Contents 6.1 The S Modeling Language User-Contributed Functions The Design Library Other Functions Further Reading Case Study in Least Squares Fitting and Interpretation of a Linear Model Descriptive Statistics Spending Degrees of Freedom/Specifying Predictor Complexity Fitting the Model Using Least Squares Checking Distributional Assumptions Checking Goodness of Fit Overly Influential Observations Test Statistics and Partial R Interpreting the Model Problems Case Study in Imputation and Data Reduction Data How Many Parameters Can Be Estimated? Variable Clustering Single Imputation Using Constants or Recursive Partitioning Transformation and Single Imputation Using transcan Data Reduction Using Principal Components Detailed Examination of Individual Transformations Examination of Variable Clusters on Transformed Variables Transformation Using Nonparametric Smoothers Multiple Imputation Further Reading Problems Overview of Maximum Likelihood Estimation General Notions Simple Cases Hypothesis Tests 183

6 Contents xvii Likelihood Ratio Test Wald Test Score Test Normal Distribution One Sample General Case Global Test Statistics Testing a Subset of the Parameters Which Test Statistics to Use When Example: Binomial Comparing Two Proportions Iterative ML Estimation Robust Estimation of the Covariance Matrix Wald, Score, and Likelihood-Based Confidence Intervals Bootstrap Confidence Regions Further Use of the Log Likelihood Rating Two Models, Penalizing for Complexity Testing Whether One Model Is Better than Another Unitless Index of Predictive Ability Unitless Index of Adequacy of a Subset of Predictors Weighted Maximum Likelihood Estimation Penalized Maximum Likelihood Estimation Further Reading Problems Binary Logistic Regression Model Model Assumptions and Interpretation of Parameters Odds Ratio, Risk Ratio, and Risk Difference Detailed Example Design Formulations Estimation Maximum Likelihood Estimates Estimation of Odds Ratios and Probabilities Test Statistics Residuals 230

7 xviii Contents 10.5 Assessment of Model Fit Collinearity Overly Influential Observations Quantifying Predictive Ability Validating the Fitted Model Describing the Fitted Model S-PLUS Functions Further Reading Problems Logistic Model Case Study 1: Predicting Cause of Death Preparation for Modeling Regression on Principal Components, Cluster Scores, and Pretransformations Fit and Diagnostics for a Full Model, and Interpreting Pretransformations Describing Results Using a Reduced Model Approximating the Full Model Using Recursive Partitioning Validating the Reduced Model Logistic Model Case Study 2: Survival of Titanic Passengers Descriptive Statistics Exploring Trends with Nonparametric Regression Binary Logistic Model With Casewise Deletion of Missing Values Examining Missing Data Patterns ' Single Conditional Mean Imputation Multiple Imputation Summarizing the Fitted Model Problems Ordinal Logistic Regression Background Ordinality Assumption Proportional Odds Model Model Assumptions and Interpretation of Parameters 333

8 Contents xix Estimation Residuals Assessment of Model Fit Quantifying Predictive Ability Validating the Fitted Model S-PLUS Functions Continuation Ratio Model Model Assumptions and Interpretation of Parameters Estimation Residuals Assessment of Model Fit Extended CR Model Role of Penalization in Extended CR Model Validating the Fitted Model S-PLUS Functions Further Reading Problems Case Study in Ordinal Regression, Data Reduction, and Penalization Response Variable Variable Clustering Developing Cluster Summary Scores Assessing Ordinality of Y for each X, and Unadjusted Checking of PO and CR Assumptions A Tentative Full Proportional Odds Model Residual Plots Graphical Assessment of Fit of CR Model Extended Continuation Ratio Model Penalized Estimation Using Approximations to Simplify the Model Validating the Model Summary Further Reading 371

9 xx Contents Problems Models Using Nonparametric Transformations of X and Y Background Generalized Additive Models Nonparametric Estimation of ^-Transformation Obtaining Estimates on the Original Scale S-PLUS Functions Case Study Introduction to Survival Analysis Background Censoring, Delayed Entry, and Truncation Notation, Survival, and Hazard Functions Homogeneous Failure Time Distributions Nonparametric Estimation of 5 and A Kaplan-Meier Estimator Altschuler-Nelson Estimator Analysis of Multiple Endpoints Competing Risks Competing Dependent Risks State Transitions and Multiple Types of Nonfatal Events Joint Analysis of Time and Severity of an Event Analysis of Multiple Events S-PLUS Functions Further Reading Problems Parametric Survival Models Homogeneous Models (No Predictors) Specific Models Estimation Assessment of Model Fit Parametric Proportional Hazards Models Model 417

10 Contents xxi Model Assumptions and Interpretation of Parameters Hazard Ratio, Risk Ratio, and Risk Difference Specific Models Estimation Assessment of Model Fit Accelerated Failure Time Models Model Model Assumptions and Interpretation of Parameters Specific Models Estimation Residuals Assessment of Model Fit Validating the Fitted Model Buckley-James Regression Model Design Formulations Test Statistics Quantifying Predictive Ability S-PLUS Functions Further Reading Problems Case Study in Parametric Survival Modeling and Model Approximation Descriptive Statistics Checking Adequacy of Log-Normal Accelerated Failure Time Model Summarizing the Fitted Model Internal Validation of the Fitted Model Using the Bootstrap Approximating the Full Model Problems Cox Proportional Hazards Regression Model Model Preliminaries Model Definition Estimation of/? 466

11 xxii Contents Model Assumptions and Interpretation of Parameters Example Design Formulations Extending the Model by Stratification Estimation of Survival Probability and Secondary Parameters Test Statistics Residuals Assessment of Model Fit Regression Assumptions Proportional Hazards Assumption What to Do When PH Fails Collinearity Overly Influential Observations Quantifying Predictive Ability Validating the Fitted Model Validation of Model Calibration Validation of Discrimination and Other Statistical Indexes Describing the Fitted Model S-PLUS Functions Further Reading Case Study in Cox Regression Choosing the Number of Parameters and Fitting the Model Checking Proportional Hazards Testing Interactions Describing Predictor Effects Validating the Model Presenting the Model Problems 522 Appendix 523 References 527 Index 559

Adequacy of Biomath. Models. Empirical Modeling Tools. Bayesian Modeling. Model Uncertainty / Selection

Adequacy of Biomath. Models. Empirical Modeling Tools. Bayesian Modeling. Model Uncertainty / Selection Directions in Statistical Methodology for Multivariable Predictive Modeling Frank E Harrell Jr University of Virginia Seattle WA 19May98 Overview of Modeling Process Model selection Regression shape Diagnostics

More information

Econometric Analysis of Cross Section and Panel Data Second Edition. Jeffrey M. Wooldridge. The MIT Press Cambridge, Massachusetts London, England

Econometric Analysis of Cross Section and Panel Data Second Edition. Jeffrey M. Wooldridge. The MIT Press Cambridge, Massachusetts London, England Econometric Analysis of Cross Section and Panel Data Second Edition Jeffrey M. Wooldridge The MIT Press Cambridge, Massachusetts London, England Preface Acknowledgments xxi xxix I INTRODUCTION AND BACKGROUND

More information

Statistics Graduate Courses

Statistics Graduate Courses Statistics Graduate Courses STAT 7002--Topics in Statistics-Biological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.

More information

CHAPTER 6 EXAMPLES: GROWTH MODELING AND SURVIVAL ANALYSIS

CHAPTER 6 EXAMPLES: GROWTH MODELING AND SURVIVAL ANALYSIS Examples: Growth Modeling And Survival Analysis CHAPTER 6 EXAMPLES: GROWTH MODELING AND SURVIVAL ANALYSIS Growth models examine the development of individuals on one or more outcome variables over time.

More information

SPSS Multivariable Linear Models and Logistic Regression

SPSS Multivariable Linear Models and Logistic Regression 1 SPSS Multivariable Linear Models and Logistic Regression Multivariable Models Single continuous outcome (dependent variable), one main exposure (independent) variable, and one or more potential confounders

More information

CHAPTER 3 EXAMPLES: REGRESSION AND PATH ANALYSIS

CHAPTER 3 EXAMPLES: REGRESSION AND PATH ANALYSIS Examples: Regression And Path Analysis CHAPTER 3 EXAMPLES: REGRESSION AND PATH ANALYSIS Regression analysis with univariate or multivariate dependent variables is a standard procedure for modeling relationships

More information

Multivariate Logistic Regression

Multivariate Logistic Regression 1 Multivariate Logistic Regression As in univariate logistic regression, let π(x) represent the probability of an event that depends on p covariates or independent variables. Then, using an inv.logit formulation

More information

Assumptions. Assumptions of linear models. Boxplot. Data exploration. Apply to response variable. Apply to error terms from linear model

Assumptions. Assumptions of linear models. Boxplot. Data exploration. Apply to response variable. Apply to error terms from linear model Assumptions Assumptions of linear models Apply to response variable within each group if predictor categorical Apply to error terms from linear model check by analysing residuals Normality Homogeneity

More information

Silvermine House Steenberg Office Park, Tokai 7945 Cape Town, South Africa Telephone: +27 21 702 4666 www.spss-sa.com

Silvermine House Steenberg Office Park, Tokai 7945 Cape Town, South Africa Telephone: +27 21 702 4666 www.spss-sa.com SPSS-SA Silvermine House Steenberg Office Park, Tokai 7945 Cape Town, South Africa Telephone: +27 21 702 4666 www.spss-sa.com SPSS-SA Training Brochure 2009 TABLE OF CONTENTS 1 SPSS TRAINING COURSES FOCUSING

More information

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics. Course Catalog In order to be assured that all prerequisites are met, students must acquire a permission number from the education coordinator prior to enrolling in any Biostatistics course. Courses are

More information

SAS Software to Fit the Generalized Linear Model

SAS Software to Fit the Generalized Linear Model SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling

More information

Survival Analysis Using SPSS. By Hui Bian Office for Faculty Excellence

Survival Analysis Using SPSS. By Hui Bian Office for Faculty Excellence Survival Analysis Using SPSS By Hui Bian Office for Faculty Excellence Survival analysis What is survival analysis Event history analysis Time series analysis When use survival analysis Research interest

More information

SUMAN DUVVURU STAT 567 PROJECT REPORT

SUMAN DUVVURU STAT 567 PROJECT REPORT SUMAN DUVVURU STAT 567 PROJECT REPORT SURVIVAL ANALYSIS OF HEROIN ADDICTS Background and introduction: Current illicit drug use among teens is continuing to increase in many countries around the world.

More information

Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences

Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences Third Edition Jacob Cohen (deceased) New York University Patricia Cohen New York State Psychiatric Institute and Columbia University

More information

Semester 1 Statistics Short courses

Semester 1 Statistics Short courses Semester 1 Statistics Short courses Course: STAA0001 Basic Statistics Blackboard Site: STAA0001 Dates: Sat. March 12 th and Sat. April 30 th (9 am 5 pm) Assumed Knowledge: None Course Description Statistical

More information

Examining a Fitted Logistic Model

Examining a Fitted Logistic Model STAT 536 Lecture 16 1 Examining a Fitted Logistic Model Deviance Test for Lack of Fit The data below describes the male birth fraction male births/total births over the years 1931 to 1990. A simple logistic

More information

REGRESSION MODELING STRATEGIES

REGRESSION MODELING STRATEGIES REGRESSION MODELING STRATEGIES Frank E Harrell Jr Department of Biostatistics Vanderbilt University School of Medicine Nashville TN 37232 USA f.harrell@vanderbilt.edu biostat.mc.vanderbilt.edu/rms VANDERBILT

More information

INTRODUCTORY STATISTICS

INTRODUCTORY STATISTICS INTRODUCTORY STATISTICS FIFTH EDITION Thomas H. Wonnacott University of Western Ontario Ronald J. Wonnacott University of Western Ontario WILEY JOHN WILEY & SONS New York Chichester Brisbane Toronto Singapore

More information

Elements of statistics (MATH0487-1)

Elements of statistics (MATH0487-1) Elements of statistics (MATH0487-1) Prof. Dr. Dr. K. Van Steen University of Liège, Belgium December 10, 2012 Introduction to Statistics Basic Probability Revisited Sampling Exploratory Data Analysis -

More information

Delme John Pritchard

Delme John Pritchard THE GENETICS OF ALZHEIMER S DISEASE, MODELLING DISABILITY AND ADVERSE SELECTION IN THE LONGTERM CARE INSURANCE MARKET By Delme John Pritchard Submitted for the Degree of Doctor of Philosophy at HeriotWatt

More information

200609 - ATV - Lifetime Data Analysis

200609 - ATV - Lifetime Data Analysis Coordinating unit: Teaching unit: Academic year: Degree: ECTS credits: 2015 200 - FME - School of Mathematics and Statistics 715 - EIO - Department of Statistics and Operations Research 1004 - UB - (ENG)Universitat

More information

Applied Statistics. J. Blanchet and J. Wadsworth. Institute of Mathematics, Analysis, and Applications EPF Lausanne

Applied Statistics. J. Blanchet and J. Wadsworth. Institute of Mathematics, Analysis, and Applications EPF Lausanne Applied Statistics J. Blanchet and J. Wadsworth Institute of Mathematics, Analysis, and Applications EPF Lausanne An MSc Course for Applied Mathematicians, Fall 2012 Outline 1 Model Comparison 2 Model

More information

Statistical Analysis with Missing Data

Statistical Analysis with Missing Data Statistical Analysis with Missing Data Second Edition RODERICK J. A. LITTLE DONALD B. RUBIN WILEY- INTERSCIENCE A JOHN WILEY & SONS, INC., PUBLICATION Contents Preface PARTI OVERVIEW AND BASIC APPROACHES

More information

Applied Multivariate Analysis

Applied Multivariate Analysis Neil H. Timm Applied Multivariate Analysis With 42 Figures Springer Contents Preface Acknowledgments List of Tables List of Figures vii ix xix xxiii 1 Introduction 1 1.1 Overview 1 1.2 Multivariate Models

More information

SPSS TRAINING SESSION 3 ADVANCED TOPICS (PASW STATISTICS 17.0) Sun Li Centre for Academic Computing lsun@smu.edu.sg

SPSS TRAINING SESSION 3 ADVANCED TOPICS (PASW STATISTICS 17.0) Sun Li Centre for Academic Computing lsun@smu.edu.sg SPSS TRAINING SESSION 3 ADVANCED TOPICS (PASW STATISTICS 17.0) Sun Li Centre for Academic Computing lsun@smu.edu.sg IN SPSS SESSION 2, WE HAVE LEARNT: Elementary Data Analysis Group Comparison & One-way

More information

Analysis of Microdata

Analysis of Microdata Rainer Winkelmann Stefan Boes Analysis of Microdata With 38 Figures and 41 Tables 4y Springer Contents 1 Introduction 1 1.1 What Are Microdata? 1 1.2 Types of Microdata 4 1.2.1 Qualitative Data 4 1.2.2

More information

MULTIVARIATE DATA ANALYSIS i.-*.'.. ' -4

MULTIVARIATE DATA ANALYSIS i.-*.'.. ' -4 SEVENTH EDITION MULTIVARIATE DATA ANALYSIS i.-*.'.. ' -4 A Global Perspective Joseph F. Hair, Jr. Kennesaw State University William C. Black Louisiana State University Barry J. Babin University of Southern

More information

Statistical Rules of Thumb

Statistical Rules of Thumb Statistical Rules of Thumb Second Edition Gerald van Belle University of Washington Department of Biostatistics and Department of Environmental and Occupational Health Sciences Seattle, WA WILEY AJOHN

More information

SAS Certificate Applied Statistics and SAS Programming

SAS Certificate Applied Statistics and SAS Programming SAS Certificate Applied Statistics and SAS Programming SAS Certificate Applied Statistics and Advanced SAS Programming Brigham Young University Department of Statistics offers an Applied Statistics and

More information

Advanced Quantitative Methods for Health Care Professionals PUBH 742 Spring 2015

Advanced Quantitative Methods for Health Care Professionals PUBH 742 Spring 2015 1 Advanced Quantitative Methods for Health Care Professionals PUBH 742 Spring 2015 Instructor: Joanne M. Garrett, PhD e-mail: joanne_garrett@med.unc.edu Class Notes: Copies of the class lecture slides

More information

SEX DISCRIMINATION PROBLEM

SEX DISCRIMINATION PROBLEM SEX DISCRIMINATION PROBLEM 12. Multiple Linear Regression in SPSS In this section we will demonstrate how to apply the multiple linear regression procedure in SPSS to the sex discrimination data. The numerical

More information

UNDERGRADUATE DEGREE DETAILS : BACHELOR OF SCIENCE WITH

UNDERGRADUATE DEGREE DETAILS : BACHELOR OF SCIENCE WITH QATAR UNIVERSITY COLLEGE OF ARTS & SCIENCES Department of Mathematics, Statistics, & Physics UNDERGRADUATE DEGREE DETAILS : Program Requirements and Descriptions BACHELOR OF SCIENCE WITH A MAJOR IN STATISTICS

More information

Computer-Aided Multivariate Analysis

Computer-Aided Multivariate Analysis Computer-Aided Multivariate Analysis FOURTH EDITION Abdelmonem Af if i Virginia A. Clark and Susanne May CHAPMAN & HALL/CRC A CRC Press Company Boca Raton London New York Washington, D.C Contents Preface

More information

Methods for Meta-analysis in Medical Research

Methods for Meta-analysis in Medical Research Methods for Meta-analysis in Medical Research Alex J. Sutton University of Leicester, UK Keith R. Abrams University of Leicester, UK David R. Jones University of Leicester, UK Trevor A. Sheldon University

More information

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not. Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C

More information

Applied Regression Analysis and Other Multivariable Methods

Applied Regression Analysis and Other Multivariable Methods THIRD EDITION Applied Regression Analysis and Other Multivariable Methods David G. Kleinbaum Emory University Lawrence L. Kupper University of North Carolina, Chapel Hill Keith E. Muller University of

More information

CRJ Doctoral Comprehensive Exam Statistics Friday August 23, :00pm 5:30pm

CRJ Doctoral Comprehensive Exam Statistics Friday August 23, :00pm 5:30pm CRJ Doctoral Comprehensive Exam Statistics Friday August 23, 23 2:pm 5:3pm Instructions: (Answer all questions below) Question I: Data Collection and Bivariate Hypothesis Testing. Answer the following

More information

Nominal and ordinal logistic regression

Nominal and ordinal logistic regression Nominal and ordinal logistic regression April 26 Nominal and ordinal logistic regression Our goal for today is to briefly go over ways to extend the logistic regression model to the case where the outcome

More information

Statistical Tools for Nonlinear Regression

Statistical Tools for Nonlinear Regression S. Huet A. Bouvier M.-A. Poursat E. Jolivet Statistical Tools for Nonlinear Regression A Practical Guide With S-PLUS and R Examples Second Edition Springer Preface to the Second Edition Preface to the

More information

Correctly Compute Complex Samples Statistics

Correctly Compute Complex Samples Statistics SPSS Complex Samples 16.0 Specifications Correctly Compute Complex Samples Statistics When you conduct sample surveys, use a statistics package dedicated to producing correct estimates for complex sample

More information

Overview Classes. 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7)

Overview Classes. 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7) Overview Classes 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7) 2-4 Loglinear models (8) 5-4 15-17 hrs; 5B02 Building and

More information

CHAPTER 12 EXAMPLES: MONTE CARLO SIMULATION STUDIES

CHAPTER 12 EXAMPLES: MONTE CARLO SIMULATION STUDIES Examples: Monte Carlo Simulation Studies CHAPTER 12 EXAMPLES: MONTE CARLO SIMULATION STUDIES Monte Carlo simulation studies are often used for methodological investigations of the performance of statistical

More information

When to Use Which Statistical Test

When to Use Which Statistical Test When to Use Which Statistical Test Rachel Lovell, Ph.D., Senior Research Associate Begun Center for Violence Prevention Research and Education Jack, Joseph, and Morton Mandel School of Applied Social Sciences

More information

Revenue Management and Survival Analysis in the Automobile Industry

Revenue Management and Survival Analysis in the Automobile Industry Andre Jerenz Revenue Management and Survival Analysis in the Automobile Industry With a foreword by Prof. Dr. Ulrich Tushaus GABLER EDITION WISSENSCHAFT List of Figures List of Tables Nomenclature xiii

More information

CHAPTER 8 EXAMPLES: MIXTURE MODELING WITH LONGITUDINAL DATA

CHAPTER 8 EXAMPLES: MIXTURE MODELING WITH LONGITUDINAL DATA Examples: Mixture Modeling With Longitudinal Data CHAPTER 8 EXAMPLES: MIXTURE MODELING WITH LONGITUDINAL DATA Mixture modeling refers to modeling with categorical latent variables that represent subpopulations

More information

Customer and Business Analytic

Customer and Business Analytic Customer and Business Analytic Applied Data Mining for Business Decision Making Using R Daniel S. Putler Robert E. Krider CRC Press Taylor &. Francis Group Boca Raton London New York CRC Press is an imprint

More information

STATISTICA Formula Guide: Logistic Regression. Table of Contents

STATISTICA Formula Guide: Logistic Regression. Table of Contents : Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

More information

Advanced Linear Modeling

Advanced Linear Modeling Ronald Christensen Advanced Linear Modeling Multivariate, Time Series, and Spatial Data; Nonparametric Regression and Response Surface Maximization Second Edition Springer Preface to the Second Edition

More information

Some Essential Statistics The Lure of Statistics

Some Essential Statistics The Lure of Statistics Some Essential Statistics The Lure of Statistics Data Mining Techniques, by M.J.A. Berry and G.S Linoff, 2004 Statistics vs. Data Mining..lie, damn lie, and statistics mining data to support preconceived

More information

How to choose a statistical test. Francisco J. Candido dos Reis DGO-FMRP University of São Paulo

How to choose a statistical test. Francisco J. Candido dos Reis DGO-FMRP University of São Paulo How to choose a statistical test Francisco J. Candido dos Reis DGO-FMRP University of São Paulo Choosing the right test One of the most common queries in stats support is Which analysis should I use There

More information

Lecture - 32 Regression Modelling Using SPSS

Lecture - 32 Regression Modelling Using SPSS Applied Multivariate Statistical Modelling Prof. J. Maiti Department of Industrial Engineering and Management Indian Institute of Technology, Kharagpur Lecture - 32 Regression Modelling Using SPSS (Refer

More information

Survival Analysis, Software

Survival Analysis, Software Survival Analysis, Software As used here, survival analysis refers to the analysis of data where the response variable is the time until the occurrence of some event (e.g. death), where some of the observations

More information

Biostatistics: Types of Data Analysis

Biostatistics: Types of Data Analysis Biostatistics: Types of Data Analysis Theresa A Scott, MS Vanderbilt University Department of Biostatistics theresa.scott@vanderbilt.edu http://biostat.mc.vanderbilt.edu/theresascott Theresa A Scott, MS

More information

Tips for surviving the analysis of survival data. Philip Twumasi-Ankrah, PhD

Tips for surviving the analysis of survival data. Philip Twumasi-Ankrah, PhD Tips for surviving the analysis of survival data Philip Twumasi-Ankrah, PhD Big picture In medical research and many other areas of research, we often confront continuous, ordinal or dichotomous outcomes

More information

Data Mining for Model Creation. Presentation by Paul Below, EDS 2500 NE Plunkett Lane Poulsbo, WA USA 98370 paul.below@eds.

Data Mining for Model Creation. Presentation by Paul Below, EDS 2500 NE Plunkett Lane Poulsbo, WA USA 98370 paul.below@eds. Sept 03-23-05 22 2005 Data Mining for Model Creation Presentation by Paul Below, EDS 2500 NE Plunkett Lane Poulsbo, WA USA 98370 paul.below@eds.com page 1 Agenda Data Mining and Estimating Model Creation

More information

Statistics in Retail Finance. Chapter 6: Behavioural models

Statistics in Retail Finance. Chapter 6: Behavioural models Statistics in Retail Finance 1 Overview > So far we have focussed mainly on application scorecards. In this chapter we shall look at behavioural models. We shall cover the following topics:- Behavioural

More information

Graduate Programs in Statistics

Graduate Programs in Statistics Graduate Programs in Statistics Course Titles STAT 100 CALCULUS AND MATR IX ALGEBRA FOR STATISTICS. Differential and integral calculus; infinite series; matrix algebra STAT 195 INTRODUCTION TO MATHEMATICAL

More information

A Basic Introduction to Missing Data

A Basic Introduction to Missing Data John Fox Sociology 740 Winter 2014 Outline Why Missing Data Arise Why Missing Data Arise Global or unit non-response. In a survey, certain respondents may be unreachable or may refuse to participate. Item

More information

An Introduction to Generalized Linear Mixed Models Using SAS PROC GLIMMIX

An Introduction to Generalized Linear Mixed Models Using SAS PROC GLIMMIX An Introduction to Generalized Linear Mixed Models Using SAS PROC GLIMMIX Phil Gibbs Advanced Analytics Manager SAS Technical Support November 22, 2008 UC Riverside What We Will Cover Today What is PROC

More information

Applied Missing Data Analysis in the Health Sciences. Statistics in Practice

Applied Missing Data Analysis in the Health Sciences. Statistics in Practice Brochure More information from http://www.researchandmarkets.com/reports/2741464/ Applied Missing Data Analysis in the Health Sciences. Statistics in Practice Description: A modern and practical guide

More information

MATHEMATICAL METHODS OF STATISTICS

MATHEMATICAL METHODS OF STATISTICS MATHEMATICAL METHODS OF STATISTICS By HARALD CRAMER TROFESSOK IN THE UNIVERSITY OF STOCKHOLM Princeton PRINCETON UNIVERSITY PRESS 1946 TABLE OF CONTENTS. First Part. MATHEMATICAL INTRODUCTION. CHAPTERS

More information

Step 5: Conduct Analysis. The CCA Algorithm

Step 5: Conduct Analysis. The CCA Algorithm Model Parameterization: Step 5: Conduct Analysis P Dropped species with fewer than 5 occurrences P Log-transformed species abundances P Row-normalized species log abundances (chord distance) P Selected

More information

Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression

Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression Data Mining and Data Warehousing Henryk Maciejewski Data Mining Predictive modelling: regression Algorithms for Predictive Modelling Contents Regression Classification Auxiliary topics: Estimation of prediction

More information

Personalized Predictive Medicine and Genomic Clinical Trials

Personalized Predictive Medicine and Genomic Clinical Trials Personalized Predictive Medicine and Genomic Clinical Trials Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer Institute http://brb.nci.nih.gov brb.nci.nih.gov Powerpoint presentations

More information

APPLIED MISSING DATA ANALYSIS

APPLIED MISSING DATA ANALYSIS APPLIED MISSING DATA ANALYSIS Craig K. Enders Series Editor's Note by Todd D. little THE GUILFORD PRESS New York London Contents 1 An Introduction to Missing Data 1 1.1 Introduction 1 1.2 Chapter Overview

More information

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm Mgt 540 Research Methods Data Analysis 1 Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm http://web.utk.edu/~dap/random/order/start.htm

More information

DEVELOPING AN ANALYTICAL PLAN

DEVELOPING AN ANALYTICAL PLAN The Fundamentals of International Clinical Research Workshop February 2004 DEVELOPING AN ANALYTICAL PLAN Mario Chen, PhD. Family Health International 1 The Analysis Plan for a Study Summary Analysis Plan:

More information

Developing Risk Adjustment Techniques Using the SAS@ System for Assessing Health Care Quality in the lmsystem@

Developing Risk Adjustment Techniques Using the SAS@ System for Assessing Health Care Quality in the lmsystem@ Developing Risk Adjustment Techniques Using the SAS@ System for Assessing Health Care Quality in the lmsystem@ Yanchun Xu, Andrius Kubilius Joint Commission on Accreditation of Healthcare Organizations,

More information

Least Squares Estimation

Least Squares Estimation Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David

More information

Local classification and local likelihoods

Local classification and local likelihoods Local classification and local likelihoods November 18 k-nearest neighbors The idea of local regression can be extended to classification as well The simplest way of doing so is called nearest neighbor

More information

Predictive Modeling Techniques in Insurance

Predictive Modeling Techniques in Insurance Predictive Modeling Techniques in Insurance Tuesday May 5, 2015 JF. Breton Application Engineer 2014 The MathWorks, Inc. 1 Opening Presenter: JF. Breton: 13 years of experience in predictive analytics

More information

Cross Validation techniques in R: A brief overview of some methods, packages, and functions for assessing prediction models.

Cross Validation techniques in R: A brief overview of some methods, packages, and functions for assessing prediction models. Cross Validation techniques in R: A brief overview of some methods, packages, and functions for assessing prediction models. Dr. Jon Starkweather, Research and Statistical Support consultant This month

More information

It is important to bear in mind that one of the first three subscripts is redundant since k = i -j +3.

It is important to bear in mind that one of the first three subscripts is redundant since k = i -j +3. IDENTIFICATION AND ESTIMATION OF AGE, PERIOD AND COHORT EFFECTS IN THE ANALYSIS OF DISCRETE ARCHIVAL DATA Stephen E. Fienberg, University of Minnesota William M. Mason, University of Michigan 1. INTRODUCTION

More information

Regression III: Advanced Methods

Regression III: Advanced Methods Lecture 16: Generalized Additive Models Regression III: Advanced Methods Bill Jacoby Michigan State University http://polisci.msu.edu/jacoby/icpsr/regress3 Goals of the Lecture Introduce Additive Models

More information

(and sex and drugs and rock 'n' roll) ANDY FIELD

(and sex and drugs and rock 'n' roll) ANDY FIELD DISCOVERING USING SPSS STATISTICS THIRD EDITION (and sex and drugs and rock 'n' roll) ANDY FIELD CONTENTS Preface How to use this book Acknowledgements Dedication Symbols used in this book Some maths revision

More information

Multivariate Statistical Inference and Applications

Multivariate Statistical Inference and Applications Multivariate Statistical Inference and Applications ALVIN C. RENCHER Department of Statistics Brigham Young University A Wiley-Interscience Publication JOHN WILEY & SONS, INC. New York Chichester Weinheim

More information

CONTENTS PREFACE 1 INTRODUCTION 1 2 DATA VISUALIZATION 19

CONTENTS PREFACE 1 INTRODUCTION 1 2 DATA VISUALIZATION 19 PREFACE xi 1 INTRODUCTION 1 1.1 Overview 1 1.2 Definition 1 1.3 Preparation 2 1.3.1 Overview 2 1.3.2 Accessing Tabular Data 3 1.3.3 Accessing Unstructured Data 3 1.3.4 Understanding the Variables and Observations

More information

Students' Opinion about Universities: The Faculty of Economics and Political Science (Case Study)

Students' Opinion about Universities: The Faculty of Economics and Political Science (Case Study) Cairo University Faculty of Economics and Political Science Statistics Department English Section Students' Opinion about Universities: The Faculty of Economics and Political Science (Case Study) Prepared

More information

MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group

MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group MISSING DATA TECHNIQUES WITH SAS IDRE Statistical Consulting Group ROAD MAP FOR TODAY To discuss: 1. Commonly used techniques for handling missing data, focusing on multiple imputation 2. Issues that could

More information

From the help desk: Bootstrapped standard errors

From the help desk: Bootstrapped standard errors The Stata Journal (2003) 3, Number 1, pp. 71 80 From the help desk: Bootstrapped standard errors Weihua Guan Stata Corporation Abstract. Bootstrapping is a nonparametric approach for evaluating the distribution

More information

Descriptive Statistics

Descriptive Statistics Descriptive Statistics Primer Descriptive statistics Central tendency Variation Relative position Relationships Calculating descriptive statistics Descriptive Statistics Purpose to describe or summarize

More information

BayesX - Software for Bayesian Inference in Structured Additive Regression

BayesX - Software for Bayesian Inference in Structured Additive Regression BayesX - Software for Bayesian Inference in Structured Additive Regression Thomas Kneib Faculty of Mathematics and Economics, University of Ulm Department of Statistics, Ludwig-Maximilians-University Munich

More information

C: LEVEL 800 {MASTERS OF ECONOMICS( ECONOMETRICS)}

C: LEVEL 800 {MASTERS OF ECONOMICS( ECONOMETRICS)} C: LEVEL 800 {MASTERS OF ECONOMICS( ECONOMETRICS)} 1. EES 800: Econometrics I Simple linear regression and correlation analysis. Specification and estimation of a regression model. Interpretation of regression

More information

Inferential Statistics

Inferential Statistics Inferential Statistics Sampling and the normal distribution Z-scores Confidence levels and intervals Hypothesis testing Commonly used statistical methods Inferential Statistics Descriptive statistics are

More information

Linear Models and Conjoint Analysis with Nonlinear Spline Transformations

Linear Models and Conjoint Analysis with Nonlinear Spline Transformations Linear Models and Conjoint Analysis with Nonlinear Spline Transformations Warren F. Kuhfeld Mark Garratt Abstract Many common data analysis models are based on the general linear univariate model, including

More information

COURSE PLAN BDA: Biomedical Data Analysis Master in Bioinformatics for Health Sciences. 2015-2016 Academic Year Qualification.

COURSE PLAN BDA: Biomedical Data Analysis Master in Bioinformatics for Health Sciences. 2015-2016 Academic Year Qualification. COURSE PLAN BDA: Biomedical Data Analysis Master in Bioinformatics for Health Sciences 2015-2016 Academic Year Qualification. Master's Degree 1. Description of the subject Subject name: Biomedical Data

More information

11. Analysis of Case-control Studies Logistic Regression

11. Analysis of Case-control Studies Logistic Regression Research methods II 113 11. Analysis of Case-control Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:

More information

12/31/2016. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2

12/31/2016. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2 PSY 512: Advanced Statistics for Psychological and Behavioral Research 2 Understand when to use multiple Understand the multiple equation and what the coefficients represent Understand different methods

More information

Interpretation of Somers D under four simple models

Interpretation of Somers D under four simple models Interpretation of Somers D under four simple models Roger B. Newson 03 September, 04 Introduction Somers D is an ordinal measure of association introduced by Somers (96)[9]. It can be defined in terms

More information

Package smoothhr. November 9, 2015

Package smoothhr. November 9, 2015 Encoding UTF-8 Type Package Depends R (>= 2.12.0),survival,splines Package smoothhr November 9, 2015 Title Smooth Hazard Ratio Curves Taking a Reference Value Version 1.0.2 Date 2015-10-29 Author Artur

More information

Nonlinear Regression:

Nonlinear Regression: Zurich University of Applied Sciences School of Engineering IDP Institute of Data Analysis and Process Design Nonlinear Regression: A Powerful Tool With Considerable Complexity Half-Day : Improved Inference

More information

Predicting Customer Default Times using Survival Analysis Methods in SAS

Predicting Customer Default Times using Survival Analysis Methods in SAS Predicting Customer Default Times using Survival Analysis Methods in SAS Bart Baesens Bart.Baesens@econ.kuleuven.ac.be Overview The credit scoring survival analysis problem Statistical methods for Survival

More information

Parametric versus Semi/nonparametric Regression Models

Parametric versus Semi/nonparametric Regression Models Parametric versus Semi/nonparametric Regression Models Hamdy F. F. Mahmoud Virginia Polytechnic Institute and State University Department of Statistics LISA short course series- July 23, 2014 Hamdy Mahmoud

More information

Statistical methods for the comparison of dietary intake

Statistical methods for the comparison of dietary intake Appendix Y Statistical methods for the comparison of dietary intake Jianhua Wu, Petros Gousias, Nida Ziauddeen, Sonja Nicholson and Ivonne Solis- Trapala Y.1 Introduction This appendix provides an outline

More information

Imputing Values to Missing Data

Imputing Values to Missing Data Imputing Values to Missing Data In federated data, between 30%-70% of the data points will have at least one missing attribute - data wastage if we ignore all records with a missing value Remaining data

More information

fifty Fathoms Statistics Demonstrations for Deeper Understanding Tim Erickson

fifty Fathoms Statistics Demonstrations for Deeper Understanding Tim Erickson fifty Fathoms Statistics Demonstrations for Deeper Understanding Tim Erickson Contents What Are These Demos About? How to Use These Demos If This Is Your First Time Using Fathom Tutorial: An Extended Example

More information

Learning outcomes. Knowledge and understanding. Competence and skills

Learning outcomes. Knowledge and understanding. Competence and skills Syllabus Master s Programme in Statistics and Data Mining 120 ECTS Credits Aim The rapid growth of databases provides scientists and business people with vast new resources. This programme meets the challenges

More information

A Bayesian hierarchical surrogate outcome model for multiple sclerosis

A Bayesian hierarchical surrogate outcome model for multiple sclerosis A Bayesian hierarchical surrogate outcome model for multiple sclerosis 3 rd Annual ASA New Jersey Chapter / Bayer Statistics Workshop David Ohlssen (Novartis), Luca Pozzi and Heinz Schmidli (Novartis)

More information

Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets

Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets http://info.salford-systems.com/jsm-2015-ctw August 2015 Salford Systems Course Outline Demonstration of two classification

More information

CHAPTER 9 EXAMPLES: MULTILEVEL MODELING WITH COMPLEX SURVEY DATA

CHAPTER 9 EXAMPLES: MULTILEVEL MODELING WITH COMPLEX SURVEY DATA Examples: Multilevel Modeling With Complex Survey Data CHAPTER 9 EXAMPLES: MULTILEVEL MODELING WITH COMPLEX SURVEY DATA Complex survey data refers to data obtained by stratification, cluster sampling and/or

More information