Applied Regression Analysis and Other Multivariable Methods



Similar documents
Univariate and Multivariate Methods PEARSON. Addison Wesley

Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences

How To Understand Multivariate Models

Multivariate Statistical Inference and Applications

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

QUANTITATIVE METHODS. for Decision Makers. Mik Wisniewski. Fifth Edition. FT Prentice Hall

POLYNOMIAL AND MULTIPLE REGRESSION. Polynomial regression used to fit nonlinear (e.g. curvilinear) data into a least squares linear regression model.

MATHEMATICAL METHODS OF STATISTICS

Regression Modeling Strategies

(and sex and drugs and rock 'n' roll) ANDY FIELD

Elements of statistics (MATH0487-1)

Methods for Meta-analysis in Medical Research

Statistics Graduate Courses

Chapter 5 Analysis of variance SPSS Analysis of variance

Linear Mixed-Effects Modeling in SPSS: An Introduction to the MIXED Procedure

STATISTICA Formula Guide: Logistic Regression. Table of Contents

COURSE PLAN BDA: Biomedical Data Analysis Master in Bioinformatics for Health Sciences Academic Year Qualification.

Statistical Rules of Thumb

Technical report. in SPSS AN INTRODUCTION TO THE MIXED PROCEDURE

SAS Software to Fit the Generalized Linear Model

GLM I An Introduction to Generalized Linear Models

MTH 140 Statistics Videos

IBM SPSS Advanced Statistics 20

11. Analysis of Case-control Studies Logistic Regression

SPSS Advanced Statistics 17.0

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

Simple Linear Regression Inference

Practical. I conometrics. data collection, analysis, and application. Christiana E. Hilmer. Michael J. Hilmer San Diego State University

II. DISTRIBUTIONS distribution normal distribution. standard scores

business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar

Contrast Coding in Multiple Regression Analysis: Strengths, Weaknesses, and Utility of Popular Coding Structures

Section Format Day Begin End Building Rm# Instructor. 001 Lecture Tue 6:45 PM 8:40 PM Silver 401 Ballerini

Factor Analysis. Principal components factor analysis. Use of extracted factors in multivariate dependency models

The Basics of Regression Analysis. for TIPPS. Lehana Thabane. What does correlation measure? Correlation is a measure of strength, not causation!

Regression III: Advanced Methods

1 Theory: The General Linear Model

Overview Classes Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7)

Introduction to Regression and Data Analysis

Analysing Ecological Data

Empirical Model-Building and Response Surfaces

Course Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics

Data Analysis in Management with SPSS Software

Module 3: Correlation and Covariance

Multivariate Logistic Regression

Essentials of Marketing Research

SUGI 29 Statistics and Data Analysis

The importance of graphing the data: Anscombe s regression examples

Part 2: Analysis of Relationship Between Two Variables

CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression

Directions for using SPSS

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Indices of Model Fit STRUCTURAL EQUATION MODELING 2013

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.

Statistical Functions in Excel

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

Using R for Linear Regression

A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution

SPSS ADVANCED ANALYSIS WENDIANN SETHI SPRING 2011

Premaster Statistics Tutorial 4 Full solutions

Module 5: Multiple Regression Analysis

Data Mining Part 5. Prediction

SP10 From GLM to GLIMMIX-Which Model to Choose? Patricia B. Cerrito, University of Louisville, Louisville, KY

AP Statistics: Syllabus 1

Module 223 Major A: Concepts, methods and design in Epidemiology

UNDERSTANDING ANALYSIS OF COVARIANCE (ANCOVA)

Silvermine House Steenberg Office Park, Tokai 7945 Cape Town, South Africa Telephone:

Supplementary PROCESS Documentation

Fairfield Public Schools

Multivariate Analysis of Variance (MANOVA)

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS

T-test & factor analysis

CONTENTS PREFACE 1 INTRODUCTION 1 2 DATA VISUALIZATION 19

Longitudinal Data Analysis

Statistics for Biology and Health

TABLE OF CONTENTS. About Chi Squares What is a CHI SQUARE? Chi Squares Hypothesis Testing with Chi Squares... 2

Computer-Aided Multivariate Analysis

SPSS Guide: Regression Analysis

Chapter 6: Multivariate Cointegration Analysis

HLM software has been one of the leading statistical packages for hierarchical

UNDERSTANDING THE TWO-WAY ANOVA

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION

Exploratory Factor Analysis Brian Habing - University of South Carolina - October 15, 2003

Elementary Statistics Sample Exam #3

Example: Boats and Manatees

LAGUARDIA COMMUNITY COLLEGE CITY UNIVERSITY OF NEW YORK DEPARTMENT OF MATHEMATICS, ENGINEERING, AND COMPUTER SCIENCE

Introduction to Statistics and Quantitative Research Methods

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

STATISTICAL PACKAGE FOR THE SOCIAL SCIENCES

Introduction to General and Generalized Linear Models

2013 MBA Jump Start Program. Statistics Module Part 3

Calculating the Probability of Returning a Loan with Binary Probability Models

Additional sources Compilation of sources:

Least Squares Estimation

240ST014 - Data Analysis of Transport and Logistics

Descriptive Statistics

ANALYSIS OF TREND CHAPTER 5

Simple Predictive Analytics Curtis Seare

13. Poisson Regression Analysis

Transcription:

THIRD EDITION Applied Regression Analysis and Other Multivariable Methods David G. Kleinbaum Emory University Lawrence L. Kupper University of North Carolina, Chapel Hill Keith E. Muller University of North Carolina, Chapel Hill Azhar Nizam Emory University An Alexander Kugushev Book ^P> Duxbury Press An Imprint of Brooks/Cole Publishing Company l P An International Thomson Publishing Company Pacific Grove Albany Belmont Bonn Boston Cincinnati Detroit Johannesburg London Madrid Melbourne Mexico City New York Paris Singapore Tokyo Toronto Washington

Contents 1 CONCEPTS AND EXAMPLES OF RESEARCH 1-1 Concepts 1 1-2 Examples 2 1-3 Concluding Remarks References 6 2 CLASSIFICATION OF VARIABLES AND THE CHOICE OF ANALYSIS 2-1 Classification of Variables 7 2-2 Overlapping of Classification Schemes 11 2-3 Choice of Analysis 11 References 13 3 BASIC STATISTICS: A REVIEW 14 3-1 Preview 14 3-2 Descriptive Statistics 15 3-3 Random Variables and Distributions 16 3-4 Sampling Distributions of /, # 2, and F 19 3-5 Statistical Inference: Estimation 21 3-6 Statistical Inference: Hypothesis Testing 24 xi

xii Contents 3-7 Error Rates, Power, and Sample Size 28 Problems 30 References 33 4 INTRODUCTIONTO REGRESSION ANALYSIS 34 4-1 Preview 34 4-2 Association versus Causality 35 4-3 Statistical versus Deterministic Models 37 4-4 Concluding Remarks 38 References 38 5 STRAIGHT-LINE REGRESSION ANALYSIS 39 5-1 Preview 39 5-2 Regression with a Single Independent Variable 39 5-3 Mathematical Properties of a Straight Line 42 5-4 Statistical Assumptions for a Straight-line Model 43 5-5 Determining the Best-fitting Straight Line 47 5-6 Measure of the Quality of the Straight-line Fit and Estimate of a 2 51 5-7 Inferences About the Slope and Intercept 52 5-8 Interpretations of Tests for Slope and Intercept 54 5-9 Inferences About the Regression Line JU Y \X = ßo + ßi% 57 5-10 Prediction of a New Value of FatX 0 59 5-11 Assessing the Appropriateness of the Straight-line Model 60 Problems 60 References 87 THE CORRELATION COEFFICIENT AND STRAIGHT-LINE REGRESSION ANALYSIS 88 6-1 Definition of r 88 6-2 ras a Measure of Association 89 6-3 The Bivariate Normal Distribution 90 6-4 r and the Strength of the Straight-line Relationship 93 6-5 What r Does Not Measure 95 6-6 Tests of Hypotheses and Confidence Intervals for the Correlation Coefficient 96 6-7 Testing for the Equality of Two Correlations 99 Problems 101 References 103

Contents xiii 7 THE ANALYSIS-OF-VARIANCE TABLE 104 7-1 Preview 104 7-2 The ANOVA Table for Straight-line Regression 104 Problems 108 ß ö MULTIPLE REGRESSION ANALYSIS: GENERAL CONSIDERATIONS 111 8-1 Preview 111 8-2 Multiple Regression Models 112 8-3 Graphical Look at the Problem 113 8-4 Assumptions of Multiple Regression 115 8-5 Determining the Best Estimate of the Multiple Regression Equation 118 8-6 The ANOVA Table for Multiple Regression 119 8-7 Numerical Examples 121 Problems 123 References 135 9 TESTING HYPOTHESES IN MULTIPLE REGRESSION 136 9-1 Preview 136 9-2 Test for Significant Overall Regression 137 9-3 PartialFTest 138 9-4 Multiple PartialFTest 143 9-5 Strategies for Using Partial F Tests 145 9-6 Tests Involving the Intercept 150 Problems 151 References 159 CORRELATIONS: MULTIPLE, PARTIAL, AND MULTIPLE PARTIAL 160 10-1 Preview 160 10-2 Correlation Matrix 161 10-3 Multiple Correlation Coefficient 162 10-4 Relationship of Ry\x h x 2,...,x k to the Multivariate Normal Distribution 164 10-5 Partial Correlation Coefficient 165 10-6 Alternative Representation of the Regression Model 172

xiv Contents 10-7 Multiple Partial Correlation 172 10-8 Concluding Remarks 174 Problems 174 Reference 185 11 CONFOUNDING AND INTERACTION IN REGRESSION 11-1 Preview 186 11-2 Overview 186 11-3 Interaction in Regression 188 11-4 Confounding in Regression 194 11-5 Summary and Conclusions 199 Problems 199 Reference 211 12 REGRESSION DIAGNOSTICS 212 12-1 Preview 212 12-2 Simple Approaches to Diagnosing Problems in Data 212 12-3 Residual Analysis 216 12-4 Treating Outliers 228 12-5 Collinearity 237 12-6 Scaling Problems 248 12-7 Treating Collinearity and Scaling Problems 248 12-8 Alternate Strategies of Analysis 249 12-9 An Important Caution 252 Problems 253 References 279 13 POLYNOMIAL REGRESSION 281 13-1 Preview 281 13-2 Polynomial Models 282 13-3 Least-squares Procedure for Fitting a Parabola 282 13-4 ANOVA Table for Second-order Polynomial Regression 284 13-5 Inferences Associated with Second-order Polynomial Regression 13-6 Example Requiring a Second-order Model 286 13-7 Fitting and Testing Higher-order Models 290 13-8 Lack-of-fit Tests 290 13-9 Orthogonal Polynomials 292

Contents xv 13-10 Strategies for Choosing a Polynomial Model 301 Problems 302 14 DUMMY VARIABLES IN REGRESSION 317 14-1 Preview 317 14-2 Definitions 317 14-3 Rule for Defming Dummy Variables 318 14-4 Comparing Two Straight-line Regression Equations: An Example 319 14-5 Questions for Comparing Two Straight Lines 320 14-6 Methods of Comparing Two Straight Lines 321 14-7 Method I: Using Separate Regression Fits to Compare Two Straight Lines 322 14-8 Method II: Using a Single Regression Equation to Compare Two Straight Lines 327 14-9 Comparison of Methods I and II 330 14-10 Testing Strategies and Interpretation: Comparing Two Straight Lines 330 14-11 Other Dummy Variable Models 332 14-12 Comparing Four Regression Equations 334 14-13 Comparing Several Regression Equations Involving Two Nominal Variables 336 Problems 338 References 360 -, c ANALYSIS OF COVARIANCE AND OTHER 1 ^ METHODS FOR ADJUSTING CONTINUOUS DATA 361 15-1 Preview 361 15-2 Adjustment Problem 361 15-3 Analysis of Covariance 363 15-4 Assumption of Parallelism: A Potential Drawback 365 15-5 Analysis of Covariance: Several Groups and Several Covariates 366 15-6 Comments and Cautions 368 15-7 Summary 371 Problems 371 Reference 385 16 SELECTING THE BEST REGRESSION EQUATION 386 16-1 Preview 386 16-2 Steps in Selecting the Best Regression Equation 387 16-3 Step 1: Specifying the Maximum Model 387 16-4 Step 2: Specifying a Criterion for Selecting a Model 390 16-5 Step 3: Specifying a Strategy for Selecting Variables 392

xvi Contents 16-6 Step 4: Conducting the Analysis 401 16-7 Step 5: Evaluating Reliability with Split Samples 401 16-8 Example Analysis ofactual Data 403 16-9 Issues in Selecting the Most Valid Model 409 Problems 409 References 422 17 ONE-WAY ANALYSIS OFVARIANCE 423 17-1 Preview 423 17-2 One-way ANOVA: The Problem, Assumptions, and Data Configuration 426 17-3 Methodology for One-way Fixed-effects ANOVA 429 17-4 Regression Model for Fixed-effects One-way ANOVA 435 17-5 Fixed-effects Model for One-way ANOVA 438 17-6 Random-effects Model for One-way ANOVA 440 17-7 Multiple-comparison Procedures for Fixed-effects One-way ANOVA 443 17-8 Choosing a Multiple-comparison Technique 456 17-9 Orthogonal Contrasts and Partitioning an ANOVA Sum of Squares 457 Problems 463 References 483 18 RANDOMIZED BLOCKS: SPECIAL CASE OF TWO-WAY ANOVA 18-1 Preview 484 18-2 Equivalent Analysis of a Matched Pairs Experiment 488 18-3 PrincipleofBlocking 491 18-4 Analysis of a Randomized-blocks Experiment 493 18-5 ANOVA Table for a Randomized-blocks Experiment 495 18-6 Regression Models for a Randomized-blocks Experiment 499 18-7 Fixed-effects ANOVA Model for a Randomized-blocks Experiment 502 Problems 503 References 515 19 TWO-WAY ANOVA WITH EQUAL CELL NUMBERS 516 19-1 Preview 516 19-2 Usinga Table of Cell Means 518 19-3 General Methodology 522 19-4 F Tests for Two-way ANOVA 527 19-5 Regression Model for Fixed-effects Two-way ANOVA 530 19-6 Interactions in Two-way ANOVA 534 19-7 Random- and Mixed-effects Two-way ANOVA Models 541 Problems 544 References 560

Contents xvii 20 TWO-WAY ANOVA WITH UNEQUAL CELL NUMBERS 561 20-1 Preview 561 20-2 Problems with Unequal Cell Numbers: Nonorthogonality 563 20-3 Regression Approach for Unequal Cell Sample Sizes 567 20-4 Higher-way ANOVA 571 Problems 572 References 588 21 ANALYSIS OF REPEATED MEASURES DATA 589 21-1 Preview 589 21-2 Examples 590 21-3 General Approach for Repeated Measures ANOVA 592 21-4 Overview of Selected Repeated Measures Designs and ANOVA-based Analyses 594 21-5 Repeated Measures ANOVA for Unbalanced Data 611 21-6 Other Approaches to Analyzing Repeated Measures Data 612 Appendix 21-A Examples of SAS's GLM and MIXED Procedures 613 Problems 616 References 638 22 THE METHOD OF MAXIMUM LIKELIHOOD 639 22-1 Preview 639 22-2 The Principle of Maximum Likelihood 639 22-3 Statistical Inference via Maximum Likelihood 642 22-4 Summary 652 Problems 653 References 655 23 LOGISTIC REGRESSION ANALYSIS 656 23-1 Preview 656 23-2 The Logistic Model 656 23-3 Estimating the Odds Ratio Using Logistic Regression 658 23-4 A Numerical Example of Logistic Regression 664 23-5 Theoretical Considerations 671 23-6 An Example of Conditional ML Estimation Involving Pair-matched Data with Unmatched Covariates 677 23-7 Summary 681 Problems 682 References 686

xviii Contents 24 POISSON REGRESSION ANALYSIS 687 24-1 Preview 687 24-2 The Poisson Distribution 687 24-3 An Example of Poisson Regression 688 24-4 Poisson Regression: General Considerations 690 24-5 Measures of Goodness of Fit 694 24-6 Continuation of Skin Cancer Data Example 696 24-7 A Second Illustration of Poisson Regression Analysis 701 24-8 Summary 704 Problems 705 References 709 A APPENDIX A TABLES 711 A-l Standard Normal Cumulative Probabilities 712 A-2 Percentiles of the t Distribution 715 A-3 Percentiles of the Chi-square Distribution 716 A-4 Percentiles of the F Distribution 717 A-5 Values off In- 1 +r 1 -r 724 A-6 Upper a Point of Studentized Range 726 A-7 Orthogonal Polynomial Coefficients 728 A-8 Bonferroni Corrected Jackknife and Studentized Residual Critical Values 729 A-9 Critical Values for Leverages 730 A-10 Critical Values for the Maximum of N Values of Cook's d(i) times (n-k-\) 731 B APPENDIX B MATRICES AND THEIR RELATIONSHIP TO REGRESSION ANALYSIS 732 APPENDIX C ANOVA INFORMATION FOR FOUR COMMON BALANCED REPEATED MEASURES DESIGNS 744 C-1 Balanced Repeated Measures Design with One Crossover Factor (Treatments) 744 C-2 Balanced Repeated Measures Design with Two Crossover Factors 746 C-3 Balanced Repeated Measures Design with One Nest Factor (Treatments) 750 C-4 Balanced Repeated Measures Design with One Crossover Factor and One Nest Factor 752 C-5 Balanced Two-group Pre/Posttest Design 755 References 757 ) SOLUTIONS TO EXERCISES 758 INDEX 787