We extended the additive model in two variables to the interaction model by adding a third term to the equation.



Similar documents
Comparing Nested Models

Interaction between quantitative predictors

Multiple Linear Regression

Statistical Models in R

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

Using R for Linear Regression

Correlation and Simple Linear Regression

EDUCATION AND VOCABULARY MULTIPLE REGRESSION IN ACTION

N-Way Analysis of Variance

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries

ANOVA. February 12, 2015

Psychology 205: Research Methods in Psychology

SPSS Guide: Regression Analysis

Generalized Linear Models

Chapter 7: Simple linear regression Learning Objectives

Factors affecting online sales

Chapter 13 Introduction to Linear Regression and Correlation Analysis

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

5. Linear Regression

MIXED MODEL ANALYSIS USING R

Statistical Models in R

DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9

Testing for Lack of Fit

Lucky vs. Unlucky Teams in Sports

Regression Analysis: A Complete Example

Lets suppose we rolled a six-sided die 150 times and recorded the number of times each outcome (1-6) occured. The data is

E(y i ) = x T i β. yield of the refined product as a percentage of crude specific gravity vapour pressure ASTM 10% point ASTM end point in degrees F

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

MULTIPLE REGRESSION EXAMPLE

Week 5: Multiple Linear Regression

A Handbook of Statistical Analyses Using R. Brian S. Everitt and Torsten Hothorn

Module 5: Multiple Regression Analysis

Stat 5303 (Oehlert): Tukey One Degree of Freedom 1

KSTAT MINI-MANUAL. Decision Sciences 434 Kellogg Graduate School of Management

Regression step-by-step using Microsoft Excel

Introduction to Regression and Data Analysis

Premaster Statistics Tutorial 4 Full solutions

n + n log(2π) + n log(rss/n)

Multivariate Logistic Regression

Independent t- Test (Comparing Two Means)

Time Series Analysis

Exchange Rate Regime Analysis for the Chinese Yuan

11. Analysis of Case-control Studies Logistic Regression

Regression Analysis (Spring, 2000)

Week TSX Index

Chapter 5 Analysis of variance SPSS Analysis of variance

Lecture 11: Confidence intervals and model comparison for linear regression; analysis of variance

Part 2: Analysis of Relationship Between Two Variables

ANALYSING LIKERT SCALE/TYPE DATA, ORDINAL LOGISTIC REGRESSION EXAMPLE IN R.

MSwM examples. Jose A. Sanchez-Espigares, Alberto Lopez-Moreno Dept. of Statistics and Operations Research UPC-BarcelonaTech.

MODEL I: DRINK REGRESSED ON GPA & MALE, WITHOUT CENTERING

Final Exam Practice Problem Answers

5. Multiple regression

Difference of Means and ANOVA Problems

Simple linear regression

Chicago Insurance Redlining - a complete example

1. The parameters to be estimated in the simple linear regression model Y=α+βx+ε ε~n(0,σ) are: a) α, β, σ b) α, β, ε c) a, b, s d) ε, 0, σ

Module 5: Statistical Analysis

A Primer on Forecasting Business Performance

Getting Correct Results from PROC REG

Multiple Regression in SPSS This example shows you how to perform multiple regression. The basic command is regression : linear.

Stat 412/512 CASE INFLUENCE STATISTICS. Charlotte Wickham. stat512.cwick.co.nz. Feb

ANALYSIS OF TREND CHAPTER 5

Univariate Regression

Applied Statistics. J. Blanchet and J. Wadsworth. Institute of Mathematics, Analysis, and Applications EPF Lausanne

This can dilute the significance of a departure from the null hypothesis. We can focus the test on departures of a particular form.

STAT 350 Practice Final Exam Solution (Spring 2015)

MGT 267 PROJECT. Forecasting the United States Retail Sales of the Pharmacies and Drug Stores. Done by: Shunwei Wang & Mohammad Zainal

CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression

A Predictive Model for NFL Rookie Quarterback Fantasy Football Points

Lecture 8: Gamma regression

2. What is the general linear model to be used to model linear trend? (Write out the model) = or

STATISTICA Formula Guide: Logistic Regression. Table of Contents

Stock Price Forecasting Using Information from Yahoo Finance and Google Trend

International Statistical Institute, 56th Session, 2007: Phil Everson

Chapter 3 Quantitative Demand Analysis

Introduction to Multilevel Modeling Using HLM 6. By ATS Statistical Consulting Group

POLYNOMIAL AND MULTIPLE REGRESSION. Polynomial regression used to fit nonlinear (e.g. curvilinear) data into a least squares linear regression model.

Discussion Section 4 ECON 139/ Summer Term II

Exercise 1.12 (Pg )

August 2012 EXAMINATIONS Solution Part I

Survey, Statistics and Psychometrics Core Research Facility University of Nebraska-Lincoln. Log-Rank Test for More Than Two Groups

Simple Methods and Procedures Used in Forecasting

Good luck! BUSINESS STATISTICS FINAL EXAM INSTRUCTIONS. Name:

Causal Forecasting Models

What is R? R s Advantages R s Disadvantages Installing and Maintaining R Ways of Running R An Example Program Where to Learn More

2013 MBA Jump Start Program. Statistics Module Part 3

Nonlinear Regression Functions. SW Ch 8 1/54/

Use of deviance statistics for comparing models

CHAPTER 7. Exercise Solutions

R: A Free Software Project in Statistical Computing

The Latent Variable Growth Model In Practice. Individual Development Over Time

Chapter 7: Dummy variable regression

Time-Series Regression and Generalized Least Squares in R

Multiple Linear Regression in Data Mining

2. Simple Linear Regression

Unit 26: Small Sample Inference for One Mean

Transcription:

Quadratic Models We extended the additive model in two variables to the interaction model by adding a third term to the equation. Similarly, we can extend the linear model in one variable to the quadratic model by adding a second term to the equation: E(Y ) = β 0 + β 1 x + β 2 x 2. This a special case of the two-variable model with x 1 = x and x 2 = x 2. E(Y ) = β 0 + β 1 x 1 + β 2 x 2 1 / 16 Multiple Linear Regression Quadratic Models

Example: immune system and exercise x = maximal oxygen uptake (VO 2 max, ml/(kg min)); y = immunoglobulin level (IgG, mg/dl); data for 30 subjects (AEROBIC.txt). Get the data and plot them: aerobic <- read.table("text/exercises&examples/aerobic.txt", header = TRUE) plot(aerobic[, c("maxoxy", "IGG")]) Slight curvature suggests a linear model may not fit. 2 / 16 Multiple Linear Regression Quadratic Models

Check the linear model: plot(lm(igg ~ MAXOXY, aerobic)) Graph of residuals against fitted values shows definite curvature. Fit and summarize the quadratic model: aerobiclm <- lm(igg ~ MAXOXY + I(MAXOXY^2), aerobic) summary(aerobiclm) 3 / 16 Multiple Linear Regression Quadratic Models

Output Call: lm(formula = IGG ~ MAXOXY + I(MAXOXY^2), data = aerobic) Residuals: Min 1Q Median 3Q Max -185.375-82.129 1.047 66.007 227.377 Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) -1464.4042 411.4012-3.560 0.00140 ** MAXOXY 88.3071 16.4735 5.361 1.16e-05 *** I(MAXOXY^2) -0.5362 0.1582-3.390 0.00217 ** --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 Residual standard error: 106.4 on 27 degrees of freedom Multiple R-squared: 0.9377, Adjusted R-squared: 0.9331 F-statistic: 203.2 on 2 and 27 DF, p-value: < 2.2e-16 4 / 16 Multiple Linear Regression Quadratic Models

The quadratic term I(MAXOXY^2) is significant, so we reject the null hypothesis that the linear model is acceptable. The quadratic term is negative, which is consistent with the concavity of the curve. The other two t-ratios test irrelevant hypotheses, because the quadratic term is important. Extrapolation: the fitted curve has a maximum at MAXOXY = 88.3071 2 0.5362 82 and declines for higher MAXOXY, which seems unlikely to represent the real relationship. 5 / 16 Multiple Linear Regression Quadratic Models

An alternative analysis The graph of IGG against log(maxoxy) is more linear: with(aerobic, plot(log(maxoxy), IGG)) aerobiclm2 <- lm(igg ~ log(maxoxy), aerobic) summary(aerobiclm2) with(aerobic, plot(maxoxy, IGG)) with(aerobic, lines(sort(maxoxy), fitted(aerobiclm)[order(maxoxy)], col = "blue")) with(aerobic, lines(sort(maxoxy), fitted(aerobiclm2)[order(maxoxy)], col = "red")) The fitted curve continues to increase indefinitely, but with diminishing slope. 6 / 16 Multiple Linear Regression Quadratic Models

Output Call: lm(formula = IGG ~ log(maxoxy), data = aerobic) Residuals: Min 1Q Median 3Q Max -165.455-88.651-2.395 55.756 218.934 Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) -4885.71 324.33-15.06 5.87e-15 *** log(maxoxy) 1653.38 83.07 19.90 < 2e-16 *** --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 Residual standard error: 107.6 on 28 degrees of freedom Multiple R-squared: 0.934, Adjusted R-squared: 0.9316 F-statistic: 396.1 on 1 and 28 DF, p-value: < 2.2e-16 7 / 16 Multiple Linear Regression Quadratic Models

More Complex Models ST 430/514 Complete second-order model When the first-order model E(Y ) = β 0 + β 1 x 1 + β 2 x 2 is inadequate, the interaction model E(Y ) = β 0 + β 1 x 1 + β 2 x 2 + β 3 x 1 x 2 may be better, but sometimes a complete second-order model is needed: E(Y ) = β 0 + β 1 x 1 + β 2 x 2 + β 3 x 1 x 2 + β 4 x 2 1 + β 5 x 2 2 8 / 16 Multiple Linear Regression More Complex Models

Example: cost of shipping packages Get the data and plot them: express <- read.table("text/exercises&examples/express.txt", header = TRUE) pairs(express) Fit the complete second-order model and summarize it: expresslm <- lm(cost ~ Weight * Distance + I(Weight^2) + I(Distance^2), express) summary(expresslm) plot(expresslm) 9 / 16 Multiple Linear Regression More Complex Models

Output ST 430/514 Call: lm(formula = Cost ~ Weight * Distance + I(Weight^2) + I(Distance^2), data = express) Residuals: Min 1Q Median 3Q Max -0.86027-0.19898-0.00885 0.16531 0.94396 Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) 8.270e-01 7.023e-01 1.178 0.258588 Weight -6.091e-01 1.799e-01-3.386 0.004436 ** Distance 4.021e-03 7.998e-03 0.503 0.622999 I(Weight^2) 8.975e-02 2.021e-02 4.442 0.000558 *** I(Distance^2) 1.507e-05 2.243e-05 0.672 0.512657 Weight:Distance 7.327e-03 6.374e-04 11.495 1.62e-08 *** --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 Residual standard error: 0.4428 on 14 degrees of freedom Multiple R-squared: 0.9939, Adjusted R-squared: 0.9918 F-statistic: 458.4 on 5 and 14 DF, p-value: 5.371e-15 10 / 16 Multiple Linear Regression More Complex Models

Qualitative Variables A qualitative variable (or factor) is one that indicates membership of different categories. E.g., a person s gender = male or female: a qualitative variable with two levels, indicating membership of one of two categories. E.g., package type = Fragile, Semifragile, or Durable: three levels, corresponding to three categories. 11 / 16 Multiple Linear Regression More Complex Models

We code a qualitative variable using indicator (dummy) variables: Choose one level to use as a base or reference level, say male or Durable. For each other level, create a variable { 1 if this item is in this category x j = 0 otherwise. For gender, there is only one other category, so the only indicator variable is { 1 for a female x = 0 for a male. 12 / 16 Multiple Linear Regression More Complex Models

For packages, there are two other categories, so the indicator variables are { 1 for a Fragile package x Fragile = 0 otherwise, { 1 for a Semifragile package x Semifragile = 0 otherwise, For any item, at most one of the indicator variables is non-zero, indicating a non-base category; if they are all zero, the item belongs to the base category. 13 / 16 Multiple Linear Regression More Complex Models

Example: shipment cost of packages, by type. Get the data and plot them: cargo <- read.table("text/exercises&examples/cargo.txt", header = TRUE) plot(cost ~ CARGO, cargo) Fit and summarize the model: cargolm <- lm(cost ~ CARGO, cargo) summary(cargolm) 14 / 16 Multiple Linear Regression More Complex Models

Output Call: lm(formula = COST ~ CARGO, data = cargo) Residuals: Min 1Q Median 3Q Max -2.20-1.80-1.00 1.05 4.24 Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) 3.260 1.075 3.032 0.0104 * CARGOFragile 9.740 1.521 6.405 3.38e-05 *** CARGOSemiFrag 5.440 1.521 3.577 0.0038 ** --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 Residual standard error: 2.404 on 12 degrees of freedom Multiple R-squared: 0.7745, Adjusted R-squared: 0.7369 F-statistic: 20.61 on 2 and 12 DF, p-value: 0.0001315 15 / 16 Multiple Linear Regression More Complex Models

Note that the intercept is the fitted value for CARGOFragile = 0 and CARGOSemiFrag = 0; that is, for Durable packages. The coefficients of CARGOFragile and CARGOSemiFrag measure the differences between those categories and Durable. The overall model F -test is the same as the analysis of variance test: cargoaov <- aov(cost ~ CARGO, cargo) summary(cargoaov) Output Df Sum Sq Mean Sq F value Pr(>F) CARGO 2 238.25 119.13 20.61 0.000132 *** Residuals 12 69.37 5.78 --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 16 / 16 Multiple Linear Regression More Complex Models