Using splines in regression



Similar documents
Forecast. Forecast is the linear function with estimated coefficients. Compute with predict command

Package MBA. February 19, Index 7. Canopy LIDAR data

Statistical Models in R

Adequacy of Biomath. Models. Empirical Modeling Tools. Bayesian Modeling. Model Uncertainty / Selection

Piecewise Cubic Splines

Exploratory Data Analysis

Linear Models and Conjoint Analysis with Nonlinear Spline Transformations

ANALYSIS OF TREND CHAPTER 5

Functions: Piecewise, Even and Odd.

Using R for Linear Regression

Smoothing and Non-Parametric Regression

Week 5: Multiple Linear Regression

Trend and Seasonal Components

Computer Graphics. Geometric Modeling. Page 1. Copyright Gotsman, Elber, Barequet, Karni, Sheffer Computer Science - Technion. An Example.

Natural cubic splines

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries

Spatial Interpolation

Regression III: Advanced Methods

FUNCTIONAL DATA ANALYSIS: INTRO TO R s FDA

Fitting Subject-specific Curves to Grouped Longitudinal Data

POLYNOMIAL AND MULTIPLE REGRESSION. Polynomial regression used to fit nonlinear (e.g. curvilinear) data into a least squares linear regression model.

Causal Forecasting Models

CSE 167: Lecture 13: Bézier Curves. Jürgen P. Schulze, Ph.D. University of California, San Diego Fall Quarter 2011

1. The forward curve

Nonlinear relationships Richard Williams, University of Notre Dame, Last revised February 20, 2015

Lecture 3: Linear methods for classification

the points are called control points approximating curve

Simple Methods and Procedures Used in Forecasting

5. Multiple regression

Smoothing. Fitting without a parametrization

A short introduction to splines in least squares regression analysis

Interaction between quantitative predictors

Estimation of σ 2, the variance of ɛ

Nonlinear Regression Functions. SW Ch 8 1/54/

VI. Introduction to Logistic Regression

Discussion Section 4 ECON 139/ Summer Term II

Finite Element Formulation for Plates - Handout 3 -

Binary Number System. 16. Binary Numbers. Base 10 digits: Base 2 digits: 0 1

Smoothing Terms in GAM Models

IAPRI Quantitative Analysis Capacity Building Series. Multiple regression analysis & interpreting results

Numerical integration of a function known only through data points

E(y i ) = x T i β. yield of the refined product as a percentage of crude specific gravity vapour pressure ASTM 10% point ASTM end point in degrees F

GAM for large datasets and load forecasting

1.3 Algebraic Expressions

Computational Assignment 4: Discriminant Analysis

(Refer Slide Time: 1:42)

1 Lecture: Integration of rational functions by decomposition

Multiple Regression in SPSS This example shows you how to perform multiple regression. The basic command is regression : linear.

Regression and Programming in R. Anja Bråthen Kristoffersen Biomedical Research Group

expression is written horizontally. The Last terms ((2)( 4)) because they are the last terms of the two polynomials. This is called the FOIL method.

# load in the files containing the methyaltion data and the source # code containing the SSRPMM functions

Computer Graphics CS 543 Lecture 12 (Part 1) Curves. Prof Emmanuel Agu. Computer Science Dept. Worcester Polytechnic Institute (WPI)

Lab 13: Logistic Regression

EECS 556 Image Processing W 09. Interpolation. Interpolation techniques B splines

Why High-Order Polynomials Should Not be Used in Regression Discontinuity Designs

AP CALCULUS AB 2006 SCORING GUIDELINES (Form B) Question 4

GLAM Array Methods in Statistics

Finite Element Formulation for Beams - Handout 2 -

Session 9 Case 3: Utilizing Available Software Statistical Analysis

Lecture 5 Principal Minors and the Hessian

CSE373: Data Structures and Algorithms Lecture 3: Math Review; Algorithm Analysis. Linda Shapiro Winter 2015

CHAPTER 1 Splines and B-splines an Introduction

Computer Animation: Art, Science and Criticism

Quantile Regression under misspecification, with an application to the U.S. wage structure

Plot and Solve Equations

Support Vector Machines for Classification and Regression

We can display an object on a monitor screen in three different computer-model forms: Wireframe model Surface Model Solid model

Package smoothhr. November 9, 2015

MATH 304 Linear Algebra Lecture 20: Inner product spaces. Orthogonal sets.

Notating the Multilevel Longitudinal Model. Multilevel Modeling of Longitudinal Data. Notating (cont.) Notating (cont.)

Lecture 5 : The Poisson Distribution

How To Understand And Solve Algebraic Equations

Comparing Nested Models

Regression with a Binary Dependent Variable

An Introduction to Calculus. Jackie Nicholas

Semi-Log Model. Semi-Log Model

DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9

Discrete Optimization

4.3 Lagrange Approximation

Time Series Analysis AMS 316

3. Interpolation. Closing the Gaps of Discretization... Beyond Polynomials

Geometric Modelling & Curves

These axioms must hold for all vectors ū, v, and w in V and all scalars c and d.

Overview of Data Fitting Component in Intel Math Kernel Library (Intel MKL) Intel Corporation

The KaleidaGraph Guide to Curve Fitting

We extended the additive model in two variables to the interaction model by adding a third term to the equation.

Linear and quadratic Taylor polynomials for functions of several variables.

Inner Product Spaces

Lets suppose we rolled a six-sided die 150 times and recorded the number of times each outcome (1-6) occured. The data is

Efficient Curve Fitting Techniques

1 Cubic Hermite Spline Interpolation

An Introduction to Modeling Longitudinal Data

Point Lattices in Computer Graphics and Visualization how signal processing may help computer graphics

Viewing Ecological data using R graphics

Partial Fractions. Combining fractions over a common denominator is a familiar operation from algebra:

Two Topics in Parametric Integration Applied to Stochastic Simulation in Industrial Engineering

Pearson's Correlation Tests

Transcription:

Using splines in regression Author: Nicholas G Reich, Jeff Goldsmith This material is part of the statsteachr project Made available under the Creative Commons Attribution-ShareAlike 3.0 Unported License: http://creativecommons.org/licenses/by-sa/3.0/deed.en US

Today s Lecture Spline models Penalized spline regression [More info: Harrel, Regression Modeling Strategies, Chapter 2, PDF handout]

Piecewise linear models A piecewise linear model (also called a change point model or broken stick model) contains a few linear components Outcome is linear over full domain, but with a different slope at different points Points where relationship changes are referred to as change points or knots Often there s one (or a few) potential change points

Piecewise linear models Suppose we want to estimate E(y x) = f (x) using a piecewise linear model. For one knot we can write this as E(y x) = β 0 + β 1 x + β 2 (x κ) + where κ is the location of the change point and (x κ) + =

Interpretation of regression coefficients β 0 = E(y x) = β 0 + β 1 x + β 2 (x κ) + β 1 = β 2 = β 1 + β 2 =

Estimation Piecewise linear models are low-dimensional (no need for penalization) Parameters are estimated via OLS The design matrix is...

Multiple knots Suppose we want to estimate E(y x) = f (x) using a piecewise linear model. For multiple knots we can write this as E(y x) = β 0 + β 1 x + K β k+1 (x κ k ) + k=1 where {κ k } K k=1 are the locations of the change points Note that knot locations are defined before estimating regression coefficients Also, regression coefficients are interpreted conditional on the knots.

Example: lidar data library(mass) library(semipar) ## Error: there is no package called SemiPar data(lidar) ## Warning: data set lidar not found y = lidar$logratio ## Error: object lidar not found range = lidar$range ## Error: object lidar not found qplot(range, y) ## Error: object y not found

Example: lidar data knots <- c(550, 625) mkspline <- function(k, x) (x - k > 0) * (x - k) X.des = cbind(1, range, sapply(knots, FUN = mkspline, x = range)) ## Error: non-numeric argument to binary operator colnames(x.des) <- c("intercept", "range", "range1", "range2") ## Error: object X.des not found lm.lin = lm(y ~ X.des - 1) ## Error: object y not found plot(range, y, xlab = "Range", ylab = "log ratio", pch = 18) ## Error: object y not found points(range, lm.lin$fitted.values, type = "l", col = "red", lwd = 2) ## Error: object lm.lin not found

Example: lidar data summary(lm.lin)$coef ## Error: object lm.lin not found

Piecewise quadratic and cubic models Suppose we want to estimate E(y x) = f (x) using a piecewise quadratic model. For multiple knots we can write this as E(y x) = β 0 + β 1 x + β 1 x 2 + K β k+2 (x κ k ) 2 + k=1 where {κ k } K k=1 are the locations of the change points Similar extension for cubics Piecewise quadratic models are smooth and have continuous first derivatives Often, knots taken as quintiles of the data.

Advantages of piecewise models Piecewise (linear, quadratic, etc) models have several advantages Easy construction of basis functions Flexible, and don t rely on determining an appropriate form for f (x) using standard functions Allow for significance testing on change point slopes Fairly direct interpretations

B-splines and natural splines Characteristics Both B-splines and natural splines similarly define a basis over the domain of x Can be constrained to have seasonal patterns They are made up of piecewise polynomials of a given degree, and have defined derivatives similarly to the piecewise defined functions Big advantage over linear splines: parameter estimation is often fairly robust to your choice of knots Big disadvantage over linear splines: harder to interpret specific coefficients

B-splines basis functions 6 E(y x) = β 0 + β j B j (x) j=1 B j (x) and j=1 B splines sum to 1 inside inner knots 6 Bj (x) 0.0 0.2 0.4 0.6 0.8 1.0 0 2 4 6 8 10 x

Example: lidar data require(splines) lm.bs3 = lm(y ~ bs(range, df = 3)) ## Error: object y not found plot(range, y, xlab = "Range", ylab = "log ratio", pch = 18) ## Error: object y not found points(range, lm.bs3$fitted.values, type = "l", col = "red", lwd = 2) ## Error: object lm.bs3 not found

Example: lidar data lm.bs5 = lm(y ~ bs(range, df = 5)) ## Error: object y not found plot(range, y, xlab = "Range", ylab = "log ratio", pch = 18) ## Error: object y not found points(range, lm.bs5$fitted.values, type = "l", col = "red", lwd = 2) ## Error: object lm.bs5 not found