A PRIMER ON REGRESSION SPLINES. 1. Overview

A PRIMER ON REGRESSION SPLINES JEFFREY S. RACINE 1. Overvew B-splnes consttute an appealng method for the nonparametrc estmaton of a range of statstcal objects of nterest. In ths prmer we focus our attenton on the estmaton of a condtonal mean,.e. the regresson functon. A splne s a functon that s constructed pece-wse from polynomal functons. The term comes from the tool used by shpbulders and drafters to construct smooth shapes havng desred propertes. Drafters have long made use of a bendable strp fxed n poston at a number of ponts that relaxes to form a smooth curve passng through those ponts. The malleablty of the splne materal combned wth the constrant of the control ponts would cause the strp to take the shape that mnmzed the energy requred for bendng t between the fxed ponts, ths beng the smoothest possble shape. We shall rely on a class of splnes called B-splnes ( bass-splnes ). A B-splne functon s the maxmally dfferentable nterpolatve bass functon. The B-splne s a generalzaton of the Bézer curve (a B-splne wth no nteror knots s a Bézer curve). B-splnes are defned by ther order m and number of nteror knots N (there are two endponts whch are themselves knots so the total number of knots wll be N+2). The degree of the B-splne polynomal wll be the splne order m mnus one (degree = m 1). To best apprecate the nature of B-splnes, we shall frst consder a smple type of splne, the Bézer functon, and then move on to the more flexble and powerful generalzaton, the B-splne tself. We begn wth the unvarate case n Secton 2 where we consder the unvarate Bézer functon. In Secton 3 we turn to the unvarate B-splne functon, and then n Secton 4 we turn to the multvarate case where we also brefly menton how one could handle the presence of categorcal predctors. We presume that nterest les n regresson splne methodology whch dffers n a number of ways from smoothng splnes, both of whch are popular n appled settngs. The fundamental dfference between the two approaches s that smoothng splnes explctly penalze roughness and use the data ponts themselves as potental knots whereas regresson splnes place knots at equdstant/eququantle ponts. We drect the nterested reader to Wahba (1990) for a treatment of smoothng splnes. Date: December 18, 2014. These notes are culled from a varety of sources. I am solely responsble for all errors. Suggestons are welcomed (racnej@mcmaster.ca). 1

2 JEFFREY S. RACINE 2. Bézer curves We present an overvew of Bézer curves whch form the bass for the B-splnes that follow. We begn wth a smple llustraton, that of a quadratc Bézer curve. Example 2.1. A quadratc Bézer curve. A quadratc Bézer curve s the path traced by the functon B(x), gven ponts β 0, β 1, and β 2, where B(x) = β 0 (1 x) 2 +2β 1 (1 x)x+β 2 x 2 = 2 β B (x), x [0,1]. The terms B 0 (x) = (1 x) 2, B 1 (x) = 2(1 x)x, and B 2 (x) = x 2 are the bases whch s ths case turn out to be Bernsten polynomals (Bernsten (1912)). For our purposes the control ponts β, = 0,1,2, wll be parameters that could be selected by least squares fttng n a regresson settng, but more on that later. Consder the followng smple example where we plot a quadratc Bézer curve wth arbtrary control ponts: B(x) 0.5 1.0 1.5 2.0 For ths smple llustraton we set β 0 = 1, β 1 = 1, β 2 = 2. Note that the dervatve of ths curve s B (x) = 2(1 x)(β 1 β 0 )+2x(β 2 β 1 ), whch s a polynomal of degree one. Ths example of a Bézer curve wll also be seen to be a second-degree B-splne wth no nteror knots or, equvalently, a thrd-order B-splne wth no nteror knots. Usng the termnology of B-splnes, n ths example we have a thrd-order B-splne (m = 3) whch s of polynomal degree two (m 1 = 2) havng hghest dervatve of polynomal degree one (m 2 = 1). x

A PRIMER ON REGRESSION SPLINES 3 2.1. The Bézer curve defned. More generally, a Bézer curve of degree n(order m) s composed of m = n+1 terms and s gven by n ( ) n B(x) = β (1 x) n x (1) = n β B,n (x), where ( ) n = n! (n )!!, whch can be expressed recursvely as ( n 1 ) ( n ) B(x) = (1 x) β B,n 1 (x) +x β B,n 1 (x), so a degree n Bézer curve s a lnear nterpolaton between two degree n 1 Bézer curves. =1 Example 2.2. A quadratc Bézer curve as a lnear nterpolaton between two lnear Bézer curves. The lnear Bézer curve s gven by β 0 (1 x)+β 1 x, and above we showed that the quadratc Bézer curve s gven by β 0 (1 x) 2 +2β 1 (1 x)x+β 2 x 2. So, when n = 2 (quadratc), we have B(x) = (1 x)(β 0 (1 x)+β 1 x)+x(β 1 (1 x)+β 2 x) = β 0 (1 x) 2 +2β 1 (1 x)x+β 2 x 2. Ths s essentally a modfed verson of the dea of takng lnear nterpolatons of lnear nterpolatons of lnear nterpolatons and so on. Note that the polynomals ( ) n B,n (x) = (1 x) n x are called Bernsten bass polynomals of degree n and are such that n B,n(x) = 1, unlke raw polynomals. 1 The m = n + 1 control ponts β, = 0,...,n, are somewhat ancllary to the dscusson here, but wll fgure promnently when we turn to regresson as n a regresson settng they wll be the coeffcents of the regresson model. 1 Naturally we defne x 0 = (1 x) 0 = 1, and by raw polynomals we smply mean x j, j = 0,...,n.

4 JEFFREY S. RACINE Example 2.3. The quadratc Bézer curve bass functons. The fgure below presents the bases B,n (x) underlyng a Bézer curve for = 1,...,2 and n = 2. B(x) x These bases are B 0,2 (x) = (1 x) 2, B 1,2 (x) = 2(1 x)x, and B 2,2 (x) = x 2 and llustrate the foundaton upon whch the Bézer curves are bult. 2.2. Dervatves of splne functons. From de Boor (2001) we know that the dervatves of splne functons can be smply expressed n terms of lower order splne functons. In partcular, for the Bézer curve we have where β (0) n l B (l) (x) = β (l) B,n l (x), = β, 0 n, and ( ) β (l) = (n l) β (l 1) +1 β (l 1) /(t t n+l ), 0 n l. See Zhou & Wolfe (2000) for detals. We now turn our attenton to the B-splne functon. Ths can be thought of as a generalzaton of the Bézer curve where we now allow for there to be addtonal breakponts called nteror knots. 3. B-splnes 3.1. B-splne knots. B-splne curves are composed from many polynomal peces and are therefore more versatle than Bézer curves. Consder N +2 real values t, called knots (N 0 are called nteror knots and there are always two endponts, t 0 and t N+1 ), wth t 0 t 1 t N+1. When the knots are equdstant they are sad to be unform, otherwse they are sad to be nonunform. One popular type of knot s the quantle knot sequence where the nteror knots are the quantles from the emprcal dstrbuton of the underlyng varable. Quantle knots guarantee that

A PRIMER ON REGRESSION SPLINES 5 an equal number of sample observatons le n each nterval whle the ntervals wll have dfferent lengths (as opposed to dfferent numbers of ponts lyng n equal length ntervals). Bézer curves possess two endpont knots, t 0 and t 1, and no nteror knots hence are a lmtng case,.e. a B-splne for whch N = 0. 3.2. The B-splne bass functon. Let t = {t Z be a sequence of non-decreasng real numbers (t t +1 ) such that 2 Defne the augmented the knot set t 0 t 1 t N+1. t (m 1) = = t 0 t 1 t N t N+1 = = t N+m, where we have appended the lower and upper boundary knots t 0 and t 1 n = m 1 tmes (ths s needed due to the recursve nature of the B-splne). If we wanted we could then reset the ndex for the frst element of the augmented knot set (.e. t (m 1) ) so that the N +2m augmented knots t are now ndexed by = 0,...,N +2m 1 (see the example below for an llustraton). Foreachoftheaugmentedknotst, = 0,...,N+2m 1, werecursvelydefneasetofreal-valued functons B,j (for j = 0,1,...,n, n beng the degree of the B-splne bass) as follows: B,0 (x) = { 1 f t x < t +1 0 otherwse. where B,j+1 (x) = α,j+1 (x)b,j (x)+[1 α +1,j+1 (x)]b +1,j (x), For the above computaton we defne 0/0 as 0. Defntons. Usng the notaton above: x t f t +j t α,j (x) = t +j t 0 otherwse. (1) the sequence t s known as a knot sequence, and the ndvdual term n the sequence s a knot. (2) the functons B,j are called the -th B-splne bass functons of order j, and the recurrence relaton s called the de Boor recurrence relaton, after ts dscoverer Carl de Boor (de Boor (2001)). (3) gven any non-negatve nteger j, the vector space V j (t) over R, generated by the set of all B-splne bass functons of order j s called the B-splne of order j. In other words, the B-splne V j (t) = span{b,j (x) = 0,1,... over R. (4) Any element of V j (t) s a B-splne functon of order j. 2 Ths descrpton s based upon the dscusson found at http://planetmath.org/encyclopeda/bsplne.html.

6 JEFFREY S. RACINE The frst term B 0,n s often referred to as the ntercept. In typcal splne mplementatons the opton ntercept=false denotes droppng ths term whle ntercept=true denotes keepng t (recall that n B,n(x) = 1 whch can lead to perfect multcollnearty n a regresson settng; also see Zhou & Wolfe (2000) who nstead apply shrnkage methods). Example 3.4. A fourth-order B-splne bass functon wth three nteror knots and ts frst dervatve functon. Suppose there are N = 3 nteror knots gven by (0.25,0.5,0.75), the boundary knots are (0,1), and the degree of the splne s n = 3 hence the order s m = 4. The set of all knot ponts needed to construct the B-splne s (0,0,0,0,0.25,0.5,0.75,1,1,1,1) and the number of bass functons s K = N +m = 7. The seven cubc splne bass functons wll be denoted B 0,3,...,B 6,3. The fgure below presents ths example of a thrd degree B-splne wth three nteror knots along wth ts frst dervatve (the splne dervatves would be requred n order to compute dervatves from the splne regresson model). B B.derv 10 5 0 5 10 x x To summarze, n ths llustraton we have an order m = 4 (degree = 3) B-splne (left) wth 4 sub-ntervals (segments) usng unform knots (N = 3 nteror knots, 5 knots n total (2 endpont knots)) and ts 1st-order dervatve (rght). The dmenson of B(x) s K = N +m = 7. See the appendx for R code (R Development Core Team (2011)) that mplements the B-splne bass functon. 3.3. The B-splne functon. A B-splne of degree n (of splne order m = n+1) s a parametrc curve composed of a lnear combnaton of bass B-splnes B,n (x) of degree n gven by (2) B(x) = N+n β B,n (x), x [t 0,t N+1 ].

A PRIMER ON REGRESSION SPLINES 7 The β are called control ponts or de Boor ponts. For an order m B-splne havng N nteror knots there are K = N +m = N +n+1 control ponts (one when j = 0). The B-splne order m must be at least 2 (hence at least lnear,.e. degree n s at least 1) and the number of nteror knots must be non-negatve (N 0). See the appendx for R code (R Development Core Team (2011)) that mplements the B-splne functon. 4. Multvarate B-splne regresson The functonal form of parametrc regresson models must naturally be specfed by the user. Typcally practtoners rely on raw polynomals and also often choose the form of the regresson functon (.e. the order of the polynomal for each predctor) n an ad-hoc manner. However, raw polynomals are not suffcently flexble for our purposes, partcularly because they possess no nteror knots whch s where B-splnes derve ther unque propertes. Furthermore, n a regresson settng we typcally encounter multple predctors whch can be contnuous or categorcal n nature, and tradtonal splnes are for contnuous predctors. Below we brefly descrbe a multvarate kernel weghted tensor product B-splne regresson method(kernel weghtng s used to handle the presence of the categorcal predctors). Ths method s mplemented n the R package crs (Racne & Ne (2011)). 4.1. Multvarate knots, ntervals, and splne bases. In general we wll have q predctors, X = (X 1,...,X q ) T. We assume that each X l, 1 l q, s dstrbuted on a compact nterval [a l,b l ], and wthout loss of generalty, we take all ntervals [a l,b l ] = [0,1]. Let G l = G (m l 2) l be the space of polynomal splnes of order m l. We note that G l conssts of functons satsfyng () s a polynomal of degree m l 1 on each of the sub-ntervals I jl,l,j l = 0,...,N l ; () for m l 2, s m l 2 tmes contnuously dfferentable on [0,1]. Pre-select an nteger N l = N n,l. Dvde [a l,b l ] = [0,1] nto (N l +1) sub-ntervals I jl,l = [t jl,l,t jl +1,l), j l = 0,...,N l 1, I Nl,l = [t Nl,l,1], where {t jl,l N l j l =1 s a sequence of equally-spaced ponts, called nteror knots, gven as t (ml 1),l = = t 0,l = 0 < t 1,l < < t Nl,l < 1 = t Nl +1,l = = t Nl +m l,l, n whch t jl,l = j l h l, j l = 0,1...,N l +1, h l = 1/(N l +1) s the dstance between neghborng knots. Let K l = K n,l = N l +m l, where N l s the number of nteror knots and m l s the splne order, and let B l (x l ) = {B jl,l(x l ) : 1 m l j l N l T be a bass system of the space G l.

8 JEFFREY S. RACINE We defne the space of tensor-product polynomal splnes by G = q l=1 G l. It s clear that G s a lnear space of dmenson K n = q l=1 K l. Then 3 [ {Bj1 B(x) =,...,j q (x) N 1,...,N q j 1 =1 m 1,...,j q=1 m q ]K = B 1(x 1 ) B q (x q ) n 1 s a bass system of the space G, where x =(x l ) q l=1. Let B = [{B(X 1 ),...,B(X n ) T] n K n. 4.2. Splne regresson. In what follows we presume that the reader s nterested n the unknown condtonal mean n the followng locaton-scale model, (3) Y = g(x,z)+σ(x,z)ε, where g( ) s an unknown functon, X =(X 1,...,X q ) T s a q-dmensonal vector of contnuous predctors, and Z = (Z 1,...,Z r ) T s an r-dmensonal vector of categorcal predctors. Lettng z = (z s ) r s=1, we assume that z s takes c s dfferent values n D s {0,1,...,c s 1, s = 1,...,r, and let c s be a fnte postve constant. Let ( Y,X T ) n,zt =1 be an..d copy of ( Y,X T,Z T). Assume for 1 l q, each X l s dstrbuted on a compact nterval [a l,b l ], and wthout loss of generalty, we take all ntervals [a l,b l ] = [0,1]. In order to handle the presence of categorcal predctors, we defne (4) { 1,when Z s = z s l(z s,z s,λ s ) =, λ s, otherwse. r r L(Z,z,λ) = l(z s,z s,λ s ) = s=1 s=1 λ 1(Zs zs) s, where l( ) s a varant of Atchson & Atken s (1976) unvarate categorcal kernel functon, L( ) s a product categorcal kernel functon, and λ = (λ 1,λ 2,...,λ r ) T s the vector of bandwdths for each of the categorcal predctors. See Ma, Racne & Yang (under revson) and Ma & Racne (2013) for further detals. We estmate β(z) by mnmzng the followng weghted least squares crteron, β(z) = arg mn β R Kn n =1 { Y B(X ) T β 2L(Z,z,λ). Let L z = dag{l(z 1,z,λ),...,L(Z n,z,λ) be a dagonal matrx wth L(Z,z,λ), 1 n as the dagonal entres. Then β(z) can be wrtten as (5) β(z) = ( n 1 B T L z B ) 1( n 1 B T L z Y ), 3 The notaton here may throw off those used to sums of the form n =1, n > 0 (.e. sum ndces that are postve ntegers), so consder a smple llustraton that may defuse ths ssue. Suppose there are no nteror knots (N = 0) and we consder a quadratc (degree n equal to two hence the splne order s three). Then N =1 m contans three terms havng ndces = 2, 1,0. In general the number of terms s the number the number of nteror knots N plus the splne order m, whch we denote K = N +m. We could alternatvely sum from 1 to N +m, or from 0 to N + m 1 of from 0 to N + n (the latter beng consstent wth the Bézer curve defnton n (1) and the B-splne defnton n (2)).

A PRIMER ON REGRESSION SPLINES 9 where Y =(Y 1,...,Y n ) T. g(x,z) s estmated by ĝ(x,z) = B(x) T β(z). See the appendx for R code (R Development Core Team (2011)) that mplements the B-splne bass functon and then uses least squares to construct the regresson model for a smulated data generatng process. References Atchson, J. & Atken, C. G. G. (1976), Multvarate bnary dscrmnaton by the kernel method, Bometrka 63(3), 413 420. Bernsten, S. (1912), Démonstraton du théorème de Weerstrass fonde sur le calcul des probabltes, Comm. Soc. Math. Kharkov 13, 1 2. de Boor, C. (2001), A practcal gude to splnes, Sprnger. Ma, S. & Racne, J. S. (2013), Addtve regresson splnes wth rrelevant categorcal and contnuous regressors, Statstca Snca 23, 515 541. Ma, S., Racne, J. S. & Yang, L. (under revson), Splne regresson n the presence of categorcal predctors, Journal of Appled Econometrcs. R Development Core Team (2011), R: A Language and Envronment for Statstcal Computng, R Foundaton for Statstcal Computng, Venna, Austra. ISBN 3-900051-07-0. URL: http://www.r-project.org/ Racne, J. S. & Ne, Z. (2011), crs: Categorcal Regresson Splnes. R package verson 0.14-9. Wahba, G. (1990), Splne Models for Observatonal Data, SIAM. Zhou, S. & Wolfe, D. A. (2000), On dervatve estmaton n splne regresson, Statstca Snca 10, 93 108.

10 JEFFREY S. RACINE Appendx A. Sample R code for constructng B-splnes The followng code uses recurson to compute the B-splne bass and B-splne functon. Then a smple llustraton demonstrates how one could mmedately compute a least-squares ft usng the B-splne. In the sprt of recurson, t has been sad that To terate s human; to recurse dvne. (L. Peter Deutsch). R Code for Implementng B-splne bass functons and the B-splne tself. ## $Id: splne_prmer.rnw,v 1.29 2013/01/22 17:43:52 jracne Exp jracne $ ## Aprl 23 2011. The code below s based upon an llustraton that ## can be found n http://www.stat.tamu.edu/~snha/research/note1.pdf ## by Dr. Samran Snha (Department of Statstcs, Texas A&M). I am ## solely to blame for any errors and can be contacted at ## racnej@mcmaster.ca (Jeffrey S. Racne). ## Ths functon s a (smplfed) R mplementaton of the bs() ## functon n the splnes lbrary and llustrates how the Cox-de Boor ## recurson formula s used to construct B-splnes. bass <- functon(x, degree,, knots) { f(degree == 0){ B <- felse((x >= knots[]) & (x < knots[+1]), 1, 0) else { f((knots[degree+] - knots[]) == 0) { alpha1 <- 0 else { alpha1 <- (x - knots[])/(knots[degree+] - knots[]) f((knots[+degree+1] - knots[+1]) == 0) { alpha2 <- 0 else { alpha2 <- (knots[+degree+1] - x)/(knots[+degree+1] - knots[+1]) B <- alpha1*bass(x, (degree-1),, knots) + alpha2*bass(x, (degree-1), (+1), knots) return(b) bs <- functon(x, degree=3, nteror.knots=null, ntercept=false, Boundary.knots = c(0,1)) { f(mssng(x)) stop("you must provde x") f(degree < 1) stop("the splne degree must be at least 1") Boundary.knots <- sort(boundary.knots) nteror.knots.sorted <- NULL f(!s.null(nteror.knots)) nteror.knots.sorted <- sort(nteror.knots) knots <- c(rep(boundary.knots[1], (degree+1)), nteror.knots.sorted, rep(boundary.knots[2], (degree+1))) K <- length(nteror.knots) + degree + 1 B.mat <- matrx(0,length(x),k) for(j n 1:K) B.mat[,j] <- bass(x, degree, j, knots) f(any(x == Boundary.knots[2])) B.mat[x == Boundary.knots[2], K] <- 1 f(ntercept == FALSE) { return(b.mat[,-1]) else { return(b.mat) ## A smple llustraton that computes and plots the B-splne bases.

A PRIMER ON REGRESSION SPLINES 11 par(mfrow = c(2,1)) n <- 1000 x <- seq(0, 1, length=n) B <- bs(x, degree=5, ntercept = TRUE, Boundary.knots=c(0, 1)) matplot(x, B, type="l", lwd=2) ## Next, smulate data then construct a regresson splne wth a ## prespecfed degree (n appled settngs we would want to choose ## the degree/knot vector usng a sound statstcal approach). dgp <- sn(2*p*x) y <- dgp + rnorm(n, sd=.1) model <- lm(y~b-1) plot(x, y, cex=.25, col="grey") lnes(x, ftted(model), lwd=2) lnes(x, dgp, col="red", lty=2)