Nonparametric Estimation: Smoothing and Data Visualization

Size: px
Start display at page:

Download "Nonparametric Estimation: Smoothing and Data Visualization"

Transcription

1 Nnparametric Estimatin: Smthing and Data Visualizatin Rnald Dias Universidade Estadual de Campinas 1 Clóqui da Regiã Sudeste Abril de 2011

2

3 Preface In recent years mre and mre data have been cllected in rder t etract infrmatin r t learn valuable characteristics abut eperiments, phenmena, bservatinal facts, etc.. This is what it s been called learning frm data. Due t their cmpleity, several datasets have been analyzed by nnparametric appraches. This field f Statistics impse minimum assumptins t get useful infrmatin frm data. In fact, nnparametric prcedures, usually, let the data speak fr themselves. This wrk is a brief intrductin t a few f the mst useful prcedures in the nnparametric estimatin tward smthing and data visualizatin. In particular, it describes the thery and the applicatins f nnparametric curve estimatin (density and regressin) prblems with emphasis in kernel, nearest neighbr, rthgnal series, smthing splines methds. The tet is designed fr undergraduate students in mathematical sciences, engineering and ecnmics. It requires at least ne semester in calculus, prbability and mathematical statistics. iii

4

5 Cntents Preface List f Figures 1 Intrductin 1 2 Kernel estimatin The Histgram Kernel Density Estimatin The Nearest Neighbr Methd Sme Statistical Results fr Kernel Density Estimatin Bandwidth Selectin Reference t a Standard Distributin Maimum likelihd Crss-Validatin Least-Squares Crss-Validatin Orthgnal series estimatrs Kernel nnparametric Regressin Methd k-nearest Neighbr (k-nn) Lcal Plynmial Regressin: LOWESS Penalized Maimum Likelihd Estimatin Cmputing Penalized Lg-Likelihd Density Estimates Spline Functins Acquiring the Taste Lgspline Density Estimatin Splines Density Estimatin: A Dimensinless Apprach The thin-plate spline n R d Additive Mdels Generalized Crss-Validatin Methd fr Splines nnparametric Regressin Regressin splines, P-splines and H-splines Sequentially Adaptive H-splines P-splines Adaptive Regressin via H-Splines Methd A Bayesian Apprach t H-splines Final Cmments 55 Bibligraphy 57 iii vii v

6

7 List f Figures 2.1 Naive estimate cnstructed frm Old faithful geyser data with h = Kernel density estimate cnstructed frm Old faithful geyser data with Gaussian kernel and h = Bandwidth effect n kernel density estimates. The data set incme was rescaled t have mean Effect f the smthing parameter k n the estimates Cmparisn f tw bandwidths, ˆσ (the sample standard deviatin) and ˆR (the sample interquartile) fr the miture 0.7 N( 2, 1) N(1, 1) Effect f the smthing parameter K n the rthgnal series methd fr density estimatin Effect f bandwiths n Nadaraya-Watsn kernel Effect f the smthing parameter k n the k-nn regressin estimates Effect f the smthing parameter using LOWESS methd Basis Functins with 6 knts placed at Basis Functins with 6 knts placed at lgspline density estimatin fr.5n(0,1)+.5n(5,1) Histgram, SSDE, Kernel and Lgspline density estimates True, tensr prduct, gam nn-adaptive and gam adaptive surfaces Smthing spline fitting with smthing parameter btained by GCV methd Spline least square fittings fr different values f K Five thusand replicates f y() = ep( ) sin(π/2) cs(π) + ɛ Five thusand replicates f the affinity and the partial affinity fr adaptive nnparametric regressin using H-splines with the true curve Density estimates f the affinity based n five thusand replicates f the curve y i = i 3 + ɛ i with ɛ i N(0,.5). Slid line is a density estimate using beta mdel and dtted line is a nnparametric density estimate A cmparisn between smthing splines (S-splines) and hybrid splines (H-splines) methds smth spline and P-spline H-spline fitting fr airmiles data vii

8 6.8 Estimatin results: a) Bayesian estimate with a = 17 and ψ(k) = K 3 (dtted line); b) (SS) smthing splines estimate (dashed line). The true regressin functin is als pltted (slid line). The SS estimate was cmputed using the R functin smth.spline frm which 4 degrees f freedm were btained and λ was cmputed by GCV One hundred estimates f the curve 6.8 and a Bayesian cnfidence interval fr the regressin curve g(t) = ep( t 2 /2) cs(4πt) with t [0, π].. 53

9 Chapter 1 Intrductin Prbably, the mst used prcedure t describe a pssible relatinship amng variables is the statistical technique knwn as regressin analysis. It is always useful t begin the study f regressin analysis by making use f simple mdels. Fr this, assume that we have cllected bservatins frm a cntinuus variable Y at n values f a predict variable T. Let (t j, y j ) be such that: y j = g(t j ) + ε j, j = 1,..., n, (1.1) where the randm variables ε j are uncrrelated with mean zer and variance σ 2. Mrever, g(t j ) are the values btained frm sme unknwn functin g cmputed at the pints t 1,..., t n. In general, the functin g is called regressin functin r regressin curve. A parametric regressin mdel assumes that the frm f g is knwn up t a finite number f parameters. That is, we can write a parametric regressin mdel by, y j = g(t j, β) + ε j, j = 1,..., n, (1.2) where β = (β 1,..., β p ) T R p. Thus, t determine frm the data a curve g is equivalent t determine the vectr f parameters β. One may ntice that, if g has a linear frm, i.e., g(t, β) = p j=1 β j j (t), where { j (t)} p j=1 are the eplanatry variables, e.g., as in plynmial regressin j (t) = t j 1, then we are dealing with a linear parametric regressin mdel. Certainly, there are ther methds f fitting curves t data. A cllectin f techniques knwn as nnparametric regressin, fr eample, allws great fleibility in the pssible frm f the regressin curve. In particular, assume n parametric frm fr g. In fact, a nnparametric regressin mdel makes the assumptin that the regressin curve belngs t sme infinite cllectin f curves. Fr eample, g can be in the class f functins that are differentiable with square integrable secnd derivatives, etc. Cnsequently, in rder t prpse a nnparametric mdel ne may just need t chse an apprpriate space f functins where he/she believes that the regressin curve lies. This chice, usually, is mtivated by the degree f the smthness f g. Then, ne uses the data t determine an element f this functin space that can represent the unknwn regressin curve. Cnsequently, nnparametric techniques rely mre heavily n the data fr infrmatin abut g than their parametric cunterparts. Unfrtunately, nnparametric estimatrs have sme disadvantages. In general, they are less efficient than the parametric estimatrs when the parametric mdel is crrectly specified. Fr 1

10 2 Chapter 1: Intrductin mst parametric estimatrs the risk will decay t zer at a rate f n 1 while nnparametric estimatrs decay at a rate f n α, where the parameter α (0, 1) depends n the smthness f g. Fr eample, when g is twice differentiable the rate is usually, n 4/5. Hwever, in the case where the parametric mdel is incrrectly specified, ad hc, the rate n 1 cannt be achieved. In fact, the parametric estimatr des nt even cnverge t the true regressin curve.

11 Chapter 2 Kernel estimatin Suppse we have n independent measurements {(t i, y i )} i=1 n, the regressin equatin is, in general, described as in (1.1). Nte that the regressin curve g is the cnditinal epectatin f the independent variable Y given the predict variable T, that is, g(t) = E[Y T = t]. When we try t apprimate the mean respnse functin g, we cncentrate n the average dependence f Y n T = t. This means that we try t estimate the cnditinal mean curve g(t) = E[Y T = t] = y f TY(t, y) dy, (2.1) f T (t) where f TY (t, y) dentes the jint density f (T, Y) and f T (t) the marginal density f T. In rder t prvide an estimate ĝ(t) f g we need t btain estimates f f TY (t, y) and f T (t). Cnsequently, density estimatin methdlgies will be described. 2.1 The Histgram The histgram is ne f the first, and ne f the mst cmmn, methds f density estimatin. It is imprtant t bear in mind that the histgram is a smthing technique used t estimate the unknwn density and hence it deserves sme cnsideratin. Let us try t cmbine the data by cunting hw many data pints fall int a small interval f length h. This kind f interval is called a bin. Observe that the well knwn dt plt f B, Hunter and Hunter (1978) is a particular type f histgram where h = 0. Withut lss f generality, we cnsider a bin centered at 0, namely the interval [ h/2, h/2) and let F X be the distributin functin f X such that F X is abslutely cntinuus with respect t a Lesbegue measure n R. Cnsequently the prbability that an bservatin f X will fall int the interval [ h/2, h/2) is given by: P(X [ h/2, h/2)) = h/2 h/2 f X ()d, where f X is the density f X. A natural estimate f this prbability is the relative frequency f the bservatins in this interval, that is, we cunt the number f bservatins falling int the interval and 3

12 4 Chapter 2: Kernel estimatin divide it by the ttal number f bservatins. In ther wrds, given the data X 1,..., X n, we have: P(X [ h/2, h/2)) 1 n #{X i [ h/2, h/2)}. Nw applying the mean value therem fr cntinuus bunded functin we btain, P(X [ h/2, h/2)) = h/2 h/2 f ()d = f (ξ)h, with ξ [ h/2, h/2). Thus, we arrive at the fllwing density estimate: ˆf h () = 1 nh #{X i [ h/2, h/2)}, fr all [ h/2, h/2). Frmally, suppse we bserve randm variables X 1,..., X n whse unknwn cmmn density is f. Let k be the number f bins, and define C j = [ 0 + (j 1)h, 0 + jh), j = 1,..., k. Nw, take n j = i=1 n I(X i C j ), where the functin I( A) is defined t be : { 1 if A I( A) = 0 therwise, and, k j=1 n j = n. Then, ˆf h () = 1 k nh n j I( C j ), j=1 fr all. Here, nte that the density estimate ˆf h depends upn the histgram bandwidth h. By varying h we can have different shapes f ˆf h. Fr eample, if ne increases h, ne is averaging ver mre data and the histgram appears t be smther. When h 0, the histgram becmes a very nisy representatin f the data (needle-plt, Härdle (1990)). The ppsite, situatin when h, the histgram, nw, becmes verly smth (b-shaped). Thus, h is the smthing parameter f this type f density estimate, and the questin f hw t chse the histgram bandwidth h turns ut t be an imprtant questin in representing the data via the histgram. Fr details n hw t estimate h see Härdle (1990). 2.2 Kernel Density Estimatin The mtivatin behind the histgram can be epanded quite naturally. Fr this cnsider a weight functin, and define the estimatr, K() = ˆf () = 1 nh { 12, if < 1 0, therwise n i=1 K( X i ). (2.2) h

13 2.2: Kernel Density Estimatin 5 We can see that ˆf etends the idea f the histgram. Ntice that this estimate just places a b f side (width) 2h and height (2nh) 1 n each bservatin and then sums t btain ˆf. See Silverman (1986) fr a discussin f this kind f estimatr. It is nt difficult t verify that ˆf is nt a cntinuus functin and has zer derivatives everywhere ecept n the jump pints X i ± h. Besides having the undesirable character f nnsmthness (Silverman (1986)), it culd give a misleading impressin t a untrained bserver since its smewhat ragged character might suggest several different bumps. Figure 2.1 shws the nnsmth character f the naive estimate. The data seem t have tw majr mdes. Hwever, the naive estimatr suggests several different small bumps Eruptins length density estimate Figure 2.1: Naive estimate cnstructed frm Old faithful geyser data with h = 0.1 T vercme sme f these difficulties, assumptins have been intrduced n the functin K. That is, K must be a nnnegative kernel functin that satisfies the fllwing prperty: K()d = 1. In ther wrds K() is a prbability density functin, as fr instance, the Gaussian density, it will fllw frm definitin that ˆf will itself be a prbability density. In additin, ˆf will inherit all the cntinuity and differentiability prperties f the kernel K.

14 6 Chapter 2: Kernel estimatin Fr eample, if K is a Gaussian density then ˆf will be a smth curve with derivatives f all rders. Figure 2.2 ehibits the smth prperties f ˆf when a Gaussian kernel is used Eruptins length density estimate Figure 2.2: Kernel density estimate cnstructed frm Old faithful geyser data with Gaussian kernel and h = 0.25 Nte that an estimate based n the kernel functin places bumps n the bservatins and the shape f thse bumps is determined by the kernel functin K. The bandwidth h sets the width arund each bservatin and this bandwidth cntrls the degree f smthness f a density estimate. It is pssible t verify that as h 0, the estimate becmes a sum f Dirac delta functins at the bservatins while as h, it eliminates all the lcal rughness and pssibly imprtant details are missed. The data fr the Figure 2.3 which is labelled incme were prvided by Charles Kperberg. This data set cnsists f 7125 randm samples f yearly net incme in the United Kingdm (Family Ependiture Survey, ). The incme data is cnsiderably large and s it is mre f a challenge t cmputing resurces and there are severe utliers. The peak at 0.24 is due t the UK ld age pensin, which caused many peple t have nearly identical incmes. The width f the peak is abut 0.02, cmpared t the range 11.5 f the data. The rise f the density t the left f the peak is very steep. There is a vast (Silverman (1986)) literature n kernel density estimatin studying its mathematical prperties and prpsing several algrithms t btain estimates based n it. This methd f density estimatin became, apart frm the histgram, the mst cmmnly used estimatr. Hwever it has drawbacks when the underlying

15 2.2: Kernel Density Estimatin 7 Histgram f incme data Relative Frequency h=r default h=.12 h=.25 h= transfrmed data Figure 2.3: Bandwidth effect n kernel density estimates. The data set incme was rescaled t have mean 1. density has lng tails Silverman (1986). What causes this prblem is the fact that the bandwidth is fied fr all bservatins, nt cnsidering any lcal characteristic f the data. In rder t slve this prblem several ther Kernel Density Estimatin Methds were prpsed such as the nearest neighbr and the variable kernel. A detailed discussin and illustratin f these methds can be fund in Silverman (1986) The Nearest Neighbr Methd The idea behind the nearest neighbr methd is t adapt the amunt f smthing t lcal characteristics f the data. The degree f smthing is then cntrlled by an integer k. Essentially, the nearest neighbr density estimatr uses distances frm in f () t the data pint. Fr eample, let d( 1, ) be the distance f data pint 1 frm the pint, and fr each dente d k () as the distance frm its kth nearest neighbr amng the data pints 1,..., n. The kth nearest neighbr density estimate is defined as, ˆf () = k 2nd k (), where n is the sample size and, typically, k is chsen t be prprtinal t n 1/2. In rder t understand this definitin, suppse that the density at is f (). Then, ne wuld epect abut 2rn f () bservatins t fall in the interval [ r, + r] fr

16 8 Chapter 2: Kernel estimatin each r > 0. Since, by definitin, eactly k bservatins fall in the interval [ d k (), + d k ()], an estimate f the density at may be btained by putting k = 2d k ()n ˆf (). Nte that while estimatrs like histgram are based n the number f bservatins falling in a b f fied width centered at the pint f interest, the nearest neighbr estimate is inversely prprtinal t the size f the b needed t cntain a given number f bservatins. In the tail f the distributin, the distance d k () will be larger than in the main part f the distributin, and s the prblem f under-smthing in the tails shuld be reduced. Like the histgram the nearest neighbr estimate is nt a smth curve. Mrever, the nearest neighbr estimate des nt integrate t ne and the tails f ˆf () die away at rate 1, in ther wrds etremely slwly. Hence, this estimate is nt apprpriate if ne is required t estimate the entire density. Hwever, it is pssible t generalize the nearest neighbr estimatr in a manner related t the kernel estimate. The generalized kth nearest neighbr estimate is defined by, ˆf () = 1 n nd k () K( X i d i=1 k () ). Observe that the verall amunt f smthing is gverned by the chice f k, but the bandwidth used at any particular pint depends n the density f bservatins near that pint. Again, we face the prblems f discntinuity f at all the pints where the functin d k () has discntinuus derivative. The precise integrability and tail prperties will depend n the eact frm f the kernel. Figure 2.4 shws the effect f the smthing parameter k n the density estimate. Observe that as k increases rugher the density estimate becmes. This effect is equivalent when h is appraching t zer in the kernel density estimatr. 2.3 Sme Statistical Results fr Kernel Density Estimatin As starting pint ne might want t cmpute the epected value f ˆf. Fr this, suppse we have X i,..., X n i.i.d. randm variables with cmmn density f and let K( ) be a prbability density functin defined n the real line. Then we have, fr a nnstchastic h E[ ˆf ()] = 1 nh n i=1 E[K( X i )] h = 1 h E[K( X i )] h = 1 K( u ) f (u)du h h = K(y) f ( + yh)dy. (2.3) Nw, let h 0. We see that E[ ˆf ()] f () K(y)dy = f (). Thus, ˆf is an asympttic unbiased estimatr f f.

17 2.3: Sme Statistical Results fr Kernel Density Estimatin bs. frm N(0.5,0.1) density True k=40 k=30 k= Figure 2.4: Effect f the smthing parameter k n the estimates data T cmpute the bias f this estimatr we have t make the assumptin that the underlying density is twice differentiable and satisfies the fllwing cnditins Prakasa- Ra (1983): Cnditin 1. sup K() M < ; K() 0 as. Cnditin 2. K() = K( ), (, ) with 2 K()d <. Then by using a Taylr epansin f f ( + yh), the bias f ˆf in estimating f is b f [ ˆf ()] = h2 2 f () y 2 K(y)dy + (h 2 ). We bserve that since we have assumed the kernel K is symmetric arund zer, we have that yk(y)h f ()dy = 0, and the bias is quadratic in h. Parzen (1962) Using a similar apprach we btain : Var f [ ˆf ()] = nh 1 K f () + ( nh ), where K 2 2 = K() 2 d MSE f [ ˆf ()] = 1 nh f () K h4 4 ( f () y 2 K(y)dy) 2 + ( 1 nh ) + (h4 ), where MSE f [ ˆf ] stands fr mean squared errr f the estimatr ˆf f f. Hence, when the cnditins h 0 and nh are assumed, the MSE f [ ˆf ] 0, which means that the kernel density estimate is a cnsistent estimatr f the underlying density f. Mrever, MSE balances variance and squared bias f the estimatr in such way that the variance term cntrls the under-smthing and the bias term cntrls ver-smthing. In ther wrds, an attempt t reduce the bias increases the variance, making the estimate t nisy (under-smth). On the cntrary, minimizing the variance leads t a very smth estimate (ver-smth) with high bias.

18 10 Chapter 2: Kernel estimatin 2.4 Bandwidth Selectin It is natural t think f finding an ptimal bandwidth by minimizing MSE f [ ˆf ] in h > 0. Härdle(1990) shws that the asympttic apprimatin, say, h fr the ptimal bandwidth is ( f () K 2 2 h = ( f ()) 2 ( ) 1/5 n 1/5 y 2 K(y)dy) 2. (2.4) n The prblem with this apprach is that h depends n tw unknwn functins f ( ) and f ( ). An apprach t vercme this prblem uses a glbal measure that can be defined as: IMSE[ ˆf ] = MSE f [ ˆf ()]d = 1 nh K h4 4 ( y 2 K(y)dy) 2 f ( 1 nh ) + (h4 ). (2.5) IMSE is the well knwn integrated mean squared errr f a density estimate. The ptimal value f h cnsidering the IMSE is define as it can be shwn that, ( h pt = c 2/5 2 h pt = arg min h>0 IMSE[ ˆf ]. ) 1/5 ( 1/5n K 2 ()d f 2) 2 1/5, (2.6) where c 2 = y 2 K(y)dy. Unfrtunately, (2.6) still depends n the secnd derivative f f, which measures the speed f fluctuatins in the density f f Reference t a Standard Distributin A very natural way t get arund the prblem f nt knwing f is t use a standard family f distributins t assign a value f the term f 2 2 in epressin (2.6). Fr eample, assume that a density f belngs t the Gaussian family with mean µ and variance σ 2, then ( f ()) 2 d = σ 5 (ϕ ()) 2 d = 3 8 π 1 2 σ σ 5, (2.7) where ϕ() is the standard nrmal density. If ne uses a Gaussian kernel, then h pt = (4π) 1/10 ( 3 8 π 1/2 ) 1/5 σ n 1/5 = ( 4 3 ) 1/5 σ n 1/5 = 1.06 σ n 1/5 (2.8) Hence, in practice a pssible chice fr h pt is 1.06 ˆσ n 1/5, where ˆσ is the sample standard deviatin.

19 2.4: Bandwidth Selectin 11 If we want t make this estimate mre insensitive t utliers, we have t use a mre rbust estimate fr the scale parameter f the distributin. Let ˆR be the sample interquartile, then ne pssible chice fr h is ˆR ĥ pt = 1.06 min(ˆσ, (Φ(3/4) Φ(1/4)) ) n 1/5 = 1.06 min( ˆσ, ˆR ) n 1/5, (2.9) where Φ is the standard nrmal distributin functin. Figure 2.5 ehibits hw a rbust estimate f the scale can help in chsing the bandwidth. Nte that by using ˆR we have strng evidence that the underlying density has tw mdes. Histgram f a miture f tw nrmal densities Relative Frequency True sigmahat interquartile Figure 2.5: Cmparisn f tw bandwidths, ˆσ (the sample standard deviatin) and ˆR (the sample interquartile) fr the miture 0.7 N( 2, 1) N(1, 1). data Maimum likelihd Crss-Validatin Cnsider kernel density estimates ˆf and suppse we want t test fr a specific h the hypthesis ˆf () = f () vs. ˆf () = f (), fr a fied The likelihd rati test wuld be based n the test statistic f ()/ ˆf (). Fr a gd bandwidth this statistic shuld thus be clse t 1. Alternatively, we wuld

20 12 Chapter 2: Kernel estimatin epect E[lg( f (X) )] t be clse t 0. Thus, a gd bandwidth, which is minimizing this ˆf (X) measure f accuracy, is in effect ptimizing the Kullback-Leibler distance: d KL ( f, ˆf ) = ( f () ) lg f ()d. (2.10) ˆf () Of curse, we are nt able t cmpute d KL ( f, ˆf ) frm the data, since we d nt knw f. But frm a theretical pint f view, we can investigate this distance fr the chice f an apprpriate bandwidth h. When d KL ( f, ˆf ) is clse t 0 this wuld give the best agreement with the hypthesis ˆf = f. Hence, we are lking fr a bandwidth h, which minimizes d KL ( f, ˆf ). Suppse we are given a set f additinal bservatins X i, independent f the thers. The likelihd fr these bservatins is i f (X i ). Substituting ˆf in the likelihd equatin we have i ˆf (Xi ) and the value f this statistic fr different h wuld indicate which value f h is preferable, since the lgarithm f this statistic is clse t d KL ( f, ˆf ). Usually, we d nt have additinal bservatins. A way ut f this dilemma is t base the estimate ˆf n the subset {X j } j =i, and t calculate the likelihd fr X i. Denting the leave-ne-ut estimate Hence, n i=1 ˆf (X i ) = (n 1) 1 h 1 j =i ˆf (X i ) = (n 1) n h n n i=1 K( X i X j ). h j =i K( X i X j ). (2.11) h Hwever it is cnvenient t cnsider the lgarithm f this statistic nrmalized with the factr n 1 t get the fllwing prcedure: CV KL (h) = 1 n = 1 n n i=1 lg[ f h,i (X i )] n lg i=1 Naturally, we chse h KL such that: [ j =i K( X i X ] j ) lg[(n 1)h] (2.12) h h KL = arg ma CV KL (h). (2.13) h Since we assumed that X i are i.i.d., the scres lg ˆf i (X i ) are identically distributed and s, E[CV KL (h)] = E[lg ˆf i (X i )]. Disregarding the leave-ne-ut effect, we can write [ E[CV KL (h)] E lg ˆf ] () f ()d lg[ f ()] f ()d. (2.14) E[d KL ( f, ˆf )] +

21 2.4: Bandwidth Selectin 13 The secnd term f the right-hand side des nt depend n h. Then, we can epect that we apprimate the ptimal bandwidth that minimizes d KL ( f, ˆf ). The Maimum likelihd crss validatin has tw shrtcmings: When we have identical bservatins in ne pint, we may btain an infinite value if CV KL (h) and hence we cannt define an ptimal bandwidth. Suppse we use a kernel functin with finite supprt, e.g., the interval [ 1, 1]. If an bservatin X i is mre separated frm the ther bservatins than the bandwidth h, the likelihd ˆf i (X i ) becmes 0. Hence the scre functin reaches the value. Maimizing CV KL (h) frces us t use a large bandwidth t prevent this degenerated case. This might lead t slight ver-smthing fr the ther bservatins Least-Squares Crss-Validatin Cnsider an alternative distance between f h and f. The integrated squared errr (ISE) d ISE (h) = ( f h f ) 2 ()d = fh 2()d 2 ( f h f )()d + f 2 ()d d ISE (h) f 2 ()d = fh 2()d 2 ( f h f )()d (2.15) Fr the last term, bserve that ( f h f )()d = E[ f h (X i )] where the epectatin is understd t be cmputed with respect t an additinal and independent bservatin X. Fr estimatin f this term define the leave-ne-ut estimate This leads t the Least-squares crss-validatin: E X [ fˆ h (X)] = 1 n n f h,i (X i ) (2.16) i=1 CV LS (h) = The bandwidth minimizing this functin is, fh 2 n ()d 2 f h,i (X i ) (2.17) i=1 h LS = arg min CV LS (h). h This crss-validatin functin is called an unbiased crss-validatin criterin, since, E[CV LS (h)] = E[d ISE (h) + 2(E X [ f h (X)] E[ 1 n n f h,i (X i )]) f 2 2 i=1 = IMSE[ f h ] f 2 2. (2.18) An interesting questin is, hw gd is the apprimatin f d ISE by CV LS. T investigate this define a sequence f bandwidths h n = h(x 1,..., X n ) t be asympttically ptimal, if d ISE (h n ) 1, a.s. when n. inf h>0 d ISE (h)

22 14 Chapter 2: Kernel estimatin It can be shwn that if the density f is bunded then h LS is asympttically ptimal. Similarly t maimum likelihd crss-validatin ne can fund in Härdle (1990) an algrithm t cmpute the least-squares crss-validatin. 2.5 Orthgnal series estimatrs Orthgnal series estimatrs apprach the density estimatin prblem frm a quite different pint f view. While kernel estimatrs is clse related t statistical thinking rthgnal series relies n the ideas f apprimatin thery. Withut lss f generality let us assume that we are trying t estimate a density f n the interval [0, 1]. The idea is t use the thery f rthgnal series methd and then t reduce the estimatin prcedure by estimating the cefficients f its Furier epansin. Define the sequence φ v () by φ 0 () = 1 φ 2r 1 () = 2 cs 2πr r = 1, 2,... φ 2r () = 2 sin 2πr r = 1, 2,... It is well knwn that f can be represented as Furier series i=0 a iφ i, where, fr each i 0, a i = f ()φ i ()d. (2.19) Nw, suppse that X is a randm variable with density f. Then written a i = Eφ i (X) and s an unbiased estimatr f f based n X 1,..., X n is (2.19) can be â i = 1 n n φ i (X j ). j=1 Nte that the i=1 âiφ i cnverges t a sum f delta functins at the bservatins, since ω() = 1 n where δ is the Dirac delta functin. Then fr each i, â i = 1 0 n δ( X i ) (2.20) i=1 ω()φ i ()d and hence the â i are eactly the Furier cefficients f the functin ω. The easiest t way t smth ω is t truncate the epansin â i φ i at sme pint. That is, chse K and define a density estimate ˆf by ˆf () = K â i φ i (). (2.21) i=1 Nte that the amunt f smthing is determined by K. Small value f K implies in ver-smthing, large value f K under-smthing.

23 2.5: Orthgnal series estimatrs bs frm N(.5,.1) density True K=3 K=10 K= data Figure 2.6: Effect f the smthing parameter K n the rthgnal series methd fr density estimatin A mre general apprach wuld be, chse a sequence f weights λ i, such that, λ i 0 as i. Then ˆf () = λ i â i φ i (). i=0 The rate at which the weights λ i cnverge t zer will determine the amunt f smthing. Fr nn finite interval we can have weight functins a() = e 2 /2 and rthgnal functins φ() prprtinal t Hermite plynmials. The data in figure 2.6 were prvided t me by Francisc Cribari-Net and cnsists f the variatin rate f ICMS (impst sbre circulaçã de mercadrias e serviçs) ta fr the city f Brasilia, D.F., frm August 1994 t July 1999.

24 16 Chapter 2: Kernel estimatin

25 Chapter 3 Kernel nnparametric Regressin Methd Suppse we have i.i.d. bservatins {(X i, Y i )} i=1 n and the nnparametric regressin mdel given in equatin (1.1). By equatin (2.1) we knw hw t estimate the denminatr by using the kernel density estimatin methd. Fr the numeratr ne can estimate the jint density using the multiplicative kernel f h1,h 2 (, y) = 1 n n K h1 ( X i )K h2 (y Y i ). i=1 where, K h1 ( X i ) = h1 1 K(( X i)/h 1 ), K h2 ( Y i ) = h2 1 K(( Y i)/h 2 ). It is nt difficult t shw that y f h1,h 2 (, y)dy = 1 n n K h1 ( X i )Y i. i=1 Based n the methdlgy f kernel density estimatin Nadaraya (1964) and Watsn (1964) suggested the fllwing estimatr g h fr g. g h () = n i=1 K h( X i )Y i n j=1 K h( X j ) (3.1) In general, the kernel functin K h () = K(( j )/h) is taken as prbability density functin symmetric arund zer and parameter h is called smthing parameter r bandwidth. Nw, cnsider the mdel (1.1) and let X 1,..., X n be i.i.d. randm variables with density f X such that X i is independent f ε i fr all i = 1,..., n. Assume the cnditins given in Sectin 2.3 and suppse that f and g are twice cntinuusly differentiable in neighbrhd f the pint. Then, if h 0 and nh as n, we have ĝ h g in prbability. Mrever, suppse E[ ε i 2+δ ] and K( 2+δ d <, fr sme δ > 0, then nh(ĝ h E[ĝ h ]) N(0, ( f X ()) 1 σ 2 (K()) 2 d) in distributin, where N(, ) stands fr a Gaussian distributin, (see details in Pagan and Ullah (1999)). As an eample, figure 3.1 shws the effect f chsing h n the Nadaraya-Watsn prcedure. The data cnsist f the speed f cars and the distances taken t stp. It is imprtant t ntice that the data were recrded in the 1920s. (These datasets can be 17

26 18 Chapter 3: Kernel nnparametric Regressin Methd fund in the sftware R) The Nadaraya-Watsn kernel methd can be etended t the multivariate regressin prblem by cnsidering the multidimensinal kernel density estimatin methd (see details in Sctt (1992)). dist h=2 h= speed Figure 3.1: Effect f bandwiths n Nadaraya-Watsn kernel 3.1 k-nearest Neighbr (k-nn) One may ntice that regressin by kernels is based n lcal averaging f bservatins Y i in a fied neighbrhd f. Instead f this fied neighbrhd, k-nn emplys varying neighbrhds in the X variable supprt. That is, where, g k () = 1 n W ki () = n W ki ()Y i, (3.2) i=1 { n/k if i J 0 therwise, (3.3) with J = {i : X i is ne f the k nearest bservatins t } It can be shwn that the bias and variance f the k-nn estimatr g k with weights (3.3) are given by, fr a fied and E[g k ()] g() 1 24( f ()) 3 [g () f () + 2g () f ()](k/n) 2 (3.4) Var[g k ()] σ2 k. (3.5)

27 3.2: Lcal Plynmial Regressin: LOWESS 19 We bserve that the bias increasing and the variance is decreasing in the smthing parameter k. T balance this trade-ff ne shuld chse k n 4/5. Fr details, see Härdle (1990). Figure 3.2 shws the effect f the parameter k n the regressin curve estimates. Nte that the curve estimate with k = 2 is less smther than the curve estimate with k = 1. The data set cnsist f the revenue passenger miles flwn by cmmercial airlines in the United States fr each year frm 1937 t 1960 and is available thrugh R package. airmiles data airmiles Data K=1 K= Passenger miles flwn by U.S. cmmercial airlines Figure 3.2: Effect f the smthing parameter k n the k-nn regressin estimates. 3.2 Lcal Plynmial Regressin: LOWESS Cleveland (1979) prpsed the algrithm LOWESS, lcally weighted scatter plt smthing, as an utlier resistant methd based n lcal plynmial fits. The basic idea is t start with a lcal plynmial (a k-nn type fitting) least squares fit and then t use rbust methds t btain the final fit. Specifically, ne can first fit a plynmial regressin in a neighbrhd f, that is, find β R p+1 which minimize n 1 n p W ki (y i β j j) 2, (3.6) i=1 j=0 where W ki dente k-nn weights. Cmpute the residuals ˆɛ i and the scale parameter ˆσ = median( ˆɛ i ). Define rbustness weights δ i = K( ˆɛ i /6ˆσ), where K(u) = (15/16)(1 u) 2, if u 1 and K(u) = 0, if therwise. Then, fit a plynmial regressin as in (3.6) but with weights (δ i W ki ()). Cleveland suggests that p = 1 prvides gd balance

28 20 Chapter 3: Kernel nnparametric Regressin Methd between cmputatinal ease and the need fr fleibility t reprduce patterns in the data. In additin, the smthing parameter can be determined by crss-validatin as in (2.13). Nte that when using the R functin lwess r less, f acts as the smthing parameter. Its relatin t the k-nn nearest neighbr is given by where n is the sample size. k = n f, f (0, 1), lwess(cars) dist f = 2/3 f = speed Figure 3.3: Effect f the smthing parameter using LOWESS methd. 3.3 Penalized Maimum Likelihd Estimatin The methd f penalized maimum likelihd in the cntet f density estimatin cnsist f estimating a density f by minimizing a penalized likelihd scre L ( f ) + λj( f ), where L ( f ) is a gdness-f-fit measure, and J( f ) is a rughness penalty. This sectin is develped cnsidering histrical results, beginning with Gd and Gaskins (1971), and ending with the mst recent result given by Gu (1993). The maimum likelihd (M.L.) methd has been used as statistical standard prcedure in the case where the underlying density f is knwn ecept by a finite number f parameters. It is well knwn the M.L. has ptimal prperties (asympttically unbiased and asympttically nrmal distributed) t estimate the unknwn parameters. Thus, it wuld be interesting if such standard technique culd be applied n a mre general scheme where there is n assumptin n the frm f the underlying density by assuming f t belng t a pre-specified family f density functins.

29 3.3: Penalized Maimum Likelihd Estimatin 21 Let X 1,..., X n be i.i.d. randm variables with unknwn density f. The likelihd functin is given by: n L( f X 1,..., X n ) = f (X i ). i=1 The prblem with this apprach can be described by the fllwing eample. Recall ˆf h () a kernel estimate, that is, ˆf h () = 1 n nh K( X i h i=1 ), with h = h/c, where c is cnstant greater than 0, i.e., fr the mment the bandwidth is h/c. Let h be small enugh such that X i X i h/c > M > 0, and assume K has been chsen s that K(u) = 0, if u > M. Then, ˆf h (X i ) = c nh K(0). If c > 1 K(0) then ˆf h (X i ) > 1 nh. Fr fied n, we can d this fr all X i simultaneusly. Thus, L ( 1 nh )n. Letting h 0, we have L. That is, L( f X 1,..., X n ) des nt have a finite maimum ver the class f all densities. Hence, the likelihd functin can be as large as ne wants it just by taking densities with the smthing parameter appraching zer. Densities having this characteristic, e.g., bandwidth h 0, apprimate t delta functins and the likelihd functin ends up t be a sum f spikes delta functins. Therefre, withut putting cnstraints n the class f all densities, the maimum likelihd prcedure cannt be used prperly. One pssible way t vercme the prblem described abve is t cnsider a penalized lg-likelihd functin. The idea is t intrduce a penalty term n the lglikelihd functin such that this penalty term quantifies the smthness f g = lg f. Let us take, fr instance, the functinal J(g) = (g ) 2 as a penalty term. Then define the penalized lg-likelihd functin by L λ (g) = 1 n n g(x i ) λj(g), (3.7) i=1 where λ is the smthing parameter which cntrls tw cnflicting gals, the fidelity t the data given by n i=1 g(x i) and the smthness, given by the penalty term J(g). The pineer wrk n penalized lg-likelihd methd is due t Gd and Gaskins (1971), wh suggested a Bayesian scheme with penalized lg-likelihd (using their ntatin) becmes: ω = ω( f ) = L( f ) Φ( f ), where L = i=1 n g(x i) and Φ is the smthness penalty. In rder t simplify the ntatin, let h have the same meaning as h()d. Nw, cnsider the number f bumps in the density as the measure f rughness r

30 22 Chapter 3: Kernel nnparametric Regressin Methd smthness. The first apprach was t take the penalty term prprtinal t Fisher s infrmatin, that is, Φ( f ) = ( f ) 2 / f. Nw by setting f = γ 2, Φ( f ) becmes (γ ) 2, and then replace f by γ in the penalized likelihd equatin. Ding that the cnstraint f 0 is eliminated and the ther cnstraint, f = 1, turns ut t be equivalent t γ 2 = 1, with γ L 2 (, ). Gd and Gaskins(1971) verified that when the penalty 4α (γ ) 2 yielded density curves having prtins that lked t straight. This fact can be eplained nting that the curvature depends als n the secnd derivatives. Thus (γ ) 2 shuld be included n the penalty term. The final rughness functinal prpsed was: Φ( f ) = 4α (γ ) 2 + β (γ ) 2, with α, β satisfying, 2ασ β = σ4, (3.8) where σ 2 is either an initially guessed value f the variance r it can be estimated the sample variance based n the data. Accrding t Gd and Gaskins (1971), the basis fr this cnstraint is the feeling that the class f nrmal distributins frm the smthest class f distributins, the imprper unifrm distributin being limiting frm. Mrever, they pinted ut that sme justificatin fr this feeling is that a nrmal distributin is the distributin f maimum entrpy fr a given mean and variance. The integral (γ ) 2 is als minimized fr a given variance when f is nrmal (Gd and Gaskins, 1971). They thught was reasnable t give the nrmal distributin special cnsideratin and decided t chse α, β such that ω(α, β; f ) is maimized by taking the mean equal t and variance as i=1 N ( i ) 2 /N 1. That is, if f () N (µ, σ 2 ) then (γ ) 2 = 1, (γ ) 2 = 3 and hence we have, 4σ 2 16σ 4 ω(α, β; f ) = N 2 lg(2πσ2 ) 1 2σ 2 N i=1 ( i µ) 2 α σ 2 3β 16σ 4. The scre functin ω(α, β; f ) is maimized when µ = and σ is such that, N + N i=1 ( i ) 2 σ 2 + 2α σ 2 + 3β = 0. (3.9) 4σ4 If we put σ 2 = N i=1 ( i ) 2 /N 1, the equatin (3.9) becmes, σ 4 (N 1) + 2ασ 2 + 3β 4 = σ4 N, and s we have the cnstraint (3.8). Pursuing the idea f Gd and Gaskins, Silverman (1982) prpsed a similar methd where the lg density is estimated instead f the density itself. An advantage f Silverman s apprach is that using the lgarithm f the density and the augmented Penalized likelihd functinal, any density estimates btained will autmatically be psitive and integrate t ne. Specifically,

31 3.3: Penalized Maimum Likelihd Estimatin 23 Let (m 1,..., m k ) be a sequence f natural numbers s that 1 i=1 k m i m, where m > 0 is such that g (m 1) eists and is cntinuus. Define a linear differential peratr D as: D(g) = c(m 1,..., m k )( ) m 1... ( ) m k(g). 1 k Nw assume that at least ne f the cefficients c(m 1,..., m k ) = 0 fr m i = m. Using this linear differential peratr define a bilinear functinal, by g 1, g 2 = D(g 1 )D(g 2 ). where the integral is taken ver a pen set Ω with respect t Lebesgue measure. Let S be the set f real functins g n Ω fr which: the (m 1)th derivatives f g eist everywhere and are piecewise differentiable, g, g <, e g <. Given the data X 1,..., X n i.i.d. with cmmn density f, such that g = lg f, ĝ is the slutin, if it eists, f the ptimizatin prblem ma{ 1 n n g(x i ) λ g, g }, 2 i=1 subject t e g = 1. And the density estimate ˆf = eĝ, where the the null space f the penalty term is the set {g S : g, g = 0}. Nte that the null space f g, g is an epnential family with at mst (m 1) parameters, fr eample, if g, g = (g (3) ) 2 then g = lg f is in an epnential family with 2 parameters. See Silverman (1982). Silverman presented an imprtant result which makes the cmputatin f the cnstrained ptimizatin prblem a relatively easy cmputatinal scheme f finding the minimum f an uncnstrained variatinal prblem. Precisely, fr any g in a class f smth functins (see details in Silverman (1982)) and fr any fied psitive λ, let and ω 0 (g) = 1 n ω(g) = 1 n n g(x i ) + λ 2 i=1 n g(x i ) + i=1 e g + λ 2 (g ) 2 (g ) 2. Silverman prved that uncnstrained minimum f ω(g) is identical with the cnstrained minimum f ω 0, if such a minimizer eists.

32 24 Chapter 3: Kernel nnparametric Regressin Methd Cmputing Penalized Lg-Likelihd Density Estimates Based n Silverman s apprach, O Sullivan(1988) develped an algrithm which is a fully autmatic, data driven versin f Silverman s estimatr. Furthermre, the estimatrs btained by O Sullivan s algrithm are apprimated by linear cmbinatin f basis functins. Similarly t the estimatrs given by Gd and Gaskins(1971), O Sullivan prpsed that cubic B-splines with knts at data pints shuld be used as the basis functins. A summary f definitins and prperties f B-splines were given in the sectin 4. The basic idea f cmputing a density estimate prvided by penalized likelihd methd is t cnstruct apprimatins t it. Given 1,..., n, the realizatins f randm variables X 1,..., X n, with cmmn lg density g. We are t slve a finite versin f (3.7) which are reasnable apprimatins t the infinite dimensinal prblem (Thmpsn and Tapia, 1990, ). Gd and Gaskins (1971) based their cmputatinal scheme n the fact that since γ L 2 (, ) then fr a given rthnrmal system f functins {φ n }, n=0 a n φ n m.s. g L 2, with n=0 a n < and {a n } R. That is, γ in L 2 can be arbitrarily apprimated by a linear cmbinatin f basis functins. In their paper, Hermite plynmials were used as basis functins. Specifically: where, f φ n () = e 2 /2 H n ()2 n/2 π 1/4 (n!) 1/2, H n () = ( 1) n e 2 ( dn d n e 2 ). The lg density estimatr prpsed by O Sullivan (1988) is defined as the minimizer 1 n b b n g( i ) + e g(s) ds + λ (g (m) ) 2 ds, (3.10) i=1 a a fr fied λ > 0, and data pints 1,..., n. The minimizatin is ver a class f abslutely cntinuus functins n [a, b] whse mth derivative is square integrable. Cmputatinal advantages f this lg density estimatrs using apprimatins by cubic B-splines are: It is a fully autmatic prcedure fr selecting an apprpriate value f the smthing parameter λ, based n the AIC type criteria. The banded structures induced by B-splines leads t an algrithm where the cmputatinal cst is linear in the number f bservatins (data pints). It prvides apprimate pintwise Bayesian cnfidence intervals fr the estimatr. A disadvantage f O Sullivan s wrk is that it des nt prvide any cmparisn f perfrmance with ther available techniques. We see that the previus cmputatinal framewrk is unidimensinal, althugh Silverman s apprach can be etended t higher dimensins.

33 Chapter 4 Spline Functins 4.1 Acquiring the Taste Due t their simple structure and gd apprimatin prperties, plynmials are widely used in practice fr apprimating functins. Fr this prpse, ne usually divides the interval [a, b] in the functin supprt int sufficiently small subintervals f the frm [ 0, 1 ],..., [ k, k+1 ] and then uses a lw degree plynmial p i fr apprimatin ver each interval [ i, i+1 ], i = 0,..., k. This prcedure prduces a piecewise plynmial apprimating functin s( ); s() = p i () n [ i, i+1 ], i = 0,..., k. In the general case, the plynmial pieces p i () are cnstructed independently f each ther and therefre d nt cnstitute a cntinuus functin s() n [a, b]. This is nt desirable if the interest is n apprimating a smth functin. Naturally, it is necessary t require the plynmial pieces p i () t jin smthly at knts 1,..., k, and t have all derivatives up t a certain rder, cincide at knts. As a result, we get a smth piecewise plynmial functin, called a spline functin. Definitin 4.1 The functin s() is called a spline functin (r simply spline ) f degree r with knts at { i } k i=1 if =: 0 < 1 <... < k < k+1 :=, where =: 0 and k+1 := are set by definitin, fr each i = 0,..., k, s() cincides n [ i, i+1 ] with a plynmial f degree nt greater than r; s(), s (),..., s r 1 () are cntinuus functins n (, ). The set S r ( 1,..., k ) f spline functins is called spline space. Mrever, the spline space is a linear space with dimensin r + k + 1 (Schumaker (1981)). Definitin 4.2 Fr a given pint (a, b) the functin (t ) r + = { (t ) r if t > 0 if t is called the truncated pwer functin f degree r with knt. 25

34 26 Chapter 4: Spline Functins Hence, we can epress any spline functin as a linear cmbinatin f r + k + 1 basis functins. Fr this, cnsider a set f interir knts { 1,..., k } and the basis functins {1, t, t 2,..., t r, (t 1 ) r +,..., (t k) r +}. Thus, a spline functin is given by, s(t) = r θ i t i k + θ j (t j r ) r + i=0 j=r+1 It wuld be interesting if we culd have basis functins that make it easy t cmpute the spline functins. It can be shwn that B-splines frm a basis f spline spaces Schumaker (1981). Als, B-splines have an imprtant cmputatinal prperty, they are splines which have smallest pssible supprt. In ther wrds, B-splines are zer n a large set. Furthermre, a stable evaluatin f B-splines with the aid f a recurrence relatin is pssible. Definitin 4.3 Let Ω = { j } {j Z} be a nndecreasing sequence f knts. The i-th B-spline f rder k fr the knt sequence Ω is defined by B k j (t) = ( k+j j )[ j,..., k+j ](t j ) k 1 + fr all t R, where, [ j,..., k+j ](t j ) k 1 + is (k 1)th divided difference f the functin ( j) k + evaluated at pints j,..., k+j. Frm the Definitin 4.3 we ntice that B k j (t) = 0 fr all t [ j, j+k ]. It fllws that nly k B-splines have any particular interval [ j, j+1 ] in their supprt. That is, f all the B- splines f rder k fr the knt sequence Ω, nly the k B-splines B k j k+1, Bk j k+2,..., Bk j might be nnzer n the interval [ j, j+1 ]. (See de Br (1978) fr details). Mrever, B k j (t) > 0 fr all ( j, j+k ) and j Z B k j (t) = 1, that is, the B-spline sequence Bk j cnsists f nnnegative functins which sum up t 1 and prvides a partitin f unity. Thus, a spline functin can be written as linear cmbinatin f B-splines, s(t) = β j B k j (t). j Z The value f the functin s at pint t is simply the value f the functin j Z β j B k j (t) which makes gd sense since the latter sum has at mst k nnzer terms. Figure 4.1 shws an eample f B-splines basis and their cmpact supprt prperty. This prperty makes the cmputatin f B-splines easier and numerically stable. Of special interest is the set f natural splines f rder 2m, m N, with k knts at j. A spline functin is a natural spline f rder 2m with knts at 1,..., k, if, in additin t the prperties implied by definitin (4.1), it satisfies an etra cnditin: s is plynmial f rder m utside f [ 1, k ]. Cnsider the interval [a, b] R and the knt sequence a := 0 < 1 <... < k < k+1 := b. Then, N S 2m = {s S(P 2m ) : s 0 = s [a,1 ) and s k = s [k,b) P m }, is the natural plynmial spline space f rder 2m with knts at 1,..., k. The name natural spline stems frm the fact that, as a result f this etra cnditin, s satisfies the s called natural bundary cnditins s j (a) = s j (b) = 0, j = m,..., 2m 1. Nw, since the dimensin f S(P 2m ) is 2m + k and we have enfrced 2m etra cnditins t define N S 2m, it is natural t epect the dimensin f N S 2m t be k.

35 4.2: Lgspline Density Estimatin 27 B splines Figure 4.1: Basis Functins with 6 knts placed at Actually, it is well knwn that N S 2m is linear space f dimensin k. See details in Schumaker (1981). In sme applicatins it may be pssible t deal with natural splines by using a basis fr S(P 2m ) and enfrcing the end cnditins. Fr ther applicatins it is desirable t have a basis fr N S 2m itself. T cnstruct such a basis cnsisting f splines with small supprts we just need functins based n the usual B-splines. Particularly, when m = 2, we will be cnstructing basis functins fr the Natural Cubic Spline Space, N S 4. Figure 4.2 shw an eample f the natural splines basis. 4.2 Lgspline Density Estimatin Kperberg and Stne (1991) intrduced anther type f algrithm t estimate an univariate density. This algrithm was based n the wrk f Stne (1990) and Stne and K (1985) where the thery f the lgspline family f functins was develped. Cnsider an increasing sequence f knts {t j } K j=1, K 4, in R. Dente by S 0 the set f real functins such that s is a cubic plynmial in each interval f the frm (, t 1 ], [t 1, t 2 ],..., [t K, ). Elements in S 0 are the well-knwn cubic splines with knts at {t j } K j=1. Ntice that S 0 is a (K + 4)-dimensinal linear space. Nw, let S S 0 such that the dimensin f S is K with functins s S linear n (, t 1 ] and n [t K, ). Thus, S has a basis f the frm 1, B 1..., B K 1, such that B 1 is linear functin with negative slpe n (, t 1 ] and B 2,..., B K 1 are cnstant functins n the same interval. Similarly, B K 1 is linear functin with psitive slpe n [t K, ) and B 1,..., B K 2 are cnstant n the interval [t K, ). Let Θ be the parametric space f dimensin p = K 1, such that θ 1 < 0 and θ p > 0

36 28 Chapter 4: Spline Functins Natural Splines Figure 4.2: Basis Functins with 6 knts placed at fr θ = (θ 1,..., θ p ) R p. Then, define and c(θ) = lg( R K 1 f (; θ) = ep{ K 1 ep( j=1 j=1 θ j B j ()d)) θ j B j () c(θ)}. The p-parametric epnential family f (, θ), θ Θ R p f psitive twice differentiable density functin n R is called lgspline family and the crrespnding lglikelihd functin is given by L(θ) = lg f (; θ); θ Θ. The lg-likelihd functin L(θ) is strictly cncave and hence the maimum likelihd estimatr ˆθ f θ is unique, if it eists. We refer t ˆf = f (, ˆθ) as the lgspline density estimate. Nte that the estimatin f ˆθ makes lgspline prcedure nt essentially nnparametric. Thus, estimatin f θ by Newtn-Raphsn, tgether with small numbers f basis functin necessary t estimate a density, make the lgspline algrithm etremely fast when it is cmpared with Gu (1993) algrithm fr smthing spline density estimatin. In the Lgspline apprach the number f knts is the smthing parameter. That is, t many knts lead t a nisy estimate while t few knts give a very smth curve. Based n their eperience f fitting lgspline mdels, Kperberg and Stne prvide a table with the number f knts based n the number f bservatins. N indicatin was fund that the number f knts takes in cnsideratin the structure f the data (number f mdes, bumps, asymmetry, etc.). Hwever, an bjective criterin

37 4.3: Splines Density Estimatin: A Dimensinless Apprach 29 fr the chice f the number f knts, Stepwise Knt Deletin and Stepwise knt Additin, are included in the lgspline prcedure. Fr 1 j p, let B j be a linear cmbinatin f a truncated pwer basis ( t k ) 3 + fr the a knt sequence t 1,..., t p, that is, Then j B j () = β j + β j0 + k β jk ( t k ) 3 +. θ j B j () = θ j β j0 + β jk θ j ( t k ) 3 +. j k Let j ˆθ j β jk = β T k ˆθ. Then, fr 1 k K Kperberg and Stne (1991), shw that SE(β T k ˆθ) = β T k (I( ˆθ)) 1 β k ) where I(θ) is the Fisher infrmatin matri btained frm the lg-likelihd functin. The knts t 1 and t K are cnsidered permanent knts, and t k, 2 k K, are nnpermanent knts. Then at any step delete (similarly fr additin step) that knt which has the smallest value f β T k ˆθ /SE(β T k ˆθ). In this matter, we have a sequence f mdels which ranges frm 2 t p 1 knts. Nw, dente by ˆL m the lg-likelihd functin f the mth mdel (2 m + 2 p 1) evaluated at the maimum likelihd estimate fr that mdel. T specify a stp criteria, Kperberg and Stne make use f the Akaike Infrmatin Criterin (AIC), that is, AIC α,m = 2ˆL m + α(p m) and chse ˆm that minimizes AIC 3,m. There is n theretical justificatin fr chsing α = 3. The chice was made, accrding t them, because this value f α makes the prbability that ˆf is bimdal when f is Gamma(5) t be abut 0.1. Figure 4.3 shws an eample f lgspline density estimatin fr a miture f tw nrmal densities. It wuld be interesting t have an algrithm which cmbines the lw cmputatinal cst f lgsplines (due t B-splines and the estimatin f their cefficients) and the perfrmance f the autmatic smthing parameter selectin develped by Gu (1993). 4.3 Splines Density Estimatin: A Dimensinless Apprach Let X 1,..., X n a randm sample frm a prbability density f n a finite dmain X. Assuming that f > 0 n X, ne can make a lgistic transfrmatin f = e g /( e g ). We knw that this transfrmatin is nt ne-t-ne and Gu and Qiu (1993) prpsed side cnditins n g such that g( 0 ) = 0, 0 X r X g = 0. Given thse cnditins we have t find the minimizer f the penalized lg-likelihd 1 n n g(x i ) + lg e g + λ J(g) (4.1) i=1 X 2 in a Hilbert space H, where J is a rughness penalty and λ is the smthing parameter. The space H is such that the evaluatin is cntinuus s that the first term in (4.1) is cntinuus. The penalty term J is a seminrm in H with a null space J f finite dimensin M 1. By taking a finite dimensinal J ne prevents interplatin (i.e. the empirical distributin) and a quadratic J makes easier the numerical slutin f the

38 30 Chapter 4: Spline Functins Miture f Nrmals True lgspline Figure 4.3: lgspline density estimatin fr.5n(0,1)+.5n(5,1) variatinal prblem (4.1). Since, H is an infinite dimensinal space, the minimizer f (4.1) is, in general, nt cmputable. Thus, Gu and Qiu (1993) prpse calculating the slutin f the variatinal prblem in finite dimensinal space, say, H n, where n is the sample size. The perfrmance f the smthing spline estimatr depends upn the chice f the smthing parameter λ. Gu (1993), suggested a perfrmance-riented iteratin prcedure ( GCV-like prcedure) which updates g and λ jintly accrding t a perfrmance estimate. The perfrmance is measured by a lss functin which was taken as a symmetrized Kullback-Leibler distance between e g / e g and e g 0/ e g 0. Specifically, if ne slves the variatinal prblem (4.1) in H n by a standard Newtn-Raphsn prcedure, then by starting frm a current iterate g, instead f calculating the net iterate with a fied λ, ne may chse a λ that minimizes the lss functin. Figure 4.4 ehibits the perfrmance f SSDE fr Buffal Snw data. (This data set can be fund in R.) Under this apprach, ne might ask the fllwing questins: Is it pssible t estimate a density using K n basis functins instead f the riginal n such that it reduces the cmputatinal cst f getting the slutin (4.1) significantly? Hw gd wuld such an apprimatin be? Dias (1998) prvided sme answers t thse questins by using the basis functins B i () given in Definitin (4.3) that can be easily etend t a multivariate case by a tensr prduct.

39 4.3: Splines Density Estimatin: A Dimensinless Apprach 31 Buffal snw data SSDE Lgspline(d) Kernel data Figure 4.4: Histgram, SSDE, Kernel and Lgspline density estimates

40

41 Chapter 5 The thin-plate spline n R d There are many applicatins where a unknwn functin g f ne r mre variables and a set f measurements are given such that: y i = L i g + ɛ i (5.1) where L 1,..., L n are linear functinals defined n sme linear space H cntaining g, and ɛ 1,..., ɛ n are measurement errrs usually assumed t be independently, identically and nrmally distributed with mean zer and unknwn variance σ 2. Typically, the L i will be pint evaluatin f the functin g. Straight frward least square fitting is ften apprpriate but it prduces a functin which is nt sufficiently smth fr sme data fitting prblems. In such cases, it may be better t lk fr a functin which minimizes a criterin that invlves a cmbinatin f gdness f fit and an apprpriate measure f smthness. Let t = ( 1,..., d ), t i = ( 1 (i),..., d (i)) fr i = 1,..., n and the evaluatin functinals L i g = g(t i ), then the regressin mdel (5.1) becmes, y i = g( 1 (i),..., d (i)) + ɛ i. (5.2) The thin-plate smthing spline is the slutin t the fllwing variatinal prblem. Find g H t minimize L λ (g) = 1 n n i=1 (y i g(t i )) 2 + λj d m(g) (5.3) where λ is the smthing parameter which cntrls the trade ff between fidelity t the data and smthness with penalty term Jm. d Nte that, when λ is large a premium is being placed n smthness and functins with large mth derivatives are penalized. In fact, λ gives an mth rder plynmial regressin fit t the data. Cnversely, fr small values f λ mre emphasis is put n gdness-f-fit and the limit case f λ 0, we have interplatin. In general, in smthing spline nnparametric regressin the penalty term Jm d is given by J d m(g) = α α d =m m! (... α 1!..., α d! m g α α d d ) 2 j The cnditin 2m d > 0 is necessary and sufficient in rder t have bunded evaluatin functinals in H, i.e., H is a reprducing kernel in Hilbert space. Mrever, the d j. 33

42 34 Chapter 5: The thin-plate spline n R d null space f the penalty term J d m is the M-dimensinal space spanned by plynmials φ 1,..., φ M f degree less r equal t m 1, e.g., φ i (t) = t j 1 /(j 1)!, fr j = 1,..., m. Wahba (1990) has shwn that, if t 1,..., t n are such that least squares regressin n φ 1,..., φ M is unique, then (5.3) has a unique minimizer g λ, with representatin n M g λ (t) = c i E m (t, t i ) + b j φ j (t) i=1 j=1 = Qc + Tb (5.4) where, T is a n M matri with entries φ j (t l ) fr j = 1,..., M, l = 1,..., n and Q is a n n matri with entries E m (t l, t i ), fr i = 1,..., n. The functin E m is a Green s functin fr the m-iterate Laplacian Wahba (1990), Silverman and Green (1994). Fr eample, when d = 1, E m (t, t i ) = (t t i ) + m 1 /(m 1)!. The cefficients c and b can be determined by substituting (5.4) int (5.3). Thus, the ptimizatin prblem (5.3) subject t T c = 0, is reduced t a linear system f equatins which is slved by standard matri decmpsitin such as QR decmpsitin. The cnstraint T c = 0 is necessary t guarantee that when cmputing the penalty term at g λ, Jm(g d λ ) is cnditinally psitive definite (See, Wahba (1990)). Effrts have been dne in rder t reduce substantially the cmputatinal cst f slving smthing splines fitting by intrducing the cncept f H-splines (Lu and Wahba (1997) and Dias (1999)), where the number f basis functins and λ act as the smthing parameters. A majr cnceptual prblem with spline smthing is that it is defined implicitly as the slutin t a variatinal prblem rather than as an eplicit frmula invlving the data values. This difficulty can be reslved, at least apprimately, by cnsidering hw the estimate behaves n large data sets. It can be shwn frm the quadratic nature f (5.3) that g λ is linear in the bservatins y i, in the sense that there eists a weight functin H λ (s, t) such that n g λ (s) = y i H λ (s, t i ). (5.5) i=1 It is pssible t btain the asympttic frm f the weight functin, and hence an apprimate eplicit frm f the estimate. Fr the sake f simplicity cnsider d = 1, m = 2 and suppse that the design pints have lcal density f (t) with respect t a Lesbegue measure n R. Assuming the fllwing cnditins, (Silverman (1984)), 1. g H[a, b]. 2. There eists an abslutely cntinuus distributin functin F n [a, b] such that F n F unifrmly as n. 3. f = F, 0 < inf [a,b] f sup [a,b] f <. 4. The density has bunded first derivative n [a, b]. 5. a(n) = sup [a,b] F n F, the smthing parameter λ depends n n in such a way that λ 0 and λ 1/4 a(n) 0 as n.

43 5.1: Additive Mdels 35 In particular, ne can assume that the design pints are regularly distributed with density f ; that is, t i = F 1 ((i 1/2)/n). Then, sup F n F = (1/2)n 1 s that n 4 λ and λ 0 fr (5) t hld. Thus, as n, where the kernel functin K is given by and the bandwidth h(t) satisfies H λ (s, t) = 1 1 f (t) h(t) K(s t h(t) ), K(u) = 1 2 ep( u / 2) sin( u / 2 + π/4), h(t) = λ 1/4 n 1/4 f (t) 1/4. Based n these frmulas, we can see that the spline smther is apprimately a cnvlutin smthing methd but the data are nt cnvlved with a kernel with fied bandwidth, in fact, h varies acrss the sample. 5.1 Additive Mdels The additive mdel is a generalizatin f the usual linear regressin mdel and what has made it s ppular fr statistical inference is that the linear mdel is linear in the predictr variables (eplanatry variables). Once we have fitted the linear mdel we can eamine the predictr variables separately, in the absence f interactins. Additive mdels are als linear in the predictr variables. An additive mdel is defined by p y i = α + g j (t j ) + ɛ i (5.6) j=1 where t j are the predictr variables and as defined befre in sectin 5, ɛ i are uncrrelated errr measurements with E[ɛ i ] = 0 and Var[ɛ i ] = σ 2. The functins g j are unknwn but assumed t be smth functins lying in sme metric space. Sectin 5 describes a general framewrk fr defining and estimating general nnparametric regressin mdels which includes additive mdels as a special case. Fr this, suppse that Ω is the space f the vectr predictr t and assume the H is reprducing kernel in Hilbert space. Hence H has the decmpsitin H = H 0 + p H k (5.7) k=1 where H 0 is spanned by φ 1,..., φ M and H k has the reprducing kernel E k (, ), defined in sectin 5. The space H 0 is the space f functins that are nt t be penalized in the ptimizatin. Fr eample, recall equatin (5.3) and let m = 2 then H 0 is the space f linear functins in t. The ptimizatin prblem becmes: Fr a given set f predictrs t 1,..., t n, find the minimizer f n i=1 {y i p g k (t i )} 2 + k=0 k λ k g k 2 H k, (5.8) k=1

44 36 Chapter 5: The thin-plate spline n R d with g k H k. Then, the thery f reprducing kernel guarantees that a minimizer eists and has the frm p ĝ = Q k c + Tb, (5.9) k=1 where Q k and T are given in equatin (5.4) and the vectrs c and b are fund by minimizing the finite dimensinal penalized least square criterin y Tb p p Q k c + λ k ck T Q kc 2. (5.10) k=1 k=1 This general prblem (5.9) can ptentially be slved by a backfitting type algrithm as in Hastie and Tibshirani (1990). Algrithm Initialize g j = g (0) j fr j = 0,..., p. 2. Cycle j = 0,..., p,..., j = 0,..., p,... ĝ j = S j (y g j (t j )) j =k 3. Cntinue (ii) until the individual functins d nt change. where y = (y 1,..., y n ), S j = Q k (Q k + λ k I) 1, fr j = 1,..., p, and S 0 = T(T T T) 1. One may bserve that mitting the cnstant term α in (5.6) des nt change the resulting estimates. An eample f gam methd is given in Figure Generalized Crss-Validatin Methd fr Splines nnparametric Regressin Withut lss f generality, take d = 1 and m = 2. The slutin f (5.3) depends strngly n the smthing parameter. Craven and Wahba (1979) prvide an autmatic data-driven prcedure t estimate λ. Fr this, let g [k] λ 1 n (y i g(t i )) 2 + λ i =k be the minimizer f (g (u)) 2 du, the ptimizatin prblem with the kth data pint left ut. Then fllwing Wahba s ntatin, the rdinary crss-validatin functin V 0 (λ) is defined as V 0 (λ) = 1 n n k=1 (y k g [k] λ (t k)) 2, (5.11) and the leave-ne-ut estimate f λ is the minimizer f V 0 (λ). T prceed, we need t describe the influence matri. It is nt difficult t shw (see Wahba (1990)) that,

45 f 5.2: Generalized Crss-Validatin Methd fr Splines nnparametric Regressin linear predictr z z linear predictr 0.4 linear predictr z z Figure 5.1: True, tensr prduct, gam nn-adaptive and gam adaptive surfaces fr fied λ we have by (5.5) that g λ is linear in the bservatins y i, that is, in matri ntatin g λ = H λ y. At this stage, ne may think that the cmputatin f this prblem is prhibitive but Craven and Wahba (1979) give us a very useful mathematical identity, which will nt be prved here, but is (y k g [k] λ (t k)) = (y k g λ (t k ))/(1 h kk (λ), (5.12) where h kk (λ) is the kth entry f H λ. By substituting (5.12) int (5.11) we btain a simplified frm f V 0, that is, V 0 (λ) = 1 n n k=1 (y k g λ (t k )) 2 /(1 h kk (λ)) 2 (5.13) The right hand f (5.13) is easier t cmpute than (5.11), hwever the GCV is even easier. The generalized crss-validatin (GCV) is methd fr chsing the smthing parameter λ, which is based n leaving-ne-ut, but it has tw advantages. It is easy t cmpute and it psses sme imprtant theretical prperties the wuld be impssible t prve fr leaving-ne-ut, althugh, as pinted ut by Wahba, in many cases the GCV and leaving-ne-ut estimates will give similar answers. The GCV functin is defined by V(λ) = 1 n n k=1 (y k g λ (t k )) 2 /(1 h kk (λ)) 2 = 1 n (I H λ)y 2 [ 1 n tr(i H λ] 2, (5.14)

46 38 Chapter 5: The thin-plate spline n R d where h kk (λ) = (1/n)tr(H λ ), with tr(h λ ) standing fr the trace f H λ. Nte that V(λ) is a weighted versin f V 0 (λ). In additin, if h kk (λ) des nt depend n k, then V 0 (λ) = V(λ) fr all λ > 0. It is imprtant t bserve that GCV is a predictive mean square errr criteria. Nte that by defining the predictive mean square errr T(λ) as T(λ) = 1 n n i=1 (L i g λ L i g) 2 (5.15) where, L i is the evaluatin functinal defined in sectin 4.3, the GCV estimate f λ is the minimizer f (5.15). Cnsider the epected value f T(λ), E[T(λ)] = 1 n n E[(L i g λ L i g) 2 ]. (5.16) i=1 The GCV therem Wahba (1990) says that if g is in a reprducing kernel Hilbert space then there is a sequence f minimizers λ(n) f EV(λ) that cmes clse t achieving the minimum pssible value f the epected mean square errr, E[T(λ)], using λ(n), as n. That is, let the epectatin inefficiency I n be defined as In = E[T( λ(n))] E[T(λ, )] where λ is the minimizer f E[T(λ)]. Then, under mild cnditins as such the nes described and discussed by Glub, Heath and Wahba (1979) and Craven and Wahba (1979), we have I n 1 as n. Figure 5.2 shws the scatter plt f the revenue passenger miles flwn by cmmercial airlines in the United States fr each year frm 1937 t (This data can be fund in the sftware). The smthing parameter λ was cmputed by GCV methd thrugh the R functin smth.spline().

47 airmiles data airmiles Data SS Passenger miles flwn by U.S. cmmercial airlines Figure 5.2: Smthing spline fitting with smthing parameter btained by GCV methd

48 40 Chapter 5: The thin-plate spline n R d

49 Chapter 6 Regressin splines, P-splines and H-splines 6.1 Sequentially Adaptive H-splines In regressin splines, the idea is t apprimate g by a finite dimensinal subspace f W spanned by basis functins B 1,..., B K, K n. That is, g g K = K c j B j, j=1 where the parameter K cntrls the fleibility f the fitting. A very cmmn chice fr basis functins is the set f cubic B-splines (de Br, 1978). The B-splines basis functins prvide numerically superir scheme f cmputatin and have the main feature that each B j has cmpact supprt. In practice, it means that we btain a stable evaluatin f the resulting matri with entries B i,j = B j (t i ), fr j = 1,..., K and i = 1,..., n is banded. Unfrtunately, the main difficulty when wrking with regressin splines is t select the number and the psitins f a sequence f breakpints called knts where the piecewise cubic plynmials are tied t enfrce cntinuity and lwer rder cntinuus derivatives. (See Schumaker (1972) fr details. ) Regressin splines are attractive because f their cmputatinal scheme where standard linear mdel techniques can be applied. But smthness f the estimate cannt easily be varied cntinuusly as functins f a single smthing parameter (Hastie and Tibshirani, 1990). In particular, when λ = 0 we have the regressin spline case, where K is the parameter that cntrls the fleibility f the fitting. T eemplify the actin f K n the estimated curve, let us cnsider an eample by simulatin with y(t) = ep( t) sin(πt/2) cs(πt) + ε with ε N(0,.05). The curve estimates were btained by least square methd with fur different numbers f basis functins which are the cubic B-splines. Figure 6.1 shws the effect f varying the number f basis functins n the estimatin f the true curve. Nte that the number f basis functins is the same as the number f knts since it is assumed that we are dealing with natural cubic splines space. Observe that small values f K make smther the estimate and hence ver smthing may ccur. Large values f K may cause under-smthing. In the smthing techniques the number f basis functins is chsen t be as large as the number f bservatins and then let the chice f the smthing parameter 41

50 42 Chapter 6: Regressin splines, P-splines and H-splines 100 bs. frm y()=-ep(-)sin(pi/2)cs(pi)+n(0,.025) y True K=4 K=12 K= Figure 6.1: Spline least square fittings fr different values f K cntlling the smthing (Bates and Wahba, 1982). Here a different apprach is t be taken. The H-splines methd intrduced by Dias (1994) in the case f nnparametric density estimatin, cmbines ideas frm regressin splines and smthing splines methds by finding the number f basis functins and the smthing parameter iteratively accrding t a criterin that is described belw. With the pint evaluatin functinals L i g = g(t i ) the equatin (6.4) becmes, A λ (g) = n i=1 (y i g(t i )) 2 + λ (g ) 2. (6.1) Assume that g g K = i=1 K c ib i = Xc s that g K H K, where H K dentes the space f natural cubic splines (NCS) spanned by the basis functins {B i } i=1 K and X is a n K matri with entries X ij = B i (t j ), fr i = 1,..., K and j = 1,..., n. Then, the numerical prblem is t find a vectr c = (c 1,..., c K ) T that minimizes, A λ (c) = y Xc λct Ωc, where Ω is K K matri with entries Ω ij = B i (t)b j (t)dt and y = (y 1,..., y n ) T. Standard calculatins (de Br, 1978) prvide c as a slutin f the fllwing linear system (X T X + λω)c λ = X T y. Nte that the linear system nw invlves K K matrices instead f using n n matrices which is the case f smthing splines. Bth K and λ cntrls the trade ff between smthness and fidelity t the data. Figure 6.2 shws, fr λ > 0, an eample f the relatinship between K and λ. Nte that when the number f basis functins increases, the smthing parameter decreases t a pint and then it increases with K. That is, fr large values f K, the smthing parameter λ becmes larger in rder t enfrce smthness.

51 6.1: Sequentially Adaptive H-splines 43 knts smhting paramter Smthing parameter knts Figure 6.2: Five thusand replicates f y() = ep( ) sin(π/2) cs(π) + ɛ. Based n the facts described previusly, the idea is t prvide a prcedure that estimates the smthing parameter and the number f basis functins iteratively. Cnsider the fllwing algrithm. Algrithm 6.1 (1) Let K 0 be the initial number f basis functins and fi λ 0. (2) Cmpute c λ0 by slving (X T X + λ 0 Ω)c λ0 = X T y. (3) Find ˆλ which minimizes, GCV(λ) = n 1 n i=1 (y i g K0 (t i )) 2 1 n 1 tr(a(λ)), where A(λ) = X(X T X + λω) 1 X T. (4) Cmpute g K0,ˆλ = A( ˆλ)y. (5) Increment the number f basis functins by ne and repeat steps (2) t (4) in rder t get g K0 +1,ˆλ. (6) Fr a real number δ > 0, if a distance d(g K0,ˆλ, g K 0 +1,ˆλ ) < δ, stp the prcedure. The number δ can be determined empirically accrding t the particular distance d(, ). Nte that each time the number f basis functins K is incremented by ne the numeratr f GCV changes and hence this prcedure prvides an ptimal smthing parameter λ fr the estimate g K based n K basis functins.

52 44 Chapter 6: Regressin splines, P-splines and H-splines The aim is t find a criterin able t tell when t stp increasing the number f basis functins. That is, t find the dimensin f the natural cubic spline space where ne is lking fr the apprimatin f the slutin f (6.4). Fr this, let us define the fllwing transfrmatin. Given any functin in W2 2 [a, b], take t g = g2 g 2, then t g 0 and t g = 1. Fr any functins f, g W2 2 [a, b], define a pseud distance clsely related t the square f the Hellinger distance, ( d 2 ( f, g) = t f ) 2 t g = 2(1 ρ( f, g)), where ρ( f, g) = f t f t g = 2 g 2 f 2 g 2 = f g f 2 g 2, is the affinity between f and g. It is nt difficult t see that 0 ρ( f, g) 1, f, g W2 2[a, b]. Nte that d2 ( f, g) is minimum when ρ( f, g) = 1, i.e., ( f 2 g 2 ) 1/2 = f g nly if α f + g = 0 fr sme α. Increasing the number f basis functins K by ne, the prcedure will stp when g g K,ˆλ K+1,ˆλ in the sense f the partial affinity, ρ(g K,ˆλ, g K+1,ˆλ ) = gk,ˆλ g K+1,ˆλ g 2 K,ˆλ g2 K+1,ˆλ where the dependence f λ n K is mitted fr the sake f simplicity. Simulatins were perfrmed in rder t verify the behavir f the affinity and the partial affinity. Figure 6.3 shws a typical eample given by the underlying functin y() = ep( ) sin(π/2) cs(π) + ɛ. One may ntice that the affinity is a cncave functin f the number f basis functins (knts) and the partial affinity appraches ne quickly. Mrever, numerical eperiments have shwn that the maimum f the affinity and the stabilizatin f the partial affinity cincide. That means, increasing the K arbitrarily nt nly increases the cmputatinal cst but als des nt prvide the best fitted curve (in the pseud Hellinger nrm). It wuld be useful t have the distributin f the affinity between the true curve and the estimate prduced by the adaptive H-splines methd. A previus study (Dias, 1996) shwed an empirical unimdal density with supprt n [0, 1] skewed t the left suggesting a beta mdel. T illustrate, five thusand replicates with sample size 20,100,200 and 500 were taken frm a test functin y i = 3 i + ɛ i, where and ɛ 1,..., ɛ n are i.i.d. N(0, 5). Figure 6.4 shws that the empirical affinity distributin (unimdal, skewed t the left with range between 0 and 1), a nnparametric density estimate using kernel methd and a parametric ne using a beta mdel whse parameters were estimated using methd f the mments. Similar results were btained fr several ther test functins and sme f them are ehibited n Figure 2.5 which brings mre evidences t supprt a beta mdel. 1,

53 6.1: Sequentially Adaptive H-splines 45 knts partial affinity knts affinity Figure 6.3: Five thusand replicates f the affinity and the partial affinity fr adaptive nnparametric regressin using H-splines with the true curve affinity, n= affinity, n= affinity, n= affinity, n=500 Figure 6.4: Density estimates f the affinity based n five thusand replicates f the curve y i = 3 i + ɛ i with ɛ i N(0,.5). Slid line is a density estimate using beta mdel and dtted line is a nnparametric density estimate.

54 46 Chapter 6: Regressin splines, P-splines and H-splines Figure 6.5 shws that, in general, H-splines methd has similar perfrmance as smthing splines. But as mentined befre the H-splines apprach slves a linear system f rder K while smthing splines must have t slve a linear system f rder n K. 100 bs. frm y()=ep(-2)sin(2pi)+n(0,.1) y TRUE H-splines S-splines Figure 6.5: A cmparisn between smthing splines (S-splines) and hybrid splines (H-splines) methds. 6.2 P-splines The basic idea f P-splines prpsed by Eilers and Mar (1996) is t use a cnsiderable number f knts and t cntrl the smthness thrugh a difference penalty n cefficients f adjacent B-splines. Fr this, let s cnsider a simple regressin mdel y() = g() + ε, where ε is a randm variable with symmetric distributin with mean zer and finite variance. Assume that the regressin curve g can be well apprimate by a linear cmbinatin f, withut lss f generality, cubic B-splines, dented by B() = B(; 3). Specifically, Given n data pints ( i, y i ) n a set f K B-splines B j (.), we take, g( i ) = K j=1 a jb j ( i ). Nw, the penalized least square prblem becmes t find a vectr f cefficients a = (a 1..., a K ) that minimizes: PLS(a) = n { Y i i=1 K } 2 { K } a j B j ( i ) + λ a j B j ( i ). j=1 j=1

55 6.3: Adaptive Regressin via H-Splines Methd 47 Fllwing, de Br (1978), we have that the secnd derivative K a j B j ( i ; 3) = h 2 K 2 a j B j ( i ; 1) j=1 j=1 where h is the distance between knts and 2 a j = ( a j ) = (a j a j 1 ) The P-splines methd penalizes the higher-rder f the finite differences f the cefficients f adjacent B-splines. That is, n { Y i i=1 K 2 a j B j ( i )} + λ K ( m (a j )) 2. j=1 j=m+1 Eilers and Mar (1996) shws that the difference penalty is a gd discrete apprimatin t the integrated square f the kth derivative and with this this penalty mments f the data are cnserved and plynmial regressin mdels ccur as limits fr large values f λ. Figure 6.6 shws a cmparisn f smth.spline and P-spline estimates n simulated eample. sin(2pi/10)+.5 + N(0,.71) y True smth.spline P spline Figure 6.6: smth spline and P-spline 6.3 Adaptive Regressin via H-Splines Methd In smthing techniques, the number f basis functins is chsen t be as large as the number f bservatins and then the smthing parameter is chsen t cntrl

56 48 Chapter 6: Regressin splines, P-splines and H-splines the fleibility f the fitting (Bates and Wahba, 1982). The h-splines methd fr nnparametric regressin (Lu and Wahba (1997), Dias (1998) and Dias (1999)) cmbines sme features f regressin splines and f traditinal smthing splines t btain a hybrid smthing prcedure which is usually implemented with large data sets and displays a desirable frm f spatial adaptability when the underlying functin is spatially inhmgeneus in its degree f cmpleity. Basically, chsing the number f basis functins, fr instance by GCV criterin, will d mst f the wrk fr balancing between bias and variance. But there is a mre imprtant reasn why we want t d a penalized regressin, namely numerical stability. It is well knwn that as the number f basis functins increases, the regressin prblem becmes mre ill-cnditined, which makes the numerical cmputatin less stable. The basis functins used, in general, are the cubic spline basis which have larger crrelatins amng them than linear spline basis, hence the ill-cnditining prblem is mre serius. The penalized regressin step acts as a remedy fr this. Similarly t smthing splines, take the penalty term J(g) as (g ) 2, the pint evaluatin functinals L i g = g(t i ) y = (y 1,..., y n ) T, g = (g(t 1 ),..., g(t n )) T, and assume that g g K,θ = K i=1 θ ib i = X K θ s that g K,θ H K, where H K dentes the space f natural cubic splines (NCS) spanned by the basis functins {B i } K i=1 and X K is a n K matri with entries (X K ) {i,j} = B i (t j ), fr i = 1,..., K and j = 1,..., n. Then, the numerical prblem is t find a vectr θ = (θ 1,..., θ K ) T that minimizes the equatin (5.3), L λ (θ) = y X Kθ λθt Ωθ, (6.2) where nw the matri Ω is K K matri with entries Ω ij = B i (t)b j (t)dt. Standard calculatins (de Br, 1978) prvide θ as a slutin f the fllwing linear system (X T X + λω)θ λ = X T y. Nte that the linear system nw invlves K K matrices instead f using n n matrices which is the case f smthing splines. Bth K and λ cntrls the trade ff between smthness and fidelity t the data. By cnstructin H- splines is mre adaptive than the regular smthing splines methd. Simulatins (see Dias (1999)) shw that H-splines methd has better perfrmance even fr small data sets (50 bservatins) and relatively large variance in the measurement errrs. 6.4 A Bayesian Apprach t H-splines We have seen that there are several methds t estimate nn-parametrically an unknwn regressin curve g by using splines since the pineer wrk f Craven and Wahba (1979). Kimeldrf and Wahba (1970) and Wahba (1983) gave an attractive Bayesian interpretatin fr an estimate ĝ f the unknwn curve g. They shwed that ĝ can be viewed as a Bayes estimate f g with respect t a certain prir n the class f all smth functins. The Bayesian apprach allws ne nt nly t estimate the unknwn functin, but als t prvide errr bunds by cnstructing the crrespnding, Bayesian cnfidence intervals (Wahba, 1983). In this sectin we allw tw smthing parameters as Lu and Wahba (1997) and Dias (1999) did. Hwever, instead f ging thrugh the difficulties f specifying them precisely in an ad-hc manner, they are allwed t vary accrding t prir infrmatin. In this way, the prcedure becmes mre capable f prviding an adequate fit.

57 6.4: A Bayesian Apprach t H-splines 49 airmiles data airmiles Data H splines Passenger miles flwn by U.S. cmmercial airlines Figure 6.7: H-spline fitting fr airmiles data Suppse we have the fllwing regressin mdel, y i = g(t i ) + ε i i = 1,..., n. where ε i s are uncrrelated with a N(0, σ 2 ). Mrever, assume that the parametric frm f the regressin curve g is unknwn. Then the likelihd f g given the bservatins y is, l y (g) (σ 2 ) n/2 ep{ 1 2σ 2 y g 2 }. (6.3) The Bayesian justificatin f penalized maimum likelihd is t place a prir density prprtinal t ep{ λ 2 (g ) 2 } ver the space f all smth functins. (see details in Silverman and Green (1994) and Kimeldrf and Wahba (1970)). Hwever, an infinite dimensinal case has a parad alluded t by Wahba (1983). Silverman (1985) prpsed a finite dimensinal Bayesian frmulatin t avid the parades and difficulties invlved in the infinite dimensinal case. Fr this, let g K,θ = i=1 K θ ibi = X K θ K with a knt sequence placed at rder statistics. A cmplete Bayesian apprach wuld assign prir distributin t the cefficients f the epansin, t the knt psitins, t the number f knts and fr σ 2. A Bayesian apprach t hybrid splines nn-parametric regressin assigns prirs fr g K, K, λ and σ 2. Given a realizatin f K the interir knts are placed at rder statistics. This well knwn prcedure in nn-parametric regressin reduces the cmputatinal cst substantially and avids trying t slve a difficult prblem f ptimizing the knt psitins. Any ther prcedure has t take int accunt the fact that changes in the knt psitins might cause cnsiderable change in the functin g (see details in Wahba (1982) fr ill-psed prblems in splines nn-parametric regressin). Mrever, in thery the number f basis functins (which is a linear functin f

58 50 Chapter 6: Regressin splines, P-splines and H-splines the number f knts) can be as large as the sample size. But then ne has t slve a system f n equatins instead f K. An attempt t keep the cmputatinal cst dwn ne might want t have K small as pssible and hence ver-smthing may ccur. Fr any K large enugh λ keeps the balance between ver-smthing and under-smthing. Thus the penalized likelihd becmes with g = g K, l p (σ 2 ) n/2 ep{ 1 2σ 2 y g K 2 } ep{ λ 2 (g K )2 }, (6.4) where g K = g K,θ = X K θ K. The reasn why we suppress the subinde θ in g K,θ will be eplained later in this sectin. Nte that maimizatin f the penalized likelihd is equivalent t minimizatin f (5.3). Fr this prpsed Bayesian set up we have a prir fr (g K, K, λ, σ 2 ) f the frm, p(g K, K, λ, σ 2 ) = p(g K K, λ)p(k, λ)p(σ 2 ) (6.5) = p(g K K, λ)p(λ K)p(K)p(σ 2 ), where p(g K K, λ) ep{ λ 2 (g K )2 }, fr u > 0, v > 0, p(σ 2 ) 1 (σ 2 ) (u+1) ep{ vσ2 }, p(k) = where q = j=k +1 aj /j!, and ep{ a}a K /K! 1 ep{ a}(1 + q, K = 1,..., K ) p(λ K) = ψ(k) ep{ ψ(k)λ}, with ψ any smth functin f K. It is well knwn that, fr λ > 0, when the number f basis functins increases the smthing parameter decreases t a pint and then it increases with K. That is, fr large values f K the smthing parameter λ becmes larger t enfrce smthness. (See details in Dias (1999).). Therefre, functins ψ that satisfy these requirements are recmmended. In particular, a fleible class is given by ψ(k) = K b ep( ck) fr suitably chsen hyperparameters b and c. Hwever, undesirably large values f K can be ecluded thrugh fiing K apprpriately r be made unlikely by fiing a accrdingly. These large values f K can be cntrlled by the hyperparameters a and K f the prir p(k). The chice f a is gverned by the prir epectatin f the structure f the underlying curve such as maima, minima, inflectin pints etc. We suggest the reader t fllw sme f the rules recmmended by Wegman and Wright (1983). These recmmendatins are based n the assumptin f fitting a cubic spline, the mst ppular case and are summarized belw. 1. Etrema shuld be center in intervals and inflectin pints shuld be lcated near knt pints.

59 6.4: A Bayesian Apprach t H-splines N mre than ne etremum and ne inflectin pint shuld fall between knts (because a cubic culd nt fit mre). 3. Knt pints shuld be lcated at data pints. Nte that g K = X K θ K is cmpletely determined if we knw the cefficients θ K. Hence, the verall parameter space Ξ can be written as cuntable unin f subspaces Ξ K = {ξ : ξ = (θ K, φ) (R K {1,..., K } [0, ] 2 )} with φ = (K, λ, σ 2 ). Thus, the psterir is given by π(ξ y) l p (ξ y)p(ξ). (6.6) In rder t sample frm the psterir π(ξ y) we have t cnsider the variatin f dimensinality f this prblem. Hence ne has t design mve types between subspaces Ξ K. Hwever, assigning a prir t g K, equivalently t the cefficients θ K, leads t a serius cmputatinal difficulty pinted ut by Denisn, Mallick and Smith (1998) where a cmparative study was develped. They suggest that the least square estimates fr the vectr θ K leads t a nn-significant deteriratin in perfrmance fr verall curve estimatin. Similarly, given a realizatin f (K, λ, σ 2 ), we slve the penalized least square bjective functin (6.2) t btain the estimates, ˆθ K = ˆθ K (y), fr the vectr θ K and cnsequently we have an estimate ĝ K = X K ˆθ K. Thus, there is n prir assigned fr this vectr f parameters, and s, we write g K = g K,θ. Having gt ˆθ K, we apprimate the marginal psterir π(φ y) by the cnditinal psterir with π(φ y, ˆθ K ) l p (φ y, ˆθ K )p(φ), (6.7) p(φ) = p(k, λ, σ 2 ) = p(λ K)p(K)p(σ 2 ). Nte that if ne assigns independent nrmal distributins t the parameters θ K it will nt be difficult t btain the marginal psterir π(φ y) and apprimatins will nt be necessary. Hwever, the results will be very similar. T slve the prblem f sampling frm the psterir, π(φ y, ˆθ K ), Dias and Gamerman (2002) used reversible jump methdlgy (Green (1995)). This technique is beynd the level f this bk and it will nt be eplained here but the interested reader will find the details f the algrithm in Dias and Gamerman (2002). In figure 6.8 we present a simulated eample t verify the estimates prvided by this apprach. The final estimate is, ŷ + (t i ) = ŷ j (t i ). j=1 Figure 6.9, ehibit apprimate Bayesian cnfidence intervals fr the true curve regressin f and it was cmputed as fllwing. Let y(t i ) and ŷ(t i ) be a particular mdel and its estimate prvided by this prpsed methd with i = 1,..., n where n is the sample size. Fr each i = 1,..., n the fitting vectrs (ŷ 1 (t i ), ŷ 2 (t i ),..., ŷ 100 (t i )) T frm randm samples and frm thse vectrs the lwer and upper quantiles were cmputed in rder t btain the cnfidence intervals. Figure 6.8 ehibits an eample f hw useful a Bayesian apprach t hybrid splines nn-parametric regressin can be. It describes a situatin where a prir infrmatin

60 52 Chapter 6: Regressin splines, P-splines and H-splines tells that the underlying curve has large curvature and the variance f the errr measurements is nt t small and the traditinal methds f smthing, e.g. smthing splines, might nt be able t capture all the structure f the true regressin curve. By using vague but prper prirs t the smthing parameters K and λ and fr the variance σ 2 this Bayesian apprach prvides a much better fitting than the traditinal smth splines apprach des. 50 bservatins frm y=ep( t^2/2)cs(4pit)+n(0,.36) y True SS estimate Bayesian estimate Figure 6.8: Estimatin results: a) Bayesian estimate with a = 17 and ψ(k) = K 3 (dtted line); b) (SS) smthing splines estimate (dashed line). The true regressin functin is als pltted (slid line). The SS estimate was cmputed using the R functin smth.spline frm which 4 degrees f freedm were btained and λ was cmputed by GCV.

61 6.4: A Bayesian Apprach t H-splines 53 Figure 6.9 shws ne hundred curves sampling frm the psterir (after burn-in) and apprimate 95% Bayesian cnfidence interval fr the regressin curve g(t) = ep( t 2 /2) cs(4πt) with t [0, π]. On the right panel f this figure we see the curve estimate which is an apprimatin fr the psterir mean and the percentiles curves 2.5% and 97.5%. The last 100 curves sampled 95% "Cnfidence" Interval Average.025 quantile.975 quantile Figure 6.9: One hundred estimates f the curve 6.8 and a Bayesian cnfidence interval fr the regressin curve g(t) = ep( t 2 /2) cs(4πt) with t [0, π].

62

63 Chapter 7 Final Cmments Cmpared t parametric techniques nnparametric mdeling has mre fleibility since it allws ne t chse frm an infinite dimensinal class f functins where the underlying regressin curve is assumed t belng. In general, this type f chice depends n the unknwn smthness f the true curve. But fr mst f the cases ne can assume mild restrictins such that a regressin curve has an abslutely cntinuus first derivative and a square integrable secnd derivative. Nevertheless, nnparametric estimatrs are less efficient than the parametric nes when the parametric mdel is valid. Fr many parametric estimatrs the mean square errr ges t zer with rate f n 1, while nnparametric estimatrs have rate f n α, α [0, 1], and α depends n the smthness f the underlying curve. When the pstulate parametric mdel is nt valid, many parametric estimatrs cannt have, ad hc, rate n 1. In fact, thse estimatrs will nt cnverge t the true curve. One f the advantages f the adaptive basis functins prcedures, e.g., H-splines methds is the ability t vary the amunt f smthing in respnse t the inhmgeneus curvature f the true functins at different lcatins. Thse methds have been very successful in capturing the structure f the unknwn functin. In general, nnparametric estimatrs are gd candidates when ne des nt knw the frm f the underlying curve. 55

64

65 Bibligraphy Bates, D. and Wahba, G. (1982). Cmputatinal Methds fr Generalized Crss-Validatin with large data sets, Academic Press, Lndn. B, G. E. P., Hunter, W. G. and Hunter, J. S. (1978). Statistics fr Eperiments: An Intrductin t Design, Data Analysis, and Mdel Building, Jhn Wiley and Sns (New Yrk, Chichester). Cleveland, W. S. (1979). Rbust lcally weighted regressin and smthing scatterplts, J. Amer. Statist. Assc. 74(368): Craven, P. and Wahba, G. (1979). Smthing nisy data with spline functins, Numerische Mathematik 31: de Br, C. (1978). A Practical Guide t Splines, Springer Verlag, New Yrk. Denisn, D. G. T., Mallick, B. K. and Smith, A. F. M. (1998). Autmatic bayesian curve fitting, Jurnal f the Ryal Statistical Sciety B 60: Dias, R. (1994). Density estimatin via h-splines, University f Wiscnsin-Madisn. Ph.D. dissertatin. Dias, R. (1996). Sequential adaptive nnparametric regressin via H-splines. Technical Reprt RP 43/96, University f Campinas, June Submitted. Dias, R. (1998). Density estimatin via hybrid splines, Jurnal f Statistical Cmputatin and Simulatin 60: Dias, R. (1999). Sequential adaptive nn parametric regressin via H-splines, Cmmunicatins in Statistics: Cmputatins and Simulatins 28: Dias, R. and Gamerman, D. (2002). A Bayesian apprach t hybrid splines nnparametric regressin, Jurnal f Statistical Cmputatin and Simulatin. 72(4): Eilers, P. H. C. and Mar, B. D. (1996). Fleible smthing with B-splines and penalties, Statist. Sci. 11(2): With cmments and a rejinder by the authrs. Glub, G. H., Heath, M. and Wahba, G. (1979). Generalized crss-validatin as a methd fr chsing a gd ridge parameter, Technmetrics 21(2): Gd, I. J. and Gaskins, R. A. (1971). Nnparametric rughness penalties fr prbability densities, Bimetrika 58:

66 58 Bibligraphy Green, P. J. (1995). Reversible jump Markv Chain Mnte Carl cmputatin and bayesian mdel determinatin, Bimetrika 82: Gu, C. (1993). Smthing spline density estimatin: A dimensinless autmatic algrithm, J. f the Amer. Stat l. Assn. 88: Gu, C. and Qiu, C. (1993). Smthing spline density estimatin:thery, Ann. f Statistics 21: Härdle, W. (1990). Smthing Techniques With Implementatin in S, Springer-Verlag (Berlin, New Yrk). Hastie, T. J. and Tibshirani, R. J. (1990). Generalized Additive Mdels, Chapman and Hall. Kimeldrf, G. S. and Wahba, G. (1970). A crrespndence between Bayesian estimatin n stchastic prcesses and smthing by splines, The Annals f Mathematical Statistics 41: Kperberg, C. and Stne, C. J. (1991). A study f lgspline density estimatin, Cmputatinal Statistics and Data Analyis 12: Lu, Z. and Wahba, G. (1997). Hybrid adaptive splines, Jurnal f the American Statistical Assciatin 92: Nadaraya, E. A. (1964). On estimating regressin, Thery f prbability and its applicatins 10: O Sullivan, F. (1988). Fast cmputatin f fully autmated lg-density and lg-hazard estimatrs, SIAM J. n Scientific and Stat l. Cmputing 9: Pagan, A. and Ullah, A. (1999). Press, Cambridge. Nnparametric ecnmetrics, Cambridge University Parzen, E. (1962). On estimatin f a prbability density functin and mde, Ann. f Mathematical Stat. 33: Prakasa-Ra, B. L. S. (1983). Nnparametric Functinal Estimatin, Academic Press (Duluth, Lndn). Schumaker, L. L. (1972). Spline Functins and Aprimatin thery, Birkhauser. Schumaker, L. L. (1981). Spline Functins: Basic Thery, WileyISci:NJ. Sctt, D. W. (1992). Multivariate Density Estimatin. Thery, Practice, and Visualizatin, Jhn Wiley and Sns (New Yrk, Chichester). Silverman, B. W. (1982). On the estimatin f a prbability density functin by the maimum penalized likelihd methd, Ann. f Statistics 10: Silverman, B. W. (1984). Spline smthing: The equivalent variable kernel methd, Ann. f Statistics 12:

67 Bibligraphy 59 Silverman, B. W. (1985). Sme aspects f the spline smthing apprach t nnparametric regressin curve fitting, Jurnal f the Ryal Statistical Sciety, Series B, Methdlgical 47: Silverman, B. W. (1986). Density Estimatin fr Statistics and Data Analysis, Chapman and Hall (Lndn). Silverman, B. W. and Green, P. J. (1994). Nnparametric Regressin and Generalized Linear Mdels, Chapman and Hall (Lndn). Stne, C. J. (1990). 18: Large-sample inference fr lg-spline mdels, Ann. f Statistics Stne, C. J. and K, C.-Y. (1985). Lgspline density estimatin, Cntemprary Mathematics pp Thmpsn, J. R. and Tapia, R. A. (1990). Nnparametric Functin Estimatin, Mdeling and Simulatin, SIAM:PA. Wahba, G. (1982). Cnstrained regularizatin fr ill psed linear peratr equatins, with applicatins in meterlgy and medicine, in S. S. Gupta and J. O. Berger (eds), Statistical Decisin Thery and Related Tpics III, in tw vlumes, Vl. 2, Academic:NY:Lnd, pp Wahba, G. (1983). Bayesian cnfidence intervals fr the crss-validated smthing spline, JRSS-B, Methdlgical 45: Wahba, G. (1990). Spline Mdels fr Observatinal Data, SIAM:PA. Watsn, G. S. (1964). Smth regressin analysis, Sankya A 26: Wegman, E., J. and Wright, I. W. (1983). Splines in statistics, Jurnal f the American Statistical Assciatin 78:

Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeoff

Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeoff Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeff Reading: Chapter 2 Stats 202: Data Mining and Analysis Lester Mackey September 23, 2015 (Slide credits: Sergi Bacallad) 1 / 24 Annuncements

More information

Chapter 3: Cluster Analysis

Chapter 3: Cluster Analysis Chapter 3: Cluster Analysis 3.1 Basic Cncepts f Clustering 3.1.1 Cluster Analysis 3.1. Clustering Categries 3. Partitining Methds 3..1 The principle 3.. K-Means Methd 3..3 K-Medids Methd 3..4 CLARA 3..5

More information

Statistical Analysis (1-way ANOVA)

Statistical Analysis (1-way ANOVA) Statistical Analysis (1-way ANOVA) Cntents at a glance I. Definitin and Applicatins...2 II. Befre Perfrming 1-way ANOVA - A Checklist...2 III. Overview f the Statistical Analysis (1-way tests) windw...3

More information

Applied Spatial Statistics: Lecture 6 Multivariate Normal

Applied Spatial Statistics: Lecture 6 Multivariate Normal Applied Spatial Statistics: Lecture 6 Multivariate Nrmal Duglas Nychka, Natinal Center fr Atmspheric Research Supprted by the Natinal Science Fundatin Bulder, Spring 2013 Outline additive mdel Multivariate

More information

Data Analytics for Campaigns Assignment 1: Jan 6 th, 2015 Due: Jan 13 th, 2015

Data Analytics for Campaigns Assignment 1: Jan 6 th, 2015 Due: Jan 13 th, 2015 Data Analytics fr Campaigns Assignment 1: Jan 6 th, 2015 Due: Jan 13 th, 2015 These are sample questins frm a hiring exam that was develped fr OFA 2012 Analytics team. Plan n spending n mre than 4 hurs

More information

1.3. The Mean Temperature Difference

1.3. The Mean Temperature Difference 1.3. The Mean Temperature Difference 1.3.1. The Lgarithmic Mean Temperature Difference 1. Basic Assumptins. In the previus sectin, we bserved that the design equatin culd be slved much easier if we culd

More information

Student Academic Learning Services Page 1 of 7. Statistics: The Null and Alternate Hypotheses. A Student Academic Learning Services Guide

Student Academic Learning Services Page 1 of 7. Statistics: The Null and Alternate Hypotheses. A Student Academic Learning Services Guide Student Academic Learning Services Page 1 f 7 Statistics: The Null and Alternate Hyptheses A Student Academic Learning Services Guide www.durhamcllege.ca/sals Student Services Building (SSB), Rm 204 This

More information

Group Term Life Insurance: Table I Straddle Testing and Imputed Income for Dependent Life Insurance

Group Term Life Insurance: Table I Straddle Testing and Imputed Income for Dependent Life Insurance An American Benefits Cnsulting White Paper American Benefits Cnsulting, LLC 99 Park Ave, 25 th Flr New Yrk, NY 10016 212 716-3400 http://www.abcsys.cm Grup Term Life Insurance: Table I Straddle Testing

More information

Spatial basis risk and feasibility of index based crop insurance in Canada: A spatial panel data econometric approach

Spatial basis risk and feasibility of index based crop insurance in Canada: A spatial panel data econometric approach Spatial basis risk and feasibility f index based crp insurance in Canada: A spatial panel data ecnmetric apprach Millin A.Tadesse (Ph.D.), University f Waterl, Statistics and Actuarial Science, Canada.

More information

CHECKING ACCOUNTS AND ATM TRANSACTIONS

CHECKING ACCOUNTS AND ATM TRANSACTIONS 1 Grades 6-8 Lessn 1 CHECKING ACCOUNTS AND ATM TRANSACTIONS Tpic t Teach: This lessn is intended fr middle schl students in sixth thrugh eighth grades during a frty minute time perid. The lessn teaches

More information

UNIVERSITY OF CALIFORNIA MERCED PERFORMANCE MANAGEMENT GUIDELINES

UNIVERSITY OF CALIFORNIA MERCED PERFORMANCE MANAGEMENT GUIDELINES UNIVERSITY OF CALIFORNIA MERCED PERFORMANCE MANAGEMENT GUIDELINES REFERENCES AND RELATED POLICIES A. UC PPSM 2 -Definitin f Terms B. UC PPSM 12 -Nndiscriminatin in Emplyment C. UC PPSM 14 -Affirmative

More information

The Importance Advanced Data Collection System Maintenance. Berry Drijsen Global Service Business Manager. knowledge to shape your future

The Importance Advanced Data Collection System Maintenance. Berry Drijsen Global Service Business Manager. knowledge to shape your future The Imprtance Advanced Data Cllectin System Maintenance Berry Drijsen Glbal Service Business Manager WHITE PAPER knwledge t shape yur future The Imprtance Advanced Data Cllectin System Maintenance Cntents

More information

CSE 231 Fall 2015 Computer Project #4

CSE 231 Fall 2015 Computer Project #4 CSE 231 Fall 2015 Cmputer Prject #4 Assignment Overview This assignment fcuses n the design, implementatin and testing f a Pythn prgram that uses character strings fr data decmpressin. It is wrth 45 pints

More information

Some Statistical Procedures and Functions with Excel

Some Statistical Procedures and Functions with Excel Sme Statistical Prcedures and Functins with Excel Intrductry Nte: Micrsft s Excel spreadsheet prvides bth statistical prcedures and statistical functins. The prcedures are accessed by clicking n Tls in

More information

In this lab class we will approach the following topics:

In this lab class we will approach the following topics: Department f Cmputer Science and Engineering 2013/2014 Database Administratin and Tuning Lab 8 2nd semester In this lab class we will apprach the fllwing tpics: 1. Query Tuning 1. Rules f thumb fr query

More information

Lesson Study Project in Mathematics, Fall 2008. University of Wisconsin Marathon County. Report

Lesson Study Project in Mathematics, Fall 2008. University of Wisconsin Marathon County. Report Lessn Study Prject in Mathematics, Fall 2008 University f Wiscnsin Marathn Cunty Reprt Date: December 14 2008 Students: MAT 110 (Cllege Algebra) students at UW-Marathn Cunty Team Members: Paul Martin Clare

More information

Using PayPal Website Payments Pro UK with ProductCart

Using PayPal Website Payments Pro UK with ProductCart Using PayPal Website Payments Pr UK with PrductCart Overview... 2 Abut PayPal Website Payments Pr & Express Checkut... 2 What is Website Payments Pr?... 2 Website Payments Pr and Website Payments Standard...

More information

TRAINING GUIDE. Crystal Reports for Work

TRAINING GUIDE. Crystal Reports for Work TRAINING GUIDE Crystal Reprts fr Wrk Crystal Reprts fr Wrk Orders This guide ges ver particular steps and challenges in created reprts fr wrk rders. Mst f the fllwing items can be issues fund in creating

More information

How do I evaluate the quality of my wireless connection?

How do I evaluate the quality of my wireless connection? Hw d I evaluate the quality f my wireless cnnectin? Enterprise Cmputing & Service Management A number f factrs can affect the quality f wireless cnnectins at UCB. These include signal strength, pssible

More information

Trends and Considerations in Currency Recycle Devices. What is a Currency Recycle Device? November 2003

Trends and Considerations in Currency Recycle Devices. What is a Currency Recycle Device? November 2003 Trends and Cnsideratins in Currency Recycle Devices Nvember 2003 This white paper prvides basic backgrund n currency recycle devices as cmpared t the cmbined features f a currency acceptr device and a

More information

Disk Redundancy (RAID)

Disk Redundancy (RAID) A Primer fr Business Dvana s Primers fr Business series are a set f shrt papers r guides intended fr business decisin makers, wh feel they are being bmbarded with terms and want t understand a cmplex tpic.

More information

Psych 2017 Chapter 7: Summarizing and interpreting data using statistics

Psych 2017 Chapter 7: Summarizing and interpreting data using statistics Psych 2017 Chapter 7: Summarizing and interpreting data using statistics Descriptive statistics - The gal f descriptive statistics is t describe sample data - Can be cntrasted with inferential statistics

More information

Operational Amplifier Circuits Comparators and Positive Feedback

Operational Amplifier Circuits Comparators and Positive Feedback Operatinal Amplifier Circuits Cmparatrs and Psitive Feedback Cmparatrs: Open Lp Cnfiguratin The basic cmparatr circuit is an p-amp arranged in the pen-lp cnfiguratin as shwn n the circuit f Figure. The

More information

Live Analytics for Kaltura Live Streaming Information Guide. Version: Jupiter

Live Analytics for Kaltura Live Streaming Information Guide. Version: Jupiter Live Analytics fr Kaltura Live Streaming Infrmatin Guide Versin: Jupiter Kaltura Business Headquarters 250 Park Avenue Suth, 10th Flr, New Yrk, NY 10003 Tel.: +1 800 871 5224 Cpyright 2015 Kaltura Inc.

More information

SUMMARY This is what Business Analysts do in the real world when embarking on a new project: they analyse

SUMMARY This is what Business Analysts do in the real world when embarking on a new project: they analyse S yu want t be a Business Analyst? Analyst analyse thyself. SUMMARY This is what Business Analysts d in the real wrld when embarking n a new prject: they analyse Why? Why are we ding this prject - what

More information

GED MATH STUDY GUIDE. Last revision July 15, 2011

GED MATH STUDY GUIDE. Last revision July 15, 2011 GED MATH STUDY GUIDE Last revisin July 15, 2011 General Instructins If a student demnstrates that he r she is knwledgeable n a certain lessn r subject, yu can have them d every ther prblem instead f every

More information

WHITE PAPER. Vendor Managed Inventory (VMI) is Not Just for A Items

WHITE PAPER. Vendor Managed Inventory (VMI) is Not Just for A Items WHITE PAPER Vendr Managed Inventry (VMI) is Nt Just fr A Items Why it s Critical fr Plumbing Manufacturers t als Manage Whlesalers B & C Items Executive Summary Prven Results fr VMI-managed SKUs*: Stck-uts

More information

ONGOING FEEDBACK AND PERFORMANCE MANAGEMENT. A. Principles and Benefits of Ongoing Feedback

ONGOING FEEDBACK AND PERFORMANCE MANAGEMENT. A. Principles and Benefits of Ongoing Feedback ONGOING FEEDBACK AND PERFORMANCE MANAGEMENT A. Principles and Benefits f Onging Feedback While it may seem like an added respnsibility t managers already "full plate," managers that prvide nging feedback

More information

Support Vector and Kernel Machines

Support Vector and Kernel Machines Supprt Vectr and Kernel Machines Nell Cristianini BIOwulf Technlgies [email protected] http:///tutrial.html ICML 2001 A Little Histry SVMs intrduced in COLT-92 by Bser, Guyn, Vapnik. Greatly develped

More information

FundingEdge. Guide to Business Cash Advance & Bank Statement Loan Programs

FundingEdge. Guide to Business Cash Advance & Bank Statement Loan Programs Guide t Business Cash Advance & Bank Statement Lan Prgrams Cash Advances: $2,500 - $1,000,000 Business Bank Statement Lans: $5,000 - $500,000 Canada Cash Advances: $5,000 - $500,000 (must have 9 mnths

More information

Licensing Windows Server 2012 for use with virtualization technologies

Licensing Windows Server 2012 for use with virtualization technologies Vlume Licensing brief Licensing Windws Server 2012 fr use with virtualizatin technlgies (VMware ESX/ESXi, Micrsft System Center 2012 Virtual Machine Manager, and Parallels Virtuzz) Table f Cntents This

More information

Michigan Transfer Agreement (MTA) Frequently Asked Questions for College Personnel

Michigan Transfer Agreement (MTA) Frequently Asked Questions for College Personnel Michigan Transfer Agreement (MTA) Frequently Asked Questins fr Cllege Persnnel What happened t the MACRAO Agreement? Originally signed in 1972, the MACRAO agreement has been used successfully by many students

More information

The ad hoc reporting feature provides a user the ability to generate reports on many of the data items contained in the categories.

The ad hoc reporting feature provides a user the ability to generate reports on many of the data items contained in the categories. 11 This chapter includes infrmatin regarding custmized reprts that users can create using data entered int the CA prgram, including: Explanatin f Accessing List Screen Creating a New Ad Hc Reprt Running

More information

Why Can t Johnny Encrypt? A Usability Evaluation of PGP 5.0 Alma Whitten and J.D. Tygar

Why Can t Johnny Encrypt? A Usability Evaluation of PGP 5.0 Alma Whitten and J.D. Tygar Class Ntes: February 2, 2006 Tpic: User Testing II Lecturer: Jeremy Hyland Scribe: Rachel Shipman Why Can t Jhnny Encrypt? A Usability Evaluatin f PGP 5.0 Alma Whitten and J.D. Tygar This article has three

More information

How to put together a Workforce Development Fund (WDF) claim 2015/16

How to put together a Workforce Development Fund (WDF) claim 2015/16 Index Page 2 Hw t put tgether a Wrkfrce Develpment Fund (WDF) claim 2015/16 Intrductin What eligibility criteria d my establishment/s need t meet? Natinal Minimum Data Set fr Scial Care (NMDS-SC) and WDF

More information

In Australia, compulsory third party insurance (CTP) covers

In Australia, compulsory third party insurance (CTP) covers COMPULSORY THIRD PARTY INSURANCE: METHODS OF MAKING EXPLICIT ALLOWANCE FOR INFLATION B. J. BRUTON and J. R. CUMPSTON Australia SUMMARY An inflatin index is essential when cnstructing claim payment mdels

More information

City of Gold Coast. Debt Management. Public Statement

City of Gold Coast. Debt Management. Public Statement City f Gld Cast Debt Management Public Statement Octber 2015 This statement explains the City f Gld Cast s debt management apprach and psitin. It includes the fllwing: Overall Financial Psitin Prfit and

More information

Experiment 1: Freezing Point Depression

Experiment 1: Freezing Point Depression Experiment 1: Freezing Pint Depressin Purpse: The purpse f this lab is t experimentally determine the freezing pint f tw slutins and cmpare the effect f slute type and cncentratin fr each slutin. Intrductin:

More information

Guide to Stata Econ B003 Applied Economics

Guide to Stata Econ B003 Applied Economics Guide t Stata Ecn B003 Applied Ecnmics T cnfigure Stata in yur accunt: Lgin t WTS (use yur Cluster WTS passwrd) Duble-click in the flder Applicatins and Resurces Duble-click in the flder Unix Applicatins

More information

CDC UNIFIED PROCESS PRACTICES GUIDE

CDC UNIFIED PROCESS PRACTICES GUIDE Dcument Purpse The purpse f this dcument is t prvide guidance n the practice f Risk Management and t describe the practice verview, requirements, best practices, activities, and key terms related t these

More information

Guidance for Financial Analysts to model the impact of aircraft noise on Flughafen Zürich AG s financial statements

Guidance for Financial Analysts to model the impact of aircraft noise on Flughafen Zürich AG s financial statements Guidance fr Financial Analysts t mdel the impact f aircraft nise n Flughafen Zürich AG s financial statements Zurich Airprt, March 17, 2015 Intrductin The bjective f this dcument is t explain the impact

More information

Contact: Monique Goyens [email protected]

Contact: Monique Goyens directorsoffice@beuc.eu Cmparisn Websites Cntact: Mnique Gyens [email protected] Ref.: X/2012/065-28/08/2012 BUREAU EUROPÉEN DES UNIONS DE CONSOMMATEURS AISBL DER EUROPÄISCHE VERBRAUCHERVERBAND Rue d Arln 80, B-1040 Brussels

More information

Licensing Windows Server 2012 R2 for use with virtualization technologies

Licensing Windows Server 2012 R2 for use with virtualization technologies Vlume Licensing brief Licensing Windws Server 2012 R2 fr use with virtualizatin technlgies (VMware ESX/ESXi, Micrsft System Center 2012 R2 Virtual Machine Manager, and Parallels Virtuzz) Table f Cntents

More information

Calibration of Oxygen Bomb Calorimeters

Calibration of Oxygen Bomb Calorimeters Calibratin f Oxygen Bmb Calrimeters Bulletin N.101 Prcedures fr standardizatin f Parr xygen bmb calrimeters. Energy Equivalent The calibratin f an xygen bmb calrimeter has traditinally been called the

More information

Chapter 6: Continuous Probability Distributions GBS221, Class 20640 March 25, 2013 Notes Compiled by Nicolas C. Rouse, Instructor, Phoenix College

Chapter 6: Continuous Probability Distributions GBS221, Class 20640 March 25, 2013 Notes Compiled by Nicolas C. Rouse, Instructor, Phoenix College Chapter Objectives 1. Understand the difference between hw prbabilities are cmputed fr discrete and cntinuus randm variables. 2. Knw hw t cmpute prbability values fr a cntinuus unifrm prbability distributin

More information

Backward Design Lesson Planning. How do I determine and write lesson objectives? (identifying desired results)

Backward Design Lesson Planning. How do I determine and write lesson objectives? (identifying desired results) Backward Design Lessn Planning Hw d I determine and write lessn bjectives? (identifying desired results) Intrductin All lessns, regardless f which lessn-planning mdel that yu use, begins with identifying

More information

Integrate Marketing Automation, Lead Management and CRM

Integrate Marketing Automation, Lead Management and CRM Clsing the Lp: Integrate Marketing Autmatin, Lead Management and CRM Circular thinking fr marketers 1 (866) 372-9431 www.clickpintsftware.cm Clsing the Lp: Integrate Marketing Autmatin, Lead Management

More information

Annuities and Senior Citizens

Annuities and Senior Citizens Illinis Insurance Facts Illinis Department f Insurance January 2010 Annuities and Senir Citizens Nte: This infrmatin was develped t prvide cnsumers with general infrmatin and guidance abut insurance cverages

More information

Equal Pay Audit 2014 Summary

Equal Pay Audit 2014 Summary Equal Pay Audit 2014 Summary Abut the dcument The fllwing summary is an abridged versin f Ofcm s equal pay audit 2014. In the full versin f the reprt we set ut ur key findings, cmment n any issues arising

More information

Fund Accounting Class II

Fund Accounting Class II Fund Accunting Class II BS&A Fund Accunting Class II Cntents Gvernmental Financial Reprting Mdel - Minimum GAAP Reprting Requirements... 1 MD&A (Management's Discussin and Analysis)... 1 Basic Financial

More information

PART 6. Chapter 12. How to collect and use feedback from readers. Should you do audio or video recording of your sessions?

PART 6. Chapter 12. How to collect and use feedback from readers. Should you do audio or video recording of your sessions? TOOLKIT fr Making Written Material Clear and Effective SECTION 3: Methds fr testing written material with readers PART 6 Hw t cllect and use feedback frm readers Chapter 12 Shuld yu d audi r vide recrding

More information

CFD AND SPOT FOREX TERMS: DEPOSIT ACCOUNTS

CFD AND SPOT FOREX TERMS: DEPOSIT ACCOUNTS 1. Structure 1.1 When we engage in cfd r spt frex trading with yu, we d s n the basis f: - ur General Terms; these terms, i.e. ur CFD and Spt Frex Terms. 1.2 The CFD and Spt Frex Terms deal with matters

More information

Success in Mathematics

Success in Mathematics Success in Mathematics Tips n hw t study mathematics, hw t apprach prblem-slving, hw t study fr and take tests, and when and hw t get help. Math Study Skills Be actively invlved in managing the learning

More information

WEB APPLICATION SECURITY TESTING

WEB APPLICATION SECURITY TESTING WEB APPLICATION SECURITY TESTING Cpyright 2012 ps_testware 1/7 Intrductin Nwadays every rganizatin faces the threat f attacks n web applicatins. Research shws that mre than half f all data breaches are

More information

WHITEPAPER SERIES. [email protected] 610.717.0413 www.metavistech.com

WHITEPAPER SERIES. info@metavistech.com 610.717.0413 www.metavistech.com WHITEPAPER SERIES Shredded Strage in SharePint 2013 What des Shredded Strage mean, hw much des it actually save and hw t take advantage f it in SharePint 2013. What is Shredded Strage? Shredded Strage

More information

Spread Bet Terms: Deposit Accounts

Spread Bet Terms: Deposit Accounts Spread Bet Terms: Depsit Accunts 1. Structure 1.1 When we engage in Spread Betting with yu, we d s n the basis f: - ur General Terms; these terms, i.e. ur Spread Terms. 1.2 The Spread Terms deal with matters

More information

Business Plan Overview

Business Plan Overview Business Plan Overview Organizatin and Cntent Summary A business plan is a descriptin f yur business, including yur prduct yur market, yur peple and yur financing needs. Yu shuld cnsider that a well prepared

More information

NHPCO Guidelines for Using CAHPS Hospice Survey Results

NHPCO Guidelines for Using CAHPS Hospice Survey Results Intrductin NHPCO Guidelines fr Using CAHPS Hspice Survey Results The Centers fr Medicare and Medicaid Services (CMS) has develped the Cnsumer Assessment f Healthcare Prviders and Systems (CAHPS ) Hspice

More information

Best Practice - Pentaho BA for High Availability

Best Practice - Pentaho BA for High Availability Best Practice - Pentah BA fr High Availability This page intentinally left blank. Cntents Overview... 1 Pentah Server High Availability Intrductin... 2 Prerequisites... 3 Pint Each Server t Same Database

More information

Ready to upgrade the Turbo on your Diesel?

Ready to upgrade the Turbo on your Diesel? Ready t upgrade the Turb n yur Diesel? Tday s diesel engines represent the state f the art in technlgy with high pwer density, excellent drivability, and gd fuel ecnmy. Frtunately fr the diesel enthusiast,

More information

Welcome to Microsoft Access Basics Tutorial

Welcome to Microsoft Access Basics Tutorial Welcme t Micrsft Access Basics Tutrial After studying this tutrial yu will learn what Micrsft Access is and why yu might use it, sme imprtant Access terminlgy, and hw t create and manage tables within

More information

A Model for Automatic Preventive Maintenance Scheduling and Application Database Software

A Model for Automatic Preventive Maintenance Scheduling and Application Database Software Prceedings f the 2010 Internatinal Cnference n Industrial Engineering and Operatins Management Dhaka, Bangladesh, January 9 10, 2010 A Mdel fr Autmatic Preventive Maintenance Scheduling and Applicatin

More information

Project Management Fact Sheet:

Project Management Fact Sheet: Prject Fact Sheet: Managing Small Prjects Versin: 1.2, Nvember 2008 DISCLAIMER This material has been prepared fr use by Tasmanian Gvernment agencies and Instrumentalities. It fllws that this material

More information

IX- On Some Clustering Techniques for Information Retrieval. J. D. Broffitt, H. L. Morgan, and J. V. Soden

IX- On Some Clustering Techniques for Information Retrieval. J. D. Broffitt, H. L. Morgan, and J. V. Soden IX-1 IX- On Sme Clustering Techniques fr Infrmatin Retrieval J. D. Brffitt, H. L. Mrgan, and J. V. Sden Abstract Dcument clustering methds which have been prpsed by R. E. Bnner and J. J. Rcchi are cmpared.

More information

Writing a Compare/Contrast Essay

Writing a Compare/Contrast Essay Writing a Cmpare/Cntrast Essay As always, the instructr and the assignment sheet prvide the definitive expectatins and requirements fr any essay. Here is sme general infrmatin abut the rganizatin fr this

More information

A Walk on the Human Performance Side Part I

A Walk on the Human Performance Side Part I A Walk n the Human Perfrmance Side Part I Perfrmance Architects have a license t snp. We are in the business f supprting ur client rganizatins in their quest fr results that meet r exceed gals. We accmplish

More information

How much life insurance do I need? Wrong question!

How much life insurance do I need? Wrong question! Hw much life insurance d I need? Wrng questin! We are ften asked this questin r sme variatin f it. We believe it is NOT the right questin t ask. What yu REALLY need is mney, cash. S the questin shuld be

More information

Dampier Bunbury Pipeline (DBP)

Dampier Bunbury Pipeline (DBP) Limited ABN 59 001 777 591 AFSL 232497 April 2011 (Update) Cst f Debt Summary Paper Dampier Bunbury Pipeline (DBP) IMPORTANT NOTE Nte 1 This dcument has been prepared by AMP Capital Investrs Limited (AMP

More information

expertise hp services valupack consulting description security review service for Linux

expertise hp services valupack consulting description security review service for Linux expertise hp services valupack cnsulting descriptin security review service fr Linux Cpyright services prvided, infrmatin is prtected under cpyright by Hewlett-Packard Cmpany Unpublished Wrk -- ALL RIGHTS

More information

WINDOW REPLACEMENT Survey

WINDOW REPLACEMENT Survey WINDOW REPLACEMENT Prperty wners and develpers undertaking rehabilitatin prjects fr bth Tax Act Certificatin and Sectin 106 Cmpliance are encuraged t repair and retain existing histric windws. Hwever,

More information

TOWARDS OF AN INFORMATION SERVICE TO EDUCATIONAL LEADERSHIPS: BUSINESS INTELLIGENCE AS ANALYTICAL ENGINE OF SERVICE

TOWARDS OF AN INFORMATION SERVICE TO EDUCATIONAL LEADERSHIPS: BUSINESS INTELLIGENCE AS ANALYTICAL ENGINE OF SERVICE TOWARDS OF AN INFORMATION SERVICE TO EDUCATIONAL LEADERSHIPS: BUSINESS INTELLIGENCE AS ANALYTICAL ENGINE OF SERVICE A N D R E I A F E R R E I R A, A N T Ó N I O C A S T R O, D E L F I N A S Á S O A R E

More information

COBRA: A Hybrid Method for Software Cost Estimation, Benchmarking, and Risk Assessment

COBRA: A Hybrid Method for Software Cost Estimation, Benchmarking, and Risk Assessment COBRA: A Hybrid Methd fr Sftware Cst Estimatin, Benchmarking, and Risk Assessment Linel C. Briand, Khaled El Emam, Frank Bmarius Fraunhfer Institute fr Experimental Sftware Engineering (IESE) Sauerwiesen

More information

Conduction in the Cylindrical Geometry

Conduction in the Cylindrical Geometry Cnductin in the Cylindrical Gemetry R. Shankar Subramanian Department f Chemical and Bimlecular Engineering Clarksn University Chemical engineers encunter cnductin in the cylindrical gemetry when they

More information

Importance and Contribution of Software Engineering to the Education of Informatics Professionals

Importance and Contribution of Software Engineering to the Education of Informatics Professionals Imprtance and Cntributin f Sftware Engineering t the Educatin f Infrmatics Prfessinals Dr. Tick, József Budapest Plytechnic, Hungary, [email protected] Abstract: As a result f the Blgna prcess a new frm f higher

More information

Responsive Design Fundamentals Chapter 1: Chapter 2: name content

Responsive Design Fundamentals Chapter 1: Chapter 2: name content Lynda.cm Respnsive Design Fundamentals Chapter 1: Intrducing Respnsive Design Respnsive design is a design strategy that is centered n designing yur cntent s that it respnds t the envirnment its encuntered

More information

Special Tax Notice Regarding 403(b) (TSA) Distributions

Special Tax Notice Regarding 403(b) (TSA) Distributions Special Tax Ntice Regarding 403(b) (TSA) Distributins P.O. Bx 7893 Madisn, WI 53707-7893 1-800-279-4030 Fax: (608) 237-2529 The IRS requires us t prvide yu with a cpy f the Explanatin f Direct Rllver,

More information

Access EEC s Web Applications... 2 View Messages from EEC... 3 Sign In as a Returning User... 3

Access EEC s Web Applications... 2 View Messages from EEC... 3 Sign In as a Returning User... 3 EEC Single Sign In (SSI) Applicatin The EEC Single Sign In (SSI) Single Sign In (SSI) is the secure, nline applicatin that cntrls access t all f the Department f Early Educatin and Care (EEC) web applicatins.

More information

IFRS Discussion Group

IFRS Discussion Group IFRS Discussin Grup Reprt n the Public Meeting February 26, 2014 The IFRS Discussin Grup is a discussin frum nly. The Grup s purpse is t assist the Accunting Standards Bard (AcSB) regarding issues arising

More information

Times Table Activities: Multiplication

Times Table Activities: Multiplication Tny Attwd, 2012 Times Table Activities: Multiplicatin Times tables can be taught t many children simply as a cncept that is there with n explanatin as t hw r why it is there. And mst children will find

More information

THE FACULTY OF LAW AND SOCIAL SCIENCES. Department of Economics and Department of Development Studies

THE FACULTY OF LAW AND SOCIAL SCIENCES. Department of Economics and Department of Development Studies Appendix G REC 25 May 2011 THE FACULTY OF LAW AND SOCIAL SCIENCES Department f Ecnmics and Department f Develpment Studies PROGRAMME SPECIFICATIONS FOR THE DEGREES OF MPHIL AND PHD IN INTERNATIONAL DEVELOPMENT

More information

COE: Hybrid Course Request for Proposals. The goals of the College of Education Hybrid Course Funding Program are:

COE: Hybrid Course Request for Proposals. The goals of the College of Education Hybrid Course Funding Program are: COE: Hybrid Curse Request fr Prpsals The gals f the Cllege f Educatin Hybrid Curse Funding Prgram are: T supprt the develpment f effective, high-quality instructin that meets the needs and expectatins

More information

Standards and Procedures for Approved Master's Seminar Paper or Educational Project University of Wisconsin-Platteville Requirements

Standards and Procedures for Approved Master's Seminar Paper or Educational Project University of Wisconsin-Platteville Requirements Standards and Prcedures fr Apprved Master's Seminar Paper r Educatinal Prject University f Wiscnsin-Platteville Requirements Guidelines Apprved by the Graduate Cuncil University f Wiscnsin-Platteville

More information

NAVIPLAN PREMIUM LEARNING GUIDE. Analyze, compare, and present insurance scenarios

NAVIPLAN PREMIUM LEARNING GUIDE. Analyze, compare, and present insurance scenarios NAVIPLAN PREMIUM LEARNING GUIDE Analyze, cmpare, and present insurance scenaris Cntents Analyze, cmpare, and present insurance scenaris 1 Learning bjectives 1 NaviPlan planning stages 1 Client case 2 Analyze

More information

Sinusoidal Steady State Response of Linear Circuits. The circuit shown on Figure 1 is driven by a sinusoidal voltage source v s (t) of the form

Sinusoidal Steady State Response of Linear Circuits. The circuit shown on Figure 1 is driven by a sinusoidal voltage source v s (t) of the form Sinusidal Steady State espnse f inear Circuits The circuit shwn n Figure 1 is driven by a sinusidal ltage surce v s (t) f the frm v () t = v cs( ωt) (1.1) s i(t) + v (t) - + v (t) s v c (t) - C Figure

More information