by Siegfried Heiler 4 Problems of simple kernel estimation and restricted approaches

Size: px
Start display at page:

Download "by Siegfried Heiler 4 Problems of simple kernel estimation and restricted approaches"

Transcription

1 1 A Survey on Nonparametric Time Series Analysis 1Introduction by Siegfried Heiler 2 Nonparametric regression 3 Kernel estimation in time series 4 Problems of simple kernel estimation and restricted approaces 5 Locally weigted regression 6 Application of locally weigted regression to time series 7 Parameter selection 8 Time series decomposition wit locally weigted regression References

2 1 INTRODUCTION 2 1 Introduction In tis survey we discuss te application of some nonparametric tecniques to time series. Tere is indeed a long tradition in applying nonparametric metods in time series analysis, and tis olds not only true for certain test situations, as, e.g. runs tests for randomness of a stocastic sequence, permutation tests or certain rank tests. An old and establised tecnique in time series analysis is periodogramme analysis. Altoug te periodogramme is an asymptotically unbiased estimate of te spectral density of an underlying stationary process, it is well known tat it is not consistent. Terefore already in te early fties smooting te periodogramme directly wit a so-called spectral window or using a system of weigts, according to a lag window witwic te empirical autocovariances are multiplied in te calculation of te Fourier transform, was introduced. Quite a number of dierent windows were proposed and wit respect to te window widt similar rules old for acieving consistent estimates as te ones we will sortly discuss in te context of nonparametric regression later in tis text. Nonparametric spectral estimation is extensively treated in many textbooks on time series analysis to wic te interested reader is refered. Hence it will not be treated furter in tis survey. Anoter area, were nonparametric ideas are being applied since a long time is smooting and decomposing seasonal time series. Local polynomial regression can be traced back to 1931 (R.R. Macaulay). A. Fiser (1937) and H.L. Jones (1943) discussed a local least squares t under te side condition tat a locally constant periodic function (for modelling seasonal uctuations) be anniilated and already in 1960 J. Bongard developped a unied principle for treating te interior and te boundary part (wit and witout seasonal variations) of a time series derived from a local regression approac. Tese ideas will be taken up later again in section 8, since tey represent an attractive alternative to smooting and seasonal decomposition procedures based on linear time series models. Te aim of tis survey is to present some basic concepts of nonparametric regression including locally weigted regression wit te special empasis on teir application to time series. Nonparametric regression as become an area wit an abundance in new metodological proposals and developments in recent years. It is not te intention of tis paper to give a compreensive overview on te subject. We rater want to concentrate on te basic ideas only. Te reader interested in some dierent aspects may be refered to a survey paper by Hardle, Lutkepol and Cen (1997), were more specic areas, proposals and furter references can be found. Te ARMA model is a typical linear time series model. Tresold autoregression (TAR) models and its variates are specic types of nonlinear models. ARCH and GARCH type models are also of a very specic nonlinear type to capture volatility penomena. In contrast to tat in nonparametric regression no assumption is made about te form of te regression function. Only some smootness conditions are required. Te complexity of te model will be determined completely by te data. One lets te data speak for temselves.

3 2 NONPARAMETRIC REGRESSION 3 Tereby one avoids subjectivity in selecting a specic parametric model. But te gain in exibility as a price. One as to coose bandwidts. We come back to tis later. Besides tis, a iger complexity in te matematical argumentation is involved. However, asymptotic considerations will not be discussed in detail in tis survey. Because of teir exibility nonparametric regression tecniques may serve as a rst step in te process of nding an adequate parametric model. If no suc one can be found wic describes te underlying structure adequately, ten te results of nonparametric estimation may be used directly for forecasting or for describing te caracteristics of te time series. 2 Nonparametric regression Since forecasting is an important objective of many time series analyses, estimating te conditional distribution, or some of its caracteristics play a considerable role. For point prediction te conditional mean or median is of particular interest. In order to obtain condence or prediction intervals also estimates of conditional variances or conditional quantiles are needed. Te latter ones are also of interest in studying volatility in nancial time series. Te rst step to go is terefore to look at nonparametric estimation of densities and conditional densities. Let x 2 IR be a random variable wose distribution as a density f and let x 1 ::: x n be a random sample from x.tena kernel density estimator for f is given by f n (x) = 1 n n nx xi ; x K : (2.1) n Here K is a so-called kernel function, i.e. a symmetric density assigning weigts to te observations x i wic decrease wit te distance between x and x i. Some popular kernel functions are listed in Table 2.1 and exibited in Figure 2.1. Te rst 5 ave te interval [;1 1] as support, wereas te Gaussian kernel as innite support. n is te bandwidt wic drives te size of te local neigbourood being included in te estimation of f at x. Te bandwidt depends on te sample size n and as to full n! 0 and n n!1 for n!1 as necessary condition for consistency. But for practical applications tis asymptotic condition is not very elpful. A very small bandwidt will lead to a wiggly course of te estimated density, wereas a large bandwidt yields a smoot course but will possibly atten out interesting details. Bandwidt selection will

4 2 NONPARAMETRIC REGRESSION 4 be dealt wit in section 7. A k n ; nearest neigbour (k n ; NN) estimator of f is obtained by substituting te Name Table 2.1: Selected kernel functions Kernel 1 Uniform 1I 2 [;1 1](u) Triangle (1 ;juj)1i [;1 1] (u) 3 Epanecnikov (1 ; 4 u2 )1I [;1 1] (u) 15 Bisquare (1 ; 16 2u2 + u 4 )1I [;1 1] (u) 35 Triweigt (1 ; 32 3u2 +3u 4 ; u 6 )1I [;1 1] (u) 1 Gaussian p 2 exp(; 1 2 u2 ) xed bandwidt n in (2.1) by te random variable H n kn (x) measuring te distance between x and te k n -nearest observation among te x i ::: n: Nearest neigbour estimators ave te property tat te number of observations used for te local approac is xed. Tis is an advantage if te x-space sows a greatly unbalanced design. On te oter and te bias varies from point to point due to te variable local bandwidt. For x 2 IR p akernel K : IR p! IR is needed in (2.1). In tis case eiter product kernels K(u) = dy j=1 K j (u j ) wit kernels K j and K j : IR! IR, bandwidt j in coordinate j,and n = 1 ::: p or norm kernels K(u) =K (jjujj) wit a suitable norm on IR p are used. In connection wit time series applications frequently product kernels are applied, f n (x) = 1 n nx py j=1 1 j K j! x ij ; x j j (2.2) and j =^ j wit an estimated standard deviation in te j-t coordinate is a popular coice for te bandwidts. Let now (y x) wit y 2 IR x2 IR p bearandomvector wit joint density f(y x) and let f X (x) be te marginal density of x. Ten te conditional density g(yjx) = f(y x)=f X (x) can be estimated by inserting a kernel density estimator or a corresponding

5 2 NONPARAMETRIC REGRESSION 5 Figure 2.1. Some popular kernel functions in practice nearest neigbourood estimator in te nominator and denominator of g(yjx). Wit te coice of a kernel function K = IR p+1! IR K(y x) =K 1 (y)k(x) and bandwidts 1 resp. we obtain te kernel estimator for te conditional density g n (yjx) = ;1 1 np xi ;x K yi ;y 1 1 K np K xi : (2.3) ;x An estimator for te conditional mean m(x) = 1 R ;1 yg(yjx)dy is obtained wen we replace

6 2 NONPARAMETRIC REGRESSION 6 g in te integral by its estmator g n.for K 1 yields m n (x) = np np y i K x;xi K x;xi being a symmetric density tis immediately : (2.4) Tis is te well-known Nadaraya-Watson nonparametric regression estimator (NW-estimator, Nadaraya, 1964 Watson, 1964). We see tat it can be written as a weigted mean m n (x) = nx y i w n i (x x 1 ::: x n ) (2.5) were te random weigts depend on te point x and te random variables x 1 ::: x n. Apart from conditional means also conditional quantiles are of interest in various time series applications. Let F (yjx)= Z y ;1 g(yjx)dy (2.6) denote te conditional distribution function of y given x. Ten te conditional - quantile at x q (x) is dened as q (x) =inffy 2 IRjF (yjx) g 0 <<1: (2.7) If g(jx) is strictly positive, ten of course q (x) is te unique solution of F (yjx)=, i.e. q (x) =F ;1 (jx). One possible procedure for estimating q is to take te empirical -quantile of an estimator F n =(jx) according to (2.7). Let F 1 (z) = z R ;1 K 1 (u)du be te distribution function pertaining to te kernel K 1. Ten te estimated conditional distribution, obtained by integrating g n (jx) from ;1 to y, is given by F n (yjx) = np K xi ;x np K F y;yi 1 1 xi ;x : (2.8)

7 2 NONPARAMETRIC REGRESSION 7 Let us assume tat K 1 as support [;1 1]. Ten we ave y ; yi ( 1 for yi y ; F 1 = for y i y + 1 so tat in tis case F n (yjx) = np + 1 K xi ;x nx ( nx xi ; x 1 (;1 y;1 ](y i )K 1 (y;1 y+ 1 )(y i )F 1 y ; yi 1 K xi ; x ) : (2.9) One can see tat te estimation contains only observations in te regressor space laying in a band around x. Te rst sum on te rigt and side includes observations, wose y-values are less tan or equal to y ; 1. Te second sum contains observations wit y i -values in a neigbourood of y. Incontrast to a usual empirical distribution function ere also observations greater tan y obtain a positiveweigt. Of particular interest may be te median regression function q 1=2 for asymmetric distributions as an alternative to ordinary regression based on te mean. Anoter interesting application may be te estimation of q =2 and q 1;=2 in order to get predictiveintervals. Tese can be compared wit intervals obtained from parametric models, wic lack te possibility toevaluate te bias due to mis-specication of te model. Taking some boundary corrections into account, for a not too unbalanced design te second sum in (2.9) can be approximated by 1 xi np ;x (y;1 y]k, so tat te conditional distribution function is estimated by ~F n (yjx) = np 1 (;1 y] (y i )K xi ;x np K xi ;x : (2.10) Tis estimator was for x 2 IR considered by Horvat and Yandell (1988) wo proved asymptotic results for te i.i.d. case. Abberger (1996) derives from (2.10) te empirical quantile function q n (x) = inffy 2 IRj ~ Fn (yjx) g 0 <<1 (2.11)

8 3 KERNEL ESTIMATION IN TIME SERIES 8 and investigates te beaviour of ~ Fn and q n in applications to stationary time series. 3 Kernel estimation in time series Wen a kernel- or NN-estimator is applied to dependent data, as it is te case in time series, ten it is eected only by te dependence among te observations in a small window and not by tat between all data. Tis fact reduces te dependence between te estimates, so tat many of te tecniques developed for independent data can be applied in tese cases as well. Tis fact was called te witening by windowing principle by Hart (1996). Atypical situation for an application to a time series fz t g is tat te regressor vector x consists of past time series values x t =(z t;1 ::: z t;p ) (3.1) wic leads to te very general nonparametric autoregression model z t = m(z t;1 ::: z t;p )+a t t = p +1 p+2 ::: (3.2) wit fa t g a wite noise sequence. Of course x t migt also include time series values of oter predictivevariables like leading indicators. An indispensable requirement for proving asymptotic properties of kernel estimates in tis and related situations is tat te underlying processes are stationary. Anoter condition is tat te memory of tese underlying processes decreases wit distance between events and tat te rate of decay can be estimated from above by so-called mixing conditions. So-called strong mixing conditions are used by Robinson (1983, 1986). Collomb (1984, 1985) worked wit so-called -or uniform mixing conditions. We will not present tese fairly complicated asymptotic considerations ere. But we would like to remark tat tese mixing conditions are ard to ceck in practice. In contrast to linear autoregressive models of te form z t = 1 z t;1 + :::+ p z t;p + a t and in a certain sense also to tresold autoregression were te autoregressive parameters vary according to some tresold variable te model (3.2) is more general and exible and its estimation may lead to insigts wic can be elpful in coosing an appropriate parametric (possibly nonlinear) model afterwards.

9 3 KERNEL ESTIMATION IN TIME SERIES 9 For x 2 IR p x t as in (3.1) and weigts xt ; x w n t = K = nx s=p+1 xs ; x K te Nadaraya-Watson estimator in model (3.2) is given by m n (x) = nx s=p+1 z t w n t (x): (3.3) For x equal to te last observed pattern, x =(z n z n;1 ::: z n;p+1 ) 0 tis provides a one-step aead predictor for z n+1 wic allows a very intuitive interpretation. Given te course of te time series observed over te last p instants, te predictor is a weigted mean of all tose time series values in te past, wic followed a course pattern tat is similar to te last observed one. Te weigts depend on ow close te pattern observed in te past comes to te pattern given by (x n ::: x n;p+1 ) 0. A k-step aead predictor is given if z t in (3.3) is replaced by z t;k+1 : m n k = n;k+1 X t=p+1 z t+k;1 w n t (x) k =1 2 ::: : (3.4) Tis predictor does not use te variables z n+1 ::: z n+k,wic are unknown, but may contain information about te conditional expection E(z n+k j(z n ::: z n;p+1 ) 0 ). Tey migt be replaced by estimates in a multistep procedure wic consists in a succession of onestep aead forecasts. Tis procedure can lead to a smaller mean squared error tan te multistep procedure (3.4). For a dierent proposal see Cen (1996). Up to now weave only considered te autoregressive case were te regressor vector contains past time series values. Te case of vector autoregression, were for eac individual (scalar) time series also past values of related time series or leading indicators are included in te regression vector, can be treated in a similar way as nonparametric autoregression, altoug te number of components in x is restricted due to te "curse of dimensionality", to wic we come back later. If te regressor vector x t =(z t;1 ::: z t;p ) 0 is used in estimating conditional distribution functions and conditional quantiles, as e.g. in (2.10) and (2.11), ten we arrive atquantile autoregression. Te median autoregression q n 1=2 may serve as an alternative to te mean autoregression (3.3). In nancial data one is often interested in te beaviour of quantiles in te tails. For instance te value at risk of a certain asset is measured by looking at low quantiles ( =0:01 or =0:05) of te conditional distribution of te corresponding series of returns.

10 3 KERNEL ESTIMATION IN TIME SERIES 10 Abberger (1996) applied quantile autoregression to time series of daily stock returns. In order to assess suc models forecast error cannot serve as a criterion, since quantiles are not observable. Abberger proposed te criterion were =1; nx t=1 (z t ; q (x t )) = nx t=1 (z t ; q ) (3.5) (u) =1 [0 1) (u)u +( ; 1)1 (;1 0) (u)u (3.6) is te loss function introduced by Koenker and Basset (1978) in teir seminal paper on quantile regression and q is te unconditional -quantile of te corresponding distribution. is constructed according to te R 2 -criterion in ordinary regression. It assumes values between zero and one, were =0 if q (x t )=q for all x t and =1 if z t = q (x t ) for all t and all, i.e. if te distribution of fzjxg is a one-point distribution. Te following Figure 3.1 and Table 3.1 illustrate te beaviour of wit a simulated conical data set of 500 observations. Te observations are eteroscedastic and ave mean zero. Te correlation between x and y is ;0:002. In Table 3.1 empirical -values for dierent are exibited. Tey are calculated by replacing in (3.5) q (x t ) by its kernel estimator q n (x t ) and q byte empirical unconditional quantile of te rst t ; 1 data values z 1 ::: z t;1. Te latter can be interpreted as a naive forecast of q (x t ). Te ndings of Abberger (1996, 1997) for several German stock returns were -values close to zero for te median and increasing in a U-saped form towards te boundary areas around = 0:01 respectively = 0:99. ARCH- and GARCH models representavery specic kind of parametric modeling for studying te penomenon of volatility. A exible alternative to te combination of an ARMA- Table values for te data in Figure model wit ARCH- or GARCH-residuals is given by te conditional eteroscedastic autoregressive non ) model

11 3 KERNEL ESTIMATION IN TIME SERIES 11 Figure 3.1. Simulated eteroskedastic data, n=500 z t = m(x t )+(x t ) t (3.7) studied by Hardle and Yang (1996) or Hardle, Tsybakov andyang (1997). Here x t = (z t;1 ::: z t;p ) 0 is again te autoregressive vector (3.1), t is a random variable wit mean zero and variance one. 2 (x) is called te volatilityfunction. Given an estimator for m, e. g. te NW-estimator m n according to (3.3), it was suggested tat 2 (x) can be estimated by 2 n (x t)=g n (x t ) ; m 2 n (x t) (3.8) were g n (x) = np t=1 np t=1 K xt;x z 2 t K xt;x = nx t=1 z 2 t w n t(x): (3.9) Since te estimator (3.8) is based on a dierence, it can appen tat from time to time a negativevariance estimator results. Tis can be avoided if te volatility function is estimated on te basis of residuals. See (7.10), te discussion tere and Feng and Heiler (1998a).

12 3 KERNEL ESTIMATION IN TIME SERIES 12 In te context of time series analysis not only past values of te time series itself or of related series may occur as regressor variables, but also te time index itself, in wic case x t = t, or some functions of te time index like polynomials or trigonometric functions. Tis leads to smooting approaces. In te case m(x t )=m(t) te NW estimator at t consists in a weigted mean of te time series values in a neigbourood [t; t+] of z t wit nonrandom weigts. Polynomials and trigonometric functions in t are used in decomposing a seasonal time series into trend- cyclical and seasonal components according to an unobserved components model. Tis application will be studied in section 8 after te discussion of locally weigted regression. In te area of quantile estimation te regressor x t = t leads to quantile smooting. Tis tecnique was used by Abberger (1996, 1997) in order to compare te results of a nonparametric procedure for stock returns wit tose of a GARCH-model, evaluated wit an S{Plus package under te standard assumption of an underlying Gaussian distribution. As an example we take daily discrete DAX returns, dened as z t = (price t ; price t;1 )=price t;1, exibited in Figure 3.2. Since te Gaussian distribution is completely determined by mean and variance, conditional quantiles can easily be calculated from te outcomes of te GARCH model estimation. Te results are depicted in Figure 3.3 and 3.4 for te lower and upper quartiles and for te 0:1 and 0:9 quantiles, respectively. Two messages can be learned from te results. Te rst is tat te asymmetric beaviour of volatility, wic is revealed by te nonparametric approac, will remain completely idden by tecoice of a wrong parametric model wic is being oered as te default option by te package. In te presented example, wic is not untypical for stock returns, volatility is a penomenon wic as mainly to do wit movements in te lower tails of te conditional distributions. Te second nding in te gures is tat kernel smooting is very robust towards aberrant and erratic observations in te course of te time series, wereas GARCH models react very sensitively to tem.

13 3 KERNEL ESTIMATION IN TIME SERIES 13 Figure 3.2 Time series of daily DAX returns from Jan. 2, 1986 to Aug. 13, 1991

14 4 PROBLEMS OF SIMPLE KERNEL ESTIMATION AND RESTRICTED APPROACHES14 Figure 3.3 Estimation of and 0.75-quantiles of daily DAX returns 4 Problems of simple kernel estimation and restricted approaces Te nonparametric approaces we ave treated so far suer from two drawbacks. One is te so-called "curse of dimensionality", te oter is increased bias in cases of a iglyclustered design density and particularly at te boundaries of te x-space. Curse of dimensionality describes te fact tat in iger dimensional regression problems te subspace of IR p+1 spanned by te data is rater empty, i.e., tere are only few observations in te neigbourood of a point x 2 IR p. In practice tis appens to be te case already for p>2. Several proposals ave been made to cope wit te curse of dimensionality problem. We will describe only two oftemvery sortly. Te rst consists in decomposing IR p into a class of J disjoint course patterns, A j j =1 ::: J wit te aid of a non-ierarcical cluster analysis. Tese J disjoint sets serve ten as te states of a omogeneous Markov cain. In te model m(x t )=E[z t jx t 2 A j ] for x t 2 A j j =1 ::: J

15 4 PROBLEMS OF SIMPLE KERNEL ESTIMATION AND RESTRICTED APPROACHES15 Figure 3.4 Estimation of and 0.90-quantiles of daily DAX returns wit x t being te autoregressive vector (3.1) m is estimated by m n (x t )=N ;1 j nx s=1 z s 1 Aj (x s ) were N j is te number of course patterns of lengt p from te time series in A j. Here te estimator is an unweigted mean of all values following courses in pattern class A j. Markovcain models of tis type were rst used bys.yakowitz (1979b) for analysing time series of water runo in rivers. Asymptotic properties for tis type of model are discussed by Collomb (1980, 1983). Gouriroux and Montfort (1992) examined a corresponding model for economic time series by incorporating volatility. Tey called teir model z t = JX j=1 j 1 Aj (x t )+ JX j=1 j 1 Aj (x t ) t a qualitative tresold ARCH model. Anoter proposal in order to cope wit te curse of dimensionality is given by te so-called generalized additive models, studied by Hastie and Tibsirani (1990), wic are dened

16 5 LOCALLY WEIGHTED REGRESSION 16 as z t = m 0 + px j=1 m j (z t;ij )+a t : Te components m j are again of a general form. For estimation so-called backtting algoritms suc as te alternating conditional expectation algoritm (ACE) of Breiman and Friedman (1985) or te BRUTO algoritm of Hastie and Tibsirani (1990) maybeused. Te main idea of backtting goes as follows. In te abovemodel E[z t ;m 0 ; P m j (z t;ij )] = m k (z t;ik ): Hence te variable in square brackets can be used to obtain a nonparametric estimate for m k (z t;ik ). But of course, te oter m j are unknown as well, so tat te estimation procedure as to be iterated until all te m n j converge. For a more detailed study of generalized additive models te reader is refered to te book of Hastie and Tibsirani as well as to te two interesting papers by Cen and Tsay in JASA (1993). For furter discussion and oter approaces see also Hardle, Lutkepol and Cen (1997). Quite a few proposals can be found in te literature dealing wit te bias problem of NWestimators close to te boundary and in cases of an unbalanced design in te x-space. Gasser and Muller (1979, 1984) suggested for te case p = 1 a system of variable weigts, Gasser, Muller and Mammitzsc (1985) developed asymmetric boundarykernels and Messer and Goldstein (1993) suggested variable kernels wic automatically get deformed and tus reduce te bias in te boundary area. Yang (1981) and Stute (1984) suggested a symmetrized k ; NN estimator and Micels (1992) proposed boundary kernels for bias reduction wic can be carried over to te case p>1.we do not discuss te above mentioned proposals in more detail since te mentioned disadvantages can be repaired by using locally weigted regression. j6=k 5 Locally weigted regression Locally weigted respectively local polynomial regression was introduced into te statistical literature by Stone (1977) and Cleveland (1979). Te statistical properties were investigated since ten in papers by Tsybakov (1986), Fan (1993), Fan and Gijbels (1992, 1995), Ruppert and Wand (1994) and many oters. A detailed description may be found in te book of Fan and Gijbels (1996). For te sake of simplicity we start wit te assumption tat te regressor x is a scalar. For a better understanding we regard te data as being generated by a location-scale model y = m(x)+(x) (5.1)

17 5 LOCALLY WEIGHTED REGRESSION 17 akin to te one considered in (3.7), were te are independentwit E() =0 Var() = 1 and m(x 0 )=E(yjx = x 0 ). m is assumed to be smoot in te sense tat te (p + 1)t derivative exists at x 0, so tat it can be expanded in a Taylor series around x 0. m(x) =m(x 0 )+(x ; x 0 )m 0 (x 0 )+:::+(x ; x 0 ) r m(r) (x 0 ) r! wit te remainder term + R r (x) (5.2) R r (x) =(x ; x 0 ) r+1 m (r+1) x 0 + (x ; x 0 ) =(r ; 1)! 0 <<1: (5.3) Wit j (x 0 )=m (j) (x 0 )=j! j =0 1 ::: r (5.4) we arrive at a local polynomial representation for m, m(x) rx j=0 j (x 0 )(x ; x 0 ) j : (5.5) Tis approac motivates te nonparametric estimation of m as a local polynomial by solving te least squares problem 8 > 2 < nx min 2IR r+1 >: 4 yi ; rx j=0 2 5 (x i ; x) j j 3 xi ; x K 9 >= > : Wit te design matrix X x aving te n rows [1 x i ; x ::: (x i ; x) r ], te diagonal weigt matrix W x = diag K( x i;x ) and te vector y =(y 1 ::: y n ) 0 te solutions at x is given by ^(x) =(X 0 x W xx x ) ;1 X 0 x W xy (5.6) and wit e j being te j-t unit vector in IR r+1 we see immediately tat

18 5 LOCALLY WEIGHTED REGRESSION 18 ^m(x) = ^ 0 = e 0 1(X 0 x W xx x ) ;1 X 0 x W xy (5.7) and tat wit ^m (j) (x) = ^ j (x)j! =j!e 0 j+1(x x W x X x ) ;1 X 0 x W xy j =1 ::: r (5.8) an estimator for te j-t derivative of m is given. Te case r = 0 yields te Nadaraya-Watson estimator (3.3). Let u =(r r (x 1 )) n be te residual vector containing te remainder terms according to (5.3) at te data points. Ten te conditional bias of ^(x) is given by B ^(x) =(X 0 x W xx x ) ;1 X 0 x W xu and wit x = W (x) 2 diag 2 (x i ) its conditional covariance matrix is Var ^(x) =(X 0 x W xx x ) ;1 (X 0 x xx x )(X 0 x W xx x ) ;1 : Te above two expressions cannot be used directly since tey contain te unknown vector u of remainder terms and te unknown diagonal matrix x : A rst order asymptotic expansion of te variance and te bias term uses te moments of K and K 2, denoted by j = Z u j K(u)du and j = Z u j K 2 (u)du wic are contained in te matrices S =( j+l ) 0j lr ~ S =(j+l+1 ) 0j lr S =( j;l ) 0j lr and te vectors c r = ( r+1 ::: 2r+1 ) ~c r = ( r+2 ::: 2r+2 ): For an i. i. d. sample (y 1 x 1 ) ::: (y n x n ) wit te marginal density f(x) > 0 and wit f m (r+1) and

19 5 LOCALLY WEIGHTED REGRESSION 19 2 continous in a neigbourood of x we obtain for ;! 0 and n n ;!1 te asymptotic conditional variance Var ^m (j) (x) = e 0 j+1 S;1 S (j!) 2 2 (x) S ;1 e j+1 f(x)n + o 1+2j p 1 n 1+2j : (5.9) For te asymptotic conditional bias weave to distinguis between te cases were r;j is odd and were r ; j is even. For r ; j odd we ave Bias ^m (j) (x) = e 0 j! j+1 S;1 c r (r +1)! m(r+1) (x) r+1;j + o p ( r+1;j ): (5.10) For (r ; j) even te asymptotic bias is Bias ^m (j) (x) = e 0 j! j+1 S;1 ~c r (r +2)! nm (r+2) (x)+(r +2)m (r+1) (x) f 0 (x) f(x) o r+2;j + o p ( r+2;j ) (5.11) provided tat f 0 and m (r+2) are continuous in a neigbourood of x and n 3 ;!1: As avery interesting fact we notice te dierence in asymptotic bias between r ; j odd and r ; j even. For instance we ave for te NW-estimator (r =0 j =0), B(m n (x)) = 2 [m 00 (x)=2+m 0 f 0 (x)=f(x)] 2 + o p ( 2 ) wereas for te local linear approac we obtain B(^m(x)) = 2 m 00 (x) 2 =2+o p ( 2 ): We see tat te bias of te local linear estimator as a simpler structure. Te linear term in te bias expansion vanises, wereas te expression for te variance is te same in bot cases and given by 0 2 (x)=n: Te bias of te NW-estimator does not only depend on m 0, but also on te score function ;f 0 =f. Tis is te reason wy anunbalanced design leads to an increased bias. Similar considerations old for iger order polynomials. In practice tis means tat for

20 5 LOCALLY WEIGHTED REGRESSION 20 estimating m it is sucient to consider r =1 or r =3 and for m 0 only r =2 or r = 4 sould be considered. In many applications r = j + 1 suces. Fitting a iger order polynomial will possibly reduce te bias, but on te oter and te variance will increase since more parameters ave to be estimated locally. If te regressor x isavector rater tan a scalar in most cases a local linear approac is cosen since in tis case te step from r =1 to r =3 leads to strong increase of parameters to be estimated locally wic entails an inacceptable increase in variance. Since ^ j (x) =e 0 j+1 ^ = e 0 j+1(x 0 x W xx x ) ;1 X 0 x W xy = nx w j ni xi ; x y i (5.12) for estimating j (x) =m (j) (x)=j! weave a similar expression as a weigted mean likefor te NW-estimator (3.3). Te weigts depend on te observations x i and on te location of x in te design space. It can be seen easily tat te weigts wni(u j t )=w j xi ;x o ni n satisfy te discrete moment conditions nx x i ; x qw j ni xi ; x = jq wit 0 j q r: As a consequence of tis te sample bias for estimating a polynomial wit degree less tan or equal to r is zero. Te variance of ^m (j) (x) is given by Var ^m (j) (x) = nx w j ni xi ; x 22 (x i ): Te kernel wit te weigts w j ni(u t ) is called te active kernel. A rst order approximation to te w j ni moments matrix S. Te according kernel is given if (X 0 x W xx x ) is replaced by tekernel ~K (j) (u) =e 0 j+1 S;1 (1 u ::: u r ) 0 K(u) (5.13)

21 5 LOCALLY WEIGHTED REGRESSION 21 is called te equivalent kernel. It satises te corresponding moment conditions Z u q ~ K(j)(u)du = jq 0 j q r: (5.14) For instance, for te case r =1 j =0 weave K(u) ~ =K(u) and for r =2 j = 1 (estimation of m 0 ) K ~ (1) (u) = ;1 2 uk(u): Tis means tat for estimating m itself in te interior of te x-space te eectivekernel is equal to te cosen symmetric kernel function itself wereas for estimating te rst derivative K ~ (1) is a skew function. As a general result K ~ (j) is symmetric for j even and skew for j odd. In terms of equivalent kernels te asymptotic conditional variance and te asymptotic conditional bias (for r ; j odd) are Var ^m (j) (x) = (j!)2 2 (x) Z ~K (j)2 (u)du + o f(x)n 1+2j p (n ;1;2j ) (5.15) Bias Z ^m (j) j! (x) = (r +1)! m(r+1) (x) r+1;j u r+1 K ~ (j) (u)du + o p ( ;r;1+j ): (5.16) Te big advantage of local polynomial regression over oter smooting metods consists in te automatic adaptation of te active resp. equivalent kernel to te estimation situation in te boundary area. If x is scalar and x = min(x i ) x = max(x i ) ten for a given bandwit i te interior of te x-space is given by all observations in te interval x + x ; : For all x in tis interval te equivalent kernels K ~ (j) ave te above mentioned symmetry resp. asymmetry property. In te left boundary part x x + i te number of left neigbours in a local neigbourood of a point x will be small compared to te number of rigt neigbours and for x = x weave only rigt neigbours. Corresponding considerations old for te rigt boundary part x ; x i : For x 2 IR p (p >1) te boundary area will often cover an important part of te wole design space. For (r ; j) odd te active resp. equivalent kernels automatically adapt to te skew data situation in te boundary area. Te situation in te rigt boundary area is illustrated in Figure 5.1 for te Epanecnikov kernel K(u) = 3(1 ; 4 u2 ) + for a local linear estimation of m (r =1 j = 0) and a local quadratic estimation of m 0 (r =2 j =1): We see ow te weigting systems get deformed towards te boundary. Te pictures for te left boundary area are symmetric to tose in Figure 5.1. Since te size of te local neigbourood srinks towards te boundary te bias part of te mean squared error (MSE) will be lower in te boundary area tan in te interior. On te oter and te variance part will increase since less observations are included in te local estimation and

22 5 LOCALLY WEIGHTED REGRESSION 22 Figure 5.1 Activekernels derived from te Epanecnikov kernel wit n = 30 at te rigt boundary for (a) r =1 j =0 and (b) r =2 j =1: Estimation at interior points (sort dases), at x = x ; 15 (dases and points), at x ; 6 (long dases) and at te boundary point x (solid line). also due to te increasing deformation of te weigting system towards te boundary. Usually, te increase in variance overcompensates te reduction of te bias, particularly if m 00 remains rougly te same in te boundary area. As a conseqence, te MSE will increase towards te boundary. Te increase will be even more pronounced for iger order polynomials. For x 2 IR p te local linear t is given as te solution of te least squares criterion nx y i i ; 0 ; 0 2 xi ; x (x i ; x) K were K is a p{variate kernel. Wit te design matrix X x wit rows 1 (xi1 ; x 1 ) ::: (x ip ; x p ) te solution as te same form as in (5.7). Let K be a product kernel composed of te same univariate kernel and bandwidt in eac coordinate and let H m (x) be te Hessian matrix of te second derivatives of m. Ten we get an asymptotic expression for te variance and te bias in te interior (see Ruppert and Wand, 1994) Var ^m(x) = 0 2 (x) f(x)n p + o p(n p ) (5.17)

23 5 LOCALLY WEIGHTED REGRESSION 23 and 2 Bias ^m(x) = 2 2trfH m (x)g + o p (p 2 ): (5.18) Te above considerations about te advantage of a local linear approac compared to te local constant estimation, about its design adaptation property and its automatic boundary adaptation old for te multivariate case in a similar way. ;1 Up to now we considered local least squares regression to estimate te mean function m: But te idea of locally weigted regression turns out to be a very versatile tool for estimation in a variety of situations. Yu and Jones (1998) consider te estimation of te conditional distribution function R F (yjx): Let F 1 (u) = u K 1 (v)dv be te distribution function pertaining to a symmetric kernel density K 1 and let 2 be a bandwidt. Yu and Jones consider a local linear approac for F (yjx) wic is motivated by te approximations E F 1 ( y ; y i 0 )jx 0 F (y0 jx 0 ) 2 and F (y 0 jx 0 ) F (y 0 jx)+ _ F(y 0 jx)(x ; x 0 )= (x ; x 0 ) were F _ (y 0 jx) =@F(y 0 jx)=@x: Tis suggests te least squares approac nx F 1 ( y i ; y 2 ) ; 0 ; 0 (x i ; x) i 2 K( x i ; x 1 ) were K is a second kernel wit bandwidt 1 : Te solution ~F 1 2 (yjx) = ^ 0 = e 0 1 (X 0 x W xx x ) ;1 X 0 x W x~y (5.19) wit ~y = F 1 ( y 1;y 2 ) ::: F 1 ( yn;y 2 is called a local linear double-kernel smooting by te autors. Te estimator is continuous and as zero as left boundary value (for y ;! ;1) and 1 as rigt boundary value. It can appen tat te estimator ranges outside [0 1]: But tis does not, as te autors say, give problems estimating q by ) 0

24 5 LOCALLY WEIGHTED REGRESSION 24 ~q (x) = ~ F ;1 1 2 (jx): Tis estimator involes te problem tat two bandwidts 1 and 2 ave tobecosen. For a possible procedure wit 2 < 1 we refer to te paper. Fan, Yao and Tong (1996) considered a related idea for estimating te conditional density itself. 1 E 2 y ; y0 K 1 2 g(y 0 jx)+ _g(y 0 jx)(x ; x 0 ) = (x ; x 0 ) wit _g(yjx)=@g(yjx)=@x leads to te least squares criterion nx 1 2 yi ; y 2 K 1 ; 0 ; 0 (x ; x 0 ) K 2 xi ; x 1 (5.20) wit te solution ^g(yjx) = ^ 0 as in (5.19), were now tevector ~y is ~y = 1 y1 ; y K ::: K 1 yn ; y 2 0 : Te local constant approac leads to te traditional estimator (2.3). Fan, Yao and Tong also consider te case of a local quadratic approac for estimating te rst derivative. We will not pursue tis case furter ere, since for te quadratic term p(p + 1)=2 more parameters ave to be estimated. In all local regression approaces so far we used te least squares criterion. Let us now look at cases were instead of te square function anoter convex loss function : IR! IR is used wic as a unique minimum at zero and let m (x) =argmin 0 E [(y ; 0 )jx]. (u) =u 2 yields te conditional expectation wic we analyzed mostly so far. (u) = juj yields te conditional median. Tis is just a special case for = 1=2 of te loss function (u) =juj +(2 ; 1)u, already mentioned in (3.6). was introduced by Koenker and Basset for parametric quantile estimation. Te function 2 (u) for various is exibited in Figure 5.2.

25 5 LOCALLY WEIGHTED REGRESSION 25 Figure (u) according to Koenker and Basset for several In robustness considerations -functions were introduced wic increase less rapidly tan te square function and for wic 0 is te so-called -function. See Huber (1981) or Hampel et al (1986). A local constant estimator for m is ^m (x) = argmin 0 n X (y i ; 0 )K( x i ; x ): Te known drawbacks of a local constant approac is tat it cannot adapt to unbalanced design situations and tat it as adverse boundary eects wic require boundary corrections. Tis idea leads to te estimator ^m (x) = ^ 0 were ( ^ 0 ^) = argmin 0 nx y i ; 0 ; 0 (x ; x 0 ) K x ; x0 : (5.21)

26 6 APPLICATIONS OF LOCALLY WEIGHTED REGRESSION TO TIME SERIES 26 For a -function belonging to a robustness class, suc as Huber's M-type estimators known metods for robust estimation can be applied in order to solve te minimum problem (5.21). We would like to remark tat te use of kernels automatically safeguards against large deviations in te design space. For nonparametric robust M-, L- and R-estimation in a time series setting see Micels (1992). For a local -quantile regression wit te function (3.6) te local solution in (5.18) can be evaluated by solving a linear programming problem, as was sown in te paper of Koenker and Basset (1978). An algoritm for evaluating tis can be found in Koenker and Dorey (1987). For te case of a general convex -function and i. i. d. observations asymptotic normality is proved in Fan, Hu and Truong (1994). Te -quantile estimation according to (5.21) is also considered by Yu and Jones (1998) and compared wit te estimator (5.19). For reasons of practical performance te autors prefer te double smooting approac (5.19). Tey also give an asymptotic expression for te mean squared error for x scalar, wic for te solution of (5.21) is given by MSE ^q (x) = Bias 2 ^q (x) + Var ^q (x) = q00 (x)+ 0 (1 ; ) nf(x)f(q (x)jx) 2 : Tese expressions are used for suggestions of bandwidt coice. Te cases of robust locally linear regression and of quantile regression are also considered in Fan and Gijbels (1996). 6 Applications of locally weigted regression to time series Local linear or iger order polynomial regression, originally mainly considered for independent data, can be applied in te same way to stationary processes wit certain memory restrictions. Te reasons are te same as tose mentioned at te beginning of section 3. Given two (dependent) random variables x s and x t and a point x in te design space, te random variables 1 xs;x K( )and 1 xt;x K( ) are nearly uncorrelated as! 0. Tis is te witening by windowing principle anditiswortwile mentionening tat tis property is not sared by parametric estimators. To andle memory restrictions in te proofs of consistency and asymptotic normality mixing conditions (strong mixing, uniform mixing or -mixing) are used. Tey give a bound to te maximal dependence between events being at least k instants apart from eac oter. Sort term dependence does not ave

27 6 APPLICATIONS OF LOCALLY WEIGHTED REGRESSION TO TIME SERIES 27 muc eect on local regression. But local polynomial tecniques are also applicable under weak dependence in medium or long term. If suitable mixing conditions are fullled, local polynomial estimators for dependent data ave te same asymptotic properties as for independent data. Of course te bias is not inuenced by dependence, wereas te variance terms are aected. In proving asymptotic equivalence ten te task consists in sowing tat te additional terms due to nonvanising covariances between te variables are of smaller order asymptotically. For a local linear estimation of m(x) =m(x 1 ::: x p ) in te autoregressive model (3.2) te design matrix and te vector y ave te form X x = 0 z p ; x 1 ::: z 1 ; x p. z n;1 ; x 1 ::: z n;p ; x p 1 0 C A y = and wit (x t ; x) 0 =(z t;1 ; x 1 ::: z t;p ; x p ) 0 te esimator can be evaluated as in (5.7). For x = x n+1 =(z n ::: z n;p+1 ) 0, z p+1. z t;1 1 C A ^m(x n+1 )= ^ 0 yields te one-step aead predictor. A direct k-step aead predictor is given if y = (z p+k ::: z n ) 0 and if te last row ofte X x -matrix is (z n;k ; z n ::: z n;k;p+1 ; z n;p+1 ). But in tis case a succession of one-step aead predictions seems preferable, as already mentioned in section 3. Asymptotic normality results for locally linear autoregression can be found in Hardle, Tsybakov an Yang (1997) and in Fan and Gijbels (1996). For te CHARN model z t = m(x t )+(x t ) t te function g(x t ) according to (3.9) can be estimated in a similar way asabove, were only in te vector y te time series values are replaced by te squares. Asymptotic normality for tis case is sown in Hardle and Tsybakov (1997). For a residual based estimator of 2 (x) see (7.10) or Feng and Heiler (1998a). Te local linear estimation of a conditional density in a time series setting wit te before mentioned double smooting procedure as in (5.19) is considered in Fan, Yao, and Tong (1996) and in Fan and Gijbels (1996), were also asymptotic results can be found. For te estimation of of te conditional distribution function according to te proposal of Yu and Jones (1998) as in (5.19) and for a general solution of (5.21) asymptotic

28 7 PARAMETER SELECTION 28 results are known for independent data. See te papers of Yu and Jones (1998), Hardle and Gasser (1984) and Tsybakov (1986). For dependent data, we ave not found yet formally puplised proofs. But considering te witening by windowing eect makes it clear tat for tese cases consistently results will old under suitable mixing conditions. 7 Parameter selection One of te rst questions to be answered in te application of kernel smooting is wic type of kernel to use for dierent coices of r and j. Itiswell known tat for r ; j odd in te interior of te x-space te Epanecknikov kernel K(u) = 3(1 ; 4 u2 ) + is te one wic minimizes te mean squared error in te class of all nonnegative, symmetric and Lipscitz continuous functions and tat for te endpoints x and x te triangular kernels (1 ; u)1 [0 1] (u) resp. (1 + u)1 [;1 0] are optimal. For oter points in te boundary area optimal solutions are not known. 1 It is easy to see tat wen looking at variance only te uniform kernel 1 2 [;1 1](u) is te one minimizing te variance. It is well known tat in practice te coice of te kernel is not very important compared to te coice of te bandwidt. Te Epanecknikov kernel will terefore be a good coice in many cases. Noneteless in practice often iger oder kernels like te Bisquare or te Triweigt are prefered. Tis as to do wit te degree of smootness, since te kernel estimates inerit te smootness properties of te kernel. According to te degree of smootness as introduced by Muller (1984), te uniform kernel as degree zero (not continuous), te triangle and te Epanecknikovkernel ave degree 1 (continuous, but rst derivate not continuous), te Bisquare and te Triweigt ave degrees 2 and 3, respectively, and te Gaussion kernel as degree 1. Te most crucial task in kernel smooting is bandwidt selection. Muc ink as been spoiled on papers concerning tis problem. It is ence impossible to give a compreensive survey ere. Instead we will discuss only a few basic ideas. Te aim is to coose bandwidts suc tat te conditional mean squared error, given by MSE(^m (j) (x)) = Bias 2 (^m (j) (x)) + Var(^m (j) (x)) (7.1) becomes minimal. We ave to distinguis between a locally optimal banwidt and a globally optimal, constant banwidt. It is clear tat a large bandwidt will lead to a low variance, but a ig bias. Decreasing te bandwidt will increase te variance, but reduce te bias. An optimal bandwidt is

29 7 PARAMETER SELECTION 29 acieved wen te canges in bias and variance balance. Using te asymptotic expressions (5.15) and (5.16) for te conditional variance and bias, ten minimizing (7.1) wit respect to yields for te (asymptotically) optimal bandwidt at x for a scalar x " 2 (x) n = C r j (K) (m (r+1) (x)) 2 f(x) 1 # 1=(2r+3) (7.2) n were te constant C r j (K) = ((r + 1)!)2 (2j +1) R 3 n K ~ (j) (u) 2 du R o 7 2(r +1; j) u r+1 K ~ 2 5 (j) (u)du 1=(2r+3) (7.3) depends only on r j and te used kernel and can be calculated beforeand. In time series applications we are mainly interested in a constant, global bandwidt, for wic te integrated mean squared error (IMSE) Z Bias(^m (j) (x)) 2 + Var(^m (j) (x)) i w(x)dx is cosen as criterion, were w is a weigt function going to zero at te bounderies to avoid boundery eects. Minimizing te IMSE wit respect to yields te optimal global bandwidt n = C r j (K) 2 4 R 2 (x) w(x)dx f (x) R fm (r+1) (x)g 2 w(x)dx 1 n 3 5 1=(2r+3) : (7.4) For local linear estimation of m wen x is a p-vector and te same bandwidt is cosen in eac coordinate a similar expression can be derived (see Feng and Heiler, 1998a). Here n = c 0 p 1 (p+4) n were

30 7 PARAMETER SELECTION 30 c 0 = " # 2 1 (p+4) (x) f(x)trfh m (x)g and H m (x) is te matrix of second derivatives of m. All tese expressions contain quantities wic are unknown and are terefore not amenable in practice. So called plug-in tecniques substitute tese quantities by pilot estimates. For more details see Ruppert, Seater and Wand (1995). A simple procedure of bandwidt selection for independent data, rstly developed to nd te smooting parameter in spline smooting, is cross validation. Let ^m i (x i ) be te so-called leave one out estimator of m at x i, were te observation (y i x i ) is not used in te estimation procedure. Ten te criterion is CV () =n ;1 n X [y i ; ^m i (x i )] 2 (7.5) and CV = argmin CV () is te cross validation bandwidt selector. Te idea can also be used for x 2 IR p and for estimating derivatives. See Hardle (1990) for details. It can be sown tat it converges almost surely to te IMSE optimal bandwidt, but te convergence rate is wit n ;1=10 very low. Te cross validation idea was developed for independent data. In a time series setting it is suggested to replace te leave one out estimator by a "leave block out" estimator, were for estimating at x i not only te i t observation is omitted, but a wole block of data around (y i x i ). Tis idea was used by Abberger (1995, 1996) in smooting te conditional -quantile, were te square function is replaced by te -function (3.6). Let 2 be te variance of te residuals in an i.i.d. sample and in te time series case te unconditional variance of te stationary process. Rice (1983, 1984) proposed a criterion R wic for a general linear smooter is given by R() =RSS() ; ^ 2 +2^ 2 n ;1 n X w ni (x i ) (7.6) were te w ni are te actual weigts for estimating m(x i ), ^ 2 is an estimate for 2 and RSS() =n ;1 n X [y i ; ^m (x i )] 2 (7.7)

31 7 PARAMETER SELECTION 31 is te mean residual sum of squares. Under te assumption tat ^ 2 is a consistent estimator Rice (1984) sowed tat te proposed estimator R = argmin R() is asymptotically optimal in te sense tat ( R ; 0 )= 0! 0 in probability, were 0 is te minimizer ofte mean averaged squared error MASE() =n ;1 E ( nx [^m (x i ) ; m(x i )] 2 ) Te rate of convergences of R is te same low rate n ;1=10 as for te cross validation solution CV. Te main dierences between te two istat R involves an estimate of 2, wereas CV does not. For ^ 2 Rice proposed an estimator based on rst dierences, wereas Gasser et al. (1986) suggested to take second dierences (since tey anniilate a local linear mean value function), ^ 2 G = 2 3(n ; 2) n;2 X : y i+1 ; 1 2 (y i + y i+2 ) 2 : (7.8) An estimator based on a general difference sequence D m = fd 0 d 1 ::: d m g suc tat P m d 0 j = 0 and P m 0 d2 j =1was considered by Hall et al. (1990). Te variance estimator based on D m is ten n;m X ^ 2 m =(n ; m) ;1 0 m j=0 12 d j y j+i A : (7.9) Fan and Gijbels (1995) suggest te residual sum of squares criterion (RSC), wic is based on a local estimator of te conditional variance derived under a local omogeneity assumption, ^ 2 (x) = np (y i ; ^y i ) 2 K xi ;x tr [W x ; W x (Xx 0 W xx x ) ;1 Xx 0 W x] : (7.10) Wit tis te RSC is dened as RSC(x ) =^ 2 (x)[1+(r +1)V ] (7.11)

32 7 PARAMETER SELECTION 32 were V is te rst diagonal element of te matrix (X 0 x W xx x ) ;1 (X 0 x W x 2X x)(x 0 x W xx x ) ;1 : V ;1 reects te eective number of local data points. RSC admits te following interpretation. If is too large, ten te bias is large and ence also ^ 2 (x). Wen te bandwidt is too small, ten V will be large. Terefore RSC protects against extreme coices of. Te minimizer of E[RSC(x )] can be approximated by n0 (x) = " # a 0 2 1=(2r+3) (x) (7.12) 2C r r+1nf(x) 2 were a 0 denotes te rst diagonal element of te matrix S ;1 S S ;1, i.e. a 0 = R ~K 2 (u)du and C r = 2r+2 ; c 0 r S;1 c r wit te denitions given in section 5 and r+1 = m (r+1) (x)=(r + 1)!. n0 (x) diers from te optimal bandwidt in (7.3) by an adjusting constant wic only depends on r j, and te kernel used. Hence te latter one can be evaluated, n(x) =Ad j r n0 (x) (7.13) were Ad j r = 2 R 3 6 (2j +1)C 4 n r ( K ~ (j) (u)) 2 du 7 R (r +1; j) u r+1 K ~ 2 R 5 (j) (u)duo ~K(u)2 du 1=(2r+3) For te Epanecnikov and te Gaussian kernel tese constants are tabulated for various r and j in Fan and Gijbels (1996). : For a global bandwidt te minimizer ^ of te integrated RSC, IRSC() = Z RSC(x )dx is taken, wic in practice breaks down to evaluating a mean over certain grid points x i1 ::: x im. ^ is also selected from among a number of grid points in an interval [ min max ]. Te global bandwidt is ten given by

33 7 PARAMETER SELECTION 33 ^ j r = Ad j r^: (7.14) Te RS criterion suers also from aving a low convergence rate. Terefore te following rened bandwidt selection procedure is suggested. It is a double smooting (DS) procedure. Te pilot smooting consists in tting a polynomial of order r + 2 and selecting ^ j r as above. Wit te bandwidt ^ r+1 r+2 estimates of ^r+1 ^ r+2 and ^ 2 (x) are evaluated. Wit tese pilot estimates in a second stage te MSE(j r) d (x ) = Bias d 2 j r(x)+ Var d j r (x) is evaluated, were Biasj r d (x) denotes te (j +1) t element of te estimated bias vector and Var d (j r) (x) is te (j +1) t diagonal element of te matrix (X 0 x W xx x ) ;1 (X 0 x W x 2X x)(x 0 x W xx x ) ;1^ P 2 (x). Wit S n l = n K xi ;x (xi ; x) l te bias vector is estimated by 0 B ^br (x) =(X 0 x W xx x ) ;1 ^ r+1 S n r+1 + ^ r+2 S n r+2. ^ r+1 S n 2r+1 + ^ r+2 S n 2r+2 1 C A : In order to avoid collinearity eects it is suggested to modify te vector on te rigt side by putting S n r+3 = :::= S n 2r+2 = 0, wic yields ^br (x) =(X 0 x W xx x ) ;1 0 B ^ r+1 S n r+1 + ^ r+2 S n r+2 ^ r+1 S n r C C A : Te global rened bandwidt selector is ten given by te minimizer ^ R j r of Z ^ MSE j r (x )dx: (7.15) Tis rened tecnique leads to an important improvementover te RSC bandwidt selector. For a balanced design, i.e. for equally spaced x values, Heiler and Feng (1998) propose a simple double smooting procedure, were in te pilot estimation step te R-criterion

34 7 PARAMETER SELECTION 34 is used. In Feng and Heiler (1998b) a furter improvement of tis proposal can be found, were a variance estimator based on te bootstrap idea is used. Equally spaced x values are for instance given in a time series setting were te regressor is te time index or a function of te time index. Tis kind of smooting will be discussed in te next section. For order selection in a time series autoregression model wit x t =(z t;1 ::: z t;p ) and ^m t (x) being te leave one out estimator according to (5.7), Ceng and Tong (1992) use te cross validation criterion CV (p) =(n ; r +1) ;1 X t [z t ; ^m t (x t )] 2 w(x t ): (7.16) were w is a weigt function to avoid boundary eects. Due to te curse of dimensionality problem it may be advisable not to take all lagged values z t;1 ::: z t;p into account but to look for a subset of lagged values wic yields te best forecasts. For a lag constellation x t (i) =(z t;i1 ::: z t;ip ) 0 Tisteim and Auestad (1994) propose to use te nal prediction error FPE(x t (i)) = n ;1 X t [z t ; ^m(x t (i))] 2 f(i) (7.17) were te factor and o = f(i) = Z K 2 (u)du 1+(n p ) ;1 0 b p (i) 1 ; (n p ) ;1 [2K p (o) ; p o] b p (i) b p (i) =n ;1 X w 2 (x t (i)) ^f(x t (i)) ^f(x t (i)) being a multivariate kernel density estimator. FPE in (7.16) is essentually a sum of squares of one-step aead prediction errors multiplied wit a factor tat penalizes small bandwidts and a large order p.

35 8 TIME SERIES DECOMPOSITION WITH LOCALLY WEIGHTED REGRESSION35 8 Time series decomposition wit locally weigted regression As already mentioned in section 3, if x t is te time index itself or a polynomial in t, ten we arrive at trend smooting. In a simple trend model z t = m(t)+a t te considerations at te beginning of section 5 deliver an estimator of te smoot trend function or its derivatives. Now te matrix X t as te rows (1 s; t : : : (s ; t) r ) for s =1 ::: n and W t = diag(k( s;t )). As an interesting fact one can easily see tat in te interior of te time series, i.e. for t n ; te weigts given in (5.8), s ; t wnt(s) j =e 0 j+1(x 0 t W tx t ) ;1 (1 s; t ::: (s ; t) r ) K are sift invariant in te sense w j n t+1(s +1)=w j nt(s). Tis means tat in te interior of te time series te local polynomial t works likeamoving average. But te big advantage over oter trend smooting tecniques lies in te automatic boundary adaptation of te procedure. Tis property makes te idea of extending te local regression approac to so-called unobserved components models very appealing. Nonparametric estimation of trend-cyclical movements and of seasonal variations and teir separation by local regression represents an interesting alternative to procedures based on parametric models like X{12 or TRAMO{SEATS. Tese involve extrapolation metods on eiter end of te time series in order to be able to estimate te components also in te boundary parts of a time series. Tis can lead to serious problems if unusual observations in te end parts of time series yield grossly erroneous forecasts. Te latter problem will not appear wit a local regression approac. Note also tat wit a data driven parameter selection te procedure works in a fully automatic way. Te decomposition of a time series into trend{cyclical and seasonal components by LOcally WEigted Scatterplot Smooting (LOWESS) was suggested by Cleveland et al. (1990). Te procedure discussed ere is dierent from teir procedure in essential features. We consider te additive (unobserved) components model z t = T (t)+s(t)+a t t =1 2 ::: (8.1)

36 8 TIME SERIES DECOMPOSITION WITH LOCALLY WEIGHTED REGRESSION36 For te sake of simplicity we assume tat fa t g is a wite noise sequence wit mean zero and constant variance 2. T (t) represents te trend cyclical and S(t) te seasonal component. Te usual assumption wit respect to T is tat it as certain smootness properties so tat te considerations at te beginning of section 5 apply, leading to a local polynomial representation of order r. Wit respect to te seasonal variations te usual assumption is tat tey sow a similar pattern from one seasonal period to te next, but tey are allowed to vary slowly in te course of time. Hence a natural assumption is tat tey can locally be approximated by afourier series, containing te seasonal frequency and its armonics, S(s) = qx j=1 [ j (t) cos 2j(s ; t)+ j (t)sin2j(s ; t)] (8.2) were is te seasonal frequency, = 1=P and P is te period of te season. Of course q 1=2 (and for q =1=2 te last sine term as to be omitted). Let u t (s) = (cos 2(s ; t) sin 2(s ; t) ::: cos 2q(s ; t) sin 2q(s ; t)) 0 (t) = ( 1 (t) 1 (t) ::: q (t) q (t)) 0 : Ten S(s) =(t) 0 u t (s). Wit te local polynomial representation for te trend-cyclical part T (s) = rx j=0 j (t)(s ; t) j = (t) 0 x t (s) were (t) =( 0 (t) ::: r (t)) 0 x t (s) =(1 s; t ::: (s ; t) r ) 0, te local least squares criterion is nx s=1 s ; t [z t ; (t) 0 x t (s) ; (t) 0 u t (s)] 2 K : (8.3) Wit te design matrices X 1t wit rows x t (s) 0, X 2t wit rows u t (s) 0, X t =(X 1t.X 2t ), te composed vector (t) 0 =((t) 0 (t) 0 ) and te weigt matrix W t = diag K s;t te solution is

37 8 TIME SERIES DECOMPOSITION WITH LOCALLY WEIGHTED REGRESSION37 ^(t) =(X 0 t W tx t ) ;1 X 0 t W ty (8.4) ^T(t) =e 0 1 (X 0 t W tx t ) ;1 X 0 t W ty (8.5) ^S(t) =(o 0 0 s)(x 0 t W tx t ) ;1 X 0 t W ty (8.6) were o 0 is a row of zeroes of lengt r +1 and 0 s is a row vector of lengt 2q wit entries 0 s =(1010:::1 0). It picks out te ^ j(t), pertaining to te cosine terms in ^S(t). Te estimator for te j t derivative T (j) of T is ^T (j) = j!e 0 j+1 (X 0 t W tx t ) ;1 X 0 t W ty: (8.7) All te above estimators work as moving averages in te interior part of te time series and ave for r ; j odd te simple boundary adaptation property discussed in section 5. Te decomposition ^m(t) = ^T(t) + ^S(t) is not unique, since te matrix X 0 t W tx t is not block diagonal. Tis could of course be acieved by an ortogonalization procedure but seems not to be compelling for practical purposes. We call te above decompostion a natural decomposition. For parameter selection rst a decision as to be made about te degree of te trend polynomial T and te trigonometric polynomial S. Since te seasonal variations are involved in te local approac te bandwidts sould be suc tat at least tree to ve periods of te season are included. In order to acieve tis, te modelization of T sould be rater exible. Hence for te interior part of te time series te polynomial degree r =3 may be preferable to te coice r = 1. A data driven coice for a joint selection of r and bandwidt is a very dicult task since te two parameters are igly correlated. A iger r allows a larger bandwidt and vice versa. In our experience collected so far a data driven procedure for te interior part always opted for te igest allowed degree r max tat was put beforeand even if te MSE criterion included a penalty term for overparameterization. As far as te trigonometric polynomial is concerned, all armonic terms sould be included, unless an inspection of te periodogramme or te estimated spectrum reveals tat one or even more of te seasonal frequences can be ommitted. After tis preselection of parameters a procedure for bandwidt selection is needed. Since for an equidistant time series te "design density" f is a constant teprocedureisso- meow simpler tan in te general situation discussed in section 7. A variate of a double smooting procedure is recommended. In te pilot stage a poly-

38 8 TIME SERIES DECOMPOSITION WITH LOCALLY WEIGHTED REGRESSION38 nomial of degree r + 2 is tted and te bandwidt is selected wit te Rice criterion wit respect to ^m = ^T + ^S. But due to seasonal variations te dierence based variance estimator (7.8) as to be altered. Heiler and Feng (1996) and Feng (1998) propose a seasonal dierence based variance estimator of te form in (7.9), were not only a local linear function, but also a local periodic function is allowed for. An example for montly data (P = 12) is D = c ;1 f;1 2 ; ; ;1 2 ;1g P were c is determined suc tat m d 2 j =1: j=0 D anniilates a local linear trend and a local periodic function wit periodicity P = 12. Similar sequences can easily be constructed. Let ^ G 2 be te resulting estimator and let g be te minimizer of te R-criterion(7.6). Wit ^m g = ^Tg + ^Sg te resulting estimator is denoted. For an arbitrary te weigts wt (s) for estimating ^T (t) + ^S (t) are te components of te vector (1 0 ::: 0 0 s)(x 0 t W tx t ) ;1 X 0 t W t, were for W t akernel wit bandwidt is taken. Using te pilot estimates ^m g (t) te bias part of te MSE at t for an estimator wit bandwidt is estimated by d Bias(^m (t)) = nx s=1 w t (s)^m g(s) ; ^m g (t) wic yields for te bias part of te mean averaged squared error MASE() B() = n ;1 n X t=1 = n ;1 n X t=1 d Bias 2 (^m (t)) ( nx s=1 Te variance is estimated by V () =n ;1^ 2 n X t=1 nx s=1 w t (s)^m g(s) ; ^m g (t)) 2 : (8.8) w t (s) 2 (8.9)

39 8 TIME SERIES DECOMPOSITION WITH LOCALLY WEIGHTED REGRESSION39 were ^ 2 sould be a suitable root-n consistent estimator of ^ 2. After te rst pilot step a minimizer ~ of te criterion MASE() =B()+V () (8.10) is evaluated over a grid, were in te second step te estimator ^ G 2 is used in V (). Tis second step leads already to a considerable improvement over te simple R-criterion, but te estimator ^ G 2 is still not very good. Hence an improved estimation wit a lower polynomial degree and a bandwit g v larger tan g is proposed. For details see Feng and Heiler (1998). According to considerations terein an estimator for g v can easily be found by multiplying te minimizer ~ of (8.10) wit a correction factor. Tis factor only depends on te used kernel and on te polynomial degree r, ^g v = CF r ~ : For instance, we get for te Epanecnikov kernel CF 1 =1:431, CF 3 =1:291, for te Bisquare kernel CF 1 =1:451, CF 3 =1:300 and for te Gaussian kernel CF 1 =1:489 and CF 3 =1:305. See Table 5.1 in Muller (1988) or Table 1 in Feng and Heiler (1998). Let now ^m gv = ^T gv + ^S gv be an estimator wit bandwidt g v.tenanimproved variance estimator is obtained by taking te mean squared residuals ^ 2 B = n ;1 n X t=1[z t ; ^m gv (t)] 2 : (8.11) In a tird step tis variance estimator is plugged into (8.9) for ^ 2 and wit tis again a minimizer of te MASE (8.10) is evaluated. In principle tis procedure can be iterated several times, were in te next step wit a polynomial of degree r + 2 a new bias estimator is evaluated. Te above described procedure yields a bandwidt for te interior part of te time series, were after te selection of te interior is given by [ +1 n; ]. As described in section 5 te procedure automatically adapts towards te boundaries. But as also described tere due to increasing variance te MSE will increase as well, particularly if r = 3 is cosen, as was recommended at te beginning of tis section. One possibility to at least partly compensate for tat is to switc to a nearest neigbour estimator in te boundary area, tat is, to keep te total bandwidt T =2 +1 constant at bot ends of te time series. Tis means tat for estimating from t = n ; +1 to t = n te same local neigbourood is used (and similarly for te left boundary). Instead or in addition to tat a switc from a local polynomial of order 3 to a local linear

40 8 TIME SERIES DECOMPOSITION WITH LOCALLY WEIGHTED REGRESSION40 approac (for T )may be recommended wenever te MSE for r = 1 becomes smaller tan tat for r = 3. In order to do tat, for te given bandwidt and te asymmetric neigbourood situation at eac time point in te boundary area wit te corresponding active weigting systems te MSE 0 s for r = 3 and r =1ave tobeevaluated according to te procedure described above. As soon as MSE 1 <MSE 3, a local linear approac is cosen for T and maintained to te end point. According to practical experiences collected so far suc a switc appened to come to eect close to te end points in almost all cases. In Figures 8.1 and 8.2 we present two examples were te discussed decomposition procedure is applied. Te rst time series is te quarterly series of te German GDP from 1968 to In te top panel in Figure 8.1 te time series itself and te estimated trend-cyclical component are exibited. In te middle te estimated seasonal component is sown and in te bottom panel te rst derivative of te trend-cyclical is exibited. Tis latter picture sows clearly te temporary boom after German reunication. Te double smooting procedure wit bootstrap variance estimator selected = 11 as bandwidt. Te polynomial degree was two for estimating te rst derivative and tree for te oter estimations. Te second example presented in Figure 8.2 sows corresponding results for te montly series of te German unemployment rates (in per cent) from January 1977 to April Here te selected bandwidt is = 21. Te polynomial degrees are te same as in te previous example. Cleveland (1979) proposed an iterative robust locally weigted regression in a general regression context and in Cleveland et al. (1990) tis idea is also used in time series decomposition. It can easily be adapted to te procedure discussed ere, altoug in teir proposal te subseries of equal weeks, mont, quarters etc. are treated separately. Te idea consists in looking at te residuals r t = z t ; ^m(t) of a rst, nonrobust procedure and to evaluate a robust scale measure for te residuals. Cleveland suggests to take te median of te jr t j. Since in many time series variability is dierent for dierent periods witin te season depending on te size of te seasonal component, it seems reasonable to evaluate dierent scale measures i for te dierent periods of te season. t;1 For t =1 ::: n let j = P +1 be te year index, j =1:::: J = n;1 i P + 1, were [:] denotes te integer part and let i = t ; P (j ; 1) be te season index, i.e. z t ;! z ij. Ten for all i =1 ::: P a robust scale measure i = median j (jr ij j) is evaluated. From tis so-called robustness weigts are derived, wic according to Cleveland's proposal are given by

41 8 TIME SERIES DECOMPOSITION WITH LOCALLY WEIGHTED REGRESSION41 Figure 8.1. Decomposition results for te time series of te German GDP from 1968 to (a) Te data and ^T, (b) ^S and (c) ^T 0

42 8 TIME SERIES DECOMPOSITION WITH LOCALLY WEIGHTED REGRESSION42 Figure 8.2. Decomposition results for te time series of te German unemploymentrates (in %) from January 1977 to April (a) Te data and ^T, (b) ^S and (c) ^T 0

43 REFERENCES 43 ij = K rij 6 i were K is a kernel function (te bisquare kernel is being suggested). In a second step te local estimation procedure is repeated, were te neigbourood weigts k st = K s;t in te diagonal weigt matrices Wt are multiplied wit te corresponding robustness weigts ij,werei and j are te season- and year index corresponding to s. Of course wit te time dependent robustness weigts te procedure is no more sift invariant, so tat te least squares solution as to be evaluated for eac t explicitely. Starting wit te new residuals te procedure can be iterated until te estimates stabilize. Since te robustness weigts will cange te activekernels, dierent bandwidts sould be used in eac iteration step. Cleveland (1979) claimed tat two robust iterations sould be adequate for almost all situations. In Feng (1998) wit a stability criterion a iger number of iteration steps occured in most cases. References [1] ABBERGER, K. (1996). Nictparametrisce Scatzung bedingter Quantile in Zeitreien{Mit Anwendungen auf Finanzmarktdaten. Hartung-Gorre Verlag, Konstanz. [2] ABBERGER, K. (1997). Quantile Smooting in Financial Time Series. Statistical Papers, 38, 125{148. [3] BONGARD, J. (1960). Some Remarks on Moving Averages. In: O.E.C.D. (editor), Seasonal Adjustment on Electronic Computers. Proceedings of an international conference eld in Paris, [4] CHEN, R. (1996). A Nonparametric Multi-step Prediction Estimator in Markovian Structures. Statistica Sinica, 6, [5] CHEN, R. and TSAY, R.S. (1993). Functional-coecient Autoregressive Models. Journal Amer. Statist. Assoc., 88, [6] CHEN, R. and TSAY, R.S. (1993). Nonlinear Additive ARX Models. Journal Amer. Statist. Assoc., 88, [7] CHENG, B. and TONG, H. (1992). On Consistent Non-parametric Order Determination and Caos (wit discussion). Journal Royal Statist. Soc., Series B, 54,

44 REFERENCES 44 [8] CLEVELAND, R.B., CLEVELAND, W.S., McRAE, I.E. and TERPENNING, I. (1990). STL: A Seasonal-trend Decomposition Procedure Based on LOWESS (mit Diskussion). Journal of Ocial Statistics, 6, 3{73. [9] CLEVELAND, W.S. (1979). Robust Locally Weigted Regression and Smooting Scatterplots. Journal of te American Statistical Association, 74, 829{836. [10] COLLOMB, G. (1980). Estimation Nonparametrique de Probabilites Conditionelles. Comptes Rendus a l'academie des Sciences de Paris, 291, Serie A, 427{430. [11] COLLOMB, G. (1983). From Nonparametric Regression to Nonparametric Prediction: Survey of te Mean Square Error and Original Results on te Predictogram. Lecture Notes in Statistics, 16, 182{204. [12] COLLOMB, G. (1984). Proprietes de Convergence Presque Complete du Predicteur a Noyau. Zeitscrift fur Warsceinlickeitsteorie und Verwandte Gebiete, 66, 441{ 460. [13] COLLOMB, G. (1985). Nonparametric Time Series Analysis and Prediction: Uniform Almost Sure Convergence of te k-nn Autoregression Estimates. Statistics, 16, 297{ 307. [14] EUBANK, R. L. (1988). Spline Smooting and Nonparametric Regression. Marcel Dekker, New York. [15] FAN, J. (1993). Local Linear Regression Smooters and Teir Minimax Eciencies. Annals of Statistics, 21, 196{216. [16] FAN, J. and GIJBELS, I. (1992). Variable Bandwidt and Local Linear Regression Smooters. Annals of Statistics, 20, 2008{2036. [17] FAN, J. and GIJBELS, I. (1995). Data-driven Bandwidt Selection in Local Polynomial Fitting: Variable bandwidt and Spatial Adaptation. Journal of te Royal Statistical Society, series B, 57, 371{394. [18] FAN, J. and GIJBELS, I. (1996). Local Polynomial Modelling and its Applications. Capman & Hall, London. [19] FAN, J., HU, T-CH. and TRUONG, Y.K. (1994). Robust Non-parametric Function Estimsation. Scandinavian Journal of Statistics, 21, [20] FAN, J., YAO, Q. and TONG, H. (1996). Estimation of Conditional Densities and Sensitivity Measures in Nonlinear Dynamic Systems. Biometrika, 83, [21] FENG, Y. (1998). Kernel- and Locally Weigted Regression wit Application to Time Series Decomposition. P.D. Tesis. University of Konstanz.

45 REFERENCES 45 [22] FENG, Y. and HEILER, S. (1998a). Locally Weigted Autoregression. In: R. Galata and H. Kuceno (editors), Econometrics in Teory and Practice. Festscrift for Hans Scneewei, [23] FENG, Y. and HEILER, S. (1998b). Bandwidt Selection Based on Bootstrap. Discussion Paper. University of Konstanz. [24] FISHER, A. (1937). A Brief Note on Seasonal Variations. Journal of Accountancy, 64, 174. [25] FRIEDMAN, J.H (1991). Multivariate Adaptive Regression Splines (mit Diskussion). Annals of Statistics, 19, 1{141. [26] GASSER, T., KNEIP, A. and KOHLER, W. (1991). A Flexible and Fast Metod for Automatic Smooting. J. Amer. Statist. Assoc., 86, 643{652. [27] GASSER, T. and MULLER, H.G. (1979). Kernel Estimation of Regression Functions. In: Gasser and Rosenblatt (editors), Smooting Tecniques for Curve Estimation, Spring-Verlag, Heidelberg, 23{68. [28] GASSER, T. and MULLER, H.G. (1984). Estimating Regression Functions and Teir Derivatives by te Kernel Metod. Scandinavian Journal of Statistics, 11, 171{185. [29] GASSER, T., M ULLER, H.G. and MAMMITZSCH V. (1985). Kernels for Nonparametric Curve Estimation. Journal of te Royal Statistical Society, series B, 47, 238{ 252. [30] GASSER, T., SROKA, L. and JENNEN-STEINMETZ, C. (1986). Residual Variance and Residual Pattern in Nonlinear Regression. Biometrika, 73, 625{633. [31] GOURIEROUX, CH. and MONFORT, A. (1992). Qualitative Tresold ARCH Models. Journal of Econometrics, 52, 159{199. [32] H ARDLE, W. (1990). Applied Nonparametric Regression. Cambridge University Press, Cambridge. [33] H ARDLE, W., HALL, P. and MARRON, J.S. (1992). Regression Smooting Parameters Tat are not far from Teir Optimum. J. Amer. Statist. Assoc., 87, 227{233. [34] H ARDLE, W. and GASSER, T. (1984). Robust Non-parametric Function Fitting. Journal Royal. Statist. Soc., Series B, 46, [35] HARDLE, W., LUTKEPOHL, H. and CHEN, R. (1997). A Review of Nonparametric Time Series Analysis. International Statistical Review, 65,

46 REFERENCES 46 [36] H ARDLE, W. and TSYBAKOV, A.B. (1988). Robust Nonparametric Regression wit Simultaneous Scale Curve Estimation. Annals of Statistics, 16, 120{135. [37] H ARDLE, W. and TSYBAKOV, A.B. (1998). Local polynomial Estimators of te Volatility Function. To appear in it Journal of Econometrics. [38] H ARDLE, W., TSYBAKOV, A.B. and YANG, L. (1997). Nonparametric Vector Autoregression. To appear in Journal of Statistical Planning and Inference. [39] H ARDLE, W. and YANG, L. (1996). Nonparametric Time Series Model Selection. Discussion paper, Humboldt-Universitat zu Berlin. [40] HALL, P., KAY, J.W. and TITTERINGTON, D.M. (1990). Asymptotically Optimal Dierence-based Estimation of Variance in Nonparametric Regression. Biometrika, 77, [41] HAMPEL, F.R., RONCHETTI, E.M., ROUSSEEUW, P.J. ans STAHEL, W.A. (1986). Robust Statistics: Te Approac Based on te Inuence Function. Wiley, new York. [42] HART, J.D. (1996).Some Automated Metods of Smooting Time-dependent Data. Journal of Nonparametric Statistics, 6, [43] HASTIE, T.J. and TIBSHIRANI, R.J. (1990). Generalized Additive Models. Monograps on Statistics and Apllied Probability, 43, Capman and Hall, London. [44] HEILER, S. (1995). Zur Glattung Saisonaler Zeitreien. In: Rinne, H., Ruger, B. and Strecker, H. (editors). Grundlagen der Statistik und Ire Anwendungen. Festscrift fur Kurt Weicselberger, Pysika-Verlag, Heidelberg, 128{148. [45] HEILER, S. and FENG, Y. (1996). Datengesteuerte Zerlegung Saisonaler Zeitreien. ifo Studien, 41{73. [46] HEILER, S. and FENG, Y. (1998). A Simple Root n Bandwidt Selector for Nonparametric Regression. Journal of Nonparametric Statistics9, [47] HEILER, S. and FENG, Y. (1997). A Bootstrap Bandwidt Selector for Local Polynomial Fitting. Discussionpaper, SFB178, II{344, University of Konstanz. [48] HEILER, S. and MICHELS, P. (1994). Deskriptive und Explorative Datenanalyse. Oldenbourg-Verlag, Muncen. [49] HORWATH L. and YANDELL B.S. (1988). Asymptotics of Conditional Empirical Porcesses. Journal of Multivariate Analysis, 26, [50] HUBER, P.J. (1981). Robust Statistics. Wiley, New York.

47 REFERENCES 47 [51] JONES, H.L. (1943). Fitting of Polynomial Trends to Seasonal Data by te Metod of Least Squares. Journal Amer. Statist. Assoc., 38, 453 [52] JONES, M.C. and HALL, P. (1990). Mean Squared Error Properties of Kernel Estimates of Regression Quantiles. Statistics & Probability Letters, 10, 283{289. [53] KOENKER, R. and BASSETT, G. (1978). Regression Quantiles. Econometrica, 46, 33{50. [54] KOENKER, R. and DOREY, V. (1987). Computing Regression Quantiles. Applied Statistics, 36, [55] KOENKER, R., PORTNOY, S. and Ng, P. (1992). Nonparametric Estimation of Conditional Quantile Functions. In: L 1 -Statistical Analysis and Related Metods (ed. Y. Dodge), Nort-Holland, New York. [56] MACAULAY, R.R. (1931). Te smooting of time series. Natinal Bureau of Economic Researc, New York. [57] MESSER, K. and GOLDSTEIN, L. (1993). A new Class of Kernels for Nonparametric Curve Estimation. Annals of Statistics, 21, 179{195. [58] MICHELS, P. (1992). Nictparametrisce Analyse und Prognose von Zeitreien. Pysica-Verlag, Heidelberg. [59] M ULLER, H.-G. (1985). Empirical Bandwidt Coice for Nonparametric Kernel Regression by Means of Pilot Estimators. Statist. Decisions, Supp. Issue 2, 193{206. [60] M ULLER, H.-G. (1988). Nonparametric Analysis of Longitudinal Data. Springer- Verlag, Berlin. [61] NADARAYA, E.A. (1964). On Estimating Regression. Teory of Probability and Its Applications, 9, 141{142. [62] PRIESTLEY, M.B. and CHAO, M.T. (1972). Nonparametric Function Fitting. Journal of te Royal Statistical Society, series B, 34, 385{392. [63] RICE, J. (1983). Metods for Bandwidt Coice in Nonparametric Kernel Regression. In: J.E. Gentle (editor), Computer Science and Statistics: Te Interface. Nort Holland, Amsterdam, [64] RICE, J. (1984). Bandwidt Coice for Nonparametric Regression. Annal of Statistics, 12, 1215{1230. [65] ROBINSON, P.M. (1983). Nonparametric Estimators for Time Series. Journal of Time Series Analysis, 4, 185{207.

48 REFERENCES 48 [66] ROBINSON, P.M. (1986). On te Consistency and Finite-sample Properties of Nonparametric Kernel Time Series Regression, Autoregression and Density Estimators. Annals of te Institute of Statistical Matematics, 38, A, 539{549. [67] RUPPERT, D., SHEATHER, S.J. and WAND, M.P. (1995). An Eective Bandwidt Selector for Local Least Squares Regression. J. Amer. Statist. Assoc., 90, 1257{1270. [68] RUPPERT, D. and WAND, M.P. (1994). Multivariate Locally Weigted Least Squares Regression, Annals of Statistics, 22, 1346{1370. [69] SILVERMAN, B.W. (1984). Spline Smooting: Te Equivalent Variable Kernel Metod. Annals of Statistics, 12, 898{916. [70] SILVERMAN, B.W. (1985). Some Aspects of te Spline Smooting Approac to Nonparametric Regression Curve Fitting (wit discussion). Journal of te Royal Statistical Society, series B, 47, 1{52. [71] STONE, C.J. (1977). Consistent Nonparametric Regression (wit discussion). Annals of Statistics, 5, 595{620. [72] STUTE, W. (1984). Asymptotic Normality of Nearest Neigbor Regression Function Estimates. Annals of Statistics, 12, 917{926. [73] STUTE, W. (1986). Conditional Empirical Processes. Annals of Statistics, 14, 638{ 647. [74] TJSTHEIM, D. and AUESTAD B. (1994a). Nonparametric Identixation of Nonlinear Time Series: Projection. J. Amer. Statist. Assoc., 89, [75] TJSTHEIM, D. and AUESTAD B. (1994b). Nonparametric Identixation of Nonlinear Time Series: Selecting Signicant Lags. J. Amer. Statist. Assoc., 89, [76] TSYBAKOV, A.B. (1986). Robust Reconstruction of Function by te Local Approximation Metod. Problems of Information Transmission, 22, 133{146. [77] WAHBA, G. (1990). Spline Models for Observational Data. SIAM, Piladelpia. [78] WAND, M.P. and JONES, M.C. (1995). Kernel Smooting. Capman & Hall, London. [79] WATSON, G.S. (1964). Smoot Regression Analysis. Sankya, Ser. A, 26, 359{372. [80] YAKOWITZ, S. (1979a). Nonparametric estimation of Markov transition functions. Annals of Statistics, 7, [81] YAKOWITZ, S. (1979b). A nonparametric Markov model for daily river ow. Water Resour. Researc, 15,

49 REFERENCES 49 [82] YAKOWITZ, S. (1985). Markov Flow Models and te Flood Warning Problem. Water Resources Researc, 21, 81{88. [83] YANG, L. and H ARDLE, W. (1996). Nonparametric Autoregression wit Multiplicative Volatility and Additive Mean. Submitted to Journal of Time Series Analysis. [84] YANG, S. (1981). Linear Functions of Concomitants of Order Statistics wit Application to Nonparametric Estimation of a Regression Function. Journal of te American Statistical Association, 76, 658{662. [85] YU, K. and JONES, M.C. (1998). Local Linear Quantile Regression. Journal. Amer. Statist. Assoc., 93,

Verifying Numerical Convergence Rates

Verifying Numerical Convergence Rates 1 Order of accuracy Verifying Numerical Convergence Rates We consider a numerical approximation of an exact value u. Te approximation depends on a small parameter, suc as te grid size or time step, and

More information

FINITE DIFFERENCE METHODS

FINITE DIFFERENCE METHODS FINITE DIFFERENCE METHODS LONG CHEN Te best known metods, finite difference, consists of replacing eac derivative by a difference quotient in te classic formulation. It is simple to code and economic to

More information

In other words the graph of the polynomial should pass through the points

In other words the graph of the polynomial should pass through the points Capter 3 Interpolation Interpolation is te problem of fitting a smoot curve troug a given set of points, generally as te grap of a function. It is useful at least in data analysis (interpolation is a form

More information

Geometric Stratification of Accounting Data

Geometric Stratification of Accounting Data Stratification of Accounting Data Patricia Gunning * Jane Mary Horgan ** William Yancey *** Abstract: We suggest a new procedure for defining te boundaries of te strata in igly skewed populations, usual

More information

The EOQ Inventory Formula

The EOQ Inventory Formula Te EOQ Inventory Formula James M. Cargal Matematics Department Troy University Montgomery Campus A basic problem for businesses and manufacturers is, wen ordering supplies, to determine wat quantity of

More information

Improved dynamic programs for some batcing problems involving te maximum lateness criterion A P M Wagelmans Econometric Institute Erasmus University Rotterdam PO Box 1738, 3000 DR Rotterdam Te Neterlands

More information

SAMPLE DESIGN FOR THE TERRORISM RISK INSURANCE PROGRAM SURVEY

SAMPLE DESIGN FOR THE TERRORISM RISK INSURANCE PROGRAM SURVEY ASA Section on Survey Researc Metods SAMPLE DESIG FOR TE TERRORISM RISK ISURACE PROGRAM SURVEY G. ussain Coudry, Westat; Mats yfjäll, Statisticon; and Marianne Winglee, Westat G. ussain Coudry, Westat,

More information

Computer Science and Engineering, UCSD October 7, 1999 Goldreic-Levin Teorem Autor: Bellare Te Goldreic-Levin Teorem 1 Te problem We æx a an integer n for te lengt of te strings involved. If a is an n-bit

More information

Research on the Anti-perspective Correction Algorithm of QR Barcode

Research on the Anti-perspective Correction Algorithm of QR Barcode Researc on te Anti-perspective Correction Algoritm of QR Barcode Jianua Li, Yi-Wen Wang, YiJun Wang,Yi Cen, Guoceng Wang Key Laboratory of Electronic Tin Films and Integrated Devices University of Electronic

More information

CHAPTER 7. Di erentiation

CHAPTER 7. Di erentiation CHAPTER 7 Di erentiation 1. Te Derivative at a Point Definition 7.1. Let f be a function defined on a neigborood of x 0. f is di erentiable at x 0, if te following it exists: f 0 fx 0 + ) fx 0 ) x 0 )=.

More information

Instantaneous Rate of Change:

Instantaneous Rate of Change: Instantaneous Rate of Cange: Last section we discovered tat te average rate of cange in F(x) can also be interpreted as te slope of a scant line. Te average rate of cange involves te cange in F(x) over

More information

Can a Lump-Sum Transfer Make Everyone Enjoy the Gains. from Free Trade?

Can a Lump-Sum Transfer Make Everyone Enjoy the Gains. from Free Trade? Can a Lump-Sum Transfer Make Everyone Enjoy te Gains from Free Trade? Yasukazu Icino Department of Economics, Konan University June 30, 2010 Abstract I examine lump-sum transfer rules to redistribute te

More information

Distances in random graphs with infinite mean degrees

Distances in random graphs with infinite mean degrees Distances in random graps wit infinite mean degrees Henri van den Esker, Remco van der Hofstad, Gerard Hoogiemstra and Dmitri Znamenski April 26, 2005 Abstract We study random graps wit an i.i.d. degree

More information

Lecture 10: What is a Function, definition, piecewise defined functions, difference quotient, domain of a function

Lecture 10: What is a Function, definition, piecewise defined functions, difference quotient, domain of a function Lecture 10: Wat is a Function, definition, piecewise defined functions, difference quotient, domain of a function A function arises wen one quantity depends on anoter. Many everyday relationsips between

More information

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written

More information

Tangent Lines and Rates of Change

Tangent Lines and Rates of Change Tangent Lines and Rates of Cange 9-2-2005 Given a function y = f(x), ow do you find te slope of te tangent line to te grap at te point P(a, f(a))? (I m tinking of te tangent line as a line tat just skims

More information

Multivariate time series analysis: Some essential notions

Multivariate time series analysis: Some essential notions Capter 2 Multivariate time series analysis: Some essential notions An overview of a modeling and learning framework for multivariate time series was presented in Capter 1. In tis capter, some notions on

More information

Comparison between two approaches to overload control in a Real Server: local or hybrid solutions?

Comparison between two approaches to overload control in a Real Server: local or hybrid solutions? Comparison between two approaces to overload control in a Real Server: local or ybrid solutions? S. Montagna and M. Pignolo Researc and Development Italtel S.p.A. Settimo Milanese, ITALY Abstract Tis wor

More information

Strategic trading and welfare in a dynamic market. Dimitri Vayanos

Strategic trading and welfare in a dynamic market. Dimitri Vayanos LSE Researc Online Article (refereed) Strategic trading and welfare in a dynamic market Dimitri Vayanos LSE as developed LSE Researc Online so tat users may access researc output of te Scool. Copyrigt

More information

SAT Subject Math Level 1 Facts & Formulas

SAT Subject Math Level 1 Facts & Formulas Numbers, Sequences, Factors Integers:..., -3, -2, -1, 0, 1, 2, 3,... Reals: integers plus fractions, decimals, and irrationals ( 2, 3, π, etc.) Order Of Operations: Aritmetic Sequences: PEMDAS (Parenteses

More information

ACT Math Facts & Formulas

ACT Math Facts & Formulas Numbers, Sequences, Factors Integers:..., -3, -2, -1, 0, 1, 2, 3,... Rationals: fractions, tat is, anyting expressable as a ratio of integers Reals: integers plus rationals plus special numbers suc as

More information

Derivatives Math 120 Calculus I D Joyce, Fall 2013

Derivatives Math 120 Calculus I D Joyce, Fall 2013 Derivatives Mat 20 Calculus I D Joyce, Fall 203 Since we ave a good understanding of its, we can develop derivatives very quickly. Recall tat we defined te derivative f x of a function f at x to be te

More information

2 Limits and Derivatives

2 Limits and Derivatives 2 Limits and Derivatives 2.7 Tangent Lines, Velocity, and Derivatives A tangent line to a circle is a line tat intersects te circle at exactly one point. We would like to take tis idea of tangent line

More information

Training Robust Support Vector Regression via D. C. Program

Training Robust Support Vector Regression via D. C. Program Journal of Information & Computational Science 7: 12 (2010) 2385 2394 Available at ttp://www.joics.com Training Robust Support Vector Regression via D. C. Program Kuaini Wang, Ping Zong, Yaoong Zao College

More information

Least Squares Estimation

Least Squares Estimation Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David

More information

Optimized Data Indexing Algorithms for OLAP Systems

Optimized Data Indexing Algorithms for OLAP Systems Database Systems Journal vol. I, no. 2/200 7 Optimized Data Indexing Algoritms for OLAP Systems Lucian BORNAZ Faculty of Cybernetics, Statistics and Economic Informatics Academy of Economic Studies, Bucarest

More information

Syntax Menu Description Options Remarks and examples Stored results Methods and formulas References Also see. Description

Syntax Menu Description Options Remarks and examples Stored results Methods and formulas References Also see. Description Title stata.com lpoly Kernel-weighted local polynomial smoothing Syntax Menu Description Options Remarks and examples Stored results Methods and formulas References Also see Syntax lpoly yvar xvar [ if

More information

To motivate the notion of a variogram for a covariance stationary process, { Ys ( ): s R}

To motivate the notion of a variogram for a covariance stationary process, { Ys ( ): s R} 4. Variograms Te covariogram and its normalized form, te correlogram, are by far te most intuitive metods for summarizing te structure of spatial dependencies in a covariance stationary process. However,

More information

- 1 - Handout #22 May 23, 2012 Huffman Encoding and Data Compression. CS106B Spring 2012. Handout by Julie Zelenski with minor edits by Keith Schwarz

- 1 - Handout #22 May 23, 2012 Huffman Encoding and Data Compression. CS106B Spring 2012. Handout by Julie Zelenski with minor edits by Keith Schwarz CS106B Spring 01 Handout # May 3, 01 Huffman Encoding and Data Compression Handout by Julie Zelenski wit minor edits by Keit Scwarz In te early 1980s, personal computers ad ard disks tat were no larger

More information

Cyber Epidemic Models with Dependences

Cyber Epidemic Models with Dependences Cyber Epidemic Models wit Dependences Maocao Xu 1, Gaofeng Da 2 and Souuai Xu 3 1 Department of Matematics, Illinois State University [email protected] 2 Institute for Cyber Security, University of Texas

More information

Schedulability Analysis under Graph Routing in WirelessHART Networks

Schedulability Analysis under Graph Routing in WirelessHART Networks Scedulability Analysis under Grap Routing in WirelessHART Networks Abusayeed Saifulla, Dolvara Gunatilaka, Paras Tiwari, Mo Sa, Cenyang Lu, Bo Li Cengjie Wu, and Yixin Cen Department of Computer Science,

More information

Artificial Neural Networks for Time Series Prediction - a novel Approach to Inventory Management using Asymmetric Cost Functions

Artificial Neural Networks for Time Series Prediction - a novel Approach to Inventory Management using Asymmetric Cost Functions Artificial Neural Networks for Time Series Prediction - a novel Approac to Inventory Management using Asymmetric Cost Functions Sven F. Crone University of Hamburg, Institute of Information Systems [email protected]

More information

An inquiry into the multiplier process in IS-LM model

An inquiry into the multiplier process in IS-LM model An inquiry into te multiplier process in IS-LM model Autor: Li ziran Address: Li ziran, Room 409, Building 38#, Peing University, Beijing 00.87,PRC. Pone: (86) 00-62763074 Internet Address: [email protected]

More information

A system to monitor the quality of automated coding of textual answers to open questions

A system to monitor the quality of automated coding of textual answers to open questions Researc in Official Statistics Number 2/2001 A system to monitor te quality of automated coding of textual answers to open questions Stefania Maccia * and Marcello D Orazio ** Italian National Statistical

More information

Catalogue no. 12-001-XIE. Survey Methodology. December 2004

Catalogue no. 12-001-XIE. Survey Methodology. December 2004 Catalogue no. 1-001-XIE Survey Metodology December 004 How to obtain more information Specific inquiries about tis product and related statistics or services sould be directed to: Business Survey Metods

More information

How To Ensure That An Eac Edge Program Is Successful

How To Ensure That An Eac Edge Program Is Successful Introduction Te Economic Diversification and Growt Enterprises Act became effective on 1 January 1995. Te creation of tis Act was to encourage new businesses to start or expand in Newfoundland and Labrador.

More information

Tis Problem and Retail Inventory Management

Tis Problem and Retail Inventory Management Optimizing Inventory Replenisment of Retail Fasion Products Marsall Fiser Kumar Rajaram Anant Raman Te Warton Scool, University of Pennsylvania, 3620 Locust Walk, 3207 SH-DH, Piladelpia, Pennsylvania 19104-6366

More information

A strong credit score can help you score a lower rate on a mortgage

A strong credit score can help you score a lower rate on a mortgage NET GAIN Scoring points for your financial future AS SEEN IN USA TODAY S MONEY SECTION, JULY 3, 2007 A strong credit score can elp you score a lower rate on a mortgage By Sandra Block Sales of existing

More information

Chapter 7 Numerical Differentiation and Integration

Chapter 7 Numerical Differentiation and Integration 45 We ave a abit in writing articles publised in scientiþc journals to make te work as Þnised as possible, to cover up all te tracks, to not worry about te blind alleys or describe ow you ad te wrong idea

More information

PLUG-IN BANDWIDTH SELECTOR FOR THE KERNEL RELATIVE DENSITY ESTIMATOR

PLUG-IN BANDWIDTH SELECTOR FOR THE KERNEL RELATIVE DENSITY ESTIMATOR PLUG-IN BANDWIDTH SELECTOR FOR THE KERNEL RELATIVE DENSITY ESTIMATOR ELISA MARÍA MOLANES-LÓPEZ AND RICARDO CAO Departamento de Matemáticas, Facultade de Informática, Universidade da Coruña, Campus de Elviña

More information

M(0) = 1 M(1) = 2 M(h) = M(h 1) + M(h 2) + 1 (h > 1)

M(0) = 1 M(1) = 2 M(h) = M(h 1) + M(h 2) + 1 (h > 1) Insertion and Deletion in VL Trees Submitted in Partial Fulfillment of te Requirements for Dr. Eric Kaltofen s 66621: nalysis of lgoritms by Robert McCloskey December 14, 1984 1 ackground ccording to Knut

More information

WORKING PAPER SERIES THE INFORMATIONAL CONTENT OF OVER-THE-COUNTER CURRENCY OPTIONS NO. 366 / JUNE 2004. by Peter Christoffersen and Stefano Mazzotta

WORKING PAPER SERIES THE INFORMATIONAL CONTENT OF OVER-THE-COUNTER CURRENCY OPTIONS NO. 366 / JUNE 2004. by Peter Christoffersen and Stefano Mazzotta WORKING PAPER SERIES NO. 366 / JUNE 24 THE INFORMATIONAL CONTENT OF OVER-THE-COUNTER CURRENCY OPTIONS by Peter Cristoffersen and Stefano Mazzotta WORKING PAPER SERIES NO. 366 / JUNE 24 THE INFORMATIONAL

More information

Welfare, financial innovation and self insurance in dynamic incomplete markets models

Welfare, financial innovation and self insurance in dynamic incomplete markets models Welfare, financial innovation and self insurance in dynamic incomplete markets models Paul Willen Department of Economics Princeton University First version: April 998 Tis version: July 999 Abstract We

More information

Master s Theory Exam Spring 2006

Master s Theory Exam Spring 2006 Spring 2006 This exam contains 7 questions. You should attempt them all. Each question is divided into parts to help lead you through the material. You should attempt to complete as much of each problem

More information

Staffing and routing in a two-tier call centre. Sameer Hasija*, Edieal J. Pinker and Robert A. Shumsky

Staffing and routing in a two-tier call centre. Sameer Hasija*, Edieal J. Pinker and Robert A. Shumsky 8 Int. J. Operational Researc, Vol. 1, Nos. 1/, 005 Staffing and routing in a two-tier call centre Sameer Hasija*, Edieal J. Pinker and Robert A. Sumsky Simon Scool, University of Rocester, Rocester 1467,

More information

College Planning Using Cash Value Life Insurance

College Planning Using Cash Value Life Insurance College Planning Using Cas Value Life Insurance CAUTION: Te advisor is urged to be extremely cautious of anoter college funding veicle wic provides a guaranteed return of premium immediately if funded

More information

SAT Math Must-Know Facts & Formulas

SAT Math Must-Know Facts & Formulas SAT Mat Must-Know Facts & Formuas Numbers, Sequences, Factors Integers:..., -3, -2, -1, 0, 1, 2, 3,... Rationas: fractions, tat is, anyting expressabe as a ratio of integers Reas: integers pus rationas

More information

Section 3.3. Differentiation of Polynomials and Rational Functions. Difference Equations to Differential Equations

Section 3.3. Differentiation of Polynomials and Rational Functions. Difference Equations to Differential Equations Difference Equations to Differential Equations Section 3.3 Differentiation of Polynomials an Rational Functions In tis section we begin te task of iscovering rules for ifferentiating various classes of

More information

Math 113 HW #5 Solutions

Math 113 HW #5 Solutions Mat 3 HW #5 Solutions. Exercise.5.6. Suppose f is continuous on [, 5] and te only solutions of te equation f(x) = 6 are x = and x =. If f() = 8, explain wy f(3) > 6. Answer: Suppose we ad tat f(3) 6. Ten

More information

A New Cement to Glue Nonconforming Grids with Robin Interface Conditions: The Finite Element Case

A New Cement to Glue Nonconforming Grids with Robin Interface Conditions: The Finite Element Case A New Cement to Glue Nonconforming Grids wit Robin Interface Conditions: Te Finite Element Case Martin J. Gander, Caroline Japet 2, Yvon Maday 3, and Frédéric Nataf 4 McGill University, Dept. of Matematics

More information

TRADING AWAY WIDE BRANDS FOR CHEAP BRANDS. Swati Dhingra London School of Economics and CEP. Online Appendix

TRADING AWAY WIDE BRANDS FOR CHEAP BRANDS. Swati Dhingra London School of Economics and CEP. Online Appendix TRADING AWAY WIDE BRANDS FOR CHEAP BRANDS Swati Dingra London Scool of Economics and CEP Online Appendix APPENDIX A. THEORETICAL & EMPIRICAL RESULTS A.1. CES and Logit Preferences: Invariance of Innovation

More information

OPTIMAL DISCONTINUOUS GALERKIN METHODS FOR THE ACOUSTIC WAVE EQUATION IN HIGHER DIMENSIONS

OPTIMAL DISCONTINUOUS GALERKIN METHODS FOR THE ACOUSTIC WAVE EQUATION IN HIGHER DIMENSIONS OPTIMAL DISCONTINUOUS GALERKIN METHODS FOR THE ACOUSTIC WAVE EQUATION IN HIGHER DIMENSIONS ERIC T. CHUNG AND BJÖRN ENGQUIST Abstract. In tis paper, we developed and analyzed a new class of discontinuous

More information

Bonferroni-Based Size-Correction for Nonstandard Testing Problems

Bonferroni-Based Size-Correction for Nonstandard Testing Problems Bonferroni-Based Size-Correction for Nonstandard Testing Problems Adam McCloskey Brown University October 2011; Tis Version: October 2012 Abstract We develop powerful new size-correction procedures for

More information

Chapter 4: Vector Autoregressive Models

Chapter 4: Vector Autoregressive Models Chapter 4: Vector Autoregressive Models 1 Contents: Lehrstuhl für Department Empirische of Wirtschaftsforschung Empirical Research and und Econometrics Ökonometrie IV.1 Vector Autoregressive Models (VAR)...

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes

More information

Chapter 6: Multivariate Cointegration Analysis

Chapter 6: Multivariate Cointegration Analysis Chapter 6: Multivariate Cointegration Analysis 1 Contents: Lehrstuhl für Department Empirische of Wirtschaftsforschung Empirical Research and und Econometrics Ökonometrie VI. Multivariate Cointegration

More information

Multigrid computational methods are

Multigrid computational methods are M ULTIGRID C OMPUTING Wy Multigrid Metods Are So Efficient Originally introduced as a way to numerically solve elliptic boundary-value problems, multigrid metods, and teir various multiscale descendants,

More information

Predicting the behavior of interacting humans by fusing data from multiple sources

Predicting the behavior of interacting humans by fusing data from multiple sources Predicting te beavior of interacting umans by fusing data from multiple sources Erik J. Sclict 1, Ritcie Lee 2, David H. Wolpert 3,4, Mykel J. Kocenderfer 1, and Brendan Tracey 5 1 Lincoln Laboratory,

More information

A.II. Kernel Estimation of Densities

A.II. Kernel Estimation of Densities A.II. Kernel Estimation of Densities Olivier Scaillet University of Geneva and Swiss Finance Institute Outline 1 Introduction 2 Issues with Empirical Averages 3 Kernel Estimator 4 Optimal Bandwidth 5 Bivariate

More information

Linear Threshold Units

Linear Threshold Units Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear

More information

DEPARTMENT OF ECONOMICS HOUSEHOLD DEBT AND FINANCIAL ASSETS: EVIDENCE FROM GREAT BRITAIN, GERMANY AND THE UNITED STATES

DEPARTMENT OF ECONOMICS HOUSEHOLD DEBT AND FINANCIAL ASSETS: EVIDENCE FROM GREAT BRITAIN, GERMANY AND THE UNITED STATES DEPARTMENT OF ECONOMICS HOUSEHOLD DEBT AND FINANCIAL ASSETS: EVIDENCE FROM GREAT BRITAIN, GERMANY AND THE UNITED STATES Sara Brown, University of Leicester, UK Karl Taylor, University of Leicester, UK

More information

Multivariate normal distribution and testing for means (see MKB Ch 3)

Multivariate normal distribution and testing for means (see MKB Ch 3) Multivariate normal distribution and testing for means (see MKB Ch 3) Where are we going? 2 One-sample t-test (univariate).................................................. 3 Two-sample t-test (univariate).................................................

More information

3. Regression & Exponential Smoothing

3. Regression & Exponential Smoothing 3. Regression & Exponential Smoothing 3.1 Forecasting a Single Time Series Two main approaches are traditionally used to model a single time series z 1, z 2,..., z n 1. Models the observation z t as a

More information

Sections 2.11 and 5.8

Sections 2.11 and 5.8 Sections 211 and 58 Timothy Hanson Department of Statistics, University of South Carolina Stat 704: Data Analysis I 1/25 Gesell data Let X be the age in in months a child speaks his/her first word and

More information

SF2940: Probability theory Lecture 8: Multivariate Normal Distribution

SF2940: Probability theory Lecture 8: Multivariate Normal Distribution SF2940: Probability theory Lecture 8: Multivariate Normal Distribution Timo Koski 24.09.2015 Timo Koski Matematisk statistik 24.09.2015 1 / 1 Learning outcomes Random vectors, mean vector, covariance matrix,

More information

Smoothing and Non-Parametric Regression

Smoothing and Non-Parametric Regression Smoothing and Non-Parametric Regression Germán Rodríguez [email protected] Spring, 2001 Objective: to estimate the effects of covariates X on a response y nonparametrically, letting the data suggest

More information

On closed-form solutions of a resource allocation problem in parallel funding of R&D projects

On closed-form solutions of a resource allocation problem in parallel funding of R&D projects Operations Research Letters 27 (2000) 229 234 www.elsevier.com/locate/dsw On closed-form solutions of a resource allocation problem in parallel funding of R&D proects Ulku Gurler, Mustafa. C. Pnar, Mohamed

More information

The modelling of business rules for dashboard reporting using mutual information

The modelling of business rules for dashboard reporting using mutual information 8 t World IMACS / MODSIM Congress, Cairns, Australia 3-7 July 2009 ttp://mssanz.org.au/modsim09 Te modelling of business rules for dasboard reporting using mutual information Gregory Calbert Command, Control,

More information

SYSTEMS OF REGRESSION EQUATIONS

SYSTEMS OF REGRESSION EQUATIONS SYSTEMS OF REGRESSION EQUATIONS 1. MULTIPLE EQUATIONS y nt = x nt n + u nt, n = 1,...,N, t = 1,...,T, x nt is 1 k, and n is k 1. This is a version of the standard regression model where the observations

More information

Model Quality Report in Business Statistics

Model Quality Report in Business Statistics Model Quality Report in Business Statistics Mats Bergdal, Ole Blac, Russell Bowater, Ray Cambers, Pam Davies, David Draper, Eva Elvers, Susan Full, David Holmes, Pär Lundqvist, Sixten Lundström, Lennart

More information

Finite cloud method: a true meshless technique based on a xed reproducing kernel approximation

Finite cloud method: a true meshless technique based on a xed reproducing kernel approximation INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN ENGINEERING Int. J. Numer. Meth. Engng 2001; 50:2373 2410 Finite cloud method: a true meshless technique based on a xed reproducing kernel approximation N.

More information

Writing Mathematics Papers

Writing Mathematics Papers Writing Matematics Papers Tis essay is intended to elp your senior conference paper. It is a somewat astily produced amalgam of advice I ave given to students in my PDCs (Mat 4 and Mat 9), so it s not

More information

Projective Geometry. Projective Geometry

Projective Geometry. Projective Geometry Euclidean versus Euclidean geometry describes sapes as tey are Properties of objects tat are uncanged by rigid motions» Lengts» Angles» Parallelism Projective geometry describes objects as tey appear Lengts,

More information

An Intuitive Framework for Real-Time Freeform Modeling

An Intuitive Framework for Real-Time Freeform Modeling An Intuitive Framework for Real-Time Freeform Modeling Mario Botsc Leif Kobbelt Computer Grapics Group RWTH Aacen University Abstract We present a freeform modeling framework for unstructured triangle

More information

Keskustelualoitteita #65 Joensuun yliopisto, Taloustieteet. Market effiency in Finnish harness horse racing. Niko Suhonen

Keskustelualoitteita #65 Joensuun yliopisto, Taloustieteet. Market effiency in Finnish harness horse racing. Niko Suhonen Keskustelualoitteita #65 Joensuun yliopisto, Taloustieteet Market effiency in Finnis arness orse racing Niko Suonen ISBN 978-952-219-283-7 ISSN 1795-7885 no 65 Market Efficiency in Finnis Harness Horse

More information

Rafał Weron * FORECASTING WHOLESALE ELECTRICITY PRICES: A REVIEW OF TIME SERIES MODELS. 1. Introduction

Rafał Weron * FORECASTING WHOLESALE ELECTRICITY PRICES: A REVIEW OF TIME SERIES MODELS. 1. Introduction To appear as: R. Weron (008) Forecasting wolesale electricity prices: A review of time series models, in "Financial Markets: Principles of Modelling, Forecasting and Decision-Making", eds. W. Milo, P.

More information

Introduction to General and Generalized Linear Models

Introduction to General and Generalized Linear Models Introduction to General and Generalized Linear Models General Linear Models - part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby

More information

A Multigrid Tutorial part two

A Multigrid Tutorial part two A Multigrid Tutorial part two William L. Briggs Department of Matematics University of Colorado at Denver Van Emden Henson Center for Applied Scientific Computing Lawrence Livermore National Laboratory

More information

Free Shipping and Repeat Buying on the Internet: Theory and Evidence

Free Shipping and Repeat Buying on the Internet: Theory and Evidence Free Sipping and Repeat Buying on te Internet: eory and Evidence Yingui Yang, Skander Essegaier and David R. Bell 1 June 13, 2005 1 Graduate Scool of Management, University of California at Davis ([email protected])

More information

15.062 Data Mining: Algorithms and Applications Matrix Math Review

15.062 Data Mining: Algorithms and Applications Matrix Math Review .6 Data Mining: Algorithms and Applications Matrix Math Review The purpose of this document is to give a brief review of selected linear algebra concepts that will be useful for the course and to develop

More information

13 PERIMETER AND AREA OF 2D SHAPES

13 PERIMETER AND AREA OF 2D SHAPES 13 PERIMETER AND AREA OF D SHAPES 13.1 You can find te perimeter of sapes Key Points Te perimeter of a two-dimensional (D) sape is te total distance around te edge of te sape. l To work out te perimeter

More information

Chapter 3: The Multiple Linear Regression Model

Chapter 3: The Multiple Linear Regression Model Chapter 3: The Multiple Linear Regression Model Advanced Econometrics - HEC Lausanne Christophe Hurlin University of Orléans November 23, 2013 Christophe Hurlin (University of Orléans) Advanced Econometrics

More information

MULTY BINARY TURBO CODED WOFDM PERFORMANCE IN FLAT RAYLEIGH FADING CHANNELS

MULTY BINARY TURBO CODED WOFDM PERFORMANCE IN FLAT RAYLEIGH FADING CHANNELS Volume 49, Number 3, 28 MULTY BINARY TURBO CODED WOFDM PERFORMANCE IN FLAT RAYLEIGH FADING CHANNELS Marius OLTEAN Maria KOVACI Horia BALTA Andrei CAMPEANU Faculty of, Timisoara, Romania Bd. V. Parvan,

More information

Factorization Theorems

Factorization Theorems Chapter 7 Factorization Theorems This chapter highlights a few of the many factorization theorems for matrices While some factorization results are relatively direct, others are iterative While some factorization

More information