1 AUSTRIAN JOURNAL OF STATISTICS Volume 3 003, Number 1&, Adaptive Regression on te Real Line in Classes of Smoot Functions L.M. Artiles and B.Y. Levit Eurandom, Eindoven, te Neterlands Queen s University, Kingston, Canada Abstract: Adaptive pointwise estimation of an unknown regression function fx, x R corrupted by additive Gaussian noise is considered in te equidistant design setting. Te function f is assumed to belong to te class Aα of functions wose Fourier transform are rapidly decreasing in te weigted L -sense. Te rate of decrease is described by a weigt function tat depends on te vector of parameters α wic, in te adaptive setting, is typically unknown. For any of te classes Aα, α fixed, we describe minimax estimators up to a constant as te bin-widt goes to zero. Conditions under wic an adaptive study is suitable are presented and a notion of adaptive asymptotic optimality is introduced based on distinguising, among all possible functional scales, between te so-called non-parametric NP and pseudo-parametric PP scales. We propose adaptive estimators wic tune up point-wisely to te unknown smootness of f. We prove tem to be asymptotically adaptively minimax for large collections of NP functional scales, subject to being rate efficient for any of te PP functional scales. Keywords: Non-parametric Statistics, Minimax Estimation, Adaptive Estimation, Fourier Transformations. 1 Introduction During te last two decades adaptive estimation as become one of te most active areas of researc in non-parametric statistics. Te introduction of different models of adaptive estimation reflects te existing practical needs for more realistic models and flexible metods of estimation. Study of tese models brougt wit it new callenging problems wic required creation of new statistical metods and approaces. In tis paper we study non-parametric adaptive regression in a fixed design model in wic an unknown regression function fx can be observed on an equidistant grid of te wole real line. More precisely, for a given bin-widt > 0, we consider te additive model of observations given by y l = fl + ξ l, l = 0, ±1, ±,... 1 were ξ l are independent centered Gaussian random variables N 0, σ, wit a given variance σ > 0. Often in te statistical literature more advanced results are obtained in te wite noise model dv x = fx dx + ɛ dw x, < x <,
2 100 Austrian Journal of Statistics, Vol , No. 1&, wic is just an approximation to te model 1, wit ɛ = σ. Here V is te noisy observation of an unknown regression function f, ɛ is te resolving noise and W x represents a standard Wiener process. Tere exists a uge literature on te equivalence between tese two models, cf. e.g. Brown and Low 1996 and Nussbaum 1996, but tis does not cover our main problem ere, namely adaptive non-parametric estimation. Our approac is greatly influenced by a recent paper, Lepski and Levit 1998, wic was a milestone in adaptive estimation of infinitely differentiable functions, in te wite noise model. Below we will explain main differences between our approac and tat of Lepski and Levit In non-parametric statistics, classes of functions are in general described by smootness parameters. In tis paper we sall study classes of functions defined in terms of positive parameters γ, β and r wose interpretation will be explained below. We will study estimation of f in 1, under te assumption tat f belongs to te functional class Aγ, β, r wic is te collection of all continuous functions suc tat f γ,β,r := γ e γt r F[f]t dt 1. 3 Here F[f] represents te Fourier transform of f. Te collection of all suc classes will be called functional scale. Note tat wen te parameters are assumed known, we are dealing wit te problem of non-parametric estimation muc studied recently, especially since te publications, Ibragimov and Has miskii 1981, 198, 1983, Stone 198. Te situation in wic neiter of tese parameters is known a priori is muc more realistic and complex. A real progress in tis problem wic is usually referred to as adaptive estimation, as been only acieved in te last decade, most notably since te publication of Lepski 1990, 1991, 199a, 199b. Furter progress was acieved in Lepski and Levit 1998, For all γ, β, r, te class Aγ, β, r is a class of infinitely differentiable functions, and eac of te parameters affects te smootness and te accuracy of te best nonparametric estimators in its own way. Te parameter γ is some kind of scale parameter: one can verify tat f A1, β, r if and only if 1 f Aγ, β, r. Terefore, γ γ of all parameters, it affects te smootness of f most dramatically. Te bigger is γ, te smooter are te functions of te class. Te parameter β can be interpreted as a size parameter and represents te radius of te corresponding L -ellipsoid defined by 3. Note tat f Aγ, 1, r if and only if βf Aγ, β, r. Terefore te bigger is β, te less smoot are te functions of te class. Finally, r can be best described as a parameter responsible for te type of smootness. It is well known tat for r = 1 all functions in te class Aγ, β, r admit bounded analytic continuation into te strip z = x + iy : y < γ} of te complex plane Paley- Wiener teorem, and terefore for all r > 1 te functions in Aγ, β, r are entire functions i.e. functions admitting analytic continuation into te wole complex plane. For r < 1 tese functions are only infinitely differentiable, and teir smootness increases togeter wit r. In te Gaussian wite noise model Lepski and Levit 1998 studied adaptive estimation for even broader classes of functions wit rapidly vanising Fourier transforms
3 L.M. Artiles and B.Y. Levit 101 F[f]t. However, teir main conclusions are readily interpretable in te special example of functional classes A γ, γ, r = f continuous, F[f]t γ exp γt r } wic are quite similar to our classes Aγ, γ, r. Let us remind some of tese conclusions ere, as a starting point for outlining our main results. For simplicity, we will assume, after Lepski and Levit 1998, tat 0 < r < r < r + <. In te adaptive estimation, wen te parameters suc as γ, β, r are unknown, one is looking for statistical procedures wic can adapt to te largest possible scope of tese parameters. As te smootness of te underlying functions is most notably affected by te scale parameter γ, we will mainly refer to te ensuing uncertainty in te value of tis parameter. More specifically, te accuracy of te best metods of estimation will be determined by te effective noise ɛ /γ, were ɛ is te average noise intensity in te observation model 1. To realize te wole scope of te problem, it is useful to look at te extreme cases. On one and, te situation could be so bad, tat no consistent estimation of te unknown function would be possible at all, even if te parameter γ was completely known. On an intuitive level, it is quite clear tat suc a situation occurs wen ɛ /γ 0. We can exclude tis case from consideration on te ground tat noting can be done in suc an extreme situation. Tus one can restrict attention to te case γ ɛ. Te situation deteriorates furter in te adaptive setting, due to te uncertainty in parameter γ. According to Lepski and Levit 1998, adaptive metods can only work efficiently if γ ɛ τ, for some 0 < τ <. On te oter and, if γ becomes too big, te underlying functions become unrealistically smoot and can be estimated wit accuracy Oɛ, i.e. wit te same accuracy wic could be acieved if all underlying functions were eiter constant, or just included a few unknown parameters. According to Lepski and Levit 1998, suc an offbeat situation occurs only wen γ becomes of order log 1/r ɛ 1. Terefore one can restrict attention to tose γ for wic ɛ τ γ log 1/r ɛ 1, wic, in a sense, is te largest possible range for wic adaptive procedure can exist. For all γ in tis range, an efficient adaptive non-parametric procedure as been proposed in Lepski and Levit Note tat tis discussion led us, by te very nature of te statistical problem of adaptation, to a situation in wic te unknown parameter of te scale γ belonged to a region Γ = Γ ɛ depending on te index ɛ of te model. In oter words, our adaptive setting leads us to a natural assumption tat te unknown scale parameter γ may itself depend on te index ɛ. Now, in te model we ave just discussed te essential role was played by te noise intensity ɛ and te scale parameter γ. Our model of discrete regression is more realistic and also contains more parameters: σ,, γ, β, r. Since te wite noise model is known to approximate te discrete regression model 1, one can expect some similarity between te ensuing results, namely tat similar procedure could lead to an efficient adaptive metod of estimation in te discrete regression. Witout aiming at precise definitions, one could speak in tis case of a weak equivalence between te wite noise and discrete time adaptive regression scemes. However, just as te relation between te two parameters involved played an important role in te above discussion, a more complicated relation between all involved parameters affects te quality of te optimal adaptive procedure in te discrete models. In fact, suc relations become more complex in te discrete case, not only because of additional parameters, all of wic may be unknown and, terefore vary togeter wit ɛ, but also due to
4 10 Austrian Journal of Statistics, Vol , No. 1&, te limitations to wic te continuous time model captures te underlying properties of te discrete model 1. In particular, te obvious naive recipe of just replacing ɛ in all te above restrictions by σ does not provide a correct answer. We comment next tat te classes similar to 3 are well known in statistics. Apparently tey ave been introduced first for r = 1 in Ibragimov and Has minskii 1983, were optimal rates of convergence were found in estimating an unknown density function f Aγ, β, 1. Later Golubev and Levit 1996 sowed again for r = 1 tat tese non-parametric classes are quite unique, in te sense tat not only optimal rates, but exact asymptotically minimax estimators, even point-wisely, can be explicitly constructed for suc classes. Asymptotically efficient non-parametric regression for te classes Aγ, β, 1 was studied in Golubev, Levit and Tsybakov Here we consider more general classes Aγ, β, r, use kernel-type estimators, different from Golubev, Levit and Tsybakov 1996 and, more significantly, consider te problem of adaptive estimation. In te Gaussian wite noise model Lepski and Levit 1998 considered still more general classes of infinitely differentiable functions, wit rapidly vanising Fourier transforms. However, te restriction on te Fourier transform of f in teir paper was based on te L -, rater tan on te L -norm, as in our case. Tey ave not only proposed asymptotically minimax estimators for all of te corresponding classes, but ave also constructed asymptotically optimal adaptive estimators for te wole scale of suc classes. Since in most applications te information about an unknown function is typically conveyed by discrete measurements, our model can be viewed as a more realistic approximation, tan te classical wite noise model. Terefore our model contains an additional discretization parameter te bin-widt. Our goal is to study, to wat degree te metod of te adaptive procedure proposed in Lepski and Levit 1998 works in te discrete regression setting. More precisely, we are seeking to find natural conditions under wic our equidistant regression model is weakly equivalent to te classical wite noise model, in te sense tat te asymptotically optimal adaptive estimators proposed for te later model, are still asymptotically optimal in te equidistant non-parametric regression models. In te next section we introduce te model. In Section 3 we prove some auxiliary lemmas. In Section 4, te problem of asymptotic minimax regression is studied first under te assumption tat te class of functions is completely determined by a fixed vector of parameters γ, β, r, tese parameters being independent of te index of te model. At te end of tis section we give te first steps towards te adaptive framework by allowing te parameters of te class depend on te index of te model. In Section 5 we consider te functional scales wic are collections of functional classes, see 40. We define te optimality criteria based on te classification of te scales in pseudo-parametric PP and non-parametric NP scales. We ten prove optimality of te adaptive procedure. We sall see tat, compared to a given functional class Aγ, β, r, an additional logaritmic factor in te exact rate of convergence as to be paid as a price for te uncertainty about te actual class te regression function belongs to, see Teorem 3.
5 L.M. Artiles and B.Y. Levit 103 Te Model Let us formalize our model. Definition 1 Let γ, β, r > 0 be given. We denote by Aγ, β, r te class of continuous functions f : R R, wose Fourier transforms F[f] satisfy γ f γ,β,r := e γt r F[f]t dt 1. 4 In tis study we use te following definition of te Fourier transform, F[f]t = e itx fx dx. 5 Note tat te Fourier inversion formula fx = 1 e itx F[f]t dt 6 certainly olds under assumption 4. It is easy to see tat for all γ, β, r > 0, functions in Aγ, β, r are infinitely differentiable. Now, let us consider te following observation model y l = fl + ξ l, l = 0, ±1, ±,..., 7 were ξ l are i.i.d. Gaussian random variables, N 0, σ, σ > 0. We assume tat te function f belongs to te family Aγ, β, r, for some γ, β, r > 0. Our purpose is to estimate te unknown function fx based on te vector of observations y =..., y, y 1, y 0, y 1, y,.... We will coose our optimal estimator from te family of kernel type estimators ˆf,s x, y = were k s, s 0, is te so-called sinc-function k s x l y l 8 k s x = and k s 0 = s. Tis kernel as te property sin sx x, 9 and terefore, according to te convolution teorem, F[k s ]t = [ s,s] t 10 F[f k s ]t = [ s,s] t F[f]t, 11 were * represents te convolution operator. Te kernel k s is just one of many possible, but its very tractable properties make it an attractive tool: it elps significantly in te searc of te most general possible results and
6 104 Austrian Journal of Statistics, Vol , No. 1&, clarifies te underlying ideas. For practical purposes some oter kernels, suc as de la Vallée Poussin kernel cf. Nikol skiĭ, 1975, p. 301, may be more relevant and typically would work better. Te parameter s is called te bandwidt. As we sall see in Section 4, for any fixed class tere exists an optimum bandwidt s. Te optimum bandwidt will depend on parameters γ, β, r, σ as well as te index of te model, called te bin-widt, wic in our asymptotic study will tend to zero. Denote by f x, y an arbitrary estimator of fx based on te observations y. To sorten te notation we will often write f x instead of f x, y. Let P f be te distribution of te vector y and let E f and Var f denote te expectation and te variance wit respect to tis measure. Wen tere is no possibility of confusion we will simply write P, E and Var respectively. Let W be te class of loss functions wx, x R, suc tat wx = w x, wx wy for x y, x, y R, and for some 0 < η < 1 e ηx wx dx <. Wit an appropriate normalizing factor σ to be defined sortly, and w W, we will consider te maximum risk, over a fixed functional class Aγ, β, r, given by sup E f w σ 1 f x, y fx f Aγ,β,r as a global measure of te error of te estimator f over te wole class Aγ, β, r. Wen te classes Aγ, β, r are considered fixed, our main goal is to find an estimator suc tat te corresponding maximum risk is as small as possible, i.e. acieves asymptotically te minimax risk inf f sup E f w f Aγ,β,r σ 1 f x, y fx were f is taken from te class of all possible estimators. In te adaptive setting, we sall allow γ, β, r to vary freely inside large scales K. Conditions under wic an adaptive study is suitable are presented and a notion of adaptive asymptotic optimality is introduced based on distinguising, among all possible functional scales, between te so-called non-parametric NP and pseudo-parametric PP scales. 3 Auxiliary Results In tis section we present, for te reader s convenience, two auxiliary results wic will be used in te subsequent sections. Te aim of te first lemma is to approximate summation formulas by integrals, wit a good approximation error in te case of very smoot integrands. Tis result is a version of te celebrated Poisson summation formula. It
7 L.M. Artiles and B.Y. Levit 105 as been used in a similar situation in Golubev, Levit and Tsybakov Below Aγ, β, r, γ, β, r > 0 are te functional classes of infinitely differentiable functions previously defined and k s x is te kernel 9. Lemma 1 Te following properties old: a Let f, g be continuous functions in L R suc tat F[f], F[g] L 1 R, ten gx lfl y = 1 e itx y F[g]t F[f]t dt + 1 l 0 l ei y e itx y F[g]t F[f] t + l dt = gx zfz y dz + 1 l 0 l ei y e itx y F[g]t F[f]t + l dt. b For arbitrary numbers s 1, s 0 s 1 s denote x = k s x k s1 x. Ten, uniformly in γ, β, r, s i 0, i = 1,, x R and f Aγ, β, r as 0 x lfl = 1 were c r = max1, r 1. e itx F[ ]t F[f]t dt + O e γ r s /c r 1/ dt s 1 γ eγtr, c Let s 1, s and x be as before. Ten, uniformly in s 1, s and x R, for 0, x l = s s O1 s s 1. Proof. a Te proof is based on te formula e i lx = δx l, 1 known in te teory of distributions cf. e.g. Antonsik et al., 1973, C Using te Fourier inversion formula, te distributional formula 1 and wit some algebra, one obtains Note tat x = k s x for s 1 = 0 and s = s.
8 106 Austrian Journal of Statistics, Vol , No. 1&, gx lfl y = = = = 1 = 1 = 1 e itx F[g]t e isy F[f]s e itx l F[g]t dt e is tl dt ds s t e itx F[g]t e isy F[f]s δ l dt ds e itx F[g]t e itx F[g]t e e isy F[f]s δ s t l ds dt l it+ e itx y F[g]t F[f]t dt + 1 l 0 e i l y y F[f] t + l dt e isl y F[f]s ds e itx y F[g]t F[f] t + l dt = gx zfz y dz + 1 l 0 e i l y e itx y F[g]t F[f] t + l dt. b If f Aγ, β, r ten f belongs to L R according to te Parseval s formula. Also, F[f] L 1 R according to 4 and te Caucy-Scwartz inequality. Tus we can apply te previous result in a, using g = and y = 0. Note tat F[ ]t = s1,s ] t. Applying te Fourier inversion formula, te Caucy-Scwartz inequality and te c r - inequality, we obtain after a few transformations x lfl 1 e itx F[ ]t F[f]t dt 1 e itx F[ ]tf[f]t + l dt l 0 1 1/ γ e γt r F[f]t dt l 0 F[ ]t γ e γ t+ 1/ l r dt
9 L.M. Artiles and B.Y. Levit l 0 1/ s 1,s ] t β γ e γt r e lγ r /c r dt 1 l 0 lγ e r /c r 1/ s 1,s ]t β γ eγtr dt = 1 1 s 1/ s 1 γ eγtr dt l=1 e l γ r /c r s 1/ s 1 γ eγtr dt e γ r /c r + e γ xr /c r dx 1 = O e s γ r /c r 1/ dt s 1 γ eγtr, 0, were te last asymptotic can be easily derived by partial integration. c Applying a and taking f = g = and x = y, we see tat x l = x l l x Terefore = 1 F[ ]t dt + 1 x l s j s i l l 0 l=1 e i l x F[ ]t F[ ] t + l dt. F[ ]t F[ ] t + l dt s 1,s ] t s 1,s ] t + l dt wic completes te proof of te lemma. 5s s 1 = O1 s s 1, Te following elementary properties will be used below. Tey will elp in bounding te bias and te approximation errors. Lemma For any positive γ and r te following inequality olds s e γtr dt s e γsr rγs r 13
10 108 Austrian Journal of Statistics, Vol , No. 1&, for all s > t 0 were t 0 satisfies rγt 0 r = 1 and s 0 e γtr dt = s eγsr 1 + o1 14 rγs r uniformly in r < r < r + for γs, were r, r + > 0 are arbitrary fixed numbers. For te first inequality see e.g. Lepski and Levit 1998, eqs..8,.10. Te second property can be easily proven by partial integration. 4 Minimax Regression in Aγ, β, r 4.1 Optimality in te Case of Fixed Classes Te first result we present in tis section is obtained in te classical framework, i.e. in a situation were te function fx altoug unknown belongs to a given class. In oter words, te parameter α = γ, β, r of te class is known and fixed. Denote for sortness Aα = Aγ, β, r. We will prove tat asymptotically minimax estimators can be found among kernel estimators using a specified bandwidt and we will also calculate to a constant teir maximal asymptotic risk, for a variety of loss functions. Teorem 1 Let α > 0 and ω W. Ten for any x R, te kernel estimator ˆf = ˆf,s, in 8 wit te bandwidt s = s α, σ = 1 γ 1 log 1/r, 15 γσ satisfies lim 0 sup E f w f Aα σ s ˆf x fx lim 0 inf f = sup E f w f Aα σ s f x fx = E wξ were f is taken from te class of all possible estimators of f and ξ N 0, 1. Proof. Upper bound for te risk. Let us first study te sample properties of te family of estimators we use. According to te model for te observations 7 and te formula for te estimator 8 one can split te error term as follows, ˆf,s x fx = k s x lfl fx + k s x lξ l := bf, x, s, + vσ, x, s,.
11 L.M. Artiles and B.Y. Levit 109 For simplicity we sall write below b s = bf, x, s,, v s = vσ, x, s,. Te mean square error can be decomposed as E ˆf,s x fx = b s + Var v s, 16 were b s is te bias and v s is a normally distributed zero mean stocastic term. First, let us consider te bias. In order to apply Lemma 1 we take s 1 = 0 and s = s. In tis case = k s. Now, applying Lemma 1b and te Fourier inversion formula for fx we see tat uniformly in f Aα b s = 1 s e itx F[k s ]t 1F[f]tdt + O e γ r /c r 1/ dt 0 γ eγtr, for 0. Furtermore, applying Caucy-Scwartz inequality, property 10, and definition of te class Aγ, β, r we get b s 1 e itx F[k s ]t 1F[f]t dt 1 t >s γ e γt r dt + O e s γ r /c r 0 γ eγtr dt 1 s + O e s γ r /c r 0 γ eγtr dt γ e γtr dt + O e s γ r /c r 0 γ eγtr dt. 17 Second, let us consider te variance term. From Lemma 1c, wit s 1 = 0 and s = s, we see tat Var v s = σ ksx l = σ s 1 + O1 s, 18 wen 0. For any s denote σ,s = σ s and for te cosen bandwidt s = s denote te resulting variance 19 σ = σ α, σ = σ s. 0 From equations we see tat te mean square error of te estimator ˆf,s satisfies E ˆf,s x fx σ,s σ,s Os + σ,s + σ,s O e γ r /c r s s 0 γ e γtr dt γ eγtr dt. 1 Now we sall verify tat, taking s = s as defined in 15, te term of te rigt and side of te previous equation is equal to σ o1. Before going into details, let us remark tat
12 110 Austrian Journal of Statistics, Vol , No. 1&, te bandwidt s is precisely te bandwidt tat balances te main terms of te bias and te variance in te mean square error, i.e. it minimizes σ s wit respect to s, since by 15 + Let us return to equation 1. Note first tat s γ e γtr dt e γs r = β γσ. s 0, wen 0. 3 Second, applying te identity and Lemma, we see tat σ s γ e γtr dt = β γσ s 1 rγs r = e γtr dt = s r log γ σ wen 0. Finally, applying te identity and trivial inequality s σ γ e r/c r 0 γ eγtr β dt γ σ e β = γ σ wen 0. Tus, from 1 and 3 5 we ave tat s e γtr dt s e γs r 1 = o1, 4 γ r /c r +γs r E ˆf x fx = σ 1 + o1, 0. e γ r /c r = o1, 5 Note tat wen we normalize te error of our estimator by σ, te normalized error term ˆf x fx/σ as a normal distribution, wit mean of order o1 and variance equal to 1 + o1 were te terms o1 are small uniformly in f Aα wen goes to zero. Because te loss function w as only countably many discontinuity points, applying te dominated convergence teorem lim sup 0 f Aα E f w σ 1 ˆf x fx = E w ξ. 6 Lower bound for te risk. Consider te parametric family of functions f θ z = θgz, gz = s k s z x. Tese functions satisfy f θ x = θ, and if we assume tat θ θ were θ = s s 0 γ eγtr dt 1 7
13 L.M. Artiles and B.Y. Levit 111 ten γ e γt r F[f θ ]t dt = θ s γ e γt r F[k s ]t dt θ s γ e γt r [ s,s ]t dt 1. Tus f θ Aα for all θ suc tat θ θ. Now, we can apply Kakutani s teorem using te fact tat g l < according to Lemma 1c, and see tat dp θ dp 0 1 y = exp σ θ y l gl θ g l were P θ = P fθ cf. e.g. Hui-Hsiung, 1975, Sect. II.. Te statistic }, 8 T = y l gl g l 9 is sufficient for te parameter θ of te family of distributions P θ. Obviously T is normally distributed. Given f θ l = θgl, we can easily verify tat T N σ θ,, 30 g l and applying Lemma 1c, wit s 1 = 0 and s = s, we see tat 1 σ g l = σ s ks x l = σ s 1 + O1 s, wen goes to zero. Tus, T can be represented as and, according to te previous arguments, T = θ + ϕ ξ were ξ N 0, 1 31 ϕ = σ g l = σ 1 + o1. 3 To derive te required lower bound, let us assume te unknown parameter θ as a prior density λθ; a convenient coice is λθ = 1 θ θ cos θ, θ θ.
14 11 Austrian Journal of Statistics, Vol , No. 1&, We obtain ten, due to te sufficiency of te statistic T, inf sup E f w f f Aα σ s f x fx inf f inf ˆθ inf ˆθ sup E f w θ <θ σ s f x f θ x sup E θ w ˆθ θ θ <θ σ s θ θ E θ w ˆθ θ λθdθ σ s θ = inf E θ w ˆθT θ λθdθ ˆθT θ σ s ϕ = E w ξ ϕ σ θ 1 x 1wxe x dx 1 + o1. Here te last equation follows from Levit According to 3, 0, wile applying identity and Lemma we see tat ϕ σ = 1 + o1, σ θ = γσ s 0 γe γtr dt = γs γe tr dt 0 γs γs e γs r 1 0, 33 rγs r wen 0. Tus we ave tat, according to te dominated convergence teorem, lim inf sup E f w 0 f Aα σ s ˆf x fx lim inf 0 inf f sup E f w f Aα σ s f x fx Togeter te relations 6 and 34 prove te teorem. 4. An Extension to Non-fixed Classes E wξ. 34 Up till now we assumed tat te classes Aα were fixed, i.e. not depending on te parameter, toug te function we wanted to estimate could vary freely witin te given class Aα and, in particular, could depend on. Te possible dependency of f on implies tat te estimated function could be as bad as our model allowed it to be wic justified te minimax approac of Teorem 1. To summarize, te assumption tat our functional class Aα is fixed implies tat te smootness properties of te elements of te class are fixed. However, we migt want to furter relax tis restriction by allowing te class itself depend on. Indeed, tere is neiter practical justification, nor a logical
15 L.M. Artiles and B.Y. Levit 113 requirement, tat te smootness of te underlying function remains te same wile te level of noise decreases and consequently te resolution of te available statistical procedures increases. Tis will become even more natural in te adaptive setting of Section 5 were te smootness of te underlying function is not known beforeand. Tus, as a first step towards introducing te adaptive framework, we let te parameters of te model γ, β and r depend on. Even so, tey still be assumed to be known to te statistician tis assumption will be abolised later in te adaptive framework of Section 5. Tis approac will allow us to explore te limits of te model were its parameters are allowed to cange freely. Let s be as defined in Teorem 1. Note tat now te optimum bandwidt s depends on also troug te parameters γ, β and r. Neverteless te statement of Teorem 1 still olds, as we sall see, under corresponding assumptions. Teorem Let w W, and let te parameters β = β, r = r, γ = γ and σ = σ be all positive and suc tat 0 < lim inf 0 r lim sup 0 r <, 35 Ten lim 0 sup f Aα lim sup 0 lim inf 0 γ E f w σ s ˆf x fx =, γ σ 36 1/r log β = 0. γ σ 37 lim 0 inf f sup f Aα E f w σ s f x fx were s, f and ˆf are te same as in Teorem 1. = = E wξ Remark 1 Note tat te conditions 35 and 37 imply s 0 wen 0. As a direct consequence of tis, we obtain consistency, provided σ is bounded, since ten σ s 0. However, our asymptotic optimality result doesn t require σ to be bounded; in oter words tey apply even wen tere is no consistency! Proof. We prove tis teorem following te same proof of Teorem 1. It is sufficient to see tat relations 3 5 and 33 still old for te class Aγ, β, r. Te limit 3 follows from 35 and 37, te limits 4 and 33 follow from 35 and 36. Finally 5 follows from te identity γ σ e γ r /c r = exp c 1 r γ r 1 c r log r γ β 1/r r } γ σ and conditions Note tat /γ 0, by 36 and 37. Te rest of te proof remains te same. 38
16 114 Austrian Journal of Statistics, Vol , No. 1&, Te important conclusion wic can be drawn from te last result is tat in order to prove asymptotic optimality of our estimation procedure, we do not ave to invoke te assumption not always realistic tat te smootness of te estimated function remains te same, even wen te level of noise decreases and, as a consequence, te resolution of available statistical metods increases. Note tat in tis more general situation te corresponding optimal rate of convergence σ α, σ = σ γ 1 log 1 r, 39 γ σ can be of any order, wit respect to any of te parameters, or σ, varying from extremely fast, parametric rates, to extremely slow, non-parametric ones, and even all te way down to no consistency at all. Te problem wic we will face in next section, is tat in practice we often do not know te real class at all. 5 Adaptive Minimax Regression 5.1 Adaptive Estimation in Functional Scales As a transition from te classical minimax setting, studied in te previous sections, to te adaptive setting we introduce functional scales A K = Aα α K }, 40 corresponding to a subset K R 3 + in te underlying parameter space. As our scales A K can be identified wit corresponding subsets K, we will speak sometimes about a scale K, instead of A K, wen tere is no risk tat could lead to a confusion. Sometimes we can tink of te scale A K as te collection of functions } f Aα α K. We will say tat some limit exists uniformly in A K to express tat it exists uniformly in f Aα for every α and tey converge uniformly in α K. Our goal is to estimate a function wic belongs to Aα for some α K. So, we must find an estimator, wic does not depend on α and suc tat it performs optimally well over te wole scale K. For tis new setting a new definition of optimality is necessary. We use te following definition wic was used in Lepski and Levit From now on we will restrict ourselves to te loss functions wx = x p, p > 0. Let A K be a functional scale and F a class of estimators f. Definition An estimator ˆf F is called p, K, F-adaptively minimax, at a point x R, if for any oter estimator f F lim sup 0 sup α K sup f Aα E f ˆf x fx p sup f Aα E f f x fx p 1.
17 L.M. Artiles and B.Y. Levit 115 Te simplest example of a scale A K can be obtained wen K is a fixed compact subset of R 3 +. Our results below cover a muc broader setting in wic te set K itself can depend on te parameter. In our approac, suc results serve two goals. First of all, tey allow a better understanding of te true scope of adaptivity of statistical procedures, since tey describe te extreme situation in wic an adaptation is still possible. In fact all wat is needed below is tat te assumptions of our non-adaptive Teorem old uniformly on te scale K; below we formulate tese assumptions more explicitly. Definition 3 A functional scale A K or te corresponding scale K is called a regular, or an R-scale if te following conditions are satisfied: 0 < lim inf 0 inf r lim sup sup r <, 41 α K α K 0 and for some 0 < δ < 1. lim sup 0 lim inf 0 sup α K inf α K γ σ 1 δ γ =, 4 1/r log β = 0 43 γ σ Te second goal tat can be acieved by considering more general scales K is to introduce te notion of optimality in adaptive estimation, by specifying a natural set of estimators F in te above Definition. Note tat witin a large scale A K, unknown functions f can vary from extremely smoot ones, allowing parametric rate σ O, to muc less smoot functions, allowing slower rates σ O δ, δ < 1, or even extremely slow rates σ Olog 1 1/. Te first possibility is not typical in non-parametric estimation and only can appen in some extreme cases. Tese ideas are made more precise by introducing te following terminology classifying functional scales A K into pseudoparametric PP and non-parametric NP scales depending of teir global rates of convergence. Definition 4 A functional scale A K or te corresponding parameter scale K is called a pseudo-parametric, or a PP scale if lim sup 0 b non-parametric, or an NP-scale if sup s α <, α K lim inf s α =. α K 0 We sall call regular pseudo-parametric and regular non-parametric scales respectively RPP and RNP scales. Since pseudo-parametric scales are not typical, in non-parametric estimation and can only appen in some extreme cases, we will only require our statistical procedure to
18 116 Austrian Journal of Statistics, Vol , No. 1&, acieve te optimal rate σ O for suc scales; cf. te Definition of te corresponding classes F p below. Note tat even wit suc procedures, a better rate will be acieved, in estimating functions in any pseudo-parametric scale tan in any of te non-parametric scales. Furter a strong evidence suggests tat tere is ardly muc more one can do tan require rate optimality, for any of te pseudo-parametric scales. On te oter and, suc an approac allows to develop natural optimality criteria, for any adaptive procedure in te classes F in te case of non-parametric scales. Let F p = F p x be te class of all estimators f tat satisfy lim sup 0 sup sup α K f Aα σ E f 1/ f x fx p < for arbitrary RPP functional scales A K. Let Fp 0 = Fp 0 x denote te class of estimators suc tat lim sup E 0 σ 1/ f x p <. 0 It is easy to notice tat F p F 0 p. In te next subsection we present an adaptive estimator ˆf F p and prove it to be p, K, F p -adaptively minimax for arbitrary RNP functional scales. 5. Te Adaptive Estimator: Upper Bound Section 5.1 outlined te general adaptive setting, introduced a notion of optimal adaptive estimation and described regular non-parametric scales of infinitely differentiable functions. Our first result describes accuracy wic can be acieved for suc scales. Its proof starts wit te construction of an adaptive estimator acieving tis accuracy. In tis, te Lepski s metod will be used, wit te recent modification of Lepski and Levit Note tat te accuracy of our procedure loses a logaritmic factor compared to te nonadaptive case were te parameters of te underlying classes are known. In Section 5.3 we will see tat tis is an unavoidable pay for not knowing te smootness a priori and we will prove optimality of te proposed procedure in te sense of Definition. Remark In principle, one could also study adaptation to te unknown parameter σ. Tis owever leads to entirely different problems, and is not considered in tis tesis. Terefore we always assume tat σ is known, altoug it can vary wit. Denote ψ = ψ α = plog s α σ α were s α and σ α were defined in 15 and 0. Teorem 3 For any p > 0 tere exists an adaptive estimator ˆf suc tat for any x R and for any RNP functional scale A K, ˆf F p lim sup 0 sup α K sup f Aα E f ψ 1 ˆf x fx p 1.
19 L.M. Artiles and B.Y. Levit 117 Te adaptive estimator. First, let us coose parameters, 1/ < l < 1, 1/ < δ < 1, p 1 > 0, l 1 = δl, and define te sequence of bandwidts s 0 = 0, s i = expi l for i = 1,.... For eac, we take a subsequence S = s 0, s 1,... s I } were I = arg max i s i log 1 1/}, 44 < 1. Our asymptotic study considers 0, tus, witout loss of generality, we define I just for < 1. Now, let us denote ˆf i x = ˆf,si x, σ i = Var ˆf i x, b i = E f ˆfi x fx, ˆσ i = σ s i, and define te tresolds Finally we define î = min σi,j = Var ˆf j x ˆf i x, ˆσ i,j = σ s j s i, λ j = p log s j + p 1 log δ s j. } 1 i I : ˆf j x ˆf i x λ j ˆσ i,j j i j I. 45 We will prove below tat te estimator ˆf x = ˆfîx satisfies bot te statements contained in Teorem 3. Let us get first some insigt into te algoritm. Te sequence S of bandwidts as several important properties. First, it is increasing, tus te variance of te corresponding estimators is also increasing. Second, according to te definition of R-scales te bandwidts s α, see eq. 43, are suc tat s α δ uniformly in K for some δ < 1, and small enoug. Tus, s I is large enoug for small enoug, so tat for eac α, te optimum bandwidt s α corresponding to Aα, can be sandwiced between two consecutive elements of te sequence S, i.e. tere exists iα = iα, suc tat Te sequence is also dense enoug so tat s iα 1 < s α s iα. s i+1 lim = 1. i s i Tis guarantees tat s α and s iα are asymptotically equivalent since s α for 0 in NP scales.
20 118 Austrian Journal of Statistics, Vol , No. 1&, Te sequence of tresolds λ j as been cosen in suc a way tat, for large i, j iα i j, te probability of te event ˆf j x ˆf i x > λ j Var 1/ ˆf j x ˆf i x, 46 is very small since, except for an event of a small probability, tis can only occur if te bias b j b i Var 1/ ˆf j x ˆf i x wic is not te case for bandwidts greater tan s α as we will see. Terefore, for any given i and j > i we reject s i in favor of te subsequent elements of te sequence S, if te event 46 occurs. Tis pairwise comparison is performed for every i, and from all te accepted s i we select te smallest, i.e. we coose te estimator wit te smallest variance. Note tat according to te previous argument no bandwidt s i, i iα will be rejected, wit ig probability. However it is possible tat a bandwidt s i, i < iα is cosen. In tat case our procedure warrants tat, cf. 45, ˆfîx ˆf iα x λ iα Var 1/ ˆfîx ˆf iα x1 + o1 Tus in te worst case te accuracy of ˆf decreases by a factor 1 + λ iα wic is of order log s α asymptotically as 0. In te next subsection we prove tat te accuracy of tis algoritm is asymptotically optimal in te adaptive setting, for all RNP scales subject to certain mild additional assumptions; see Teorems 1 and 6. Now, let us turn to te proof of te teorem. We start wit an auxiliary result needed in te proof were we use te same notations as tose used in describing te estimation procedure. Lemma 3 For 0, uniformly wit respect to i, j 1 i, j I and wit respect to α varying in a regular scale, a b j = o1ˆσ j for all j suc tat iα j I. b σ j = ˆσ j 1 + Olog 1 1/. c b j b i 1 + o1ˆσ i,j for all i, j suc tat iα i j I. d σ i,j = ˆσ i,j1 + Olog 1 1/. Proof. a Using te bound for te bias given in 17, equation, and Lemma we see, wit some algebra, tat b j 1 σ s j s j γ e γtr dt + O e s j γ r /c r βγe γtr dt e γs j r γ σ rγs j + O e γ r r /c r σ s j 0 jr γ σ eγs = ˆσ j e γs r γs j r rγs r + O e γ r /c r +γs r e γ r /c r +γs j r.
21 L.M. Artiles and B.Y. Levit 119 Now, given s j s α and using conditions 44 in te definition of te sequence of bandwidts S and conditions in te definition of R scales, we obtain b j = o1ˆσ j wen 0, uniformly wit respect to j iα j I and wit respect to α in K. b Tis is just a reformulation of te asymptotic relation 18 using te fact tat, according to 44, s j log 1 1/. c Applying Lemma 1b taking s 1 = s i and s = s j, and arguing as in 17 and in te proof a, we see tat b j b i 1 e itx F[ i,j ]tf[f]t dt + O e s j γ r /c r s i γ eγtr dt 1 sj s i γ e γtr dt + O σ s j s i = ˆσ i,j γ σ O e γs ir e γ r /c r s j + e γ r /c r σ s j s i s i γ eγtr dt γ σ eγs jr e γs r γs i r + O e γ r /c r+γs r e γ r /c r+γs j r = ˆσ i,j1 + o1, 0. d It follows directly from Lemma 1c, taking s 1 = s i and s = s j. Here, as in 18, we can verify tat σ i,j = σ k s j x l k si x l = σ s j s i 1 + O1 s j s i. 47 and tus, using 44, tis completes te proof of te lemma. We now proceed wit proving Teorem 3. For arbitrary f in any R-functional scale A K, R f := E ˆfîx fx p = R f + R+ f were and R f = E R + f = E î iα} ˆfîx fx p} î>iα} ˆfîx fx p}.
22 10 Austrian Journal of Statistics, Vol , No. 1&, Let us examine R f first. We ave } î iα ˆfîx ˆf } iα x λ iαˆσî,iα terefore R f E î iα} ˆfîx ˆf iα x λ iαˆσ iα }, ˆfîx ˆf iα x + ˆf p iα x fx E λ iαˆσ iα + ˆf p iα x fx p E λ iαˆσ iα + b iα + σ iα ξ were ξ N 0, 1. Now according to Lemma 3, a and b, uniformly wit respect to α in any regular scale σ iα = ˆσ iα 1 + o1 and b iα = o1ˆσ iα, 0. It follows tat for 0 uniformly wit respect to any RPP scale R f = Op/, 48 wile by te dominated convergence teorem, uniformly in any RNP scale R f ψp α1 + o1. 49 Now let us examine R + f. Consider te auxiliary events A i = ω : ˆf i x fx } λ iˆσ i. Applying Hölder s inequality we obtain R + f = E = I i=iα+1 î>iα} ˆfîx fx p = R +,1 f + R+, f, I i=iα+1 E E ˆf i x fx p î=i} A i + î=i} A c i î=i} ˆf i x fx p were R +,1 f = I i=iα+1 λ i ˆσ i p/ Pî = i
23 L.M. Artiles and B.Y. Levit 11 and We ave R +, f = Pî = i j=i+1 I i=iα+1 E 1/ ˆfi x fx p P 1/ A c i. P ˆf j 1 x ˆf i 1 x > ˆσ i 1,j 1 λ j By writing ˆf j x ˆf i x = σ i,j ξ + b j b i, were ξ N 0, 1, applying Lemma 3d, and using te well known bound on te tails of te normal distribution cf. Feller, 1968, Lemma, we find for some C > 0 and all small enoug P ˆf j x ˆf i x > λ j ˆσ i,j exp 1 ˆσ } i,j λ j C σ i,j Since by Lemma 3c and 44 ˆσ i,j P ξ > λ j b j b i σ i,j σ i,j exp 1 ˆσ λ i,j j σi j + Cλ j ˆσ i,j σ i,j } exp 1 ˆσ i,j λ j + Cλ j + 1 } σ i,j λ j 1 ˆσ i,j. σi j λ j σ i,j ˆσ i,j σ i,j = λ j O log 1 1/ = o1, 0, it follows from te last inequality tat for some C 1 > 0 P ˆf j x ˆf i x > λ j ˆσ i,j C 1 exp 1 } λj + Cλ j for all α, j i iα and all sufficiently small. Returning to 50 we obtain tat Pî = i C 1 exp 1 } λj 1 + Cλ j 1 j=i+1 = C 1 j=i exp pjl + p 1 j l 1 = C 1 j=i + C pj l + p 1 j l 1 } exp 1 } λj + Cλ j C 1 j=i exp pjl p } 1j l1 3 C 1 pl i1 l exp pil p } 1i l1 3 = C 1 pl i1 l s p/ i exp p 1i l 1 } 3 C s p/ i exp p 1i l 1 4 } 51
24 1 Austrian Journal of Statistics, Vol , No. 1&, for some C > 0 and all i iα, wen is sufficiently small. Terefore uniformly in A K R +,1 f = Op/ i=1 } i pl/ exp p 1 i l 1 /4 = O p/, 0. In order to obtain a bound on R +, f we write again ˆf i fx = b i + σ i ξ, ξ N 0, 1. Applying Lemma 3, a and b, in te same way as before, we ave PA c i P ξ > ˆσ i λ i b i P ξ > ˆσ i λ i σ i σ i σ i exp 1 ˆσ i λi } σ i } C 3 exp pi l p 1 i l 1 / } C 3 exp λ i + λ i } = C 3 s p i exp p 1 i l 1 /, for some C 3, all i iα and all α provided is small enoug. Tus, R +, f = I i=iα+1 I i=iα+1 E 1/ ˆf i x fx p P 1/ A c i ˆσ p i E1/ o o1ξ p P 1/ A c i σ p/ = O exp i=1 } p 1 i r 1 /4 = O p/, 0, 5 uniformly in A K. We can tus conclude tat, uniformly in any RPP scale K, our estimator satisfies E 1/ ˆf x fx p = O1, sup α K sup f Aα wile for any RNP scale K wen 0. sup α K sup f Aα E ψ 1 α ˆf x fx p 1 + o1, 5.3 Lower Bound: Optimality Results In Section 5. we ave establised an upper bound for te risk of adaptive procedures, by evaluating te quality of a proposed adaptive estimator. In tis section we will establis a lower bound for arbitrary suc estimator, wic will allow us to establis optimality of te proposed procedure in te sense of Definition.
25 L.M. Artiles and B.Y. Levit 13 Teorem 4 Let p > 0. Let A K be an arbitrary RNP scale suc tat quantities s = s α, s s α, and φ α can be defined in suc a way tat for all sufficiently small and α K and φ = φ α min p log s, rγ s r / 53 Denote lim inf φ =. 54 α K 0 Ten for any estimator f F 0 p x lim inf 0 Proof. Letting θ = φ inf ψ = ψ α = σ s sup α K f Aα E f ψ 1 φ. f x fx p 1. φ consider te following pair of functions: Note tat f 1 satisfies f 0 z 0, f 1 z = θ gz, gz = σ s k s x z. 55 σ s f 1 x = θ. Obviously f 1 is a continuous function and using 10, definition 15 of s, and Lemma, we get γ e γt r F[f 1 ]t dt = θ σ s γ e γt r F[k s ]t dt = θ γσ R s 0 γe γtr dt γ s = θ e γs rr s 0 γe γtr dt γ s = θ rγ s r e γ s r γs r 1 + o o1 1, φ rγ s r e γ s r γs r 1 + o1 56 uniformly in K for small enoug. Tus f 1 Aα for all sufficiently small and every α K.
26 14 Austrian Journal of Statistics, Vol , No. 1&, Let f Fp 0 x be an arbitrary estimator and denote f ten f x f 1 x = f ψ 1 f 1x = f wereas ψ 1 = ψ 1 σ f x f 0 x = σ f x = σ ψ fx = σ s σ f x and L = φ 1 θ; φ 1 θ = f L 57 φ f x = s 1/ φ f x log = f s exp + log φ }. 58 Denote q = exp φ } so tat by 54, q 0 uniformly wit respect to α for 0. Now, wit te tus defined f 1 Aα, for any f Fp 0 x, uniformly in α K as 0, we ave R := sup f Aα E ψ 1 f f x fx p E1 ψ 1 f x f 1 x p q E 0 σ f x f 0 x + 1 q E ψ 1 1 f p x f 1 x + Oq. According to 53 and 57 59, were R q exp φ + p log φ } E 0 f x p + 1 q E 1 f x L p + Oq 1 q E 1 Z f x p + f x L p + Oq 1 q E 1 inf x p 59 Z x p + x L p + Oq 60 φ Z = q exp + p log φ } dp 0 y. dp 1 For eac value of Z consider te optimization problem of minimizing te function: gx = Z x p + L x p. As was sown in Lepski and Levit 1998, minz, 1L p if p 1, min gx = x 1 + Z 1 p 1 p 1 L p if p > 1. 61
27 L.M. Artiles and B.Y. Levit 15 Tus for any p > 0 we can write min x gx = χl p, 6 were χ is defined by 61 and satisfies 0 < χ 1. Now, let us consider te likeliood corresponding to f 0 and f 1. Using te same arguments tat we used in 8 3 we can see tat dp 0 dp 1 dp 0 dp 1 y = exp = exp = exp 1 σ θξ θ s θ g l + θ y l gl θξ 1 } θ + O1 s } } k s x l were ξ N 0, 1 wit respect to P 1. Using te definition of θ, condition 53 and definition 55 we can see tat } y = 1 + o1 exp θ θξ, 0. Note tat by 54 Z = 1 + o1 exp φ + φ + p log φ φ φ ξ 1 } φ φ P 1 wen 0, ence χ P 1 1. Also L = 1 + o1, according to its definition. Terefore according to equations 60 6, uniformly in α K, R 1 ql p E 1 χ + Oq = 1 + o1, 0. Corollary 1 Let A K be an arbitrary RNP scale suc tat lim inf 0 rγs r inf = 63 α K log s were s is te optimum bandwidt defined in 15. Ten for any p > 0 and x R, te estimator ˆf of Teorem 3 is p, K, F p x-adaptively minimax at x. Proof. Tis is a consequence of Teorems 3 and 4. In order to prove te lower bound use te previous teorem taking s in place of s. Now, we prove a version of Teorem 4 under a weaker condition. It will be used below to provide an easily verifiable conditions for adaptive optimality of te estimator proposed in Section 5..,
28 16 Austrian Journal of Statistics, Vol , No. 1&, Teorem 5 Let A K be an arbitrary RNP scale suc tat lim inf 0 inf α K rγs r log log s = 64 were te optimum bandwidt s was defined in 15. Ten for any estimator f F 0 p x, lim inf 0 inf sup α K f Aα E f ψ 1 f x fx p 1, were ψ = ψ α = p log s σ s. Proof. We prove tis teorem in te same way as Teorem 4 by coosing φ = p log s and subsequently defining s in suc a way tat p log s rγ s r eγ s r γs r 1 65 for small enoug. Te point ere is tat condition 65 was only needed in proving 56, wic now becomes 65. We construct an appropriate s asymptotically equivalent to s tat satisfies te previous inequality for small enoug. Let us first, for fixed α, define te auxiliary bandwidt s as te solution of te equation γs r = γ s r + log rγ s r. 66 We know tat γs goes to infinity as goes to zero uniformly in regular scales. Tus from te previous equation, γ s goes to infinity too and we can see tat s r = 1 + log rγ s s γ s r = 1 + o1, uniformly in K according to 64. Tus te auxiliary bandwidt s is asymptotically equivalent to s. It also satisfies 64, see tat lim inf 0 inf α K rγ s r = lim inf log log s 0 inf α K rγs r 1+o1 lim inf log log s 0 inf α K rγs r log log s =. Now, let us define s equation = ϑ s were ϑ 0 < ϑ < 1 is te closest solution to 1 of te rγ s r log log s ϑ r log ϑ 1 = 1. We can see tat ϑ 1 as 0 tus implying tat s is asymptotically equivalent to s and s. Now, after few transformations, γ s r = γ s r + s s rγt r t 1 dt
29 L.M. Artiles and B.Y. Levit 17 s = γs r log rγ s r + rγt r t 1 dt s s γs r + log rγ s r + rγ s r t 1 dt = γs r + log rγ s r + rγ s r ϑ r log ϑ 1 s and we see tat = γs r + log rγ s r + log log s e γ s r e γs r rγ s r log s = e γs r p log s rγ s ϑr r γ s r /p e γs r p log s rγ s for small enoug. Te rest of te proof is te same as for Teorem 4. Finally, given σ ψ := p log s s is asymptotically equivalent to ψ we ave te proof of te lemma. Finally, we prove tat te estimator we constructed in Teorem 3 is adaptively minimax, for any RNP scale satisfying a condition just a little stronger tan condition 4 used in te definition of a regular scale. Teorem 6 Let K be a RNP scale suc tat lim inf 0 inf α K γσ C 1 δ for some δ 0 < δ < 1 and C > 0. Ten for any p > 0 and x R, te estimator ˆf of Teorem 3 is p, K, F p x-adaptively minimax at x. Proof. Te upper bound result was proved in Teorem 3. To prove te lower bound we notice tat rγs r = r log γσ r log C δ wile according to conditions for R scales log log s = log log 1 γ 1 log 1/r < log log 1 γσ tus rγs r log log s goes to infinity wen 0, uniformly wit respect to te scale K. Te desired lower bound follows now from Teorem 5.
30 18 Austrian Journal of Statistics, Vol , No. 1&, References P. Antonsik, J. Mikusiński, and R. Sikorski. Teory of Distribution. Te Sequential Approac. Elsevier, Amsterdam, L.D. Brown and M.G. Low. Asymptotic equivalence of nonparametric regression and wite noise. Ann. Statist., 4: , W. Feller. An Introduction to Probability Teory and its Applications, volume I. Wiley, New York, 3rd edition, G.K. Golubev and B.Y. Levit. Asymptotically efficient estimation for analytic distributions. Mat. Met. Statist., 5: , G.K. Golubev, B.Y. Levit, and A.B. Tsybakov. Asymptotically efficient estimation of analytic functions in Gaussian noise. Bernoulli, : , Kuo Hui-Hsiung. Gaussian Measures in Banac Spaces. Number 463 in Lect. Notes Mat. Springer-Verlag, Berlin-Heidelberg-New York, I.A. Ibragimov and R.I. Has minskii. Statistical Estimation, Asymptotic Teory. Springer, New York, I.A. Ibragimov and R.I. Has minskii. Bounds for te risks of non-parametric regression estimates. Teor. Probab. Appl., 7:84 99, 198. I.A. Ibragimov and R.I. Has minskii. Estimation of distribution density. Journ. Sov. Mat., 5:40 57, O.V. Lepski. On a problem of adaptive estimation in Gaussian noise. Teory Probab. Appl., 35: , O.V. Lepski. Asymptotically minimax adaptive estimation. I: Upper bounds. Optimally adaptive estimates. Teory Probab. Appl., 36:68 697, O.V. Lepski. Asymptotically minimax adaptive estimation. II: Scemes witout optimal adaptation. Adaptive estimators. Teory Probab. Appl., 7: , 199a. O.V. Lepski. On problems of adaptive estimation in wite Gaussian noise. Adv. Soc. Mat., 1:87 106, 199b. O.V. Lepski and B.Y. Levit. Adaptive minimax estimation of infinitely differentiable functions. Mat. Met. Statist., 7:13 156, O.V. Lepski and B.Y. Levit. Adaptive non-parametric estimation of smoot multivariate functions. Mat. Met. Statist., 8: , B.Y. Levit. On te asymptotic minimax estimates of te second order. Teory Prob. Appl., 5:55 568, 1980.
31 L.M. Artiles and B.Y. Levit 19 S. Nikol skiĭ. Approximation of Functions of Several Variables and Imbedding Teorems. Springer-Verlag, Berlin Heidelberg New York, M. Nussbaum. Asymptotic equivalence of density estimation and Gaussian wite noise. Ann. Statist., 4: , C.J. Stone. Optimal global rates of convergence for nonparametric regression. Ann. Statist., 10: , 198. Autors addresses: L.M. Artiles B.Y. Levit Eurandom Department of Matematics & Statistics P.O. Box 513 Queen s University 5600 MB Eindoven Kingston, ON, K7L 3N6 Te Neterlands Canada
1 Order of accuracy Verifying Numerical Convergence Rates We consider a numerical approximation of an exact value u. Te approximation depends on a small parameter, suc as te grid size or time step, and
ON LOCAL LIKELIHOOD DENSITY ESTIMATION WHEN THE BANDWIDTH IS LARGE Byeong U. Park 1 and Young Kyung Lee 2 Department of Statistics, Seoul National University, Seoul, Korea Tae Yoon Kim 3 and Ceolyong Park
FINITE DIFFERENCE METHODS LONG CHEN Te best known metods, finite difference, consists of replacing eac derivative by a difference quotient in te classic formulation. It is simple to code and economic to
Te EOQ Inventory Formula James M. Cargal Matematics Department Troy University Montgomery Campus A basic problem for businesses and manufacturers is, wen ordering supplies, to determine wat quantity of
Stratification of Accounting Data Patricia Gunning * Jane Mary Horgan ** William Yancey *** Abstract: We suggest a new procedure for defining te boundaries of te strata in igly skewed populations, usual
Distances in random graps wit infinite mean degrees Henri van den Esker, Remco van der Hofstad, Gerard Hoogiemstra and Dmitri Znamenski April 26, 2005 Abstract We study random graps wit an i.i.d. degree
Computer Science and Engineering, UCSD October 7, 1999 Goldreic-Levin Teorem Autor: Bellare Te Goldreic-Levin Teorem 1 Te problem We æx a an integer n for te lengt of te strings involved. If a is an n-bit
Tangent Lines and Rates of Cange 9-2-2005 Given a function y = f(x), ow do you find te slope of te tangent line to te grap at te point P(a, f(a))? (I m tinking of te tangent line as a line tat just skims
Instantaneous Rate of Cange: Last section we discovered tat te average rate of cange in F(x) can also be interpreted as te slope of a scant line. Te average rate of cange involves te cange in F(x) over
Can a Lump-Sum Transfer Make Everyone Enjoy te Gains from Free Trade? Yasukazu Icino Department of Economics, Konan University June 30, 2010 Abstract I examine lump-sum transfer rules to redistribute te
Database Systems Journal vol. I, no. 2/200 7 Optimized Data Indexing Algoritms for OLAP Systems Lucian BORNAZ Faculty of Cybernetics, Statistics and Economic Informatics Academy of Economic Studies, Bucarest
CHAPTER 7 Di erentiation 1. Te Derivative at a Point Definition 7.1. Let f be a function defined on a neigborood of x 0. f is di erentiable at x 0, if te following it exists: f 0 fx 0 + ) fx 0 ) x 0 )=.
Introduction Te Economic Diversification and Growt Enterprises Act became effective on 1 January 1995. Te creation of tis Act was to encourage new businesses to start or expand in Newfoundland and Labrador.
Mat 444/445 Geometry for Teacers Summer 2008 Supplement : Similar Triangles Tis supplement is meant to be read after Venema s Section 9.2. Trougout tis section, we assume all nine axioms of uclidean geometry.
ASA Section on Survey Researc Metods SAMPLE DESIG FOR TE TERRORISM RISK ISURACE PROGRAM SURVEY G. ussain Coudry, Westat; Mats yfjäll, Statisticon; and Marianne Winglee, Westat G. ussain Coudry, Westat,
Trapezoid Rule and Simpson s Rule c 2002, 2008, 200 Donald Kreider and Dwigt Lar Trapezoid Rule Many applications of calculus involve definite integrals. If we can find an antiderivative for te integrand,
Scedulability Analysis under Grap Routing in WirelessHART Networks Abusayeed Saifulla, Dolvara Gunatilaka, Paras Tiwari, Mo Sa, Cenyang Lu, Bo Li Cengjie Wu, and Yixin Cen Department of Computer Science,
Section 7.6 Comple Fractions 695 7.6 Comple Fractions In tis section we learn ow to simplify wat are called comple fractions, an eample of wic follows. 2 + 3 Note tat bot te numerator and denominator are
Understanding te Derivative Backward and Forward by Dave Slomer Slopes of lines are important, giving average rates of cange. Slopes of curves are even more important, giving instantaneous rates of cange.
Improved dynamic programs for some batcing problems involving te maximum lateness criterion A P M Wagelmans Econometric Institute Erasmus University Rotterdam PO Box 1738, 3000 DR Rotterdam Te Neterlands
Optimal Pricing Strategy for Second Degree Price Discrimination Alex O Brien May 5, 2005 Abstract Second Degree price discrimination is a coupon strategy tat allows all consumers access to te coupon. Purcases
Capter 3 Interpolation Interpolation is te problem of fitting a smoot curve troug a given set of points, generally as te grap of a function. It is useful at least in data analysis (interpolation is a form
Capter Finite Difference Approximations Our goal is to approximate solutions to differential equations, i.e., to find a function (or some discrete approximation to tis function) tat satisfies a given relationsip
OPTIMAL DISCONTINUOUS GALERKIN METHODS FOR THE ACOUSTIC WAVE EQUATION IN HIGHER DIMENSIONS ERIC T. CHUNG AND BJÖRN ENGQUIST Abstract. In tis paper, we developed and analyzed a new class of discontinuous
Module : Introduction to Finite Element Analysis Lecture : Introduction.. Introduction Te Finite Element Metod (FEM) is a numerical tecnique to find approximate solutions of partial differential equations.
Lecture 10: Wat is a Function, definition, piecewise defined functions, difference quotient, domain of a function A function arises wen one quantity depends on anoter. Many everyday relationsips between
CS106B Spring 01 Handout # May 3, 01 Huffman Encoding and Data Compression Handout by Julie Zelenski wit minor edits by Keit Scwarz In te early 1980s, personal computers ad ard disks tat were no larger
Capter 2 Multivariate time series analysis: Some essential notions An overview of a modeling and learning framework for multivariate time series was presented in Capter 1. In tis capter, some notions on
Insertion and Deletion in VL Trees Submitted in Partial Fulfillment of te Requirements for Dr. Eric Kaltofen s 66621: nalysis of lgoritms by Robert McCloskey December 14, 1984 1 ackground ccording to Knut
Lecture Notes 3 Finite Volume Discretization of te Heat Equation We consider finite volume discretizations of te one-dimensional variable coefficient eat equation, wit Neumann boundary conditions u t x
Bonferroni-Based Size-Correction for Nonstandard Testing Problems Adam McCloskey Brown University October 2011; Tis Version: October 2012 Abstract We develop powerful new size-correction procedures for
Lecture 10 Limits (cont d) One-sided its (Relevant section from Stewart, Sevent Edition: Section 2.4, pp. 113.) As you may recall from your earlier course in Calculus, we may define one-sided its, were
Mat 3 HW #5 Solutions. Exercise.5.6. Suppose f is continuous on [, 5] and te only solutions of te equation f(x) = 6 are x = and x =. If f() = 8, explain wy f(3) > 6. Answer: Suppose we ad tat f(3) 6. Ten
Economics Letters 84 (2004) 407 411 www.elsevier.com/locate/econbase Equilibria in sequential bargaining games as solutions to systems of equations Tasos Kalandrakis* Department of Political Science, Yale
TRADING AWAY WIDE BRANDS FOR CHEAP BRANDS Swati Dingra London Scool of Economics and CEP Online Appendix APPENDIX A. THEORETICAL & EMPIRICAL RESULTS A.1. CES and Logit Preferences: Invariance of Innovation
LSE Researc Online Article (refereed) Strategic trading in a dynamic noisy market Dimitri Vayanos LSE as developed LSE Researc Online so tat users may access researc output of te Scool. Copyrigt and Moral
Difference Equations to Differential Equations Section 3.3 Differentiation of Polynomials an Rational Functions In tis section we begin te task of iscovering rules for ifferentiating various classes of
An inquiry into te multiplier process in IS-LM model Autor: Li ziran Address: Li ziran, Room 409, Building 38#, Peing University, Beijing 00.87,PRC. Pone: (86) 00-62763074 Internet Address: firstname.lastname@example.org
Density functions, cummulative density functions, measures of central tendency, and measures of dispersion densityfunctions-intro.tex October, 9 Note tat tis section of notes is limitied to te consideration
Comparison between two approaces to overload control in a Real Server: local or ybrid solutions? S. Montagna and M. Pignolo Researc and Development Italtel S.p.A. Settimo Milanese, ITALY Abstract Tis wor
Journal of Information & Computational Science 7: 12 (2010) 2385 2394 Available at ttp://www.joics.com Training Robust Support Vector Regression via D. C. Program Kuaini Wang, Ping Zong, Yaoong Zao College
Researc on te Anti-perspective Correction Algoritm of QR Barcode Jianua Li, Yi-Wen Wang, YiJun Wang,Yi Cen, Guoceng Wang Key Laboratory of Electronic Tin Films and Integrated Devices University of Electronic
Researc in Official Statistics Number 2/2001 A system to monitor te quality of automated coding of textual answers to open questions Stefania Maccia * and Marcello D Orazio ** Italian National Statistical
Properties of BMO functions whose reciprocals are also BMO R. L. Johnson and C. J. Neugebauer The main result says that a non-negative BMO-function w, whose reciprocal is also in BMO, belongs to p> A p,and
United States Department of Agriculture Forest Service Pacific Nortwest Researc Station Researc Note PNW-RN-557 July 2007 Area-Specific Recreation Use Estimation Using te National Visitor Use Monitoring
Catalogue no. 1-001-XIE Survey Metodology December 004 How to obtain more information Specific inquiries about tis product and related statistics or services sould be directed to: Business Survey Metods
Pre-trial Settlement wit Imperfect Private Monitoring Mostafa Beskar University of New Hampsire Jee-Hyeong Park y Seoul National University July 2011 Incomplete, Do Not Circulate Abstract We model pretrial
2.23 Gambling Reabilitation Services Introduction Figure 1 Since 1995 provincial revenues from gambling activities ave increased over 56% from $69.2 million in 1995 to $108 million in 2004. Te majority
8 t World IMACS / MODSIM Congress, Cairns, Australia 3-7 July 2009 ttp://mssanz.org.au/modsim09 Te modelling of business rules for dasboard reporting using mutual information Gregory Calbert Command, Control,
4. Variograms Te covariogram and its normalized form, te correlogram, are by far te most intuitive metods for summarizing te structure of spatial dependencies in a covariance stationary process. However,
MATHEMATICS FOR ENGINEERING DIFFERENTIATION TUTORIAL 1 - BASIC DIFFERENTIATION Tis tutorial is essential pre-requisite material for anyone stuing mecanical engineering. Tis tutorial uses te principle of
Te Power Rule A function of te form f (x) = x r, were r is any real number, is a power function. From our previous work we know tat x x 2 x x x x 3 3 x x In te first two cases, te power r is a positive
Welfare, financial innovation and self insurance in dynamic incomplete markets models Paul Willen Department of Economics Princeton University First version: April 998 Tis version: July 999 Abstract We
Cal Poly San Luis Obispo Mecanical Engineering ME422 Mecanical Control Systems Modeling Fluid Systems Owen/Ridgely, last update Mar 2003 Te dynamic euations for fluid flow are very similar to te dynamic
M ULTIGRID C OMPUTING Wy Multigrid Metods Are So Efficient Originally introduced as a way to numerically solve elliptic boundary-value problems, multigrid metods, and teir various multiscale descendants,
Analyzing te Effects of Insuring Healt Risks: On te Trade-off between Sort Run Insurance Benefits vs. Long Run Incentive Costs Harold L. Cole University of Pennsylvania and NBER Soojin Kim University of
An Interest Rate Model Concepts and Buzzwords Building Price Tree from Rate Tree Lognormal Interest Rate Model Nonnegativity Volatility and te Level Effect Readings Tuckman, capters 11 and 12. Lognormal
PLUG-IN BANDWIDTH SELECTOR FOR THE KERNEL RELATIVE DENSITY ESTIMATOR ELISA MARÍA MOLANES-LÓPEZ AND RICARDO CAO Departamento de Matemáticas, Facultade de Informática, Universidade da Coruña, Campus de Elviña
Free Sipping and Repeat Buying on te Internet: eory and Evidence Yingui Yang, Skander Essegaier and David R. Bell 1 June 13, 2005 1 Graduate Scool of Management, University of California at Davis (email@example.com)
8 Int. J. Operational Researc, Vol. 1, Nos. 1/, 005 Staffing and routing in a two-tier call centre Sameer Hasija*, Edieal J. Pinker and Robert A. Sumsky Simon Scool, University of Rocester, Rocester 1467,
Wat is? Spring 2008 Note: Slides are on te web Wat is finance? Deciding ow to optimally manage a firm s assets and liabilities. Managing te costs and benefits associated wit te timing of cas in- and outflows
45 We ave a abit in writing articles publised in scientiþc journals to make te work as Þnised as possible, to cover up all te tracks, to not worry about te blind alleys or describe ow you ad te wrong idea
THE CENTRAL LIMIT THEOREM DANIEL RÜDT UNIVERSITY OF TORONTO MARCH, 2010 Contents 1 Introduction 1 2 Mathematical Background 3 3 The Central Limit Theorem 4 4 Examples 4 4.1 Roulette......................................
Fuzzy Probability Distributions in Bayesian Analysis Reinhard Viertl and Owat Sunanta Department of Statistics and Probability Theory Vienna University of Technology, Vienna, Austria Corresponding author:
Statistica Sinica 17(27), 289-3 ON THE EXISTENCE AND LIMIT BEHAVIOR OF THE OPTIMAL BANDWIDTH FOR KERNEL DENSITY ESTIMATION J. E. Chacón, J. Montanero, A. G. Nogales and P. Pérez Universidad de Extremadura
3 Te Te Derivative 3. Limits 3. Continuity 3.3 Rates of Cange 3. Definition of te Derivative 3.5 Grapical Differentiation Capter 3 Review Etended Application: A Model for Drugs Administered Intravenously
NET GAIN Scoring points for your financial future AS SEEN IN USA TODAY S MONEY SECTION, JULY 3, 2007 A strong credit score can elp you score a lower rate on a mortgage By Sandra Block Sales of existing
Pretrial Settlement wit Imperfect Private Monitoring Mostafa Beskar Indiana University Jee-Hyeong Park y Seoul National University April, 2016 Extremely Preliminary; Please Do Not Circulate. Abstract We
Nonparametric adaptive age replacement with a one-cycle criterion P. Coolen-Schrijner, F.P.A. Coolen Department of Mathematical Sciences University of Durham, Durham, DH1 3LE, UK e-mail: Pauline.Schrijner@durham.ac.uk
6. Metric spaces In this section we review the basic facts about metric spaces. Definitions. A metric on a non-empty set X is a map with the following properties: d : X X [0, ) (i) If x, y X are points
Referendum-led Immigration Policy in te Welfare State YUJI TAMURA Department of Economics, University of Warwick, UK First version: 12 December 2003 Updated: 16 Marc 2004 Abstract Preferences of eterogeneous
Modeling User Perception of Interaction Opportunities for Effective Teamwork Ece Kamar, Ya akov Gal and Barbara J. Grosz Scool of Engineering and Applied Sciences Harvard University, Cambridge, MA 02138
Chapter 1 Metric Spaces Many of the arguments you have seen in several variable calculus are almost identical to the corresponding arguments in one variable calculus, especially arguments concerning convergence
1.6 Analyse Optimum Volume and Surface Area Estimation and oter informal metods of optimizing measures suc as surface area and volume often lead to reasonable solutions suc as te design of te tent in tis
A New Cement to Glue Nonconforming Grids wit Robin Interface Conditions: Te Finite Element Case Martin J. Gander, Caroline Japet 2, Yvon Maday 3, and Frédéric Nataf 4 McGill University, Dept. of Matematics
Simultaneous Location of Trauma Centers and Helicopters for Emergency Medical Service Planning Soo-Haeng Co Hoon Jang Taesik Lee Jon Turner Tepper Scool of Business, Carnegie Mellon University, Pittsburg,
Section 2.2 Te Derivative as a Function 200 Kiryl Tsiscanka Te Derivative as a Function DEFINITION: Te derivative of a function f at a number a, denoted by f (a), is if tis limit exists. f (a) f(a+) f(a)
SAT Mat Must-Know Facts & Formuas Numbers, Sequences, Factors Integers:..., -3, -2, -1, 0, 1, 2, 3,... Rationas: fractions, tat is, anyting expressabe as a ratio of integers Reas: integers pus rationas