SIEVE INFERENCE ON POSSIBLY MISSPECIFIED SEMI-NONPARAMETRIC TIME SERIES MODELS

Transcription

1 SIEVE INFERENCE ON POSSIBLY MISSPECIFIED SEMI-NONPARAMERIC IME SERIES MODELS By Xiaohong Chen, Zhipeng Liao and Yixiao Sun Yale University, UC Los Angeles and UC San Diego his paper provides a general theory on the asymptotic normality of plug-in sieve M estimators of possibly irregular functionals of semi-nonparametric time series models. We show that, even when the sieve score process is not a martingale difference, the asymptotic variances of plug-in sieve M estimators of irregular i.e., slower than root- estimable functionals are the same as those for independent data. Nevertheless, ignoring the temporal dependence in finite samples may not lead to accurate inference. We then propose an easy-to-compute and more accurate inference procedure based on a pre-asymptotic sieve variance estimator that captures temporal dependence of unknown forms. We construct a pre-asymptotic Wald statistic using an orthonormal series long run variance OS-LRV estimator. For sieve M estimators of both regular i.e., root- estimable and irregular functionals, a scaled pre-asymptotic Wald statistic is asymptotically F distributed when the series number of terms in the OS-LRV estimator is held fixed. Simulations indicate that our scaled pre-asymptotic Wald test with F critical values has more accurate size in finite samples than the conventional Wald test with chi-square critical values. 1. Introduction. Many economic and financial time series are nonlinear and non- Gaussian; see, e.g., Granger For policy analysis, it is important to uncover complicated nonlinear economic relations in structural models. Unfortunately, it is difficult to correctly parameterize all aspects of nonlinear dynamic functional relations. Due to the well-known problem of curse of dimensionality it is also impractical to estimate a general nonlinear time series model fully nonparametrically. hese issues motivate the growing popularity of semiparametric and semi-nonparametric models and methods in economics and finance. he method of sieves Grenander, 1981 is a general procedure for estimating semiparametric and nonparametric models, and has been widely used in statistics, economics, finance, biostatistics and other disciplines. In this paper, we focus on sieve M estimation, which optimizes a sample average of a random criterion over a sequence of approximating parameter spaces, sieves, that becomes dense in the original infinite dimensional parameter space as the complexity of the sieves grows to infinity with the sample size.see Shen and Wong 1994, Chen 2007 and the references therein for many examples of sieve Supported by the National Science Foundation SES and Cowles Foundation Supported by the National Science Foundation SES AMS 2000 subject classifications: Primary 62G10; secondary 62M10 Keywords and phrases: Dynamic misspecification, sieve M estimation, sieve Riesz representer, irregular functional, pre-asymptotic variance, orthogonal series long run variance estimation, F distribution 1

2 2 X. CHEN, Z. LIAO AND Y. SUN M estimation, including sieve quasi maximum likelihood, sieve nonlinear least squares, sieve generalized least squares, and sieve quantile regression. We consider inference on possibly misspecified semi-nonparametric time series models via the method of sieve M estimation. For general sieve M estimators with weakly dependent data, White and Wooldridge 1991 establish the consistency, and Chen and Shen 1998 establish the convergence rate and the asymptotic normality of plug-in sieve M estimators of regular i.e., estimable functionals. o the best of our knowledge, there is no published work on the limiting distributions of plug-in sieve M estimators of irregular i.e., slower than estimable functionals. here is also no published inferential result for general sieve M estimators of regular or irregular functionals for possibly misspecified semi-nonparametric time series models. We first provide a general theory on the asymptotic normality of plug-in sieve M estimators of possibly irregular functionals in semi-nonparametric time series models. he key insight is to examine the functional of interest on a sieve tangent space where a Riesz representer always exists regardless of whether the functional is regular or irregular. he asymptotic normality result is rate-adaptive in the sense that applied researchers do not need to know aprioriwhether the functional of interest is estimable or not. For possibly misspecified semi-nonparametric models with weakly dependent data, Chen and Shen 1998 establish that the asymptotic variance of a sieve M estimator of any regular functional depends on the temporal dependence and is equal to the long run variance LRV of a scaled score or moment process. In this paper, we show a new result that, regardless of whether the score process is martingale difference or not, the asymptotic variance of a sieve M estimator of an irregular functional for weakly dependent data is the same as that for independent data. Our asymptotic theory suggests that, for weakly dependent time series data with a large sample size, temporal dependence could be ignored in making inference on irregular functionals via the method of sieves. However, simulation studies indicate that inference procedures based on asymptotic variance estimates ignoring autocorrelation do not perform well when the sample size is small relatively to the degree of temporal dependence. See, e.g., Conley, Hansen and Liu 1997 and Pritsker 1998 for earlier discussion of this problem with kernel density estimation for interest rate data sets. o deal with this problem, for inference on both regular and irregular functionals, we propose to use a pre-asymptotic sieve variance that captures temporal dependence of an unknown form. hat is, we treat the underlying triangular array sieve score process as a generic time series and ignore the fact that it becomes less temporally dependent when the sieve number of terms in approximating unknown functions grows to infinity as goes to infinity. his novel pre-asymptotic sieve approach enables us to develop a unified inference framework that can accommodate both regular and irregular functionals. o derive a simple and more accurate asymptotic approximation under weak conditions, we compute a pre-asymptotic Wald statistic using an orthonormal series LRV OS-LRV estimator. For both regular and irregular functionals, we show that the pre-asymptotic t statistic and a scaled Wald statistic converge to the standard t distribution and F distribution respectively when the series number of terms in the OS-LRV estimator is held fixed;

3 SIEVE INFERENCE ON IME SERIES MODELS 3 and that the t distribution and F distribution approach the standard normal and chi-square distributions respectively when the series number of terms in the OS-LRV estimator goes to infinity. Our pre-asymptotic t and F approximations achieve triple robustness in the following sense: they are asymptotically valid regardless of 1 whether the functional is regular or not; 2 whether there is temporal dependence of unknown form or not; and 3 whether the series number of terms in the OS-LRV estimator is held fixed or not. he rest of the paper is organized as follows. Section 2 presents the plug-in sieve M estimator of functionals of interest and gives two illustrative examples. Section 3 establishes the asymptotic normality of the plug-in sieve M estimators of possibly irregular functionals. Section 4 shows that the asymptotic variances of plug-in sieve M estimators of irregular functionals for weakly dependent data are the same as if they were for i.i.d. data. Section 5 presents the pre-asymptotic OS-LRV estimator and F approximation. Section 6 describes a simple computation method and reports a simulation study using a partially linear regression model. Appendix contains all the proofs. Notation. We denote f A a F A a as the marginal probability density cdf of a random variable A evaluated at a and f AB a, b F AB a, b the joint density cdf of the random variables A and B. We use to introduce definitions. For any vector-valued A, weleta denote its transpose and A E A A, although sometimes we also use A = A A without confusion. Denote L p Ω,dμ, 1 p<, as a space of measurable functions with g L p Ω,dμ { Ω gt p dμt} 1/p <, where Ω is the support of the sigma-finite positive measure dμ sometimes L p Ω and g L p Ω are used when dμ is the Lebesgue measure. For any possibly random positive sequences {a } =1 and {b } =1, a = O p b means that lim c lim sup Pr a /b >c=0;a = o p b meansthat for all ε > 0, lim Pr a /b >ε = 0; and a b means that there exist two constants 0 <c 1 c 2 < such that c 1 a b c 2 a.weusea A k, H H k and V V k to denote various sieve spaces. For simplicity, we assume that dimv = dima dimh k, all of which grow to infinity with the sample size. 2. Sieve M Estimation. We assume that the data {Z t =Y t,x t } is from a strictly stationary and weakly dependent process defined on an underlying complete probability space. Let Z R dz, 1 d z <, Y R dy and X R dx be the supports of Z t,y t and X t respectively. Let A,d denote an infinite dimensional metric space. Let l : Z A R be a measurable function and E[lZ, α] be a population criterion. For simplicity we assume that there is a unique α 0 A,d such that E[lZ, α 0 ] >E[lZ, α] for all α A,dwithdα, α 0 > 0. Different models correspond to different choices of the criterion functions E[lZ, α] and the parameter spaces A,d. A model does not need to be correctly specified and α 0 could be a pseudo-true parameter. Let f :A,d R be a known measurable mapping. In this paper we are interested in estimation of and inference on fα 0 via the method of sieves. Let A be a sieve space for the whole parameter space A,d. hen there is an element Π α 0 A such that d Π α 0,α 0 0asdimA with. An approximate sieve

4 4 X. CHEN, Z. LIAO AND Y. SUN M estimator α A of α 0 solves lz t, α sup α A lz t,α O p ε 2, where the term O p ε 2 =o p 1 denotes the maximization error when α fails to be the exact maximizer over the sieve space. We call f α theplug-in sieve M estimator of fα 0. Under very mild conditions see, e.g., Chen, 2007, heorem 3.1 and White and Wooldridge, 1991, the sieve M estimator α is consistent for α 0 : d α,α 0 =O p {max [d α, Π α 0,dΠ α 0,α 0 ]} = o p 1. Given the consistency, we can restrict our attention to a shrinking d-neighborhood of α 0. We equip A with an inner product induced norm α α 0 that is weaker than dα, α 0 i.e., α α 0 cdα, α 0 for a constant c>0, and is locally equivalent to E[lZ t,α 0 lz t,α] in a shrinking d-neighborhood of α 0. For strictly stationary weakly dependent data, Chen and Shen 1998 establish the convergence rate: α α 0 = O p ξ =o p 1/4, where ξ =max[ α Π α 0, Π α 0 α 0 ]. he method of sieve M estimation includes many special cases. Different choices of criterion functions lz t,α and different choices of sieves A lead to different examples of sieve M estimation. As an illustration, we provide two examples below. See, e.g., Shen and Wong 1994 and Chen 2007 for additional examples. Example 2.1. Partially additive ARX regression Suppose that the time series data {Y t } is generated by 2.2 Y t = X tθ 0 + h 01 Y t 1 +h 02 Y t 2 +u t, with E [u t X t,y t 1,Y t 2 ]=0, where X t is a d x dimensional random vector, and could include finitely many lagged Y t s. Let θ 0 Θ R dx and h 0j H j for j =1, 2. Letα 0 =θ 0,h 01,h 02 A=Θ H 1 H 2. Examples of functionals of interest could be fα 0 =λ θ 0 or h 0j y j where λ R dx and y j inty for j =1, 2. For the sake of concreteness we assume that Y is a bounded interval of R and H j = Λ s j Y ahölder space for s j > 0.5, j =1, 2, where { Λ s Y = h C [s] Y : sup sup k [s] hy [s] h y } hy <, sup k [s] y Y y,y Y y y s [s] <, where [s] is the largest integer that is strictly smaller than s. hehölder space Λ s Y with s>0.5 is a smooth function space that is widely assumed in the semi-nonparametric

5 SIEVE INFERENCE ON IME SERIES MODELS 5 literature. We can then approximate H = H 1 H 2 by a sieve H = H 1, H 2,,where for j =1, 2, k j, 2.3 H j, = h :h = β k p j,k =β P kj,, β R k j,, k=1 where the known sieve basis P kj, could be polynomial splines, B-splines, wavelets, Fourier series and others. Let lz t,α = [Y t X t θ h 1 Y t 1 h 2 Y t 2 ] 2 /4withα = θ,h 1,h 2 A = Θ H 1 H 2.LetA =Θ H 1, H 2, be a sieve for A. We can estimate α 0 Aby the sieve least squares LS estimator α θ, ĥ1,, ĥ2, A : α =arg max θ,h 1,h 2 A lz t,θ,h 1,h 2. A functional of interest fα 0 suchasλ θ 0 or h 0j y j is then estimated by the plug-in sieve LS estimator f α suchasλ θ or ĥj, y j. his example is very similar to Example 2 in Chen and Shen 1998, except that we allow for dynamic mispecification in the sense that E [u t X t,y t 1,Y t 2 ; Y t j for j 3] may not equal to zero. One can slightly modify their proofs to get the convergence rate of α and the -asymptotic normality of λ θ. But that paper does not provide a variance estimator for λ θ. he results in our paper immediately lead to the asymptotic normality of f α for possibly irregular functionals fα 0 and provide simple, robust inference on fα 0. Example 2.2. Possibly misspecified copula-based time series model Suppose that {Y t } is a sample of strictly stationary first order Markov process generated from F Y,C 0,, where F Y is the true unknown continuous marginal distribution, and C 0, is the true unknown copula for Y t 1,Y t that captures all the temporal and tail dependence of {Y t }.he τ-th conditional quantile of Y t given Y t 1 =Y t 1,...,Y 1 is: Q Y τ y =F 1 Y C [τ F Y y], where C 2 1 [ u] u C 0u, is the conditional distribution of U t F Y Y t given U t 1 = u, and C [τ u] is its τ-th conditional quantile. he conditional density function of Y t given Y t 1 is p 0 Y t 1 =f Y c 0 F Y Y t 1,F Y, where f Y and c 0, are the density functions of F Y and C 0, respectively. A researcher specifies a parametric form {c, ; θ :θ Θ} for the copula density function, but it could be misspecified in the sense c 0, / {c, ; θ :θ Θ}. Letθ 0 be the pseudo true copula dependence parameter: θ 0 =argmax θ Θ cu, v; θc 0 u, vdudv.

6 6 X. CHEN, Z. LIAO AND Y. SUN Let θ 0,f Y be the parameters of interest. Examples of functionals of interest could be λ θ 0, f Y y, F Y y or Q Y y = FY C [τ F Y y; θ 0 ] for any λ R d θ and some y suppy t. We could estimate θ 0,f Y by the method of sieve quasi ML using different parameterizations and different sieves for f Y. For example, let h 0 = f Y and α 0 =θ 0,h 0 be the pseudo true unknown parameters. hen f Y =h 2 0 / h2 0 y dy, andh 0 L 2 R. For the identification of h 0, we can assume that h 0 H: 2.5 H = h =p 0 + β j p j : βj 2 <, j=1 where {p j } j=0 is a complete orthonormal basis functions in L2 R, such as Hermite polynomials, wavelets and other orthonormal basis functions. Here we normalize the coefficient of the first basis function p 0 to be 1 in order to achieve the identification of h 0. Other normalization could also be used. It is now obvious that h 0 Hcould be approximated by functions in the following sieve space: 2.6 H = h =p k 0 + β j p j =p 0 +β P k :β R k. j=1 Let Z t =Y t 1,Y t, α =θ,h A=Θ Hand 2.7 { } { h 2 Y t Yt 1 lz t,α=log +log c h2 y dy j=1 h 2 y h2 x dx dy, Yt } h 2 y h2 x dx dy; θ. hen α 0 =θ 0,h 0 A=Θ H could be estimated by the sieve quasi MLE α = θ, ĥ A =Θ H that solves: { { }} 1 h 2 Y sup lz t,α+log α Θ H O p ε 2 t=2 h2. y dy A functional of interest f α 0 suchasλ θ 0, f Y y = h 2 0 y / h2 0 y dy, F Y y or Q Y 0.01 y is then estimated by the plug-in sieve quasi MLE f α suchasλ θ, fy y = ĥ 2 y / ĥ2 y dy, F Y y = y f Y ydy or Q Y y = F Y C [τ F Y y; θ]. Under correct specification, Chen, Wu and Yi 2009 establish the rate of convergence of the sieve MLE α and provide a sieve likelihood-ratio inference for regular functionals including f α 0 =λ θ 0 or F Y yorq Y 0.01 y. Under misspecified copulas, by applying Chen and Shen 1998, we can still derive the convergence rate of the sieve quasi MLE α and the asymptotic normality of f α for regular functionals. However, the sieve likelihood ratio inference given in Chen, Wu and Yi 2009 is no longer valid under misspecification. he results in this paper immediately lead to the asymptotic normality of f α suchas f Y y =ĥ2 y / ĥ2 y dy for any possibly irregular functional fα 0suchasf Y y as well as valid inferences under potential misspecification.

7 SIEVE INFERENCE ON IME SERIES MODELS 7 3. Asymptotic Normality of Sieve M Estimators. In this section, we establish the asymptotic normality of plug-in sieve M estimators of possibly irregular functionals of semi-nonparametric time series models. We also give a closed-form expression for the sieve Riesz representer that appears in our asymptotic normality result Local Geometry. he convergence rate result of Chen and Shen 1998 implies that α B B 0 with probability approaching one, where 3.1 B 0 {α A: α α 0 Cξ loglog }; B B 0 A. Hence, we now regard B 0 as the effective parameter space and B as its sieve space. Let 3.2 α 0, arg min α B α α 0. Let V clsp B {α 0, },whereclsp B denotes the closed linear span of B under. henv is a finite dimensional Hilbert space under. Similarly the space V clsp B 0 {α 0 } is a Hilbert space under. Moreover,V is dense in V under. o simplify the presentation, we assume that dimv =dima k, all of which grow to infinity with. By definition we have α 0, α 0,v =0forallv V. As demonstrated in Chen and Shen 1998, there is lots of freedom to choose such a norm α α 0 that is locally equivalent to E[lZ, α 0 lz, α]. In some parts of this paper, for the sake of concreteness, we present results for a specific choice of the norm. We suppose that for all α in a shrinking d-neighborhood of α 0, lz, α lz, α 0 canbe approximated by ΔZ, α 0 [α α 0 ] such that ΔZ, α 0 [α α 0 ] is linear in α α 0. Denote the remainder of the approximation as: 3.3 rz, α 0 [α α 0,α α 0 ] 2 {lz, α lz, α 0 ΔZ, α 0 [α α 0 ]}. When lim τ 0 [lz, α 0 + τ[α α 0 ] lz, α 0 /τ] is well defined, we could let ΔZ, α 0 [α α 0 ] = lim τ 0 [lz, α 0 + τ[α α 0 ] lz, α 0 /τ], which is called the directional derivative of lz, α atα 0 in the direction [α α 0 ]. Define 3.4 α α 0 = E rz, α 0 [α α 0,α α 0 ] with the corresponding inner product, 3.5 α 1 α 0,α 2 α 0 = E { rz, α 0 [α 1 α 0,α 2 α 0 ]} for any α 1,α 2 in the shrinking d-neighborhood of α 0. In general this norm defined in 3.4 is weaker than d,. Since α 0 is the unique maximizer of E[lZ, α] on A, under mild conditions α α 0 defined in 3.4 is locally equivalent to E[lZ, α 0 lz, α]. For any v V, we define fα 0 [v] to be the pathwise directional derivative of the functional f atα 0 and in the direction of v = α α 0 V: 3.6 fα 0 [v] = fα 0 + τv τ for any v V. τ=0

8 8 X. CHEN, Z. LIAO AND Y. SUN For any v = α α 0, V, we let 3.7 fα 0 [v ]= fα 0 [α α 0 ] fα 0 [α 0, α 0 ]. So fα 0 [ ] is also a linear functional on V. Note that V is a finite dimensional Hilbert space. As any linear functional on a finite dimensional Hilbert space is bounded, we can invoke the Riesz representation theorem to deduce that there is a v V such that 3.8 and that 3.9 fα 0 [v] = v,v for all v V fα 0 [v ]= v 2 = sup fα 0 v V,v 0 [v] 2 / v 2 We call v the sieve Riesz representer of the functional fα 0 [ ] onv. We emphasize that the sieve Riesz representation of the linear functional fα 0 [ ] onv always exists regardless of whether fα 0 [ ] is bounded on the infinite dimensional space V or not. his crucial observation enables us to develop a general and unified theory that is currently lacking in the literature. If fα 0 [ ] is bounded on the infinite dimensional Hilbert space V, i.e v sup v V,v 0 { fα 0 [v] / v } <, then v = O 1 in fact v v < and v v 0as ; we say that f isregular at α = α 0. In this case, we have fα 0 [v] = v,v for all v V,andv is the Riesz representer of the functional fα 0 [ ] onv. See, e.g., Shen If fα 0 [ ] is unbounded on the infinite dimensional Hilbert space V, i.e sup v V,v 0 { fα 0 [v] / v } =, then v as ; and we say that f isirregular at α = α 0. As it will become clear later, the convergence rate of f α f α 0 depends on the order of v Asymptotic Normality. o establish the asymptotic normality of f α for possibly irregular nonlinear functionals, we assume:

9 SIEVE INFERENCE ON IME SERIES MODELS 9 Assumption 3.1 local behavior of functional. fα i sup α B fα0 fα 0 [α α 0 ] = o 1 2 v ; ii fα 0 [α 0, α 0 ] = o 1 2 v. Assumption 3.1.i controls the linear approximation error of possibly nonlinear functional f. It is automatically satisfied when f is a linear functional, but it may rule out some highly nonlinear functionals. Assumption 3.1.ii controls the bias part due to the finite dimensional sieve approximation of α 0, to α 0. It is a condition imposed on the growth rate of the sieve dimension dima, and requires that the sieve approximation error rate is of smaller order than 1 2 v.whenf is a regular functional, we have v v <, and since α 0, α 0,v = 0 by definition of α 0,, we have: fα 0 [α 0, α 0 ] = v,α 0, α 0 = v v,α 0, α 0 v v α 0, α 0, thus Assumption 3.1.ii is satisfied if 3.12 v v α 0, α 0 = o 1/2 when f is regular, which is similar to condition 4.1iiiii imposed in Chen 2007, p for regular functionals. Next, we make an assumption on the relationship between v and the asymptotic standard deviation of f α fα 0,. It will be shown that the asymptotic standard deviation is the limit of the standard deviation sd norm v sd of v, defined as 3.13 v 2 sd Var 1/2 ΔZ t,α 0 [v ]. Note that v 2 sd is the finite dimensional sieve version of the long run variance of the score process ΔZ t,α 0 [v ], and v 2 sd = VarΔZ, α 0[v ] if the score process {ΔZ t,α 0 [v ]} t is a martingale difference array. Assumption 3.2 sieve variance. v / v sd = O 1. By definition of v given in 3.9, 0 < v is non-decreasing in dimv, and hence is non-decreasing in. Assumption 3.2 then implies that lim inf v sd > 0. Define 3.14 u v / v sd to be the normalized version of v. hen Assumption 3.2 implies that u = O1. Let μ {g Z} 1 [g Z t Eg Z t ] denote the centered empirical process indexed by the function g. Letε = o 1/2. For notational economy, we use the same ε as that in 2.1.

10 10 X. CHEN, Z. LIAO AND Y. SUN Assumption 3.3 local behavior of criterion. i μ {ΔZ, α 0 [v]} is linear in v V; iii ii sup α B sup μ {lz, α ± ε u lz, α ΔZ, α 0 [±ε u ]} = O p ε 2 ; α B E[lZ t,α lz t,α± ε u ] α ± ε u α 0 2 α α = Oε2. Assumptions 3.3.ii and iii are simplified versions of those in Chen and Shen 1998, and can be verified in the same way. μ {ΔZ, α 0 [u ]} d N0, 1, wheren0, 1 is a stan- Assumption 3.4 CL. dard normal distribution. Assumption 3.4 is a very mild one, and can be easily verified by applying any existing triangular array CL for weakly dependent data see, e.g., Hall and Heyde, We are now ready to state the asymptotic normality theorem for the plug-in sieve M estimator. heorem 3.1. Let Assumptions 3.1.i, 3.2 and 3.3 hold. hen 3.15 [f α fα 0, ]/ v sd = μ {ΔZ, α 0 [u ]} + o p 1 ; If further Assumptions 3.1.ii and 3.4 hold, then 3.16 [f α fα 0 ]/ v sd = μ {ΔZ, α 0 [u ]} + o p 1 d N0, 1. In light of heorem 3.1, wecall v 2 sd defined in 3.13 the pre-asymptotic sievevariance of the estimator f α. When the functional fα 0 is regular i.e., v = O1, we have v sd v = O1 typically; so f α convergestofα 0 at the parametric rate of 1/. When the functional fα 0 is irregular i.e., v, we have v sd under Assumption 3.2; so the convergence rate of f α becomes slower than 1/. Regardless of whether the pre-asymptotic sieve variance v 2 sd stays bounded asymptotically i.e., as or not, it always captures whatever true temporal dependence exists in finite samples. For regular functionals of semi-nonparametric time series models, Chen and Shen 1998 and Chen 2007, heorem 4.3 establish that f α fα 0 d N0,σv 2 with 3.17 σ 2 v = lim Var 1/2 ΔZ t,α 0 [v ] = lim v 2 sd 0,. Our heorem 3.1 is a natural extension of their results to allow for irregular functionals.

11 SIEVE INFERENCE ON IME SERIES MODELS Sieve Riesz Representer. o apply the asymptotic normality heorem 3.1 one needs to verify Assumptions Once we compute the sieve Riesz representer v V, Assumptions 3.1 and 3.2 can be easily checked, while Assumptions 3.3 and 3.4 are standard ones and can be verified in the same ways as those in Chen and Shen 1998 and Chen 2007 for regular functionals of semi-nonparametric models. Although it may be difficult to compute the Riesz representer v Vin a closed form for a regular functional on the infinite dimensional space V, we can always compute the sieve Riesz representer v V defined in 3.8 and3.9 explicitly. herefore, heorem 3.1 is easily applicable to a large class of semi-nonparametric time series models, regardless of whether the functionals of interest are estimable or not Sieve Riesz representers for general functionals. For the sake of concreteness, in this subsection we focus on a large class of semi-nonparametric models where the population criterion E[lZ t,θ,h ] is maximized at α 0 =θ 0,h 0 A=Θ H,Θisacompact subset in R d θ, H is a class of real valued continuous functions of a subset of Z t belonging toahölder, Sobolev or Besov space, and A = Θ H is a finite dimensional sieve space. he general cases with multiple unknown functions require only more complicated notation. Let be the norm defined in 3.4 andv = R d θ {v h =P k β : β R k } be dense in the infinite dimensional Hilbert space V,. By definition, the sieve Riesz representer v =v θ,,v h, =vθ,,p k β V of fα 0 [ ] solves the following optimization problem: fα 0 fα 0 [v ]= v 2 θ v = sup θ + fα 0 h [v h ] 2 v=v θ,v h E r Z V,v 0 t,θ 0,h 0 [v, v] γ F k F 3.18 k = sup γ γ=v θ,β γ R d θ +k,γ 0 R k γ, where 3.19 F k is a d θ + k 1 vector, 1 and fα0 θ, fα 0 h [P k ] 3.20 γ R k γ E r Z t,θ 0,h 0 [v, v] for all v = v θ,p k β V, with I11 I 3.21 R k =,12 I,21 I,22 and R 1 I 11 k := I 12 I 21 I 22 1 When fα 0 h [ ] applies to a vector matrix, it stands for element-wise column-wise operations. We follow the same convention for other operators such as Δ Z t,α 0[ ] and r Z t,α 0[, ] in the paper.

12 12 X. CHEN, Z. LIAO AND Y. SUN being d θ + k d θ + k positive definite matrices. For example if the criterion function lz,θ,h is twice [ continuously pathwise differentiable [ with respect to θ, h, ] then we have I 11 = E 2 lz t,θ 0,h 0 θ θ ], I,22 = E 2 lz t,θ 0,h 0 h h [P k,p k ], I,12 = [ ] E 2 lz t,θ 0,h 0 θ h [P k ] and I,21 I,12. he sieve Riesz representation 3.8 becomes: for all v =v θ,p k β V, fα [v] =F k γ = v,v = γ R k γ for all γ =v θ,β R d θ+k. It is obvious that the optimal solution of γ in 3.18 orin3.22 hasaclosed-form expression: 3.23 γ = v θ,,β he sieve Riesz representer is then given by Consequently, = R 1 k F k. v = v θ,,v h, = v θ,,p k β V v 2 = γ R k γ = F k R 1 k F k, which is finite for each sample size but may grow with. Finally the score process can be expressed as hus ΔZ t,α 0 [v ]= Δ θ Z t,θ 0,h 0, Δ h Z t,θ 0,h 0 [P k ] γ S k Z t γ VarΔZ t,α 0 [v ] = γ E [ S k Z t S k Z t ] γ and v 2 sd = γ Var 1 S k Z t γ. o verify Assumptions 3.1 and 3.2 for irregular functionals, it is handy to know the exact speed of divergence of v 2. We assume Assumption 3.5. he smallest and largest eigenvalues of R k defined in 3.20 are bounded and bounded away from zero uniformly for all k. Assumption 3.5 imposes some regularity conditions on the sieve basis functions, which is a typical assumption in the linear sieve or series literature. Remark 3.2. Assumption 3.5 implies that v 2 γ 2 E F k 2 E = fα 0 2 E + fα 0 θ h [P k ] 2 E. hen: f is regular at α = α 0 if lim k fα 0 h [P k ] 2 E < ; f is irregular at α = α 0 if lim k fα 0 h [P k ] 2 E =.

13 SIEVE INFERENCE ON IME SERIES MODELS Examples. We first consider three typical linear functionals of semi-nonparametric models. For the Euclidean parameter functional fα = λ θ,wehavef k = λ, 0 k with 0 k =[0,...,0] 1 k, and hence v =v θ,,p k β V with vθ, = I11 λ, β = I21 λ, and v 2 = F k R 1 k F k = λ I 11 λ. If the largest eigenvalue of I 11, λ maxi 11, is bounded above by a finite constant uniformly in k, then v 2 λ max I 11 λ λ< uniformly in, and the functional fα =λ θ is regular. For the evaluation functional fα =hx forx X,wehaveF k =0 d θ,p k x,and hence v =v θ,,p k β V with vθ, = I12 P k x, β = I22 P k x, and v 2 = F k R 1 k F k = P k xi 22 P k x. So if the smallest eigenvalue of I 22, λ mini 22, is bounded away from zero uniformly in k, then v 2 λ min I 22 P k x 2 E, and the functional fα =hx is irregular. For the weighted integration functional fα = X wxhxdx for a weighting function wx, we have F k =0 d θ, X wxp k x dx, and hence v = v θ,,p k β with vθ, = I12 X wxp k xdx, β = I22 X wxp k xdx, and { } v 2 = F k R 1 k F k = wxp k xdx I 22 wxp k xdx. X Suppose that the smallest and largest eigenvalues of I 22 are bounded and bounded away from zero uniformly for all k.hen v 2 X wxp k xdx 2 E.husfα = X wxhxdx is regular if lim k X wxp k xdx 2 E < ; is irregular if lim k X wxp k xdx 2 E =. We finally consider an example of nonlinear functionals that arises in Example 2.2 when the parameter of interest is α 0 =θ 0,h 0 with h 2 0 = f Y being the true marginal density of Y t. Consider the functional fα =h 2 y / h2 y dy. Notethatfα 0 = f Y y =h 2 0 y andh 0 is approximated by the linear sieve H given in 2.6. hen F k = 0 d θ, fα 0 h [P k ] with fα 0 h [P k ] = 2h 0 y P k y h 0 y h 0 y P k ydy, and hence v =v θ,,p k β V with vθ, = I12 and v 2 = F k R 1 k F k = fα 0 fα 0 h h [P k ]I 22 X [P k ], β = I22 fα 0 h [P k ], fα 0 h [P k ]. So if the smallest eigenvalue of I 22 is bounded away from zero uniformly in k,then v 2 const. fα 0 h [P k ] 2 E, and the functional f α =h2 y / h2 y dy is irregular at α = α 0.

14 14 X. CHEN, Z. LIAO AND Y. SUN 4. Asymptotic Variances of Sieve Estimators of Irregular Functionals. In this section, we derive the asymptotic expression of the pre-asymptotic sieve variance v 2 sd for irregular functionals. We provide general sufficient conditions under which the asymptotic variance does not depend on the temporal dependence Exact Form of the Asymptotic Variance. By definition of the pre-asymptotic sieve variance v 2 sd and the strict stationarity of the data {Z t},wehave: [ 4.1 v 2 sd = VarΔZ, α 1 0[v ] t ] ρ t, where {ρ t} is the autocorrelation coefficient of the triangular array {ΔZ t,α 0 [v ]} t : 4.2 ρ t E ΔZ 1,α 0 [v ]ΔZ t+1,α 0 [v ] Var ΔZ, α 0 [v ]. Denote C sup E {ΔZ 1,α 0 [v ]ΔZ t+1,α 0 [v ]}. t [1, he following high-level assumption captures the essence of the problem. Assumption 4.1. i v as,and v 2 /V ar ΔZ, α 0 [v ] = O1; ii here is an increasing integer sequence {d [2,} such that d C a Var 1 ΔZ, α 0 [v = o1 and b ] 1 t ρ t = o1. Primitive sufficient conditions for Assumption 4.1 are given in the next subsection. heorem 4.1. Let Assumption 4.1 hold. hen: v 2 sd VarΔZ,α 0 [v ] 1 = o 1; Iffurther Assumptions 3.1, 3.3 and 3.4 hold, then t=d 4.3 [f α fα 0 ] Var ΔZ, α 0 [v ] d N 0, Sufficient Conditions for Assumption 4.1. In this subsection, we first provide sufficient conditions for Assumption 4.1 for sieve M estimation of irregular functionals of general semi-nonparametric models. We then present additional low-level sufficient conditions for sieve M estimation of real-valued functionals of purely nonparametric models. We show that these sufficient conditions are easily satisfied for sieve M estimation of the evaluation and the weighted integration functionals.

15 SIEVE INFERENCE ON IME SERIES MODELS Irregular functionals of general semi-nonparametric models. Given the closedform expressions of v and VarΔZ, α 0[v ] in Subsection 3.3, it is easy to see that the following assumption implies Assumption 4.1.i. Assumption 4.2. i Assumption 3.5 holds and lim k fα 0 h [P k ] 2 E = ; ii he smallest eigenvalue of E [S k Z t S k Z t ] in 3.25 is bounded away from zero uniformly for all k. Next, we provide some sufficient conditions for Assumption 4.1.ii. Let f Z1,Z t, be the joint density of Z 1,Z t andf Z be the marginal density of Z. Letp [1,. Define 4.4 ΔZ, α 0 [v ] p E { ΔZ, α 0 [v ] p } 1/p. By definition, ΔZ, α 0 [v ] 2 2 Assumption 4.1.iia. = VarΔZ, α 0[v ]. he following assumption implies Assumption 4.3. i sup t 2 sup z,z Z Z f Z1,Z t z,z / [f Z1 z f Zt z ] C for some constant C>0; ii ΔZ, α 0 [v ] 1 / ΔZ, α 0[v ] 2 = o1. Assumption 4.3.i is mild. When Z t is a continuous random variable, it is equivalent to assuming that the copula density of Z 1,Z t is bounded uniformly in t 2. For irregular functionals i.e., v, the L2 f Z norm ΔZ, α 0 [v ] 2 diverges under Assumption 4.1.i or Assumption 4.2, Assumption 4.3.ii requires that the L 1 f Z norm ΔZ, α 0 [v ] 1 diverge at a slower rate than the L2 f Z norm ΔZ, α 0 [v ] 2 as k. In many applications the L 1 f Z norm ΔZ, α 0 [v ] 1 actually remains bounded as k and hence Assumption 4.3.ii is trivially satisfied. he following assumption implies Assumption 4.1.iib. Assumption 4.4. i {Z t } is strictly stationary strong-mixing with mixing coefficients α t satisfying tγ [α t] η 2+η < for some η>0 and γ>0; ii As k, ΔZ, α 0 [v ] γ 1 ΔZ, α 0[v ] 2+η ΔZ, α 0 [v ] γ+1 2 = o 1. he α-mixing condition in Assumption 4.4.i with γ> 2+η becomes Condition 1.iii in section of Fan and Yao 2003 for the pointwise asymptotic normality of their local polynomial estimator of a conditional mean function. In the next subsection, we illustrate that γ> η 2+η is also sufficient for sieve M estimation of evaluation functionals of nonparametric time series models to satisfy Assumption 4.4.ii. Proposition 4.2. Let Assumptions 4.2, 4.3 and 4.4 hold. hen: 1 and Assumption 4.1 holds. v η ρ t = o1 heorem 4.1 and Proposition 4.2 show that when the functional f is irregular i.e.,, time series dependence does not affect the asymptotic variance of a general

16 16 X. CHEN, Z. LIAO AND Y. SUN sieve M estimator f α. Similar results have been proved for nonparametric kernel and local polynomial estimators of evaluation functionals of conditional mean and density functions. See for example, Robinson 1983, Fan and Yao 2003 and Gao However, whether this is the case for general sieve M estimators of unknown functionals has been a long standing question. heorem 4.1 and Proposition 4.2 give a positive answer. his may seem surprising at first sight as sieve estimators are often regarded as global estimators while kernel estimators are regarded as local estimators Irregular functionals of purely nonparametric models. In this subsection, we provide additional low-level sufficient conditions for Assumptions 4.1.i, 4.3.ii and 4.4.ii for purely nonparametric models where the true unknown parameter is a real-valued function h 0 thatsolvessup h H E[lZ t,hx t ]. his includes as a special case the nonparametric conditional mean model: Y t = h 0 X t +u t with E[u t X t ] = 0. Our results can be easily generalized to more general settings with only some notational changes. Let α 0 = h 0 Hand let f :H R be any functional of interest. By the results in Subsection 3.3, fh 0 has its sieve Riesz representer given by: where R k v =P k β V is such that with β = R 1 k fh 0 h [P k ], β R k β = E r Z t,h 0 [β P k,p k β] = β E { r Z t,h 0 X t P k X t P k X t } β for all β R k. Also, the score process can be expressed as ΔZ t,h 0 [v ]= ΔZ t,h 0 X t v X t = ΔZ t,h 0 X t P k X t β. Here the notations ΔZ t,h 0 X t and r Z t,h 0 X t indicate the standard first-order and second-order derivatives of lz t,hx t instead of functional pathwise derivatives for example, we have r Z t,h 0 X t = 1 and ΔZ t,h 0 X t = [Y t h 0 X t ] /2 in the nonparametric conditional mean model. hus, v 2 = E { E[ r Z, h 0 X X]v X2} = β R k β = fh 0 h [P k ]R 1 fh 0 k h [P k ], VarΔZ, h 0 [v ] = E {E[ ΔZ, h 0 X] 2 Xv X2}. It is then obvious that Assumption 4.1.i is implied by the following condition. Assumption 4.5. i inf x X E[ r Z, h 0 X X = x] c 1 > 0; ii sup x X E[ r Z, h 0 X X = x] c 2 < ; iii the smallest and largest eigenvalues of E {P k XP k X } are bounded and bounded away from zero uniformly for all k,andlim k fh 0 h [P k ] 2 E = ; iv inf x X E[ ΔZ, h 0 X] 2 X = x c 3 > 0. It is easy to see that Assumptions 4.3.ii and 4.4.ii are implied by the following assumption.

17 SIEVE INFERENCE ON IME SERIES MODELS 17 [ ] Assumption 4.6. i E { v X } = O1; ii sup x X E ΔZ, h0 X 2+η X = x 2+ηγ+1/2 c 4 < ; iii E{ v } X 2 E{ v X 2+η } = o1. It actually suffices to use ess-inf x or ess-sup x instead of inf x or sup x in Assumptions 4.5 and 4.6. We immediately obtain the following results. Remark Let Assumptions 4.3.i, 4.4.i, 4.5 and 4.6 hold. hen: 1 ρ t = o1 and v 2 sd Var ΔZ, α 0 [v ] 1 = o 1. 2 Assumptions 4.5 and 4.6.ii imply that VarΔZ, α 0 [v ] E { v X 2} v 2 β 2 E fh 0 h [P k ] 2 E ; hence Assumption 4.6.iii is satisfied if E{ P k X β 2+η }/ β 2+ηγ+1 E = o1. Assumptions 4.3.i, 4.4.i, 4.5 and 4.6.ii are all very standard low level sufficient conditions. Assumptions 4.6.i and iii are easily satisfied by two typical functionals of nonparametric models: the evaluation functional and the weighted integration functional. Consider as an example the evaluation functional fh 0 = h 0 x with x X. We have fh 0 h [P k ] = P k x, v = P k β = P k R 1 k P k x. hen v 2 = P k xr 1 k P k x =v x, and v 2 P k x 2 E under Assumption 4.5.iiiiii. Furthermore, we have, for any v V : 4.5 v x =E {E[ r Z, h 0 X X]v Xv X} v x δ x, x dx, where 4.6 δ x, x =E[ r Z, h 0 X X = x]v x f X x = E[ r Z, h 0 X X = x]p k xr 1 k P k xf X x. By equation 4.5 δ x, x has the reproducing property on V, so it behaves like the Dirac delta function δ x x onv. herefore v x concentrates in a neighborhood around x = x and maintains the same positive sign in this neighborhood. We first verify Assumption 4.6.i. By equation 4.6, we have v sign v x f X x dx = x x X x X E[ r Z, h 0 X X = x] δ x, x dx b xδ x, x dx, x X where signv x = 1 if v x > 0andsignv x = 1ifv x 0, and sup x X b x c 1 1 < under Assumption 4.5.i. If b x V, then by equation 4.5 wehave: v x f sign v X x dx = b x = x E[ r Z, h 0 X X = x] c 1 1 = O 1. x X x X

18 18 X. CHEN, Z. LIAO AND Y. SUN If b x / V but can be approximated by a bounded function ṽ x V such that [b x ṽ x] δ x, x dx = o1, x X then, also using equation 4.5, we obtain: v x f X x dx = ṽ x δ x, x dx + x X x X =ṽ x+o1 = O 1. hus Assumption 4.6.i is satisfied. Similarly we can show that under mild conditions: { E v X 2+η} On the other hand, { E v X 2} = x X x X [b x ṽ x] δ x, x dx v x 1+η E[ r Z, h 0 X X = x] 1 + o 1 = O v x 1+η. v x 2 f X x dx = x X v x E[ r Z, h 0 X X = x] δ x, x dx v x. herefore E { v X 2} 2+ηγ+1/2 { E v X 2+η} v x 1+η 2+ηγ+1/2 = o1 if 1 + η 2 + ηγ +1/2 < 0, which is equivalent to γ > η/2 + η. hat is, when γ>η/2 + η, Assumption 4.6.iii is satisfied. One may conclude from heorem 4.1 and Proposition 4.2 that the results and inference procedures for sieve estimators carry over from iid data to the time series case without modifications. However, this is true only when the sample size is large and the dependence is weak. Whether the sample size is large enough so that one can ignore the temporal dependence depends on the functional of interest, the strength of the temporal dependence, and the sieve basis functions employed. So it is ultimately an empirical question. In any finite sample, the temporal dependence does affect the sampling distribution of the sieve estimator. In the next section, we design an inference procedure that is easy to use and at the same time captures the time series dependence in finite samples. 5. Autocorrelation Robust Inference. In order to apply the asymptotic normality heorem 3.1, we need an estimator of the sieve variance v 2 sd. In this section we propose a simple estimator of v 2 sd and establish the asymptotic distributions of the associated t statistic and Wald statistic. he theoretical sieve Riesz representer v is not known and has to be estimated. Let denote the empirical norm induced by the following empirical inner product 5.1 v 1,v 2 = 1 rz t, α [v 1,v 2 ],

19 SIEVE INFERENCE ON IME SERIES MODELS 19 for any v 1,v 2 V. We define an empirical sieve Riesz representer v f α [ ] with respect to the empirical norm, i.e. 5.2 f α [ v ]= sup v V,v 0 f α [v] 2 v 2 < of the functional and 5.3 f α [v] = v, v for any v V. We next show that the theoretical sieve Riesz representer v can be consistently estimated by the empirical sieve Riesz representer v under the norm. In the following we denote W {v V : v =1}. Assumption 5.1. Let {ɛ } be a positive sequence such that ɛ = o1. i sup α B,v 1,v 2 W E{rZ, α[v 1,v 2 ] rz, α 0 [v 1,v 2 ]} = Oɛ ; ii sup α B,v 1,v 2 W μ {rz, α[v 1,v 2 ]} = O p ɛ ; fα iii sup α B,v W [v] fα 0 [v] = Oɛ. Assumption 5.1.i is a smoothness condition on the second derivative of the criterion function with respect to α. In the nonparametric LS regression model, we have rz, α[v 1,v 2 ]=rz, α 0 [v 1,v 2 ] for all α and v 1,v 2. Hence Assumption 5.1.i is trivially satisfied. Assumption 5.1.ii is a stochastic equicontinuity condition on the empirical process 1 rz t,α[v 1,v 2 ] indexed by α in the shrinking neighborhood B uniformly in v 1,v 2 W. Assumption 5.1.iii puts some smoothness condition on the functional fα [v] with respect to α in the shrinking neighborhood B uniformly in v W. 5.4 Lemma 5.1. Let Assumption 5.1 hold, then v v 1 = O pɛ and v v v = O p ɛ. With the empirical estimator v satisfying Lemma 5.1, we can now construct an estimate of the v 2 sd, which is the LRV of the score process ΔZ t,α 0 [v ]. Many nonparametric LRV estimators are available in the literature. o be consistent with our focus on the method of sieves and to derive a simple and robust asymptotic approximation, we use an orthonormal series LRV OS-LRV estimator in this paper. he OS-LRV estimator has already been used in constructing autocorrelation robust inference on regular functionals of parametric time series models; see, e.g., Phillips 2005 and Sun 2011a. Let {φ m } m=0 be a sequence of orthonormal basis functions in L 2 [0, 1] with φ 0 1. Define the orthogonal series projection 5.5 Λm = 1 φ m t ΔZ t, α [ v ]

20 20 X. CHEN, Z. LIAO AND Y. SUN and construct the direct series estimator Ω m = Λ 2 m for each m =1, 2,...,M where M Z+. aking a simple average of these direct estimators yields our OS-LRV estimator v 2 sd, of v 2 sd : 5.6 v 2 sd, 1 M M Ω m = 1 M m=1 M Λ 2 m, where M, the number of orthonormal basis functions used, is the smoothing parameter in the LRV estimation. For irregular functionals, our asymptotic result in Section 4 suggests that we can ignore the temporal dependence and estimate v 2 sd by σ2 v = 1 {ΔZ t,α 0 [ v ]}2. However, when the sample size is small, there may still be considerable autocorrelation in the sieve score process {ΔZ t,α 0 [v ]}. o capture the possibly large but diminishing autocorrelation in a finite sample, we propose treating {ΔZ t,α 0 [v ]} as a generic time series and using the same formula as in 5.6 to estimate the asymptotic variance of 1/2 ΔZ t,α 0 [v ]. We call the estimator the pre-asymptotic variance estimator. With a data-driven smoothing parameter choice of M, the pre-asymptotic variance estimator v 2 sd, should be close to σ2 v when the sample size is large. On the other hand, when the sample size is small, the pre-asymptotic variance estimator may provide a more accurate measure of the sampling variation of the plug-in sieve M estimator of irregular functionals. An extra benefit of the pre-asymptotic idea is that it allows us to treat regular and irregular functionals in a unified framework. So we do not distinguish regular and irregular functionals in the rest of this section. o make statistical inference on a scalar functional fα 0, we construct a t statistic as follows: [f α fα 0 ] 5.7 t v sd,. We proceed to establish the asymptotic distribution of t when M isafixedconstant.o facilitate our development, we make the assumption below. Assumption 5.2. Let ɛ ξ = o1 and the following conditions hold: i sup v W,α B 1/2 φ m t/ ΔZ t,α[v] ΔZ t,α 0 [v] E{ΔZ t,α[v]} = o p 1 for m =0, 1,...,M; ii sup v W,α B E {ΔZ, α[v] ΔZ t,α 0 [v] rz, α 0 [v, α α 0 ]} = O ɛ ξ ; iii sup 1/2 v W φ mt/ ΔZ t,α 0 [v] = O p 1 for m =0, 1,...,M; iv For e t iid N0, 1, we have for any x =x 1,...,x M R M, P 1/2 φ m t/ ΔZ t,α 0 [u ] <x m, m =0, 1,...,M = P 1/2 m=1 φ m t/ e t <x m, m =0, 1,...,M + o 1.