SIEVE INFERENCE ON POSSIBLY MISSPECIFIED SEMI-NONPARAMETRIC TIME SERIES MODELS

Similar documents

The VAR models discussed so fare are appropriate for modeling I(0) data, like asset returns or growth rates of macroeconomic time series.

1 Short Introduction to Time Series

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Example 4.1 (nonlinear pendulum dynamics with friction) Figure 4.1: Pendulum. asin. k, a, and b. We study stability of the origin x

Introduction to General and Generalized Linear Models

Mathematics Course 111: Algebra I Part IV: Vector Spaces

A Coefficient of Variation for Skewed and Heavy-Tailed Insurance Losses. Michael R. Powers[ 1 ] Temple University and Tsinghua University

Adaptive Online Gradient Descent

INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition)

Nonparametric adaptive age replacement with a one-cycle criterion

t := maxγ ν subject to ν {0,1,2,...} and f(x c +γ ν d) f(x c )+cγ ν f (x c ;d).

BANACH AND HILBERT SPACE REVIEW

Metric Spaces. Chapter Metrics

MATHEMATICAL METHODS OF STATISTICS

Least Squares Estimation

Bias in the Estimation of Mean Reversion in Continuous-Time Lévy Processes

Statistical Machine Learning

Sales forecasting # 2

SYSTEMS OF REGRESSION EQUATIONS

1 Teaching notes on GMM 1.

The term structure of Russian interest rates

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus

Time Series Analysis

CHAPTER IV - BROWNIAN MOTION

Chapter 4: Statistical Hypothesis Testing

CITY UNIVERSITY LONDON. BEng Degree in Computer Systems Engineering Part II BSc Degree in Computer Systems Engineering Part III PART 2 EXAMINATION

Data Mining: Algorithms and Applications Matrix Math Review

Inner Product Spaces and Orthogonality

3. INNER PRODUCT SPACES

Statistics in Retail Finance. Chapter 6: Behavioural models

Why High-Order Polynomials Should Not be Used in Regression Discontinuity Designs

Bootstrapping Big Data

Interpreting Kullback-Leibler Divergence with the Neyman-Pearson Lemma

THE FUNDAMENTAL THEOREM OF ALGEBRA VIA PROPER MAPS

THE CENTRAL LIMIT THEOREM TORONTO

Continued Fractions and the Euclidean Algorithm

Systems with Persistent Memory: the Observation Inequality Problems and Solutions

INDISTINGUISHABILITY OF ABSOLUTELY CONTINUOUS AND SINGULAR DISTRIBUTIONS

Multivariate Normal Distribution

Master s Theory Exam Spring 2006

Statistics Graduate Courses

From the help desk: Bootstrapped standard errors

Basics of Statistical Machine Learning

Two Topics in Parametric Integration Applied to Stochastic Simulation in Industrial Engineering

Inner Product Spaces

Further Study on Strong Lagrangian Duality Property for Invex Programs via Penalty Functions 1

Communication on the Grassmann Manifold: A Geometric Approach to the Noncoherent Multiple-Antenna Channel

LOGNORMAL MODEL FOR STOCK PRICES

Date: April 12, Contents

TD(0) Leads to Better Policies than Approximate Value Iteration

1 VECTOR SPACES AND SUBSPACES

Lecture 13: Martingales

Functional Principal Components Analysis with Survey Data

The Heat Equation. Lectures INF2320 p. 1/88

Chapter 4: Vector Autoregressive Models

Department of Economics

1 Norms and Vector Spaces

Uncertainty quantification for the family-wise error rate in multivariate copula models

Chapter 6: Multivariate Cointegration Analysis

THREE DIMENSIONAL GEOMETRY

Comparing Features of Convenient Estimators for Binary Choice Models With Endogenous Regressors

Lecture 3: Linear methods for classification

Reference: Introduction to Partial Differential Equations by G. Folland, 1995, Chap. 3.

Vector and Matrix Norms

Assessing the Relative Power of Structural Break Tests Using a Framework Based on the Approximate Bahadur Slope

Gambling Systems and Multiplication-Invariant Measures

CHAPTER II THE LIMIT OF A SEQUENCE OF NUMBERS DEFINITION OF THE NUMBER e.

Lectures 5-6: Taylor Series

Lecture 10. Finite difference and finite element methods. Option pricing Sensitivity analysis Numerical examples

Differentiating under an integral sign

IDENTIFICATION IN A CLASS OF NONPARAMETRIC SIMULTANEOUS EQUATIONS MODELS. Steven T. Berry and Philip A. Haile. March 2011 Revised April 2011

Mathematical Methods of Engineering Analysis

Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh

Time Series Analysis

1 if 1 x 0 1 if 0 x 1

Math 4310 Handout - Quotient Vector Spaces

Metric Spaces. Chapter 1

A SURVEY ON CONTINUOUS ELLIPTICAL VECTOR DISTRIBUTIONS

ECON20310 LECTURE SYNOPSIS REAL BUSINESS CYCLE

(Quasi-)Newton methods

Maximum likelihood estimation of mean reverting processes

NOTES ON LINEAR TRANSFORMATIONS

Random graphs with a given degree sequence

MATH 425, PRACTICE FINAL EXAM SOLUTIONS.

Non Parametric Inference

LINEAR ALGEBRA W W L CHEN

OPTIMAl PREMIUM CONTROl IN A NON-liFE INSURANCE BUSINESS

Notes on metric spaces

STA 4273H: Statistical Machine Learning

Chapter 3: The Multiple Linear Regression Model

MATH 304 Linear Algebra Lecture 20: Inner product spaces. Orthogonal sets.

**BEGINNING OF EXAMINATION** The annual number of claims for an insured has probability function: , 0 < q < 1.

NOV /II. 1. Let f(z) = sin z, z C. Then f(z) : 3. Let the sequence {a n } be given. (A) is bounded in the complex plane

Transcription:

SIEVE INFERENCE ON POSSIBLY MISSPECIFIED SEMI-NONPARAMERIC IME SERIES MODELS By Xiaohong Chen, Zhipeng Liao and Yixiao Sun Yale University, UC Los Angeles and UC San Diego his paper provides a general theory on the asymptotic normality of plug-in sieve M estimators of possibly irregular functionals of semi-nonparametric time series models. We show that, even when the sieve score process is not a martingale difference, the asymptotic variances of plug-in sieve M estimators of irregular i.e., slower than root- estimable functionals are the same as those for independent data. Nevertheless, ignoring the temporal dependence in finite samples may not lead to accurate inference. We then propose an easy-to-compute and more accurate inference procedure based on a pre-asymptotic sieve variance estimator that captures temporal dependence of unknown forms. We construct a pre-asymptotic Wald statistic using an orthonormal series long run variance OS-LRV estimator. For sieve M estimators of both regular i.e., root- estimable and irregular functionals, a scaled pre-asymptotic Wald statistic is asymptotically F distributed when the series number of terms in the OS-LRV estimator is held fixed. Simulations indicate that our scaled pre-asymptotic Wald test with F critical values has more accurate size in finite samples than the conventional Wald test with chi-square critical values. 1. Introduction. Many economic and financial time series are nonlinear and non- Gaussian; see, e.g., Granger 2003. For policy analysis, it is important to uncover complicated nonlinear economic relations in structural models. Unfortunately, it is difficult to correctly parameterize all aspects of nonlinear dynamic functional relations. Due to the well-known problem of curse of dimensionality it is also impractical to estimate a general nonlinear time series model fully nonparametrically. hese issues motivate the growing popularity of semiparametric and semi-nonparametric models and methods in economics and finance. he method of sieves Grenander, 1981 is a general procedure for estimating semiparametric and nonparametric models, and has been widely used in statistics, economics, finance, biostatistics and other disciplines. In this paper, we focus on sieve M estimation, which optimizes a sample average of a random criterion over a sequence of approximating parameter spaces, sieves, that becomes dense in the original infinite dimensional parameter space as the complexity of the sieves grows to infinity with the sample size.see Shen and Wong 1994, Chen 2007 and the references therein for many examples of sieve Supported by the National Science Foundation SES-0838161 and Cowles Foundation Supported by the National Science Foundation SES-0752443 AMS 2000 subject classifications: Primary 62G10; secondary 62M10 Keywords and phrases: Dynamic misspecification, sieve M estimation, sieve Riesz representer, irregular functional, pre-asymptotic variance, orthogonal series long run variance estimation, F distribution 1

2 X. CHEN, Z. LIAO AND Y. SUN M estimation, including sieve quasi maximum likelihood, sieve nonlinear least squares, sieve generalized least squares, and sieve quantile regression. We consider inference on possibly misspecified semi-nonparametric time series models via the method of sieve M estimation. For general sieve M estimators with weakly dependent data, White and Wooldridge 1991 establish the consistency, and Chen and Shen 1998 establish the convergence rate and the asymptotic normality of plug-in sieve M estimators of regular i.e., estimable functionals. o the best of our knowledge, there is no published work on the limiting distributions of plug-in sieve M estimators of irregular i.e., slower than estimable functionals. here is also no published inferential result for general sieve M estimators of regular or irregular functionals for possibly misspecified semi-nonparametric time series models. We first provide a general theory on the asymptotic normality of plug-in sieve M estimators of possibly irregular functionals in semi-nonparametric time series models. he key insight is to examine the functional of interest on a sieve tangent space where a Riesz representer always exists regardless of whether the functional is regular or irregular. he asymptotic normality result is rate-adaptive in the sense that applied researchers do not need to know aprioriwhether the functional of interest is estimable or not. For possibly misspecified semi-nonparametric models with weakly dependent data, Chen and Shen 1998 establish that the asymptotic variance of a sieve M estimator of any regular functional depends on the temporal dependence and is equal to the long run variance LRV of a scaled score or moment process. In this paper, we show a new result that, regardless of whether the score process is martingale difference or not, the asymptotic variance of a sieve M estimator of an irregular functional for weakly dependent data is the same as that for independent data. Our asymptotic theory suggests that, for weakly dependent time series data with a large sample size, temporal dependence could be ignored in making inference on irregular functionals via the method of sieves. However, simulation studies indicate that inference procedures based on asymptotic variance estimates ignoring autocorrelation do not perform well when the sample size is small relatively to the degree of temporal dependence. See, e.g., Conley, Hansen and Liu 1997 and Pritsker 1998 for earlier discussion of this problem with kernel density estimation for interest rate data sets. o deal with this problem, for inference on both regular and irregular functionals, we propose to use a pre-asymptotic sieve variance that captures temporal dependence of an unknown form. hat is, we treat the underlying triangular array sieve score process as a generic time series and ignore the fact that it becomes less temporally dependent when the sieve number of terms in approximating unknown functions grows to infinity as goes to infinity. his novel pre-asymptotic sieve approach enables us to develop a unified inference framework that can accommodate both regular and irregular functionals. o derive a simple and more accurate asymptotic approximation under weak conditions, we compute a pre-asymptotic Wald statistic using an orthonormal series LRV OS-LRV estimator. For both regular and irregular functionals, we show that the pre-asymptotic t statistic and a scaled Wald statistic converge to the standard t distribution and F distribution respectively when the series number of terms in the OS-LRV estimator is held fixed;

SIEVE INFERENCE ON IME SERIES MODELS 3 and that the t distribution and F distribution approach the standard normal and chi-square distributions respectively when the series number of terms in the OS-LRV estimator goes to infinity. Our pre-asymptotic t and F approximations achieve triple robustness in the following sense: they are asymptotically valid regardless of 1 whether the functional is regular or not; 2 whether there is temporal dependence of unknown form or not; and 3 whether the series number of terms in the OS-LRV estimator is held fixed or not. he rest of the paper is organized as follows. Section 2 presents the plug-in sieve M estimator of functionals of interest and gives two illustrative examples. Section 3 establishes the asymptotic normality of the plug-in sieve M estimators of possibly irregular functionals. Section 4 shows that the asymptotic variances of plug-in sieve M estimators of irregular functionals for weakly dependent data are the same as if they were for i.i.d. data. Section 5 presents the pre-asymptotic OS-LRV estimator and F approximation. Section 6 describes a simple computation method and reports a simulation study using a partially linear regression model. Appendix contains all the proofs. Notation. We denote f A a F A a as the marginal probability density cdf of a random variable A evaluated at a and f AB a, b F AB a, b the joint density cdf of the random variables A and B. We use to introduce definitions. For any vector-valued A, weleta denote its transpose and A E A A, although sometimes we also use A = A A without confusion. Denote L p Ω,dμ, 1 p<, as a space of measurable functions with g L p Ω,dμ { Ω gt p dμt} 1/p <, where Ω is the support of the sigma-finite positive measure dμ sometimes L p Ω and g L p Ω are used when dμ is the Lebesgue measure. For any possibly random positive sequences {a } =1 and {b } =1, a = O p b means that lim c lim sup Pr a /b >c=0;a = o p b meansthat for all ε > 0, lim Pr a /b >ε = 0; and a b means that there exist two constants 0 <c 1 c 2 < such that c 1 a b c 2 a.weusea A k, H H k and V V k to denote various sieve spaces. For simplicity, we assume that dimv = dima dimh k, all of which grow to infinity with the sample size. 2. Sieve M Estimation. We assume that the data {Z t =Y t,x t } is from a strictly stationary and weakly dependent process defined on an underlying complete probability space. Let Z R dz, 1 d z <, Y R dy and X R dx be the supports of Z t,y t and X t respectively. Let A,d denote an infinite dimensional metric space. Let l : Z A R be a measurable function and E[lZ, α] be a population criterion. For simplicity we assume that there is a unique α 0 A,d such that E[lZ, α 0 ] >E[lZ, α] for all α A,dwithdα, α 0 > 0. Different models correspond to different choices of the criterion functions E[lZ, α] and the parameter spaces A,d. A model does not need to be correctly specified and α 0 could be a pseudo-true parameter. Let f :A,d R be a known measurable mapping. In this paper we are interested in estimation of and inference on fα 0 via the method of sieves. Let A be a sieve space for the whole parameter space A,d. hen there is an element Π α 0 A such that d Π α 0,α 0 0asdimA with. An approximate sieve

4 X. CHEN, Z. LIAO AND Y. SUN M estimator α A of α 0 solves 2.1 1 1 lz t, α sup α A lz t,α O p ε 2, where the term O p ε 2 =o p 1 denotes the maximization error when α fails to be the exact maximizer over the sieve space. We call f α theplug-in sieve M estimator of fα 0. Under very mild conditions see, e.g., Chen, 2007, heorem 3.1 and White and Wooldridge, 1991, the sieve M estimator α is consistent for α 0 : d α,α 0 =O p {max [d α, Π α 0,dΠ α 0,α 0 ]} = o p 1. Given the consistency, we can restrict our attention to a shrinking d-neighborhood of α 0. We equip A with an inner product induced norm α α 0 that is weaker than dα, α 0 i.e., α α 0 cdα, α 0 for a constant c>0, and is locally equivalent to E[lZ t,α 0 lz t,α] in a shrinking d-neighborhood of α 0. For strictly stationary weakly dependent data, Chen and Shen 1998 establish the convergence rate: α α 0 = O p ξ =o p 1/4, where ξ =max[ α Π α 0, Π α 0 α 0 ]. he method of sieve M estimation includes many special cases. Different choices of criterion functions lz t,α and different choices of sieves A lead to different examples of sieve M estimation. As an illustration, we provide two examples below. See, e.g., Shen and Wong 1994 and Chen 2007 for additional examples. Example 2.1. Partially additive ARX regression Suppose that the time series data {Y t } is generated by 2.2 Y t = X tθ 0 + h 01 Y t 1 +h 02 Y t 2 +u t, with E [u t X t,y t 1,Y t 2 ]=0, where X t is a d x dimensional random vector, and could include finitely many lagged Y t s. Let θ 0 Θ R dx and h 0j H j for j =1, 2. Letα 0 =θ 0,h 01,h 02 A=Θ H 1 H 2. Examples of functionals of interest could be fα 0 =λ θ 0 or h 0j y j where λ R dx and y j inty for j =1, 2. For the sake of concreteness we assume that Y is a bounded interval of R and H j = Λ s j Y ahölder space for s j > 0.5, j =1, 2, where { Λ s Y = h C [s] Y : sup sup k [s] hy [s] h y } hy <, sup k [s] y Y y,y Y y y s [s] <, where [s] is the largest integer that is strictly smaller than s. hehölder space Λ s Y with s>0.5 is a smooth function space that is widely assumed in the semi-nonparametric

SIEVE INFERENCE ON IME SERIES MODELS 5 literature. We can then approximate H = H 1 H 2 by a sieve H = H 1, H 2,,where for j =1, 2, k j, 2.3 H j, = h :h = β k p j,k =β P kj,, β R k j,, k=1 where the known sieve basis P kj, could be polynomial splines, B-splines, wavelets, Fourier series and others. Let lz t,α = [Y t X t θ h 1 Y t 1 h 2 Y t 2 ] 2 /4withα = θ,h 1,h 2 A = Θ H 1 H 2.LetA =Θ H 1, H 2, be a sieve for A. We can estimate α 0 Aby the sieve least squares LS estimator α θ, ĥ1,, ĥ2, A : 1 2.4 α =arg max θ,h 1,h 2 A lz t,θ,h 1,h 2. A functional of interest fα 0 suchasλ θ 0 or h 0j y j is then estimated by the plug-in sieve LS estimator f α suchasλ θ or ĥj, y j. his example is very similar to Example 2 in Chen and Shen 1998, except that we allow for dynamic mispecification in the sense that E [u t X t,y t 1,Y t 2 ; Y t j for j 3] may not equal to zero. One can slightly modify their proofs to get the convergence rate of α and the -asymptotic normality of λ θ. But that paper does not provide a variance estimator for λ θ. he results in our paper immediately lead to the asymptotic normality of f α for possibly irregular functionals fα 0 and provide simple, robust inference on fα 0. Example 2.2. Possibly misspecified copula-based time series model Suppose that {Y t } is a sample of strictly stationary first order Markov process generated from F Y,C 0,, where F Y is the true unknown continuous marginal distribution, and C 0, is the true unknown copula for Y t 1,Y t that captures all the temporal and tail dependence of {Y t }.he τ-th conditional quantile of Y t given Y t 1 =Y t 1,...,Y 1 is: Q Y τ y =F 1 Y C 1 2 1 [τ F Y y], where C 2 1 [ u] u C 0u, is the conditional distribution of U t F Y Y t given U t 1 = u, and C 1 2 1 [τ u] is its τ-th conditional quantile. he conditional density function of Y t given Y t 1 is p 0 Y t 1 =f Y c 0 F Y Y t 1,F Y, where f Y and c 0, are the density functions of F Y and C 0, respectively. A researcher specifies a parametric form {c, ; θ :θ Θ} for the copula density function, but it could be misspecified in the sense c 0, / {c, ; θ :θ Θ}. Letθ 0 be the pseudo true copula dependence parameter: θ 0 =argmax θ Θ 1 1 0 0 cu, v; θc 0 u, vdudv.

6 X. CHEN, Z. LIAO AND Y. SUN Let θ 0,f Y be the parameters of interest. Examples of functionals of interest could be λ θ 0, f Y y, F Y y or Q Y 1 0.01 y = FY C 1 2 1 [τ F Y y; θ 0 ] for any λ R d θ and some y suppy t. We could estimate θ 0,f Y by the method of sieve quasi ML using different parameterizations and different sieves for f Y. For example, let h 0 = f Y and α 0 =θ 0,h 0 be the pseudo true unknown parameters. hen f Y =h 2 0 / h2 0 y dy, andh 0 L 2 R. For the identification of h 0, we can assume that h 0 H: 2.5 H = h =p 0 + β j p j : βj 2 <, j=1 where {p j } j=0 is a complete orthonormal basis functions in L2 R, such as Hermite polynomials, wavelets and other orthonormal basis functions. Here we normalize the coefficient of the first basis function p 0 to be 1 in order to achieve the identification of h 0. Other normalization could also be used. It is now obvious that h 0 Hcould be approximated by functions in the following sieve space: 2.6 H = h =p k 0 + β j p j =p 0 +β P k :β R k. j=1 Let Z t =Y t 1,Y t, α =θ,h A=Θ Hand 2.7 { } { h 2 Y t Yt 1 lz t,α=log +log c h2 y dy j=1 h 2 y h2 x dx dy, Yt } h 2 y h2 x dx dy; θ. hen α 0 =θ 0,h 0 A=Θ H could be estimated by the sieve quasi MLE α = θ, ĥ A =Θ H that solves: { { }} 1 h 2 Y 1 2.8 sup lz t,α+log α Θ H O p ε 2 t=2 h2. y dy A functional of interest f α 0 suchasλ θ 0, f Y y = h 2 0 y / h2 0 y dy, F Y y or Q Y 0.01 y is then estimated by the plug-in sieve quasi MLE f α suchasλ θ, fy y = ĥ 2 y / ĥ2 y dy, F Y y = y f Y ydy or Q Y 1 0.01 y = F Y C 1 2 1 [τ F Y y; θ]. Under correct specification, Chen, Wu and Yi 2009 establish the rate of convergence of the sieve MLE α and provide a sieve likelihood-ratio inference for regular functionals including f α 0 =λ θ 0 or F Y yorq Y 0.01 y. Under misspecified copulas, by applying Chen and Shen 1998, we can still derive the convergence rate of the sieve quasi MLE α and the asymptotic normality of f α for regular functionals. However, the sieve likelihood ratio inference given in Chen, Wu and Yi 2009 is no longer valid under misspecification. he results in this paper immediately lead to the asymptotic normality of f α suchas f Y y =ĥ2 y / ĥ2 y dy for any possibly irregular functional fα 0suchasf Y y as well as valid inferences under potential misspecification.

SIEVE INFERENCE ON IME SERIES MODELS 7 3. Asymptotic Normality of Sieve M Estimators. In this section, we establish the asymptotic normality of plug-in sieve M estimators of possibly irregular functionals of semi-nonparametric time series models. We also give a closed-form expression for the sieve Riesz representer that appears in our asymptotic normality result. 3.1. Local Geometry. he convergence rate result of Chen and Shen 1998 implies that α B B 0 with probability approaching one, where 3.1 B 0 {α A: α α 0 Cξ loglog }; B B 0 A. Hence, we now regard B 0 as the effective parameter space and B as its sieve space. Let 3.2 α 0, arg min α B α α 0. Let V clsp B {α 0, },whereclsp B denotes the closed linear span of B under. henv is a finite dimensional Hilbert space under. Similarly the space V clsp B 0 {α 0 } is a Hilbert space under. Moreover,V is dense in V under. o simplify the presentation, we assume that dimv =dima k, all of which grow to infinity with. By definition we have α 0, α 0,v =0forallv V. As demonstrated in Chen and Shen 1998, there is lots of freedom to choose such a norm α α 0 that is locally equivalent to E[lZ, α 0 lz, α]. In some parts of this paper, for the sake of concreteness, we present results for a specific choice of the norm. We suppose that for all α in a shrinking d-neighborhood of α 0, lz, α lz, α 0 canbe approximated by ΔZ, α 0 [α α 0 ] such that ΔZ, α 0 [α α 0 ] is linear in α α 0. Denote the remainder of the approximation as: 3.3 rz, α 0 [α α 0,α α 0 ] 2 {lz, α lz, α 0 ΔZ, α 0 [α α 0 ]}. When lim τ 0 [lz, α 0 + τ[α α 0 ] lz, α 0 /τ] is well defined, we could let ΔZ, α 0 [α α 0 ] = lim τ 0 [lz, α 0 + τ[α α 0 ] lz, α 0 /τ], which is called the directional derivative of lz, α atα 0 in the direction [α α 0 ]. Define 3.4 α α 0 = E rz, α 0 [α α 0,α α 0 ] with the corresponding inner product, 3.5 α 1 α 0,α 2 α 0 = E { rz, α 0 [α 1 α 0,α 2 α 0 ]} for any α 1,α 2 in the shrinking d-neighborhood of α 0. In general this norm defined in 3.4 is weaker than d,. Since α 0 is the unique maximizer of E[lZ, α] on A, under mild conditions α α 0 defined in 3.4 is locally equivalent to E[lZ, α 0 lz, α]. For any v V, we define fα 0 [v] to be the pathwise directional derivative of the functional f atα 0 and in the direction of v = α α 0 V: 3.6 fα 0 [v] = fα 0 + τv τ for any v V. τ=0

8 X. CHEN, Z. LIAO AND Y. SUN For any v = α α 0, V, we let 3.7 fα 0 [v ]= fα 0 [α α 0 ] fα 0 [α 0, α 0 ]. So fα 0 [ ] is also a linear functional on V. Note that V is a finite dimensional Hilbert space. As any linear functional on a finite dimensional Hilbert space is bounded, we can invoke the Riesz representation theorem to deduce that there is a v V such that 3.8 and that 3.9 fα 0 [v] = v,v for all v V fα 0 [v ]= v 2 = sup fα 0 v V,v 0 [v] 2 / v 2 We call v the sieve Riesz representer of the functional fα 0 [ ] onv. We emphasize that the sieve Riesz representation 3.8 3.9 of the linear functional fα 0 [ ] onv always exists regardless of whether fα 0 [ ] is bounded on the infinite dimensional space V or not. his crucial observation enables us to develop a general and unified theory that is currently lacking in the literature. If fα 0 [ ] is bounded on the infinite dimensional Hilbert space V, i.e. 3.10 v sup v V,v 0 { fα 0 [v] / v } <, then v = O 1 in fact v v < and v v 0as ; we say that f isregular at α = α 0. In this case, we have fα 0 [v] = v,v for all v V,andv is the Riesz representer of the functional fα 0 [ ] onv. See, e.g., Shen 1997. If fα 0 [ ] is unbounded on the infinite dimensional Hilbert space V, i.e. 3.11 sup v V,v 0 { fα 0 [v] / v } =, then v as ; and we say that f isirregular at α = α 0. As it will become clear later, the convergence rate of f α f α 0 depends on the order of v. 3.2. Asymptotic Normality. o establish the asymptotic normality of f α for possibly irregular nonlinear functionals, we assume:

SIEVE INFERENCE ON IME SERIES MODELS 9 Assumption 3.1 local behavior of functional. fα i sup α B fα0 fα 0 [α α 0 ] = o 1 2 v ; ii fα 0 [α 0, α 0 ] = o 1 2 v. Assumption 3.1.i controls the linear approximation error of possibly nonlinear functional f. It is automatically satisfied when f is a linear functional, but it may rule out some highly nonlinear functionals. Assumption 3.1.ii controls the bias part due to the finite dimensional sieve approximation of α 0, to α 0. It is a condition imposed on the growth rate of the sieve dimension dima, and requires that the sieve approximation error rate is of smaller order than 1 2 v.whenf is a regular functional, we have v v <, and since α 0, α 0,v = 0 by definition of α 0,, we have: fα 0 [α 0, α 0 ] = v,α 0, α 0 = v v,α 0, α 0 v v α 0, α 0, thus Assumption 3.1.ii is satisfied if 3.12 v v α 0, α 0 = o 1/2 when f is regular, which is similar to condition 4.1iiiii imposed in Chen 2007, p. 5612 for regular functionals. Next, we make an assumption on the relationship between v and the asymptotic standard deviation of f α fα 0,. It will be shown that the asymptotic standard deviation is the limit of the standard deviation sd norm v sd of v, defined as 3.13 v 2 sd Var 1/2 ΔZ t,α 0 [v ]. Note that v 2 sd is the finite dimensional sieve version of the long run variance of the score process ΔZ t,α 0 [v ], and v 2 sd = VarΔZ, α 0[v ] if the score process {ΔZ t,α 0 [v ]} t is a martingale difference array. Assumption 3.2 sieve variance. v / v sd = O 1. By definition of v given in 3.9, 0 < v is non-decreasing in dimv, and hence is non-decreasing in. Assumption 3.2 then implies that lim inf v sd > 0. Define 3.14 u v / v sd to be the normalized version of v. hen Assumption 3.2 implies that u = O1. Let μ {g Z} 1 [g Z t Eg Z t ] denote the centered empirical process indexed by the function g. Letε = o 1/2. For notational economy, we use the same ε as that in 2.1.

10 X. CHEN, Z. LIAO AND Y. SUN Assumption 3.3 local behavior of criterion. i μ {ΔZ, α 0 [v]} is linear in v V; iii ii sup α B sup μ {lz, α ± ε u lz, α ΔZ, α 0 [±ε u ]} = O p ε 2 ; α B E[lZ t,α lz t,α± ε u ] α ± ε u α 0 2 α α 0 2 2 = Oε2. Assumptions 3.3.ii and iii are simplified versions of those in Chen and Shen 1998, and can be verified in the same way. μ {ΔZ, α 0 [u ]} d N0, 1, wheren0, 1 is a stan- Assumption 3.4 CL. dard normal distribution. Assumption 3.4 is a very mild one, and can be easily verified by applying any existing triangular array CL for weakly dependent data see, e.g., Hall and Heyde, 1980. We are now ready to state the asymptotic normality theorem for the plug-in sieve M estimator. heorem 3.1. Let Assumptions 3.1.i, 3.2 and 3.3 hold. hen 3.15 [f α fα 0, ]/ v sd = μ {ΔZ, α 0 [u ]} + o p 1 ; If further Assumptions 3.1.ii and 3.4 hold, then 3.16 [f α fα 0 ]/ v sd = μ {ΔZ, α 0 [u ]} + o p 1 d N0, 1. In light of heorem 3.1, wecall v 2 sd defined in 3.13 the pre-asymptotic sievevariance of the estimator f α. When the functional fα 0 is regular i.e., v = O1, we have v sd v = O1 typically; so f α convergestofα 0 at the parametric rate of 1/. When the functional fα 0 is irregular i.e., v, we have v sd under Assumption 3.2; so the convergence rate of f α becomes slower than 1/. Regardless of whether the pre-asymptotic sieve variance v 2 sd stays bounded asymptotically i.e., as or not, it always captures whatever true temporal dependence exists in finite samples. For regular functionals of semi-nonparametric time series models, Chen and Shen 1998 and Chen 2007, heorem 4.3 establish that f α fα 0 d N0,σv 2 with 3.17 σ 2 v = lim Var 1/2 ΔZ t,α 0 [v ] = lim v 2 sd 0,. Our heorem 3.1 is a natural extension of their results to allow for irregular functionals.

SIEVE INFERENCE ON IME SERIES MODELS 11 3.3. Sieve Riesz Representer. o apply the asymptotic normality heorem 3.1 one needs to verify Assumptions 3.1 3.4. Once we compute the sieve Riesz representer v V, Assumptions 3.1 and 3.2 can be easily checked, while Assumptions 3.3 and 3.4 are standard ones and can be verified in the same ways as those in Chen and Shen 1998 and Chen 2007 for regular functionals of semi-nonparametric models. Although it may be difficult to compute the Riesz representer v Vin a closed form for a regular functional on the infinite dimensional space V, we can always compute the sieve Riesz representer v V defined in 3.8 and3.9 explicitly. herefore, heorem 3.1 is easily applicable to a large class of semi-nonparametric time series models, regardless of whether the functionals of interest are estimable or not. 3.3.1. Sieve Riesz representers for general functionals. For the sake of concreteness, in this subsection we focus on a large class of semi-nonparametric models where the population criterion E[lZ t,θ,h ] is maximized at α 0 =θ 0,h 0 A=Θ H,Θisacompact subset in R d θ, H is a class of real valued continuous functions of a subset of Z t belonging toahölder, Sobolev or Besov space, and A = Θ H is a finite dimensional sieve space. he general cases with multiple unknown functions require only more complicated notation. Let be the norm defined in 3.4 andv = R d θ {v h =P k β : β R k } be dense in the infinite dimensional Hilbert space V,. By definition, the sieve Riesz representer v =v θ,,v h, =vθ,,p k β V of fα 0 [ ] solves the following optimization problem: fα 0 fα 0 [v ]= v 2 θ v = sup θ + fα 0 h [v h ] 2 v=v θ,v h E r Z V,v 0 t,θ 0,h 0 [v, v] γ F k F 3.18 k = sup γ γ=v θ,β γ R d θ +k,γ 0 R k γ, where 3.19 F k is a d θ + k 1 vector, 1 and fα0 θ, fα 0 h [P k ] 3.20 γ R k γ E r Z t,θ 0,h 0 [v, v] for all v = v θ,p k β V, with I11 I 3.21 R k =,12 I,21 I,22 and R 1 I 11 k := I 12 I 21 I 22 1 When fα 0 h [ ] applies to a vector matrix, it stands for element-wise column-wise operations. We follow the same convention for other operators such as Δ Z t,α 0[ ] and r Z t,α 0[, ] in the paper.

12 X. CHEN, Z. LIAO AND Y. SUN being d θ + k d θ + k positive definite matrices. For example if the criterion function lz,θ,h is twice [ continuously pathwise differentiable [ with respect to θ, h, ] then we have I 11 = E 2 lz t,θ 0,h 0 θ θ ], I,22 = E 2 lz t,θ 0,h 0 h h [P k,p k ], I,12 = [ ] E 2 lz t,θ 0,h 0 θ h [P k ] and I,21 I,12. he sieve Riesz representation 3.8 becomes: for all v =v θ,p k β V, fα 0 3.22 [v] =F k γ = v,v = γ R k γ for all γ =v θ,β R d θ+k. It is obvious that the optimal solution of γ in 3.18 orin3.22 hasaclosed-form expression: 3.23 γ = v θ,,β he sieve Riesz representer is then given by Consequently, = R 1 k F k. v = v θ,,v h, = v θ,,p k β V. 3.24 v 2 = γ R k γ = F k R 1 k F k, which is finite for each sample size but may grow with. Finally the score process can be expressed as hus ΔZ t,α 0 [v ]= Δ θ Z t,θ 0,h 0, Δ h Z t,θ 0,h 0 [P k ] γ S k Z t γ. 3.25 VarΔZ t,α 0 [v ] = γ E [ S k Z t S k Z t ] γ and v 2 sd = γ Var 1 S k Z t γ. o verify Assumptions 3.1 and 3.2 for irregular functionals, it is handy to know the exact speed of divergence of v 2. We assume Assumption 3.5. he smallest and largest eigenvalues of R k defined in 3.20 are bounded and bounded away from zero uniformly for all k. Assumption 3.5 imposes some regularity conditions on the sieve basis functions, which is a typical assumption in the linear sieve or series literature. Remark 3.2. Assumption 3.5 implies that v 2 γ 2 E F k 2 E = fα 0 2 E + fα 0 θ h [P k ] 2 E. hen: f is regular at α = α 0 if lim k fα 0 h [P k ] 2 E < ; f is irregular at α = α 0 if lim k fα 0 h [P k ] 2 E =.

SIEVE INFERENCE ON IME SERIES MODELS 13 3.3.2. Examples. We first consider three typical linear functionals of semi-nonparametric models. For the Euclidean parameter functional fα = λ θ,wehavef k = λ, 0 k with 0 k =[0,...,0] 1 k, and hence v =v θ,,p k β V with vθ, = I11 λ, β = I21 λ, and v 2 = F k R 1 k F k = λ I 11 λ. If the largest eigenvalue of I 11, λ maxi 11, is bounded above by a finite constant uniformly in k, then v 2 λ max I 11 λ λ< uniformly in, and the functional fα =λ θ is regular. For the evaluation functional fα =hx forx X,wehaveF k =0 d θ,p k x,and hence v =v θ,,p k β V with vθ, = I12 P k x, β = I22 P k x, and v 2 = F k R 1 k F k = P k xi 22 P k x. So if the smallest eigenvalue of I 22, λ mini 22, is bounded away from zero uniformly in k, then v 2 λ min I 22 P k x 2 E, and the functional fα =hx is irregular. For the weighted integration functional fα = X wxhxdx for a weighting function wx, we have F k =0 d θ, X wxp k x dx, and hence v = v θ,,p k β with vθ, = I12 X wxp k xdx, β = I22 X wxp k xdx, and { } v 2 = F k R 1 k F k = wxp k xdx I 22 wxp k xdx. X Suppose that the smallest and largest eigenvalues of I 22 are bounded and bounded away from zero uniformly for all k.hen v 2 X wxp k xdx 2 E.husfα = X wxhxdx is regular if lim k X wxp k xdx 2 E < ; is irregular if lim k X wxp k xdx 2 E =. We finally consider an example of nonlinear functionals that arises in Example 2.2 when the parameter of interest is α 0 =θ 0,h 0 with h 2 0 = f Y being the true marginal density of Y t. Consider the functional fα =h 2 y / h2 y dy. Notethatfα 0 = f Y y =h 2 0 y andh 0 is approximated by the linear sieve H given in 2.6. hen F k = 0 d θ, fα 0 h [P k ] with fα 0 h [P k ] = 2h 0 y P k y h 0 y h 0 y P k ydy, and hence v =v θ,,p k β V with vθ, = I12 and v 2 = F k R 1 k F k = fα 0 fα 0 h h [P k ]I 22 X [P k ], β = I22 fα 0 h [P k ], fα 0 h [P k ]. So if the smallest eigenvalue of I 22 is bounded away from zero uniformly in k,then v 2 const. fα 0 h [P k ] 2 E, and the functional f α =h2 y / h2 y dy is irregular at α = α 0.

14 X. CHEN, Z. LIAO AND Y. SUN 4. Asymptotic Variances of Sieve Estimators of Irregular Functionals. In this section, we derive the asymptotic expression of the pre-asymptotic sieve variance v 2 sd for irregular functionals. We provide general sufficient conditions under which the asymptotic variance does not depend on the temporal dependence. 4.1. Exact Form of the Asymptotic Variance. By definition of the pre-asymptotic sieve variance v 2 sd and the strict stationarity of the data {Z t},wehave: [ 4.1 v 2 sd = VarΔZ, α 1 0[v ] 1+2 1 t ] ρ t, where {ρ t} is the autocorrelation coefficient of the triangular array {ΔZ t,α 0 [v ]} t : 4.2 ρ t E ΔZ 1,α 0 [v ]ΔZ t+1,α 0 [v ] Var ΔZ, α 0 [v ]. Denote C sup E {ΔZ 1,α 0 [v ]ΔZ t+1,α 0 [v ]}. t [1, he following high-level assumption captures the essence of the problem. Assumption 4.1. i v as,and v 2 /V ar ΔZ, α 0 [v ] = O1; ii here is an increasing integer sequence {d [2,} such that d C a Var 1 ΔZ, α 0 [v = o1 and b ] 1 t ρ t = o1. Primitive sufficient conditions for Assumption 4.1 are given in the next subsection. heorem 4.1. Let Assumption 4.1 hold. hen: v 2 sd VarΔZ,α 0 [v ] 1 = o 1; Iffurther Assumptions 3.1, 3.3 and 3.4 hold, then t=d 4.3 [f α fα 0 ] Var ΔZ, α 0 [v ] d N 0, 1. 4.2. Sufficient Conditions for Assumption 4.1. In this subsection, we first provide sufficient conditions for Assumption 4.1 for sieve M estimation of irregular functionals of general semi-nonparametric models. We then present additional low-level sufficient conditions for sieve M estimation of real-valued functionals of purely nonparametric models. We show that these sufficient conditions are easily satisfied for sieve M estimation of the evaluation and the weighted integration functionals.

SIEVE INFERENCE ON IME SERIES MODELS 15 4.2.1. Irregular functionals of general semi-nonparametric models. Given the closedform expressions of v and VarΔZ, α 0[v ] in Subsection 3.3, it is easy to see that the following assumption implies Assumption 4.1.i. Assumption 4.2. i Assumption 3.5 holds and lim k fα 0 h [P k ] 2 E = ; ii he smallest eigenvalue of E [S k Z t S k Z t ] in 3.25 is bounded away from zero uniformly for all k. Next, we provide some sufficient conditions for Assumption 4.1.ii. Let f Z1,Z t, be the joint density of Z 1,Z t andf Z be the marginal density of Z. Letp [1,. Define 4.4 ΔZ, α 0 [v ] p E { ΔZ, α 0 [v ] p } 1/p. By definition, ΔZ, α 0 [v ] 2 2 Assumption 4.1.iia. = VarΔZ, α 0[v ]. he following assumption implies Assumption 4.3. i sup t 2 sup z,z Z Z f Z1,Z t z,z / [f Z1 z f Zt z ] C for some constant C>0; ii ΔZ, α 0 [v ] 1 / ΔZ, α 0[v ] 2 = o1. Assumption 4.3.i is mild. When Z t is a continuous random variable, it is equivalent to assuming that the copula density of Z 1,Z t is bounded uniformly in t 2. For irregular functionals i.e., v, the L2 f Z norm ΔZ, α 0 [v ] 2 diverges under Assumption 4.1.i or Assumption 4.2, Assumption 4.3.ii requires that the L 1 f Z norm ΔZ, α 0 [v ] 1 diverge at a slower rate than the L2 f Z norm ΔZ, α 0 [v ] 2 as k. In many applications the L 1 f Z norm ΔZ, α 0 [v ] 1 actually remains bounded as k and hence Assumption 4.3.ii is trivially satisfied. he following assumption implies Assumption 4.1.iib. Assumption 4.4. i {Z t } is strictly stationary strong-mixing with mixing coefficients α t satisfying tγ [α t] η 2+η < for some η>0 and γ>0; ii As k, ΔZ, α 0 [v ] γ 1 ΔZ, α 0[v ] 2+η ΔZ, α 0 [v ] γ+1 2 = o 1. he α-mixing condition in Assumption 4.4.i with γ> 2+η becomes Condition 1.iii in section 6.6.2 of Fan and Yao 2003 for the pointwise asymptotic normality of their local polynomial estimator of a conditional mean function. In the next subsection, we illustrate that γ> η 2+η is also sufficient for sieve M estimation of evaluation functionals of nonparametric time series models to satisfy Assumption 4.4.ii. Proposition 4.2. Let Assumptions 4.2, 4.3 and 4.4 hold. hen: 1 and Assumption 4.1 holds. v η ρ t = o1 heorem 4.1 and Proposition 4.2 show that when the functional f is irregular i.e.,, time series dependence does not affect the asymptotic variance of a general

16 X. CHEN, Z. LIAO AND Y. SUN sieve M estimator f α. Similar results have been proved for nonparametric kernel and local polynomial estimators of evaluation functionals of conditional mean and density functions. See for example, Robinson 1983, Fan and Yao 2003 and Gao 2007. However, whether this is the case for general sieve M estimators of unknown functionals has been a long standing question. heorem 4.1 and Proposition 4.2 give a positive answer. his may seem surprising at first sight as sieve estimators are often regarded as global estimators while kernel estimators are regarded as local estimators. 4.2.2. Irregular functionals of purely nonparametric models. In this subsection, we provide additional low-level sufficient conditions for Assumptions 4.1.i, 4.3.ii and 4.4.ii for purely nonparametric models where the true unknown parameter is a real-valued function h 0 thatsolvessup h H E[lZ t,hx t ]. his includes as a special case the nonparametric conditional mean model: Y t = h 0 X t +u t with E[u t X t ] = 0. Our results can be easily generalized to more general settings with only some notational changes. Let α 0 = h 0 Hand let f :H R be any functional of interest. By the results in Subsection 3.3, fh 0 has its sieve Riesz representer given by: where R k v =P k β V is such that with β = R 1 k fh 0 h [P k ], β R k β = E r Z t,h 0 [β P k,p k β] = β E { r Z t,h 0 X t P k X t P k X t } β for all β R k. Also, the score process can be expressed as ΔZ t,h 0 [v ]= ΔZ t,h 0 X t v X t = ΔZ t,h 0 X t P k X t β. Here the notations ΔZ t,h 0 X t and r Z t,h 0 X t indicate the standard first-order and second-order derivatives of lz t,hx t instead of functional pathwise derivatives for example, we have r Z t,h 0 X t = 1 and ΔZ t,h 0 X t = [Y t h 0 X t ] /2 in the nonparametric conditional mean model. hus, v 2 = E { E[ r Z, h 0 X X]v X2} = β R k β = fh 0 h [P k ]R 1 fh 0 k h [P k ], VarΔZ, h 0 [v ] = E {E[ ΔZ, h 0 X] 2 Xv X2}. It is then obvious that Assumption 4.1.i is implied by the following condition. Assumption 4.5. i inf x X E[ r Z, h 0 X X = x] c 1 > 0; ii sup x X E[ r Z, h 0 X X = x] c 2 < ; iii the smallest and largest eigenvalues of E {P k XP k X } are bounded and bounded away from zero uniformly for all k,andlim k fh 0 h [P k ] 2 E = ; iv inf x X E[ ΔZ, h 0 X] 2 X = x c 3 > 0. It is easy to see that Assumptions 4.3.ii and 4.4.ii are implied by the following assumption.

SIEVE INFERENCE ON IME SERIES MODELS 17 [ ] Assumption 4.6. i E { v X } = O1; ii sup x X E ΔZ, h0 X 2+η X = x 2+ηγ+1/2 c 4 < ; iii E{ v } X 2 E{ v X 2+η } = o1. It actually suffices to use ess-inf x or ess-sup x instead of inf x or sup x in Assumptions 4.5 and 4.6. We immediately obtain the following results. Remark 4.3. 1 Let Assumptions 4.3.i, 4.4.i, 4.5 and 4.6 hold. hen: 1 ρ t = o1 and v 2 sd Var ΔZ, α 0 [v ] 1 = o 1. 2 Assumptions 4.5 and 4.6.ii imply that VarΔZ, α 0 [v ] E { v X 2} v 2 β 2 E fh 0 h [P k ] 2 E ; hence Assumption 4.6.iii is satisfied if E{ P k X β 2+η }/ β 2+ηγ+1 E = o1. Assumptions 4.3.i, 4.4.i, 4.5 and 4.6.ii are all very standard low level sufficient conditions. Assumptions 4.6.i and iii are easily satisfied by two typical functionals of nonparametric models: the evaluation functional and the weighted integration functional. Consider as an example the evaluation functional fh 0 = h 0 x with x X. We have fh 0 h [P k ] = P k x, v = P k β = P k R 1 k P k x. hen v 2 = P k xr 1 k P k x =v x, and v 2 P k x 2 E under Assumption 4.5.iiiiii. Furthermore, we have, for any v V : 4.5 v x =E {E[ r Z, h 0 X X]v Xv X} v x δ x, x dx, where 4.6 δ x, x =E[ r Z, h 0 X X = x]v x f X x = E[ r Z, h 0 X X = x]p k xr 1 k P k xf X x. By equation 4.5 δ x, x has the reproducing property on V, so it behaves like the Dirac delta function δ x x onv. herefore v x concentrates in a neighborhood around x = x and maintains the same positive sign in this neighborhood. We first verify Assumption 4.6.i. By equation 4.6, we have v sign v x f X x dx = x x X x X E[ r Z, h 0 X X = x] δ x, x dx b xδ x, x dx, x X where signv x = 1 if v x > 0andsignv x = 1ifv x 0, and sup x X b x c 1 1 < under Assumption 4.5.i. If b x V, then by equation 4.5 wehave: v x f sign v X x dx = b x = x E[ r Z, h 0 X X = x] c 1 1 = O 1. x X x X

18 X. CHEN, Z. LIAO AND Y. SUN If b x / V but can be approximated by a bounded function ṽ x V such that [b x ṽ x] δ x, x dx = o1, x X then, also using equation 4.5, we obtain: v x f X x dx = ṽ x δ x, x dx + x X x X =ṽ x+o1 = O 1. hus Assumption 4.6.i is satisfied. Similarly we can show that under mild conditions: { E v X 2+η} On the other hand, { E v X 2} = x X x X [b x ṽ x] δ x, x dx v x 1+η E[ r Z, h 0 X X = x] 1 + o 1 = O v x 1+η. v x 2 f X x dx = x X v x E[ r Z, h 0 X X = x] δ x, x dx v x. herefore E { v X 2} 2+ηγ+1/2 { E v X 2+η} v x 1+η 2+ηγ+1/2 = o1 if 1 + η 2 + ηγ +1/2 < 0, which is equivalent to γ > η/2 + η. hat is, when γ>η/2 + η, Assumption 4.6.iii is satisfied. One may conclude from heorem 4.1 and Proposition 4.2 that the results and inference procedures for sieve estimators carry over from iid data to the time series case without modifications. However, this is true only when the sample size is large and the dependence is weak. Whether the sample size is large enough so that one can ignore the temporal dependence depends on the functional of interest, the strength of the temporal dependence, and the sieve basis functions employed. So it is ultimately an empirical question. In any finite sample, the temporal dependence does affect the sampling distribution of the sieve estimator. In the next section, we design an inference procedure that is easy to use and at the same time captures the time series dependence in finite samples. 5. Autocorrelation Robust Inference. In order to apply the asymptotic normality heorem 3.1, we need an estimator of the sieve variance v 2 sd. In this section we propose a simple estimator of v 2 sd and establish the asymptotic distributions of the associated t statistic and Wald statistic. he theoretical sieve Riesz representer v is not known and has to be estimated. Let denote the empirical norm induced by the following empirical inner product 5.1 v 1,v 2 = 1 rz t, α [v 1,v 2 ],

SIEVE INFERENCE ON IME SERIES MODELS 19 for any v 1,v 2 V. We define an empirical sieve Riesz representer v f α [ ] with respect to the empirical norm, i.e. 5.2 f α [ v ]= sup v V,v 0 f α [v] 2 v 2 < of the functional and 5.3 f α [v] = v, v for any v V. We next show that the theoretical sieve Riesz representer v can be consistently estimated by the empirical sieve Riesz representer v under the norm. In the following we denote W {v V : v =1}. Assumption 5.1. Let {ɛ } be a positive sequence such that ɛ = o1. i sup α B,v 1,v 2 W E{rZ, α[v 1,v 2 ] rz, α 0 [v 1,v 2 ]} = Oɛ ; ii sup α B,v 1,v 2 W μ {rz, α[v 1,v 2 ]} = O p ɛ ; fα iii sup α B,v W [v] fα 0 [v] = Oɛ. Assumption 5.1.i is a smoothness condition on the second derivative of the criterion function with respect to α. In the nonparametric LS regression model, we have rz, α[v 1,v 2 ]=rz, α 0 [v 1,v 2 ] for all α and v 1,v 2. Hence Assumption 5.1.i is trivially satisfied. Assumption 5.1.ii is a stochastic equicontinuity condition on the empirical process 1 rz t,α[v 1,v 2 ] indexed by α in the shrinking neighborhood B uniformly in v 1,v 2 W. Assumption 5.1.iii puts some smoothness condition on the functional fα [v] with respect to α in the shrinking neighborhood B uniformly in v W. 5.4 Lemma 5.1. Let Assumption 5.1 hold, then v v 1 = O pɛ and v v v = O p ɛ. With the empirical estimator v satisfying Lemma 5.1, we can now construct an estimate of the v 2 sd, which is the LRV of the score process ΔZ t,α 0 [v ]. Many nonparametric LRV estimators are available in the literature. o be consistent with our focus on the method of sieves and to derive a simple and robust asymptotic approximation, we use an orthonormal series LRV OS-LRV estimator in this paper. he OS-LRV estimator has already been used in constructing autocorrelation robust inference on regular functionals of parametric time series models; see, e.g., Phillips 2005 and Sun 2011a. Let {φ m } m=0 be a sequence of orthonormal basis functions in L 2 [0, 1] with φ 0 1. Define the orthogonal series projection 5.5 Λm = 1 φ m t ΔZ t, α [ v ]

20 X. CHEN, Z. LIAO AND Y. SUN and construct the direct series estimator Ω m = Λ 2 m for each m =1, 2,...,M where M Z+. aking a simple average of these direct estimators yields our OS-LRV estimator v 2 sd, of v 2 sd : 5.6 v 2 sd, 1 M M Ω m = 1 M m=1 M Λ 2 m, where M, the number of orthonormal basis functions used, is the smoothing parameter in the LRV estimation. For irregular functionals, our asymptotic result in Section 4 suggests that we can ignore the temporal dependence and estimate v 2 sd by σ2 v = 1 {ΔZ t,α 0 [ v ]}2. However, when the sample size is small, there may still be considerable autocorrelation in the sieve score process {ΔZ t,α 0 [v ]}. o capture the possibly large but diminishing autocorrelation in a finite sample, we propose treating {ΔZ t,α 0 [v ]} as a generic time series and using the same formula as in 5.6 to estimate the asymptotic variance of 1/2 ΔZ t,α 0 [v ]. We call the estimator the pre-asymptotic variance estimator. With a data-driven smoothing parameter choice of M, the pre-asymptotic variance estimator v 2 sd, should be close to σ2 v when the sample size is large. On the other hand, when the sample size is small, the pre-asymptotic variance estimator may provide a more accurate measure of the sampling variation of the plug-in sieve M estimator of irregular functionals. An extra benefit of the pre-asymptotic idea is that it allows us to treat regular and irregular functionals in a unified framework. So we do not distinguish regular and irregular functionals in the rest of this section. o make statistical inference on a scalar functional fα 0, we construct a t statistic as follows: [f α fα 0 ] 5.7 t v sd,. We proceed to establish the asymptotic distribution of t when M isafixedconstant.o facilitate our development, we make the assumption below. Assumption 5.2. Let ɛ ξ = o1 and the following conditions hold: i sup v W,α B 1/2 φ m t/ ΔZ t,α[v] ΔZ t,α 0 [v] E{ΔZ t,α[v]} = o p 1 for m =0, 1,...,M; ii sup v W,α B E {ΔZ, α[v] ΔZ t,α 0 [v] rz, α 0 [v, α α 0 ]} = O ɛ ξ ; iii sup 1/2 v W φ mt/ ΔZ t,α 0 [v] = O p 1 for m =0, 1,...,M; iv For e t iid N0, 1, we have for any x =x 1,...,x M R M, P 1/2 φ m t/ ΔZ t,α 0 [u ] <x m, m =0, 1,...,M = P 1/2 m=1 φ m t/ e t <x m, m =0, 1,...,M + o 1.

SIEVE INFERENCE ON IME SERIES MODELS 21 Assumption 5.2.iv is a slightly stronger version of Assumption 3.4. Itisequivalent to assuming that 1/2 [φ 0t/,...,φ m t/ ] ΔZ t,α 0 [u ] follows a multivariate CL. When φ m x is continuously differentiable in x, Assumption 5.2.iv is weaker than afcloftheform: ΔZ t,α 0 [u ] d W τ 1/2 [τ] where W τ is the standard Brownian motion process. A FCL of the above type is often assumed in parametric time series analysis. When Assumption 5.2.iv holds, we write 1/2 φ m t ΔZ t,α 0 [u ] a 1/2 φ m t e t where a signifies that the two sides are asymptotically equivalent in distribution. heorem 5.1. Let {φ m } M m=0 be a sequence of orthonormal basis functions in L2 [0, 1]. Under Assumptions 3.2, 3.3, 5.1 and 5.2, we have, for m =1,...,M, v 1 Λ a sd m iid N0, 1. If further Assumption 3.1 holds, then t [f α fα 0 ] / v sd, a t M, where t M is the t distribution with degree of freedom M. heorem 5.1 shows that when M is fixed, the t statistic converges weakly to a standard t distribution. his result is very handy as critical values from the t distribution can be easily obtained from statistical tables or standard software packages. his is an advantage of using the OS-LRV estimator. When M,tM approaches the standard normal distribution. So critical values from t M can be justified even if M = M slowly with the sample size.heorem5.1 extends the result of Sun 2011a on robust OS- LRV estimation for parametric trend regressions to the case of general semi-nonparametric models. In some applications, we may be interested in a vector of functionals f =f 1,...,f q for some fixed finite q Z +.Ifeachf j satisfies Assumptions 3.1 3.3 and their Riesz representer v =v 1,,...,v q, satisfies the multivariate version of Assumption 3.4: then v 1 sd μ {ΔZ, α 0 [v ]} d N0,I q, 5.8 v 1 sd [f α fα 0 ] d N0,I q, μ where v 2 sd = Var ΔZ, α 0 [v ] is a q q matrix. A direct implication is that 5.9 [f α fα 0 ] v 2 sd [f α fα 0 ] d χ 2 q.

22 X. CHEN, Z. LIAO AND Y. SUN o estimate v 2 sd, we define the orthogonal series projection Λ m = Λ 1 m,..., Λ q m with Λ j m = 1/2 φ m t/ ΔZ t, α [ˆv j, ], where ˆv j, denotes the empirical sieve Riesz representer of the functional f j α [ ] j = 1,...,q. he OS-LRV estimator v 2 sd, of the sieve variance v 2 sd is v 2 sd, = 1 M M m=1 Λ m Λ m. o make statistical inference on fα 0, we construct the F test version of the Wald statistic as follows: 5.10 F [f α fα 0 ] v 2 sd, [f α fα 0 ] /q. We maintain Assumption 5.2 but replace Assumption 5.2iv by its multivariate version: for e t iid N0,I q, we have [ ] P 1/2 φ m t/ ΔZ t,α 0 v 1 sd v < x m, m =0, 1,...,M = P 1/2 φ m t/ e t < x m,m=0, 1,...,M + o 1 for x m R q. Using a proof similar to that for heorem 5.1, we can prove the theorem below. heorem 5.2. Let {φ m } M m=0 be a sequence of orthonormal basis functions in L2 [0, 1]. Let Assumptions 3.1, 3.2, 3.3, 5.1 and the multivariate version of Assumption 5.2 hold. hen, for a fixed finite integer M: M q +1 M F d F q,m q+1, where F q,m q+1 is the F distribution with degree of freedom q,m q +1. he weak convergence of the F statistic can be rewritten as F d χ 2 M q+1 χ 2 q/q M M / M q +1 M q +1 =d F q,m q+1 M q +1. As M, both χ 2 M q+1 / M q +1 and M/M q + 1 converge to one, and hence F d χ 2 q /q. WhenM is not very large or the number of the restrictions q is large, the asymptotic distribution χ 2 q /q is likely to produce a large approximation error. his explains why the F approximation is more accurate, especially when M is relatively small and q is relatively large.

6. Computation and Simulation. SIEVE INFERENCE ON IME SERIES MODELS 23 6.1. Computation. o compute the OS-LRV estimator in the previous section, we have to first find the empirical Riesz representer v, which is not very appealing to applied researchers. In this subsection we show that in finite samples we can directly apply the formula of the OS-LRV estimation derived under parametric assumptions and ignore the semiparametric/nonparametric nature of the model. For simplicity, let the sieve space be A =Θ H with Θ a compact subset of R d θ and H = { h =P k β : β R k }.Letα0, =θ 0,P k β 0, intθ H.For α A = Θ H, we write lz t,α = lz t,θ,h = lz t,θ,p k β and define lz t,γ=lz t,θ,p k β as a function of γ =θ,β R dγ where d γ = d θ + d β and d β k. For any given Z t,weviewlz t,α as a functional of α on the infinite dimensional function space A, but lz t,γ as a function of γ on the Euclidian space R dγ whose dimension d γ grows with the sample size but could be regarded as fixed in finite samples. By definition,, for any α j = θ j,p k β j j =1, 2, we have 6.1 lz t,γ 1 γ γ 2 γ 1 =ΔlZ t,α 1 [α 2 α 1 ] where the left hand side is the regular derivative and the right hand side is the pathwise functional derivative. By the consistency of the sieve M estimator α = θ,p k β for α 0, =θ 0,P k β 0,, we have that γ θ, β is a consistent estimator of γ 0, = θ 0,β 0,, then the first order conditions for the sieve M estimation can be represented as 6.2 1 lz t, γ γ 0. hese first order conditions are exactly the same as what we would get for parametric models with d γ -dimensional parameter space. Next, we pretend that lz t,γ is a parametric criterion function on a finite dimensional space R dγ. Using the OS-LRV estimator for the parametric M estimator based on the sample criterion function 1 lz t,γ, we obtain the asymptotic variance estimator for γ γ 0, as follows: Σ = R = 1 B = 1 M R 1 B R 1,where 2 lzt, γ γ γ, [ M 1 ][ t lzt, γ 1 φ m γ m=1 ] t lzt, γ φ m γ. Now suppose we are interested in a real-valued functional f 0, = f α 0, =f θ 0,P k β 0,, which is estimated by the plug-in sieve M estimator f = f α =f θ,p k β. We

24 X. CHEN, Z. LIAO AND Y. SUN compute the asymptotic variance of f mechanically via the Delta method. We can then estimate the asymptotic variance of f f 0, by Var f = F k Σ Fk, with F f α k = θ, f α h [P k ]. It is easy to verify that for any sample size, Var f is numerically identical to v 2 sd,, our asymptotic variance estimator given in 5.6. he numerical equivalence in variance estimators and point estimators i.e., γ implies that the corresponding test statistics are also numerically identical. Hence, we can use standard statistical packages designed for misspecified parametric models to compute test statistics for semi-nonparametric models. 6.2. Simulation. o examine the accuracy of our inference procedures in Section 5, we consider a partially linear regression model in our simulation study: Y t = X 1t θ 0 + h 0 X 2t +u t, E[u t X 1t, X 2t ]=0,t =1,...,, where X 2t and u t are scalar processes, X 1t =X1t 1,...,Xd 1t is a d-dimensional vector process with independent component X j 1t for j =1,...,d.Letd =4and X j 1t = ρxj 1,t 1 + 1 ρ 2 ε j 1t, X2t = X1t 1 +...+ X1t d / 2d + e t / 2, e t = ρe t 1 + 1 ρ 2 ε et, u t = ρu t 1 + 1 ρ 2 ε ut, where ε 1 1t,...,εd 1t,ε et,ε ut are iid N0,I d+2. Here we have normalized X j 1t, X 2t, and u t to have zero mean and unit variance. We take ρ {0, 0.25, 0.5, 0.75}. Without loss of generality, we set θ 0 =0. We consider h 0 X 2t =sin X 2t andcos X 2t. Such choices are qualitatively similar to that in Härdle, Liang and Gao 2000, pages 52 and 139 who employ sinπ X 2t. We focus on h 0 X 2t =cos X 2t below as it is harder to be approximated by a linear function around the center of the distribution of X 2t, but the qualitative results are the same for h 0 X 2t =sin X 2t. o estimate the model using the method of sieves on the unit interval [0, 1], we first transform X 2t into [0, 1] via X 2t =logx 2t /1 X 2t. hen h 0 X 2t =coslog[x 2t 1 X 2t 1 ] h 0 X 2t. Let P k x 2 =[p 1 x 2,...,p k x 2 ] be a k 1 vector, where {p j x 2 :j 1} is a set of basis functions on [0, 1]. We approximate h 0 X 2t byp k X 2t β for some β = β 1,...,β k R k. Let X t = X 1t,P k X 2t a1 d+k vector and X =X 1,..., X ad + k matrix. Let Y =Y 1,...,Y, U =u 1,...,u and γ =θ,β. hen the sieve LS estimator of γ is γ =X X 1 X Y. In our simulation experiment, we use AIC and BIC to select k. We employ our asymptotic theory to construct confidence regions for θ 1:j =θ 01,...,θ 0j. Equivalently, we test the null of H 0j : θ 1:j = 0 against the alternative H 1j :atleastone element of θ 1:j is not zero. Depending on the value of j, the number of joint hypotheses under consideration ranges from 1 to d. Let R θ j be the first j rows of the identity matrix I d+k, then the sieve estimator of θ 1:j = R θ j γ is θ 1:j = R θ j γ,andso θ1:j θ 1:j = 1/2 R θ j X X/ 1 X t u t + o p 1.

SIEVE INFERENCE ON IME SERIES MODELS 25 Let û 1,...,û = Û = Y X γ, Δ θt = R θ jx X/ 1 X tût R j and Ω θm = 1 M 1/2 φ m t M Δ θt 1/2 φ m t Δ θt m=1 θ1:j θ 1:j. We can con- be the OS-LRV estimator of the asymptotic variance Ω of struct the F-test version of the Wald statistic as: F θ j = Rθ j γ Ω 1 θm Rθ j γ /j. We refer to the test using critical values from the χ 2 j /j distribution as the chi-square test. We refer to the test using critical value M M j +1 1 Fj,M j+1 τ as the F test, where Fj,M j+1 τ is the 1 τ quantile of the F distribution F j,m j+1. hroughout the simulation, we use φ 2m 1 x = 2cos2mπx, φ 2m x = 2sin2mπx,m =1,...,M/2 as the orthonormal basis functions for the OS-LRV estimation. o perform either the chi-square test or the F test, we need to choose M. Here we choose M to minimize the coverage probability error CPE of the confidence region based on the conventional chi-square test. he CPE-optimal M can be derived in the same way as that in Sun 2011b for parametric models, with his kernel bandwidth b = M 1, q =2,c 1 =0,c 2 =1,p = j. We obtain: M CPE = j Xj τ + j 1 3 4 tr BΩ 1 2 3, where B is the asymptotic bias of Ω, Xj τ is the 1 τ quantile of χ 2 j distribution, and is the ceiling function. he parameters B and Ω in M CPE are unknown but could be estimated by a standard plug-in procedure as in Andrews 1991. We fit an approximating VAR1 model to the vector process Δ θt and use the fitted model to estimate Ω and B. We have also implemented choosing M based on the mean square criterion and the simulation results are qualitatively similar. We are also interested in making inference on h 0 x. For each given x, let R x =[0 1 d, P k x ]. hen the sieve estimator of h 0 x =R x γ is ĥ x =R x γ. We test H 0 : h x = h 0 x against H 1 : h x h 0 x forx =[1+exp x 2 ] 1 and x 2 { 2, 0.1, 2}. Since X 2t is standard normal, this range of x 2 largely covers the support of X2t. Let Δxt = R x X X/ 1 X tû t and Ω M xm = M 1 φ m t Δ xt φ m t Δ xt, m=1 1/2 1/2 be the OS-LRV estimator of the asymptotic variance Ω of θ1:j θ 1:j. Using the numerical equivalence result in Section 6.1, we can construct the F-test version of the Wald statistic as: 6.3 F x = [Rx γ h 0 x] Ω 1 xm [Rx γ h 0 x].

26 X. CHEN, Z. LIAO AND Y. SUN As in the inference for the parametric part, we select the smoothing parameter M based on the CPE criterion. It is important to point out that the approximating model and hence the data-driven smoothing parameter M are different for different hypotheses under consideration. In Section 4, we have shown that, for evaluation functionals, the asymptotic variance does not depend on the time series dependence. So from an asymptotic point of view, we could also use Ω xm = 1 Δ xt Δxt as the estimator for the asymptotic variance of [R x γ h 0 x] and construct the Fx statistic accordingly. Here Fx is the same as F x given in 6.3 but with Ω xm replaced by Ω xm. For the nonparametric part, we have three different inference procedures. he first two are both based on the F x statistic with pre-asymptotic variance estimator, except that one uses χ 2 1 approximation and the other uses F 1,M approximation. he third one is based on the Fx statistic and uses the χ2 1 approximation. For ease of reference, we call the first two tests the pre-asymptotic χ 2 test and the pre-asymptotic F test, respectively. We call the test based on Fx and the χ2 1 approximation the asymptotic χ2 test. able 1 gives the empirical null rejection probabilities for testing θ 1:j =0forj =1, 2, 3, 4 for ρ 0. he number of simulation replications is 10,000. We consider two types of sieve basis functions to approximate h : the sine/cosine bases and the cubic spline bases with evenly spaced knots. he nominal rejection probability is τ =5%andk is selected by AIC. Results for BIC are qualitatively similar. Several patterns emerge from the table. First, the F test has a more accurate size than the chi-square test. his is especially true when the processes are persistent and the number of joint hypotheses being tested is large. Second, the size properties of the tests are not sensitive to the different sieve basis functions used for h. Finally, as the sample size increases, the size distortion of both the F test and the chi-square test decreases. It is encouraging that the size advantage of the F test remains even when = 500. Figure 1 presents the empirical rejection probabilities for testing H 0 : h x =h 0 x against H 0 : h x h 0 x forx =[1+exp x 2 ] 1 and x 2 { 2, 0.1, 2}. It is clear that the asymptotic χ 2 test that ignores the time series dependence has a large size distortion when the process is persistent. o save space, this figure only reports the case with = 100 and spline sieve basis, but the pattern remains the same for both sample sizes and for both sieve bases. Compared to the pre-asymptotic χ 2 test, the pre-asymptotic F test has more accurate size when the sample size is not large and the processes are persistent. his, combined with the evidence for parametric inference, suggests that the pre-asymptotic F test is preferred for both parametric and nonparametric inference in practical situations.

SIEVE INFERENCE ON IME SERIES MODELS 27 able 1 Empirical Null Rejection Probabilities for the 5% F test and Chi-square est j =1 j =2 j =3 j =4 F test χ 2 est F test χ 2 est F test χ 2 est F test χ 2 est = 100, Cosine and Sine Basis ρ = 0 0.0633 0.0860 0.0685 0.0935 0.0825 0.1287 0.1115 0.2008 ρ =0.25 0.0621 0.1069 0.0677 0.1225 0.0806 0.1696 0.0973 0.2922 ρ =0.50 0.0588 0.1307 0.0635 0.1494 0.0815 0.2225 0.0997 0.3955 ρ =0.75 0.0521 0.1549 0.0640 0.1764 0.0874 0.2767 0.1016 0.4922 = 500, Cosine and Sine Basis ρ = 0 0.0597 0.0848 0.0649 0.0900 0.0760 0.1187 0.0992 0.1896 ρ =0.25 0.0570 0.1028 0.0648 0.1138 0.0752 0.1611 0.0886 0.2786 ρ =0.50 0.0539 0.1240 0.0621 0.1383 0.0706 0.2093 0.0850 0.3778 ρ =0.75 0.0440 0.1472 0.0574 0.1716 0.0794 0.2647 0.0904 0.4738 = 500, Cosine and Sine Basis ρ = 0 0.0517 0.0566 0.0521 0.0576 0.0500 0.0641 0.0607 0.0901 ρ =0.25 0.0531 0.0633 0.0522 0.0650 0.0513 0.0786 0.0578 0.1171 ρ =0.50 0.0545 0.0678 0.0527 0.0713 0.0498 0.0932 0.0512 0.1402 ρ =0.75 0.0511 0.0676 0.0499 0.0749 0.0447 0.1003 0.0431 0.1636 = 500, Spline Basis ρ = 0 0.0487 0.0544 0.0470 0.0547 0.0475 0.0606 0.0562 0.0844 ρ =0.25 0.0527 0.0608 0.0479 0.0629 0.0494 0.0745 0.0528 0.1108 ρ =0.50 0.0518 0.0656 0.0498 0.0703 0.0472 0.0900 0.0474 0.1339 ρ =0.75 0.0491 0.0637 0.0465 0.0688 0.0425 0.0959 0.0410 0.1555 Note: j is the number of joint hypotheses. 0.4 0.35 0.3 ρ = 0 Pre asymptotic F test Pre asymptotic χ 2 test Asymptotic χ 2 test 0.4 0.35 0.3 ρ = 0.25 Pre asymptotic F test Pre asymptotic χ 2 test Asymptotic χ 2 test 0.25 0.25 0.2 0.2 0.15 0.15 0.1 0.1 0.05 2 1 0 1 2 0.05 2 1 0 1 2 0.4 ρ = 0.5 0.4 ρ = 0.75 0.35 0.35 0.3 0.3 0.25 0.25 0.2 0.2 0.15 0.15 0.1 0.1 0.05 2 1 0 1 2 0.05 2 1 0 1 2 Fig 1. Empirical Rejection Probabilities Against the value of X 2t with Spline Basis and = 100

28 X. CHEN, Z. LIAO AND Y. SUN 7. Appendix A: Mathematical Proofs. Proof of heorem 3.1. For any α B,denoteα u = α±ε u as a local alternative of α for some ε = o 1 2. It is clear that if α B,thenα u B.Since α B with probability approaching one wpa1, we have that α u, = α ± ε u B wpa1. By the definition of α,wehave 7.1 O p ε 2 1 lz t, α 1 lz t, α u, = E[lZ t, α lz t, α { u, ] + μ ΔZ, α0 [ α α ]} u, { + μ lz, α lz, α u, ΔZ, α 0 [ α α ]} u, = E[lZ t, α lz t, α u, ] μ {ΔZ, α 0 [ε u ]} + O p ε 2 by Assumption 3.3.iii. Next, by Assumptions 3.2 and 3.3.iii we have: E[lZ t, α lz t, α u, ] = α ± ε u α 0 2 α α 0 2 2 = ±ε α α 0,u + O pε 2. + O p ε 2 Combining these with the definition of α u, and the inequality in 7.1, we deduce that which further implies that O p ε 2 ±ε α α 0,u ε μ {ΔZ, α 0 [u ]} + O pε 2, 7.2 α α 0,u μ {ΔZ, α 0 [u ]} = O p ε =o p 1/2. By definition of α 0,,wehave α 0, α 0,v =0foranyv V.hus α 0, α 0,u =0, and 7.3 α α 0,,u μ {ΔZ, α 0 [u ]} = o p 1. By Assumptions 3.1.i and 3.2, and the Riesz representation theorem, f α fα 0, v sd = f α fα 0 fα 0 [ α α 0 ] fα 0, fα 0 fα0 [α 0, α 0 ] v sd v sd + fα 0 [ α α 0 ] fα 0 [α 0, α 0 ] v sd 7.4 = α α 0,,u + o p 1/2.

SIEVE INFERENCE ON IME SERIES MODELS 29 It follows from 7.3 and7.4 that f α fα 0, 7.5 μ {ΔZ, α 0 [u ]} sd = o p1, v which establishes the first result of the theorem. he second result follows immediately from 7.5 and Assumption 3.4. Proof of heorem 4.1. By Assumption 4.1.i, we have: 0 <VarΔZ, α 0 [v ]. Byequation4.1 and definition of ρ t, we have: v 2 sd Var ΔZ, α 0 [v ] 1=2[J 1, + J 2, ], where J 1, = d 1 t E {ΔZ1,α 0 [v ]ΔZ t+1,α 0 [v ]} Var{ΔZ, α 0 [v ]} J 2, = 1 t=d +1 1 t ρ t. By Assumption 4.1.iia, we have: d C 7.6 J 1, Var{ΔZ, α 0 [v = o1. ]} Assumption 4.1.iib immediately gives J 2, = o1. hus v 7.7 2 sd Var ΔZ, α 0 [v ] 1 2[ J 1, + J 2, ]=o1, which establishes the first claim. his, Assumption 4.1.i and heorem 3.1 together imply the asymptotic normality result in 4.3. Proof of Proposition 4.2. For Assumption 4.1.i, we note that Assumption 4.2.i implies v by Remark 3.2. Also under Assumption 4.2, wehave: v 2 Var { ΔZ, α 0 [v ]} = γ γ R k γ λ max R k E [S k ZS k Z ] γ λ min E [S k ZS k Z ] = O1, where λ max A andλ min A denote the largest and the smallest eigenvalues of a matrix A. Hence v 2 /V ar {ΔZ, h 0 [v ]} = O1. For Assumption 4.1.iia, we have, under Assumption 4.3.i, E {ΔZ 1,α 0 [v ]ΔZ t,α 0 [v ]} = Δz 1,α 0 [v ]Δz t,α 0 [v ] f Z 1,Z t z 1,z t z 1 Z z t Z f Z z 1 f Z z t f Z z 1 f Z z t dz 1 dz t 2 C Δz 1,α 0 [v ] f Z z 1 dz 1 = C ΔZ, α 0 [v ] 2 1, z 1 Z and

30 X. CHEN, Z. LIAO AND Y. SUN which implies that C C ΔZ, α 0 [v ] 2 1. his and Assumption 4.3.ii imply the existence of a growing d such that d C / ΔZ, α 0 [v ] 2 2 0, thus Assumption 4.1.iia is satisfied. Under Assumption 4.4.ii, we could further choose d to satisfy ΔZ, α 0 [v ] 2 1 d ΔZ, α0 [v ] 2 2 = o 1 and d γ ΔZ, α 0[v ] 2 2+η ΔZ, α0 [v ] 2 2 for some γ>0. It remains to verify that such a choice of d and Assumption 4.4.i together imply Assumption 4.1.iib. Under Assumption 4.4.i, {Z t } is a strictly stationary and strong-mixing process, {ΔZ t,α 0 [v ]:t 1} forms a triangular array of strong-mixing processes with the same decay rate. We can then apply Davydov s Lemma Hall and Heyde 1980, Corollary A2 and obtain: hen: provided that E {ΔZ 1,α 0 [v ]ΔZ t+1,α 0 [v ]} 8[αt] 2+η ΔZ, α 0 [v ] 2 2+η. 1 t=d E {ΔZ 1,α 0 [v ]ΔZ t+1,α 0 [v ]} ΔZ, α 0 [v ] 2 2 8 ΔZ, α 0[v ] 2 1 2+η ΔZ, α0 [v ] 2 d γ t γ η [αt] 2+η = o1 2 t=d ΔZ, α 0 [v ] 2 2+η ΔZ, α0 [v ] 2 d γ = O1 and 2 η t γ [αt] η 2+η < for some γ>0, which verifies Assumption 4.1.iib. Actually, we have established the stronger result: 1 ρ t = o1. Proof of Lemma 5.1. First, using Assumptions 5.1.i-ii and the triangle inequality, we have 1 rz t,α[v 1,v 2 ] E {rz t,α 0 [v 1,v 2 ]} sup sup α B v 1,v 2 V v 1 v 2 sup sup α B v 1,v 2 W 1 rz t,α[v 1,v 2 ] E {rz t,α[v 1,v 2 ]} 7.8 +sup sup E {rz, α[v 1,v 2 ] rz, α 0 [v 1,v 2 ]} = O p ɛ. α B v 1,v 2 W Let α = α, v 1 = v and v 2 = v. hen it follows from 7.8, the definitions of, and, that 7.9 1 rz t, α [ v,v] E {rz t,α 0 [ v,v]} v v v =,v v,v v v = O pɛ.

SIEVE INFERENCE ON IME SERIES MODELS 31 Combining this result with Assumption 5.1.iii and using f α [v] = v,v and fα 0 [v] = v,v, we can deduce that f α O p ɛ = sup [v] fα 0 [v] v V v =sup v,v v,v v V v v =sup v v,v 7.10 v V v + O pɛ v. his implies that 7.11 sup v v,v v V v = O pɛ v. Letting v = v v in 7.11, we get v 7.12 v = O p ɛ v. v v It follows from this result that v v 1 v v v v = v v v = O p = O p ɛ v 7.13 v 1 + O p ɛ v + v v,v v ɛ v v from which we deduce that 7.14 v v 1 = O pɛ. Combining the results in 7.12, 7.13, and 7.14, we get v v v = O p ɛ as desired. Proof of heorem 5.1. Part i For m =1, 2,...,M, we write Λ m as Λ m = 1 + 1 + 1 φ m t {ΔZ t, α [ v ] EΔZ t, α [ v ] ΔZ t,α 0 [ v ]+EΔZ t,α 0 [ v ]} φ m t {EΔZ t, α [ v ] EΔZ t,α 0 [ v ] ErZ t,α 0 [ v, α α 0 ]} φ m t ErZ t,α 0 [ v, α α 0 ] + 1 Îm,1 + Îm,2 + Îm,3 + Îm,4. φ m t ΔZ t,α 0 [ v ]

32 X. CHEN, Z. LIAO AND Y. SUN Using Assumption 5.2.i-ii, we have Îm,1 = o p v andîm,2 = O p ɛ ξ v.so 7.15 Λ m = 1 [ 1 φ m t ΔZ t,α 0 [v ]+ 1 φ m t ΔZ t,α 0 [ v v ] ] φ m t [ v, α α 0 + ] v v, α α 0 + o p v +O p ɛ ξ v. Under Assumptions 3.2 and 3.3, we can invoke equation 7.2 in the proof of heorem 3.1 to deduce that 7.16 v 1 sd v, α α 0 = 1 v 1 sd ΔZ t,α 0 [v ]+o p 1. Using Lemma 5.1 and the Hölder inequality, we get 7.17 v v, α α 0 v v α α 0 = O p v ɛ ξ. Next, by Assumption 5.2.iii and Lemma 5.1, 1 φ m t ΔZ t,α 0 [ v v ] v v sup 1 φ v W m t 7.18 ΔZ t,α 0 [v] = O p v ɛ. Now, using Lemma 5.1, 7.15-7.18, Assumption 3.2 v = O v sd, Assumption 5.2.iv and ɛ ξ = o1, we can deduce that 7.19 v 1 sd Λ m = 1 v 1 sd a 1 [ t φ m 1 t φ m [ t φ m 1 s ] φ m e t ζ m s=1 ] ΔZ t,α 0 [v ]+o p1 Since {φ m,m=0, 1,...,M} is a set of orthonormal functions and φ 0 =1,wehave a ζ m iid N0, 1 for m =1,...,M, and hence v 1 Λ a sd m iid N0, 1 for m =1,...,M. Part ii It follows from part i that 7.20 v 1 sd v 2 sd, v 1 sd = 1 M M m=1 v 1 sd Λ m 2 a 1 M M ζm. 2 m=1

SIEVE INFERENCE ON IME SERIES MODELS 33 which, combined with heorem 3.1, further implies that t = [f α fα 0 ] v sd / v sd, v sd 7.21 = a / [f α fα 0 ] M 1 v sd ζ 0 M 1 M m=1 ζ2 m. M m=1 v 1 sd Λ m 2 where ζ 0 = 1/2 e t. Since both ζ 0 and ζ m are approximately standard normal and cov ζ 0,ζ m = 1 φ m t/ =o 1, ζ 0 is asymptotically independent of ζ m for m =1,...,M.hisimpliesthatt a t M. Proof of heorem 5.2. Using similar arguments as in proving heorem 5.1, wecan show that [ ] 7.22 v 1 Λ a sd m 1/2 φ m t/ 1 φ m s/ e t ζ m and ζ m a iid N0,Iq. It then follows that 7.23 v 1 sd v 2 sd, v 1 sd Using the results in 5.8 and7.23, we have 7.24 s=1 a M 1 M ζ m ζ m. m=1 F = [f α fα 0 ] v 2 sd, [f α fα 0 ] /q a { M } 1 1/2 e t M 1 ζ m ζ m 1/2 = ζ 0 { M 1 M m=1 m=1 ζ m ζ m} 1 ζ 0, e t /q where ζ 0 1/2 e t. Since φ m,m=1, 2,...,M are orthonormal and integrate to zero, we have a M 1 F ξ0 M 1 ξ m ξ m ξ 0 m=1 where ξ m iid N 0,I q form = 0,...,M. his is exactly the same distribution as Hotelling 1931 s 2 distribution. Using the well-known relationship between the 2 distribution and F distribution, we have [M q +1/M ] F Fq,M q+1 as a desired.

34 X. CHEN, Z. LIAO AND Y. SUN References. [1] Andrews, D. W. K. 1991: Heteroskedasticity and Autocorrelation Consistent Covariance Matrix Estimation, Econometrica, 59, 817 854. [2] Chen, X. 2007: Large Sample Sieve Estimation of Semi-Nonparametric Models, In: James J. Heckman and Edward E. Leamer, Editors, Handbook of Econometrics, Volume 6, Part 2, Pages 5549 5632. [3] Chen, X. and X. Shen 1998: Sieve Extremum Estimates for Weakly Dependent Data, Econometrica, 66, 289 314. [4] Chen, X., W. Wu and Y. Yi 2009: Efficient Estimation of Copula-based Semiparametric Markov Models, Annals of Statistics, Vol. 376B, 4214 4253. [5] Conley,.G., L. P. Hansen, W. Liu 1997: Bootstrapping the Long Run, Macroeconomic Dynamics, 1, 1997, 279 311. [6] Fan, J. and Q. Yao 2003: Nonlinear ime Series: Nonparametric and Parametric Methods, New York: Springer. [7] Gao, J. 2007: Nonlinear ime Series: Semiparametric and Nonparametric Methods. London: Chapman & Hall/CRC. [8] Granger, C.W.J. 2003: ime Series Concepts for Conditional Distributions, Oxford Bulletin of Economics and Statistics, 65, supplement 689 701. [9] Grenander, U. 1981: Abstract Inference, New York: Wiley Series. [10] Hall, P. and C. C. Heyde 1980: Martingale Limit heory and Its Applications, New York: Academic Press. [11] Härdle, W., H. Liang and J. Gao 2000: Partially Linear Models. Springer. [12] Hotelling H. 1931: he Generalization of Student s Ratio, Annals of Mathematical Statistics, 2, 360 378. [13] Pritsker, M. 1998: Nonparametric Density Estimation and ests of Continuous ime Interest Rate Models, Review of Financial Studies, 11 449 487. [14] Phillips, P. C. B. 2005: HAC Estimation by Automated Regression, Econometric heory, 21, 116 142. [15] Robinson, P. 1983: Nonparametric Estimators for ime Series, Journal of ime Series Analysis, 4, 185 207. [16] Shen, X. and W. Wong 1994: Convergence Rate of Sieve Estimates, he Annals of Statistics, 22, 580-615. [17] Sun, Y. 2011a: Robust rend Inference with Series Variance Estimator and esting-optimal Smoothing Parameter, Journal of Econometrics, 1642, 345 366. [18] Sun, Y. 2011b: Asymptotic F est in a GMM Framework, Working paper, Department of Economics, UC San Diego. [19] White, H. and J. Wooldridge 1991: Some Results on Sieve Estimation with Dependent Observations, in Barnett, W.A., J. Powell and G. auchen eds., Non-parametric and Semi-parametric Methods in Econometrics and Statistics, 459 493, Cambridge: Cambridge University Press. Cowles Foundation for Research in Economics Yale University 30 Hillhouse Ave., Box 208281 New Haven, C 06520 E-mail: xiaohong.chen@yale.edu Department of Economics UC San Diego 9500 Gilman Drive La Jolla, CA 92093 E-mail: yisun@ucsd.edu Department of Economics UC Los Angeles 8379 Bunche Hall, Mail Stop: 147703 Los Angeles, CA 90095 E-mail: zhipeng.liao@econ.ucla.edu