MAXIMUM LIKELIHOODESTIMATION OF DISCRETELY SAMPLED DIFFUSIONS: A CLOSED-FORM APPROXIMATION APPROACH. By Yacine Aït-Sahalia 1

Transcription

1 Ecoometrica, Vol. 7, No. 1 (Jauary, 22), MAXIMUM LIKELIHOODESTIMATION OF DISCRETEL SAMPLED DIFFUSIONS: A CLOSED-FORM APPROXIMATION APPROACH By acie Aït-Sahalia 1 Whe a cotiuous-time diffusio is observed oly at discrete dates, i most cases the trasitio distributio ad hece the likelihood fuctio of the observatios is ot explicitly computable. Usig Hermite polyomials, I costruct a explicit sequece of closed-form fuctios ad show that it coverges to the true (but ukow) likelihood fuctio. I documet that the approximatio is very accurate ad prove that maximizig the sequece results i a estimator that coverges to the true maximum likelihood estimator ad shares its asymptotic properties. Mote Carlo evidece reveals that this method outperforms other approximatio schemes i situatios relevat for fiacial models. Keywords: Maximum-likelihood estimatio, cotiuous-time diffusio, discrete samplig, trasitio desity, Hermite expasio. 1 itroductio Cosider a cotiuous-time parametric diffusio (1.1) dx t = X t dt + X t dw t where X t is the state variable, W t a stadard Browia motio, ad are kow fuctios, ad a ukow parameter vector i a ope bouded set R K. Diffusio processes are widely used i fiacial models, for istace to represet the stochastic dyamics of asset returs, exchage rates, iterest rates, macroecoomic factors, etc. While the model is writte i cotiuous time, the available data are always sampled discretely i time. Igorig the differece ca result i icosistet estimators (see, e.g., Merto (198) ad Melio (1994)). A umber of ecoometric methods have bee recetly developed to estimate the parameters of (1.1), without requirig that a cotiuous record of observatios be available. Some of these methods are based o simulatios (Gouriéroux, Mofort, ad Reault (1993), Gallat ad Tauche (1996)), others o the geeralized method of momets (Hase ad Scheikma (1995), Duffie ad Gly (1997), Kessler ad Sorese 1 I am grateful to David Bates, Reé Carmoa, Freddy Delbae, Ro Gallat, Lars Hase, Bjarke Jese, Per Myklad, Peter C. B. Phillips, Rolf Poulse, Peter Robiso, Chris Rogers, Agel Serrat, Chris Sims, George Tauche, ad i particular a co-editor ad three aoymous referees for very helpful commets ad suggestios. Robert Kimmel ad Erst Schaumburg provided excellet research assistace. This research was supported by a Alfred P. Sloa Research Fellowship ad by the NSF uder Grat SBR Mathematica code to calculate the closed-form desity sequece ca be foud at yacie. 223

2 224 yacie aït-sahalia (1999)), oparametric desity-matchig (Aït-Sahalia (1996a, 1996b)), oparametric regressio for approximate momets (Stato (1997)), or are Bayesia (Eraker (1997) ad Joes (1997)). As i most cotexts, provided oe trusts the parametric specificatio (1.1), maximum-likelihood is the method of choice. The major caveat i the preset cotext is that the likelihood fuctio for discrete observatios geerated by (1.1) caot be determied explicitly for most models. Let p X x x deote the coditioal desity of X t+ = x give X t = x iduced by the model (1.1), also called the trasitio fuctio. Assume that we observe the process at dates t = i i =, where > is fixed. 2 Bayes rule combied with the Markovia ature of (1.1), which the discrete data iherit, imply that the log-likelihood fuctio has the simple form (1.2) l l { p X X i X i 1 i=1 For some of the rare exceptios where p X is available i closed-form, see Wog (1964); i fiace, the models of Black ad Scholes (1973), Vasicek (1977), Cox, Igersoll, ad Ross (1985), ad Cox (1975) all rely o the kow closed-form expressios. If samplig of the process were cotiuous, the situatio would be simpler. First, the likelihood fuctio for a cotiuous record ca be obtaied by meas of a classical absolutely cotiuous chage of measure (see, e.g., Basawa ad Prakasa Rao (198)). 3 Secod, whe the samplig iterval goes to zero, expasios for the trasitio fuctio i small time are available i the statistical literature (see, e.g., Azecott (1981)). Dacuha-Castelle ad Flores-Zmirou (1986) calculate expressios for the trasitio fuctio i terms of fuctioals of a Browia Bridge. With discrete-time samplig, the available methods to compute the likelihood fuctio ivolve either solvig umerically the Fokker- Plack-Kolmogorov partial differetial equatio (see, e.g., Lo (1988)) or simulatig a large umber of sample paths alog which the process is sampled very fiely (see Pederse (1995) ad Sata-Clara (1995)). Neither method produces a closed-form expressio to be maximized over : the criterio fuctio takes either the form of a implicit solutio to a partial differetial equatio, or a sum over the outcome of the simulatios. By cotrast, I costruct a closed-form sequece p J X of approximatios to the trasitio desity, hece from (1.2) a sequece l J of approximatios to the loglikelihood fuctio l. I also provide empirical evidece that J = 2 or 3 is amply adequate for models that are relevat i fiace. 4 Sice the expressio 2 See Sectio 3.1 for extesios to the cases where the samplig iterval is time-varyig ad eve possibly radom. 3 Note that the cotiuous-observatio likelihood is oly defied if the diffusio fuctio is kow. 4 I additio, Jese ad Poulse (1999) have recetly completed a compariso of the method of this paper agaist four alteratives: a discrete Euler approximatio of the cotiuous-time model

3 maximum likelihood estimatio 225 Notes: This figure reports the average uiform absolute error of various desity approximatio techiques applied to the Vasicek, Cox-Igersoll-Ross ad Black-Scholes models. Euler refers to the discrete-time, cotiuous-state, first-order Gaussia approximatio scheme for the trasitio desity give i equatio (5.4); Biomial Tree refers to the discrete-time, discrete-state (two) approximatio; Simulatios refers to a implemetatio of Pederse (1995) s simulated-likelihood method; PDE refers to the umerical solutio of the Fokker-Plack-Kolmogorov partial differetial equatio satisfied by the trasitio desity, usig the Crak- Nicolso algorithm. For implemetatio details o the differet methods cosidered, see Jese ad Poulse (1999). Figure 1. Accuracy ad speed of differet approximatio methods for p X. to be maximized is explicit, the effort ivolved is miimal, idetical to a stadard maximum-likelihood problem with a kow likelihood fuctio. Examples are cotaied i a compaio paper (Aït-Sahalia (1999)), which provides, for differet models, the correspodig expressio of p J X. Besides makig maximum-likelihood estimatio feasible, these closed-form approximatios have other applicatios i fiacial ecoometrics. For istace, they could be used for derivative pricig, for idirect iferece (see Gouriéroux, Mofort, ad Reault (1993)), which i its simplest versio uses a Euler approximatio as istrumetal model, or for Bayesia iferece basically wheever a expressio for the trasitio desity is required. The paper is orgaized as follows. Sectio 2 describes the sequece of desity approximatios ad proves its covergece. Sectio 3 studies the properties of the resultig maximum-likelihood estimator. I Sectio 4, I show how to calculate i closed-form the coefficiets of the approximatio ad readers primarily iterested i applyig these results to a specific model ca go there directly. l J (1.1), a biomial tree approximatio, the umerical solutio of the PDE, ad simulatio-based methods, all i the cotext of various specificatios ad parameter values that are relevat for iterest rate ad stock retur models. To give a idea of the relative accuracy ad speed of these approximatios, Figure 1 summarizes their mai results. As is clear from the figure, the approximatio of the trasitio fuctio derived here provides a degree of accuracy ad speed that is umatched by ay of the other methods.

4 226 yacie aït-sahalia Sectio 5 gives the results of Mote Carlo simulatios. Sectio 6 cocludes. All proofs are i the Appedix. 2 a sequece of expasios of the trasitio fuctio To uderstad the costructio of the sequece of approximatios to p X, the followig aalogy may be helpful. Cosider a stadardized sum of radom variables to which the Cetral Limit Theorem (CLT) apply. Ofte, oe is willig to approximate the actual sample size by ifiity ad use the N 1 limitig distributio for the properly stadardized trasformatio of the data. If ot, higher order terms of the limitig distributio (for example the classical Edgeworth expasio based o Hermite polyomials) ca be calculated to improve the small sample performace of the approximatio. The basic idea of this paper is to create a aalogy betwee this situatio ad that of approximatig the trasitio desity of a diffusio. Thik of the samplig iterval as playig the role of the sample size i the CLT. If we properly stadardize the data, the we ca fid out the limitig distributio of the stadardized data as teds to (by aalogy with what happes i the CLT whe teds to ifiity). Properly stadardizig the data i the CLT meas summig them ad dividig by 1/2 ; here it will ivolve trasformig the origial diffusio X ito aother oe, which I call Z below. I both cases, the appropriate stadardizatio makes N 1 the leadig term. I will the refie this N 1 approximatio by correctig for the fact that is ot (just as i practical applicatios of the CLT is ot ifiity), i.e., by computig the higher order terms. As i the CLT case, it is atural to cosider higher order terms based o Hermite polyomials, which are orthogoal with respect to the leadig N 1 term. But i what sese does such a expasio coverge? I the CLT case, the covergece is uderstood to mea that the series with a fixed umber of corrective terms (i.e., fixed J ) coverges whe the sample size goes to ifiity. I fact, for a fixed, the Edgeworth expasio will typically diverge as more ad more corrective terms are added, uless the desity of each of these radom variables was close to a Normal desity to start with. I will make this statemet precise later, usig the criterio of Cramér (1925): the desity p z to be expaded aroud a N 1 must have tails sufficietly thi for exp z 2 /2 p z 2 to be itegrable. The poit however is that the desity p X caot i geeral be approximated for fixed aroud a Normal desity, because the distributio of the diffusio X is i geeral too far from that of a Normal. For istace, if X follows a geometric Browia motio, the right tail of the correspodig log-ormal desity p X is too large for its Hermite expasio to coverge. Ideed, that tail is of order x 1 exp l 2 x as x teds to +. Similarly, the expasio of ay N v desity aroud a N 1 diverges if v>2, ad hece the class of trasitio desities p X for which straight Hermite expasios coverge i the sese of addig more terms (J icreases with fixed) is quite limited.

5 maximum likelihood estimatio 227 To obtai evertheless a expasio that coverges as more correctio terms are added while remais fixed, I will show that the trasformatio of the diffusio process X ito Z i fact guaratees (ulike the CLT situatio) that the resultig variable Z has a desity p Z that belogs to the class of desities for which the Hermite series coverges as more polyomial terms are added. I will the costruct a coverget Hermite series for p Z. Sice Z is a kow trasformatio of X, I will be able to revert the trasformatio from X to Z ad by the Jacobia formula obtai a expasio for the desity of X. As a result of trasformig Z back ito X, which i geeral is a oliear trasformatio (uless x is idepedet of the state variable x), the leadig term of the expasio for the desity p X will be a deformed, or stretched, Normal desity rather tha the N 1 leadig term of the expasio for p Z. The rest of this sectio makes this basic ituitio rigorous. I particular, Theorem 1 will prove that such a expasio coverges uiformly to the ukow p X. 2 1 Assumptios ad First Trasformatio X I start by makig fairly geeral assumptios o the fuctios ad. I particular, I do ot assume that ad satisfy the typical growth coditios at ifiity, or do I restrict attetio to statioary diffusios oly. Let D X = x x deote the domai of the diffusio X. I will cosider the two cases where D X = + ad D X = +. The latter case is ofte relevat i fiace, whe cosiderig models for asset prices or omial iterest rates. I additio, the fuctio is ofte specified i fiacial models i such a way that lim x + x = ad ad/or violate the liear growth coditios ear the boudaries. For these reasos, I will devise a set of assumptios where growth coditios (without costrait o the sig of the drift fuctio ear the boudaries) are replaced by assumptios o the sig of the drift ear the boudaries (without restrictio o the growth of the coefficiets). The assumptios are: Assumptio 1 (Smoothess of the Coefficiets): The fuctios x ad x are ifiitely differetiable i x, ad three times cotiuously differetiable i, for all x D X ad. Assumptio 2 (No-Degeeracy of the Diffusio): 1. If D X = +, there exists a costat c such that x >c> for all x D X ad. 2. If D X = + lim x + x = is possible, but the there exist costats > > such that x x for all <x ad. Whether or ot lim x + x =, is a odegeerate o +, that is: for each >, there exists a costat c such that x c > for all x + ad. The first step I employ towards costructig the sequece of approximatios to p X cosists i stadardizig the diffusio fuctio of X, i.e., trasformig X

6 228 yacie aït-sahalia ito defied as 5 (2.1) X X = du/ u where ay primitive of the fuctio 1/ may be selected, i.e., the costat of itegratio is irrelevat. Because >od X, the fuctio is icreasig ad ivertible for all. It maps D X ito D = y ȳ, the domai of, where y lim x x + x ad ȳ lim x x x. For example, if D X = + ad x = x, the = 1 X 1 if < <1 (so D = +, = l X if = 1 (so D = + ad = 1 X 1 if >1 (so D =. For a give model uder cosideratio, assume that the parameter space is restricted i such a way that D is idepedet of i. This restrictio o is iessetial, but it helps keep the otatio simple. By applyig Itô s Lemma, has uit diffusio, that is (2.2) d t = t dt + dw t where y = 1 y 1 y 1 2 x 1 y Assumptio 3 (Boudary Behavior): For all y ad its derivatives with respect to y ad have at most polyomial growth 6 ear the boudaries ad lim y y + or y y < + where is the potetial, i.e., y 2 y + y / y /2. 1. Left Boudary: If y =, there exist costats such that for all < y ad y y where either >1 ad >, or = 1 ad 1. Ify =, there exist costats E > ad K> such that for all y E ad y Ky. 2. Right Boudary: If ȳ =+, there exist costats E > ad K> such that for all y E ad y Ky. Ifȳ =, there exist costats such that for all >y ad y y where either >1 ad > or = 1 ad 1/2. Note that is ot restricted from goig to ear the boudaries. Assumptio 3 is formulated i terms of the fuctio for reasos of coveiece, but the restrictio it imposes o the origial fuctios ad follows from (2.1). Assumptio 3 oly restricts how large ca grow if it has the wrog sig, meaig that is positive ear y ad egative ear y: the liear growth is the maximum possible rate. But if has the right sig, the process is beig pulled 5 The same trasformatio, sometimes referred to as the Lamperti trasform, has bee used, for istace, by Flores (1999). 6 Defie a ifiitely differetiable fuctio f as havig at most polyomial growth if there exists a iteger p such that y p f y is bouded above i a eighborhood of ifiity. If p = 1 f is said to have at most liear growth, ad if p = 2 at most quadratic growth. Near, polyomial growth meas that y +p f y is bouded.

7 maximum likelihood estimatio 229 back away from the boudaries ad I do ot restrict how fast mea-reversio occurs (up to a arbitrary large polyomial rate for techical reasos). The costraits o the behavior of the fuctio are essetially the best possible for the followig reasos. If has the wrog sig ear a ifiity boudary, ad grows faster tha liearly, the explodes (i.e., ca reach the ifiity boudary) i fiite time. Near a zero boudary, say y =, if there exist > ad <1 such that y ky i a eighborhood of +, the becomes attaiable. The behavior of the diffusio implied by the assumptios made is fully characterized by the followig propositio, where T if t t D = y ȳ deotes the exit time from D : Propositio 1: Uder Assumptios 1 3, (2.2) admits a weak solutio t t, uique i probability law, for every distributio of its iitial value. 7 The boudaries of D are uattaiable, i the sese that Prob T = = 1. Fially, if + is a right boudary, the it is atural if, ear + y Ky ad etrace if y Ky for some >1. If is a left boudary, the it is atural if, ear y K y ad etrace if y K y for some >1. If is a boudary (either right or left), the it is etrace. 8 Note also that Assumptio 3 either requires or implies that the process is statioary. Whe both boudaries of the domai D are etrace boudaries, the the process is ecessarily statioary with commo ucoditioal (margial) desity for all t { y / ȳ { v (2.3) y exp 2 u du exp 2 u du dv y provided that the iitial radom variable is itself distributed with desity (2.3) (see, e.g., Karli ad Taylor (1981)). Whe at least oe of the boudaries is atural, statioarity is either precluded or implied i that the (oly) possible cadidate for statioary desity,, may or may ot be itegrable ear 7 A weak solutio to (2.2) i the iterval D is a pair W, a probability space ad a filtratio, such that W satisfies the stochastic itegral equatio that uderlies the stochastic differetial equatio (2.2). For a formal defiitio, see, e.g., Karatzas ad Shreve (1991, Defiitio 5.5.2). Uiqueess i law meas that two solutios would have idetical fiite-dimesioal distributios, i.e., i particular the same observable implicatios for ay discrete-time data. From the perspective of statistical iferece from discrete observatios, this is therefore the appropriate cocept of uiqueess. 8 Natural boudaries ca either be reached i fiite time, or ca the diffusio be started or escape from there. Etrace boudaries, such as, caot be reached startig from a iterior poit i D = +, but it is possible for to begi there. I that case, the process moves quickly away from ad ever returs there. Typically, ecoomic cosideratios require the boudaries to be uattaiable; however, they say little about how the process would behave if it were to start at the boudary, or whether that is eve possible, ad hece it is sesible to allow both types of boudary behavior.

8 23 yacie aït-sahalia the boudaries. 9 Next, I show that the diffusio admits a smooth trasitio desity: Propositio 2: Uder Assumptios 1 3, admits a trasitio desity p y y that is cotiuously differetiable i >, ifiitely differetiable i y D ad y D, ad three times cotiuously differetiable i. Furthermore, there exists > such that for every, there exist positive costats C i i= 4, ad D such that for every ad y y D 2 : <p y y C 1/2 e 3 y y 2 /8 e C 1 y y y +C 2 y y +C 3 y +C 4 y 2 (2.4) p y y / y (2.5) D 1/2 e 3 y y 2 /8 P y y e C 1 y y y +C 2 y y +C 3 y +C 4 y 2 where P is a polyomial of fiite order i y y, with coefficiets uiformly bouded i. Fially, if ear the right boudary + ad ear the left boudary (either or ), the =+. The ext result shows that these properties essetially exted to the diffusio X of origial iterest. Corollary 1: Uder Assumptios 1 3, (1.1) admits a weak solutio X t t, uique i probability law, for every distributio of its iitial value X. The boudaries of D X are uattaiable, i the sese that Prob T X = = 1 where T X if t X t D X. I additio, X admits a trasitio desity p X x x which is cotiuously differetiable i >, ifiitely differetiable i x D X ad x D X, ad three times cotiuously differetiable i. 2 2 Secod Trasformatio Z The boud (2.4) implies that the tails of p have a Gaussia-like upper boud. I light of the discussio at the begiig of Sectio 2 about the requiremets for covergece of a Hermite series, this is a big step forward. However, while, thaks to its uit diffusio = 1, is closer to a Normal variable tha X is, it is ot practical to expad p. This is due to the fact that p gets peaked aroud the coditioal value y whe gets small. Ad a Dirac mass is ot a particularly appealig leadig term for a expasio. For that reaso, I perform a further trasformatio. For give >, ad y R, defie the pseudoormalized icremet of as (2.6) Z 1/2 y 9 For istace, both a Orstei-Uhlebeck process, where y = y, ad a Browia motio, where y =, satisfy the assumptios made, ad both have atural boudaries at ad +. et the former process is statioary, due to mea-reversio, while the latter (ull recurret) is ot.

9 maximum likelihood estimatio 231 Of course, sice I do ot require that, I make o claim regardig the degree of accuracy of this stadardizatio device, hece the term pseudo. However, I will show that for fixed Z defied i (2.6) happes to be close eough to a N 1 variable to make it possible to create a coverget series of expasios for its desity p Z aroud a N 1. I other words, Z turs out to be the appropriate trasformatio of X if we are goig to start a expasio with a N 1 term. Expasios startig with a differet leadig term could be cosidered (with matchig orthogoal polyomials) but, should i fact be small, they would have the drawback of startig with a iadequate leadig term ad therefore requirig additioal correctio. 1 Let p y y deote the coditioal desity of t+ t, ad defie the desity fuctio of Z (2.7) p Z z y 1/2 p 1/2 z + y y Oce I have obtaied a sequece of approximatios to the fuctio z y p Z z y, I will backtrack ad ifer a sequece of approximatios to the fuctio y y p y y by ivertig (2.7): (2.8) p y y 1/2 p Z 1/2 y y y ad the back to the object of iterest x x p X x x, by applyig agai the Jacobia formula for the chage of desity: (2.9) p X x x = x 1 p x x 2 3 Approximatio of the Trasitio Fuctio of the Trasformed Data So this leaves us with the eed to approximate the desity fuctio p Z.For that purpose, I costruct a Hermite series expasio for the coditioal desity of the variable Z t, which has bee costructed precisely so that it is close eough to a N 1 variable for a expasio aroud a N 1 desity to coverge. The classical Hermite polyomials are H j z e z2 /2 dj [ e z2 /2 (2.1) ] j dz j ad let z e z2 /2 / 2 deote the N 1 desity fuctio. Also, defie J (2.11) p J Z z y z j= j Z y H j z as the Hermite expasio of the desity fuctio z p Z z y (for fixed y, ad ). 11 By orthoormality of the Hermite polyomials, divided by j! 1 This is because the limitig form of the desity for a diffusio, which is drive by a Browia motio, is Gaussia. However a differet leadig term would be appropriate for processes of a differet kid (for example drive by a o-browia Lévy process). 11 Hece the boudary behavior of the trasitio desity approximatio is desiged to match that of the true desity as the forward variable (ot the backward variable) ears the boudaries of the support: uder the assumptios made, p Z ear the boudaries.

10 232 yacie aït-sahalia with respect to the L 2 scalar product weighted by the Normal desity, the coefficiets j Z are give by (2.12) j Z y 1/j! H j z p Z z y dz Sectio 4 will idicate how to approximate these coefficiets i closed-form, yieldig a fully explicit sequece of approximatios to p Z. By aalogy with (2.8), I ca the form the sequece of approximatios to p as (2.13) p J y y 1/2 p J 1/2 y y y Z ad fially approximate p X by mimickig (2.9), i.e., (2.14) p J X x x x 1 p J x x The followig theorem proves that the expasio (2.14) coverges uiformly as more terms are added, ad that the limit is ideed the true (but ukow) desity fuctio p X. Theorem 1: Uder Assumptios 1 3, there exists > (give i Propositio 1) such that for every, ad x x D 2 X : (2.15) p J X x x p X x x J I additio, the covergece is uiform i over ad i x over compact subsets of D X.If x >c> o D X, the the covergece is further uiform i x over the etire domai D X.IfD X = + ad lim x + x =, the the covergece is uiform i x i each iterval of the form + >. 3 a sequece of approximatios to the maximum-likelihood estimator I ow study the properties of the sequece of maximum-likelihood estimators ˆ J derived from maximizig over i the approximate likelihood fuctio computed from p J X, i.e., (1.2) with p X replaced by p J X.12 I will show that ˆ J coverges as J to the true (but ucomputable i practice) maximumlikelihood estimator ˆ. I further prove that whe the sample size gets larger ( ), oe ca fid J such that ˆ J coverges to the true parameter value The roots of the Hermite polyomials are such that p J X > o a iterval c J c J with c J as J.Leta J be a positive sequece covergig to as J. Defie J as a (smooth) versio of the trimmig idex takig value 1 if p J X >a J ad a J otherwise. Before takig the logarithm, replace p J X by J p J X. It is show i the Appedix that such trimmig is asymptotically irrelevat. 13 This setup is differet from either the psuedo-maximum likelihood oe (see White (1982) ad Gouriéroux, Mofort, ad Trogo (1984)), or the semi-oparametric case (Gallat ad Nychka

11 maximum likelihood estimatio Likelihood Fuctio: Iitial Observatio ad Radom Samplig Extesio Whe defiig the log-likelihood fuctio i (1.2), I igored the ucoditioal desity of the first observatio, l X, because it is domiated by the sum of the coditioal desity terms l p X X i X i 1 as. The sample cotais oly oe observatio o the ucoditioal desity ad o the trasitio fuctio, so that the iformatio o cotaied i the sample does ot icrease with. All the distributioal properties below will be asymptotic, so the defiitio (1.2) is appropriate for the log-likelihood fuctio (see Billigsley (1961)). I ay case, re-itroducig the term l X back ito the log-likelihood poses o difficulty. Note also that I have assumed for coveiece that is idetical across pairs of successive observatios. If istead varies determiistically, say i is the time iterval betwee the i 1 th ad ith observatios, the it is clear from (1.2) that it suffices to replace by its actual value i whe evaluatig the trasitio desity for the ith pair of observatios. If the samplig iterval is radom, the oe ca write dow the joit likelihood fuctio of the pair of observatios ad i ad utilize Bayes Rule to express it as the product of the coditioal desity of the ith observatio X i give the i 1 th ad i, times the margial desity q of i : that is p X i X i X i 1 q j where is a parameter vector parameterizig the samplig desity. 14 If the samplig process is idepedet of X ad, the the margial desity is irrelevat for the likelihood maximizatio ad the coditioal desity is the same fuctio p X as before, evaluated at the realizatio i. Hece for the purpose of estimatig, the criterio fuctio (1.2) is uchaged ad as i the determiistic case it suffices to replace by the realizatio i correspodig to the time iterval betwee the i 1 th ad ith observatios. By cotrast, whe the samplig iterval is radom ad iformative about the parameters of the uderlyig process (for example, if more rapid arrivals of trade sigal a icrease of price volatility), the the joit desity caot be itegrated out as simply. I ow retur to the base case of fixed samplig at iterval. 3 2 Properties of the Maximum-Likelihood Estimator To aalyze the properties of the estimators ˆ ad ˆ J, I itroduce the followig otatio. Defie the K K idetity matrix as Id ad L i (1987)). We are i a somewhat atypical situatio i the sese that the psuedo-likelihood does approximate the true likelihood fuctio, ad we wish to exploit this fact. I particular, the choice of J is idepedet of ad J ca always be chose sufficietly large to make the resultig estimator arbitrarily close to the true MLE. This paper is ot cocered with the potetial misspecificatio of the true likelihood fuctio, i.e., it accepts (1.1) as the data geeratig process, but the does ot require that the desities belog to specific classes such as the liear expoetial family. 14 To isure that Theorem 1 remais applicable whe is ot costat, assume that the distributio of has a support cotaied i a iterval of the form where < < <. I this case, the covergece i Theorem 1 is uiform i.

12 234 yacie aït-sahalia l p X X i X i 1. L i (ad additioal dots) deotes differetiatio with respect to, ad T deotes traspositio. The score vector V L i=1 i is a martigale. Corollary 1 proved that p X admits three cotiuous derivatives with respect to i ; the same holds for p J X by direct ispectio of its expressio give i Sectio 2.3. Next defie (3.1) i E L i L i T H L i i=1... I diag i T L i i=1 i=1 The fiiteess of i for every is proved as part of Propositio 3 below. Note that if the process is ot statioary. E L i L i T varies with the time idex i because it depeds o the joit distributio of X i X i 1. The square root of the diagoal elemet i i will determie the appropriate speeds of covergece for the correspodig compoet of ˆ, ad I defie the local I 1/2 -eighborhoods of the true parameter as N / I 1/2, where deotes the Euclidea orm o K. Recall that E H = i. 15 To idetify the parameters, we make the followig assumptio. Assumptio 4 (Idetificatio): The true parameter vector I is ivertible, belogs to (3.2) I 1 a s as uiformly i ad R I 1/2 T I 1/2 is uiformly bouded i probability for all i a I 1/2 -eighborhood of. If X is a statioary diffusio, a sufficiet coditio that guaratees (3.2) is that for all k = 1 K, ad x D X, (3.3) < I kk = < + x x x x l p X x x / k 2 p X x x dxdx uiformly i (where p X x x = p X x x x deotes the joit desity of observatios sampled uits of time apart) sice i that case I 1 = 1 I 1 a s. For the upper boud, it is sufficiet that l p X x x / k remai bouded as x varies i D X, but ot ecessary. For the lower boud, it is sufficiet that p X x x / k ot be zero i a 15 The order of differetiatio with respect to ad itegratio with respect to the coditioal desity p X (i.e., computatio of coditioal expectatios) ca be iterchaged due to the smoothess of the log-likelihood resultig from Corollary 1.

13 maximum likelihood estimatio 235 regio x x where the joit desity has positive mass, i.e., the trasitio fuctio p X must ot be uiformly flat i the directio of ay oe of the parameters k. Otherwise p X x x / k for all x x ad the parameter vector caot be idetified. Furthermore, a sufficiet coditio for (3.3) is that x = x ad x = x for -almost all x imply =. I show i the proof of Propositio 3 that the boudedess coditio o R i Assumptio 4 is automatically satisfied i the statioary case. A ostatioary example is provided i Sectio 5. The strategy I employ to study the asymptotic properties of ˆ J is to first determie those of ˆ (see Propositio 3) ad the show that ˆ J ad ˆ share the same asymptotic properties provided oe lets J go to ifiity with (Theorem 2). I Propositio 3, I show that geeral results pertaiig to time series asymptotics (see, e.g., Basawa ad Scott (1983) ad Jegaatha (1995)) ca be applied to the preset cotext. These properties follow from first establishig that the likelihood ratio has the locally asymptotically quadratic (LAQ) structure, i.e., (3.4) l + I 1/2 h l = h S h T G h /2 + o p 1 for every bouded sequece h such that + I 1/2 h, where S I 1/2 V ad G I 1/2 H I 1/2. The, depedig upo the joit distributio of S G, differet cases arise: Propositio 3: Uder Assumptios 1 4, ad for, the likelihood ratio satisfies the LAQ structure (3.4), the MLE ˆ is cosistet ad has the followig properties: i. (Locally Asymptotically Mixed Normal Structure): If (3.5) d S G G 1/2 Z G where Z is a N Id variable idepedet of the possibly radom but almost surely fiite ad positive defiite matrix G, the d (3.6) I 1/2 ˆ G 1/2 N Id Suppose that is a alterative estimator such that for ay h R K ad, (3.7) I 1/2 I 1/2 h d F uder P 1/2 +I h where F is a proper law, ot ecessarily Normal. The ˆ has maximum cocetratio i that class, i.e., is closer to tha is, i the sese that for ay > ) (3.8) C lim Prob ( I 1/2 ) ˆ C lim ( Prob I 1/2 where C + K. Further, if has the distributio I 1/2 d G 1/2 N Ṽ uder P, the Ṽ Id is o-egative defiite.

14 236 yacie aït-sahalia ii. (Locally Asymptotically Normal Structure): If X is a statioary diffusio, the a special case of the LAMN structure arises where (3.3) is a sufficiet coditio for Assumptio 4, i E L 1 L 1 T is Fisher s iformatio matrix, i = i I diag i I = I G = I 1/2 i I 1/2 is a oradom matrix ad (3.6) reduces to (3.9) d 1/2 ˆ N i ( 1) The efficiecy result simplifies to the Fisher-Rao form: i 1 is the smallest possible asymptotic variace amog that of all cosistet ad asymptotically Normal estimators of. iii. (Locally Asymptotically Browia Fuctioal structure): If ( S G d 1 1 ) (3.1) M dw M M T d where M W is a Gaussia process such that W is a stadard Browia motio, the (3.11) ( I 1/2 ˆ d 1 ) 1 M M T d 1 M dw If M ad W are idepedet, the LABF is a special case of LAMN, but ot otherwise. If oe had ormed the differece ˆ by the stochastic factor diag H 1/2 rather tha by the determiistic factor I 1/2, the the asymptotic distributio of the estimator would have bee N Id rather tha G 1/2 N Id (see the example i Sectio 5). I other words, the stochastic ormig, while itrisically more complicated, may be useful if the distributio of G is itractable, sice i that case, the distributio of I 1/2 eed ot be asymptotically Normal (ad depeds o ) whereas that of the stochastically ormed differece would simply be N Id. Noe of these difficulties are preset i the statioary case, where G is oradom. 16 Sufficiet coditios ca be give that isure that the LAMN structure holds: p for istace, if G G uiformly i over compact subsets of the (3.5) ecessarily holds by applyig Theorem 1 i Basawa ad Scott (1983, page 34). Note also that whe the parameter vector is multidimesioal, the K diagoal terms of i 1/2 do ot ecessarily go to ifiity at the same rate, ulike the commo rate 1/2 i the statioary case (see agai the example i Sectio 5). Propositio 3 is ot a ed i itself sice i our cotext ˆ caot be computed explicitly. It becomes useful however whe oe proves that the approximate maximum-likelihood estimator ˆ J is a good substitute for ˆ, i the sese 16 I the termiology of Basawa ad Scott (1983), whe G is determiistic (resp. radom), the model is called ergodic (resp. oergodic). But the LAMN situatio where G is radom is oly oe particularly tractable form of oergodicity.

15 maximum likelihood estimatio 237 that the asymptotic properties of ˆ idetified i Propositio 3 carry over to ˆ J For techical reasos, a mior refiemet of Assumptio 2 is eeded:. Assumptio 5 (Stregtheig of Assumptio 2 i the limitig case where = 1 ad the diffusio is degeerate at ): Recall the costat i Assumptio 2(2), ad the costats ad i Assumptio 3(1). If = 1, the either 1 with o restrictio o, or 2 / 1 if <1. If >1, o restrictio is required. The followig theorem shows that ˆ J (ucomputable) true MLE ˆ : iherit the asymptotic properties of the Theorem 2: Uder Assumptios 1 5, ad for : i. Fix the sample size. The as J ˆ J p ˆ uder P. ii. As, a sequece J ca be chose sufficietly large to deliver ay rate of covergece of ˆ J to ˆ. I particular, there exists a sequece J such that ˆ J ˆ = o p I 1/2 uder P which the makes ˆ J ad ˆ share the same asymptotic distributio described i Propositio 3. 4 explicit expressios for the desity expasio I ow tur to the explicit computatio of the terms i the desity expasio. Theorem 1 showed that (4.1) p Z z y = z j Z y H j z j= Recall that p J Z z y deotes the partial sum i (4.1) up to j = J. From (2.12), we have (4.2) j Z y = 1/j! = 1/j! H j z p Z z y dz H j z 1/2 p ( 1/2 z + y y ) dz ( = 1/j! H j 1/2 y y ) p y y dy = 1/j! E [ ( H j 1/2 t+ y ) t = y ] so that the coefficiets j Z are specific coditioal momets of the process.as such, they ca be computed i a umber of ways, icludig for istace Mote Carlo itegratio. A particularly attractive alterative is to calculate explicitly a Taylor series expasio i for the coefficiets j Z.Letf y y be a polyomial.

16 238 yacie aït-sahalia Taylor s Theorem applied to the fuctio s E f t+s y t = y yields (4.3) E [ f t+ y t = y ] = K k= A k f y y k k! + E [ A K+1 f t+ y t = y ] K+1 K + 1! where A is the ifiitesimal geerator of the diffusio, defied as the operator A f f/ y+ 1/2 2 f/ y 2. The followig propositio provides sufficiet coditios uder which the series (4.3) is coverget: Propositio 4: Uder Assumptios 1 3, suppose that for the relevat boudaries of D = y ȳ, ear ȳ =+ y Ky for some >1; ear y = y K y for some >1; ear y = y y for some >1 ad >; ad ear ȳ = y y for some >1 ad >. The the diffusio is statioary with ucoditioal desity ad the series (4.3) coverges i L 2 for fixed >. J K Now let pz deote the Taylor series up to order K i of p J Z. The series for the first seve Hermite coefficiets j = 6 are give by Z = 1, ad to order K = 3 by: (4.4) (4.5) (4.6) (4.7) (4.8) (4.9) 1 3 Z 2 3 Z 3 3 Z 4 3 Z 5 3 Z 6 3 Z = 1/2 ( 2 1 ( = ( ( ) / ( = ( ) / 2 3/ = ( ( = ( = ( ) / 3/2 6 ( ) / 4 5/2 24 ) / 2 12 ) 3 / ) / 5/2 48 ) 2 / ) / 5/ ) 3 / ) /

17 maximum likelihood estimatio 239 where I have used the more compact otatio k m for k y / y k m. Differet ways of gatherig the terms are available (as i the CLT, where for example both the Edgeworth ad Gram-Charlier expasios are based o a Hermite expasio). Here, if we gather all the terms accordig to icreasig powers of istead of icreasig order of the Hermite polyomials, ad let p K Z p K Z (ad similarly for, so that p K y y = 1/2 p K ( Z 1/2 y y y ), ad the for X), we obtai a explicit represetatio of p K, give by (4.1) p K y y = 1/2 ( y y 1/2 where c y y = 1 ad for all j 1: ) ( y exp w dw y ) K k= c k y y k k! (4.11) y c j y y =j y y j w y j 1 y { w c j 1 w y + ( 2 c j 1 w y / )/ w 2 2 dw Fially, ote that i geeral, coditioal momets of the process eed ot be aalytic i time, 17 i which case (4.3) ad (4.1) must be iterpreted strictly as Taylor series. Eve whe that is the case, their relevace for empirical work lies i the fact that icludig a small umber of terms (oe, two, or three) makes the approximatio very accurate for the values these variables typically take i fiacial ecoometrics, as we shall ow see accuracy of the approximatios ad mote carlo evidece While Figure 1 shows that the approximatio of p X was extremely accurate as a fuctio of the state variables, it does ot ecessarily imply that the resultig parameter estimates would i practice ecessarily be close to the true MLE, as was proved theoretically i Theorem 2. To aswer that questio, I perform Mote Carlo experimets. Cosider first the Orstei-Uhlebeck specificatio, dx t = X t dt + dw t, where 2 ad D X = +. The process X has a Gaussia trasitio desity with mea x e ad variace 1 e 2 2/ 2. I this case, = X = 1 X y =y, ad the additioal terms i 17 Note however that as a result of Theorem 1 the trasitio fuctio is aalytic i the forward state variable. The expasio is desiged to deliver a approximatio of the desity fuctio y p y y for a fixed value of the backward (coditioig) variable y. Therefore, except i the limit where becomes ifiitely small, it is ot desiged to reproduce the limitig behavior of p i the limit where y teds to the boudaries. The expasio delivers the correct behavior for y tedig to the boudaries, except i the limitig situatio of y y 1 i Assumptio 3.1 where it is oly appropriate if becomes ifiitesimally small. 18 See also the compaio paper (Aït-Sahalia (1999)) for examples ad a applicatio to the estimatio of iterest rate models.

18 24 yacie aït-sahalia the approximatio p J Z eed oly correct for the iadequacy of the coditioal momets i the leadig term p Z, ot for ay o-gaussiaity. I other words, i the trasformatio from X to beig liear, there is o deformatio or stretchig of the Gaussia leadig term whe goig from the approximatio of p to that of p X. By specializig Propositio 3 to this model, oe obtais the followig asymptotic distributios for the MLE: 19 Corollary 2: (Asymptotic Distributio of the MLE for the Orstei- Uhlebeck Model): i. If > (LAN, statioary case): (( ) ( )) ˆ (5.1) ˆ 2 2 ( d N ) e e ( 2 e e e e e 2 1 ii. If < (LAMN, explosive case), assume X = ; the ) e +1 (5.2) e 2 1 ˆ d G 1/2 N 1 ad ˆ 2 2 d N 2 4 where G has a 2 [1] distributio idepedet of the N 1. G 1/2 N 1 is a Cauchy distributio. iii. If = (LAQ, uit root case), assume X = ; the / ( d 1 ) ˆ 1 W W 2 t dt ad ˆ 2 (5.3) 2 d N 2 4 where W t deotes a stadard Browia motio. I Mote Carlo experimets, I study the behavior of the true MLE of (which is computable i this example sice the trasitio fuctio is kow i closedform), the Euler estimator, ad the estimators of this paper correspodig to oe ad two orders i respectively. The Euler approximatio correspods to a simple discretizatio of the cotiuous-time stochastic differetial equatio, where the differetial equatio (1.1) is replaced by the differece equatio X t+ X t = X t + X t t+ with t+ N 1, so that (5.4) p Euler X x x = 2 2 x 1/2 exp { ( x x x ) 2/ 2 2 x 19 Sice there is o cofusio possible i what follows, the subscript is omitted whe deotig the true parameter values.

19 maximum likelihood estimatio 241 I set the true value of 2 at 1., ad examie the behavior of the various estimators of ad 2 for the differet cases of Corollary 2, by settig = 1, 5, ad 1 (statioary root LAN), = (uit root LABF), ad = 1 (explosive root LAMN). For each value of the parameters, I perform M = 5 Mote Carlo simulatios of the sample paths geerated by the model, each cotaiig = 1 observatios. These Mote Carlo experimets aswer four separate questios. Firstly, how accurate are the various asymptotic distributios i Corollary 2? This questio is aswered i Figures 2 ad 3, where I plot the fiite sample distributios of the estimators (histograms) ad the correspodig asymptotic distributio (solid lie). The asymptotic distributio of ˆ reported i Paels A C of Figure 2 is from (5.1). Not surprisigly, as the drift parameter makes the process closer ad closer to a uit root ( decreasig from 1 to 1), the quality of the asymptotic approximatio (5.1) deteriorates ad the small sample distributio starts to resemble (5.3), which is strogly skewed. This oly affects the drift parameter; the estimator of 2 behaves i small samples as predicted by the asymptotic distributio which is compatible with the fact that the distributio for estimatig 2 is cotiuous whe goig through the = boudary. Pael A of Figure 3 reports results for the uit root case, with the asymptotic distributio give i (5.3). I the explosive case <, Pael B of Figure 3 is based o the Cauchy distributio (5.2), while Pael C exploits the possibility of radom ormig to obtai a Gaussia asymptotic distributio of the drift coefficiet (see (A.69) i the Appedix). The diffusio estimator is idetical i both Paels B ad C, ad is therefore ot repeated i Pael C. Sice the rate of covergece i ostatioary cases varies, both Paels B ad C report the distributio of the drift estimator scaled by the relevat rate of covergece, rather tha the raw distributio of ˆ as i all other paels. The simulatios show that i both ostatioary cases, ad i the statioary case whe sufficietly far away from a uit root, the asymptotic distributio of the drift estimator is a accurate guide to its small sample distributio. The secod questio these experimets address is: what is the dispersio of the MLE aroud the true value? Tables I ad II report the first four momets of the fiite sample ad asymptotic distributios. For each of the parameter values ad the M samples, I also report i these tables the first two momets of the differeces betwee the true MLE estimators of ad 2, their Euler versios ad the estimators from usig the method of this paper with oe ad two terms. This makes it possible, thirdly, to compare the MLE dispersio, or samplig oise, to the distace betwee the MLE ad the various approximatios uder cosideratio. I particular, whe selectig the order of approximatio, it is uecessary to select a value larger tha what is required to make the distace betwee ˆ J ad ˆ a order of magitude smaller tha the distace betwee ˆ ad the true value (as measured by the exact MLE samplig distributio). These simulatios show that the parameter estimates obtaied with oe ad eve more so two terms are several orders of magitude closer to the exact MLE tha

20 242 yacie aït-sahalia Figure 2. Small sample ad asymptotic distributios of the MLE for the Orstei-Uhlebeck process: statioary processes. the MLE is to the parameter, so that the approximate estimates ca be used i place of the exact MLE i practice. Fially, these Mote Carlo experimets make it possible to compare the relative accuracy of the three estimators based o the Euler discretizatio approximatio, ad those of this paper. The results of the bottom part of Tables I ad II,