Ecoometrica, Vol. 7, No. 1 (Jauary, 22), 223 262 MAXIMUM LIKELIHOODESTIMATION OF DISCRETEL SAMPLED DIFFUSIONS: A CLOSED-FORM APPROXIMATION APPROACH By acie Aït-Sahalia 1 Whe a cotiuous-time diffusio is observed oly at discrete dates, i most cases the trasitio distributio ad hece the likelihood fuctio of the observatios is ot explicitly computable. Usig Hermite polyomials, I costruct a explicit sequece of closed-form fuctios ad show that it coverges to the true (but ukow) likelihood fuctio. I documet that the approximatio is very accurate ad prove that maximizig the sequece results i a estimator that coverges to the true maximum likelihood estimator ad shares its asymptotic properties. Mote Carlo evidece reveals that this method outperforms other approximatio schemes i situatios relevat for fiacial models. Keywords: Maximum-likelihood estimatio, cotiuous-time diffusio, discrete samplig, trasitio desity, Hermite expasio. 1 itroductio Cosider a cotiuous-time parametric diffusio (1.1) dx t = X t dt + X t dw t where X t is the state variable, W t a stadard Browia motio, ad are kow fuctios, ad a ukow parameter vector i a ope bouded set R K. Diffusio processes are widely used i fiacial models, for istace to represet the stochastic dyamics of asset returs, exchage rates, iterest rates, macroecoomic factors, etc. While the model is writte i cotiuous time, the available data are always sampled discretely i time. Igorig the differece ca result i icosistet estimators (see, e.g., Merto (198) ad Melio (1994)). A umber of ecoometric methods have bee recetly developed to estimate the parameters of (1.1), without requirig that a cotiuous record of observatios be available. Some of these methods are based o simulatios (Gouriéroux, Mofort, ad Reault (1993), Gallat ad Tauche (1996)), others o the geeralized method of momets (Hase ad Scheikma (1995), Duffie ad Gly (1997), Kessler ad Sorese 1 I am grateful to David Bates, Reé Carmoa, Freddy Delbae, Ro Gallat, Lars Hase, Bjarke Jese, Per Myklad, Peter C. B. Phillips, Rolf Poulse, Peter Robiso, Chris Rogers, Agel Serrat, Chris Sims, George Tauche, ad i particular a co-editor ad three aoymous referees for very helpful commets ad suggestios. Robert Kimmel ad Erst Schaumburg provided excellet research assistace. This research was supported by a Alfred P. Sloa Research Fellowship ad by the NSF uder Grat SBR-999623. Mathematica code to calculate the closed-form desity sequece ca be foud at http://www.priceto.edu/ yacie. 223
224 yacie aït-sahalia (1999)), oparametric desity-matchig (Aït-Sahalia (1996a, 1996b)), oparametric regressio for approximate momets (Stato (1997)), or are Bayesia (Eraker (1997) ad Joes (1997)). As i most cotexts, provided oe trusts the parametric specificatio (1.1), maximum-likelihood is the method of choice. The major caveat i the preset cotext is that the likelihood fuctio for discrete observatios geerated by (1.1) caot be determied explicitly for most models. Let p X x x deote the coditioal desity of X t+ = x give X t = x iduced by the model (1.1), also called the trasitio fuctio. Assume that we observe the process at dates t = i i =, where > is fixed. 2 Bayes rule combied with the Markovia ature of (1.1), which the discrete data iherit, imply that the log-likelihood fuctio has the simple form (1.2) l l { p X X i X i 1 i=1 For some of the rare exceptios where p X is available i closed-form, see Wog (1964); i fiace, the models of Black ad Scholes (1973), Vasicek (1977), Cox, Igersoll, ad Ross (1985), ad Cox (1975) all rely o the kow closed-form expressios. If samplig of the process were cotiuous, the situatio would be simpler. First, the likelihood fuctio for a cotiuous record ca be obtaied by meas of a classical absolutely cotiuous chage of measure (see, e.g., Basawa ad Prakasa Rao (198)). 3 Secod, whe the samplig iterval goes to zero, expasios for the trasitio fuctio i small time are available i the statistical literature (see, e.g., Azecott (1981)). Dacuha-Castelle ad Flores-Zmirou (1986) calculate expressios for the trasitio fuctio i terms of fuctioals of a Browia Bridge. With discrete-time samplig, the available methods to compute the likelihood fuctio ivolve either solvig umerically the Fokker- Plack-Kolmogorov partial differetial equatio (see, e.g., Lo (1988)) or simulatig a large umber of sample paths alog which the process is sampled very fiely (see Pederse (1995) ad Sata-Clara (1995)). Neither method produces a closed-form expressio to be maximized over : the criterio fuctio takes either the form of a implicit solutio to a partial differetial equatio, or a sum over the outcome of the simulatios. By cotrast, I costruct a closed-form sequece p J X of approximatios to the trasitio desity, hece from (1.2) a sequece l J of approximatios to the loglikelihood fuctio l. I also provide empirical evidece that J = 2 or 3 is amply adequate for models that are relevat i fiace. 4 Sice the expressio 2 See Sectio 3.1 for extesios to the cases where the samplig iterval is time-varyig ad eve possibly radom. 3 Note that the cotiuous-observatio likelihood is oly defied if the diffusio fuctio is kow. 4 I additio, Jese ad Poulse (1999) have recetly completed a compariso of the method of this paper agaist four alteratives: a discrete Euler approximatio of the cotiuous-time model
maximum likelihood estimatio 225 Notes: This figure reports the average uiform absolute error of various desity approximatio techiques applied to the Vasicek, Cox-Igersoll-Ross ad Black-Scholes models. Euler refers to the discrete-time, cotiuous-state, first-order Gaussia approximatio scheme for the trasitio desity give i equatio (5.4); Biomial Tree refers to the discrete-time, discrete-state (two) approximatio; Simulatios refers to a implemetatio of Pederse (1995) s simulated-likelihood method; PDE refers to the umerical solutio of the Fokker-Plack-Kolmogorov partial differetial equatio satisfied by the trasitio desity, usig the Crak- Nicolso algorithm. For implemetatio details o the differet methods cosidered, see Jese ad Poulse (1999). Figure 1. Accuracy ad speed of differet approximatio methods for p X. to be maximized is explicit, the effort ivolved is miimal, idetical to a stadard maximum-likelihood problem with a kow likelihood fuctio. Examples are cotaied i a compaio paper (Aït-Sahalia (1999)), which provides, for differet models, the correspodig expressio of p J X. Besides makig maximum-likelihood estimatio feasible, these closed-form approximatios have other applicatios i fiacial ecoometrics. For istace, they could be used for derivative pricig, for idirect iferece (see Gouriéroux, Mofort, ad Reault (1993)), which i its simplest versio uses a Euler approximatio as istrumetal model, or for Bayesia iferece basically wheever a expressio for the trasitio desity is required. The paper is orgaized as follows. Sectio 2 describes the sequece of desity approximatios ad proves its covergece. Sectio 3 studies the properties of the resultig maximum-likelihood estimator. I Sectio 4, I show how to calculate i closed-form the coefficiets of the approximatio ad readers primarily iterested i applyig these results to a specific model ca go there directly. l J (1.1), a biomial tree approximatio, the umerical solutio of the PDE, ad simulatio-based methods, all i the cotext of various specificatios ad parameter values that are relevat for iterest rate ad stock retur models. To give a idea of the relative accuracy ad speed of these approximatios, Figure 1 summarizes their mai results. As is clear from the figure, the approximatio of the trasitio fuctio derived here provides a degree of accuracy ad speed that is umatched by ay of the other methods.
226 yacie aït-sahalia Sectio 5 gives the results of Mote Carlo simulatios. Sectio 6 cocludes. All proofs are i the Appedix. 2 a sequece of expasios of the trasitio fuctio To uderstad the costructio of the sequece of approximatios to p X, the followig aalogy may be helpful. Cosider a stadardized sum of radom variables to which the Cetral Limit Theorem (CLT) apply. Ofte, oe is willig to approximate the actual sample size by ifiity ad use the N 1 limitig distributio for the properly stadardized trasformatio of the data. If ot, higher order terms of the limitig distributio (for example the classical Edgeworth expasio based o Hermite polyomials) ca be calculated to improve the small sample performace of the approximatio. The basic idea of this paper is to create a aalogy betwee this situatio ad that of approximatig the trasitio desity of a diffusio. Thik of the samplig iterval as playig the role of the sample size i the CLT. If we properly stadardize the data, the we ca fid out the limitig distributio of the stadardized data as teds to (by aalogy with what happes i the CLT whe teds to ifiity). Properly stadardizig the data i the CLT meas summig them ad dividig by 1/2 ; here it will ivolve trasformig the origial diffusio X ito aother oe, which I call Z below. I both cases, the appropriate stadardizatio makes N 1 the leadig term. I will the refie this N 1 approximatio by correctig for the fact that is ot (just as i practical applicatios of the CLT is ot ifiity), i.e., by computig the higher order terms. As i the CLT case, it is atural to cosider higher order terms based o Hermite polyomials, which are orthogoal with respect to the leadig N 1 term. But i what sese does such a expasio coverge? I the CLT case, the covergece is uderstood to mea that the series with a fixed umber of corrective terms (i.e., fixed J ) coverges whe the sample size goes to ifiity. I fact, for a fixed, the Edgeworth expasio will typically diverge as more ad more corrective terms are added, uless the desity of each of these radom variables was close to a Normal desity to start with. I will make this statemet precise later, usig the criterio of Cramér (1925): the desity p z to be expaded aroud a N 1 must have tails sufficietly thi for exp z 2 /2 p z 2 to be itegrable. The poit however is that the desity p X caot i geeral be approximated for fixed aroud a Normal desity, because the distributio of the diffusio X is i geeral too far from that of a Normal. For istace, if X follows a geometric Browia motio, the right tail of the correspodig log-ormal desity p X is too large for its Hermite expasio to coverge. Ideed, that tail is of order x 1 exp l 2 x as x teds to +. Similarly, the expasio of ay N v desity aroud a N 1 diverges if v>2, ad hece the class of trasitio desities p X for which straight Hermite expasios coverge i the sese of addig more terms (J icreases with fixed) is quite limited.
maximum likelihood estimatio 227 To obtai evertheless a expasio that coverges as more correctio terms are added while remais fixed, I will show that the trasformatio of the diffusio process X ito Z i fact guaratees (ulike the CLT situatio) that the resultig variable Z has a desity p Z that belogs to the class of desities for which the Hermite series coverges as more polyomial terms are added. I will the costruct a coverget Hermite series for p Z. Sice Z is a kow trasformatio of X, I will be able to revert the trasformatio from X to Z ad by the Jacobia formula obtai a expasio for the desity of X. As a result of trasformig Z back ito X, which i geeral is a oliear trasformatio (uless x is idepedet of the state variable x), the leadig term of the expasio for the desity p X will be a deformed, or stretched, Normal desity rather tha the N 1 leadig term of the expasio for p Z. The rest of this sectio makes this basic ituitio rigorous. I particular, Theorem 1 will prove that such a expasio coverges uiformly to the ukow p X. 2 1 Assumptios ad First Trasformatio X I start by makig fairly geeral assumptios o the fuctios ad. I particular, I do ot assume that ad satisfy the typical growth coditios at ifiity, or do I restrict attetio to statioary diffusios oly. Let D X = x x deote the domai of the diffusio X. I will cosider the two cases where D X = + ad D X = +. The latter case is ofte relevat i fiace, whe cosiderig models for asset prices or omial iterest rates. I additio, the fuctio is ofte specified i fiacial models i such a way that lim x + x = ad ad/or violate the liear growth coditios ear the boudaries. For these reasos, I will devise a set of assumptios where growth coditios (without costrait o the sig of the drift fuctio ear the boudaries) are replaced by assumptios o the sig of the drift ear the boudaries (without restrictio o the growth of the coefficiets). The assumptios are: Assumptio 1 (Smoothess of the Coefficiets): The fuctios x ad x are ifiitely differetiable i x, ad three times cotiuously differetiable i, for all x D X ad. Assumptio 2 (No-Degeeracy of the Diffusio): 1. If D X = +, there exists a costat c such that x >c> for all x D X ad. 2. If D X = + lim x + x = is possible, but the there exist costats > > such that x x for all <x ad. Whether or ot lim x + x =, is a odegeerate o +, that is: for each >, there exists a costat c such that x c > for all x + ad. The first step I employ towards costructig the sequece of approximatios to p X cosists i stadardizig the diffusio fuctio of X, i.e., trasformig X
228 yacie aït-sahalia ito defied as 5 (2.1) X X = du/ u where ay primitive of the fuctio 1/ may be selected, i.e., the costat of itegratio is irrelevat. Because >od X, the fuctio is icreasig ad ivertible for all. It maps D X ito D = y ȳ, the domai of, where y lim x x + x ad ȳ lim x x x. For example, if D X = + ad x = x, the = 1 X 1 if < <1 (so D = +, = l X if = 1 (so D = + ad = 1 X 1 if >1 (so D =. For a give model uder cosideratio, assume that the parameter space is restricted i such a way that D is idepedet of i. This restrictio o is iessetial, but it helps keep the otatio simple. By applyig Itô s Lemma, has uit diffusio, that is (2.2) d t = t dt + dw t where y = 1 y 1 y 1 2 x 1 y Assumptio 3 (Boudary Behavior): For all y ad its derivatives with respect to y ad have at most polyomial growth 6 ear the boudaries ad lim y y + or y y < + where is the potetial, i.e., y 2 y + y / y /2. 1. Left Boudary: If y =, there exist costats such that for all < y ad y y where either >1 ad >, or = 1 ad 1. Ify =, there exist costats E > ad K> such that for all y E ad y Ky. 2. Right Boudary: If ȳ =+, there exist costats E > ad K> such that for all y E ad y Ky. Ifȳ =, there exist costats such that for all >y ad y y where either >1 ad > or = 1 ad 1/2. Note that is ot restricted from goig to ear the boudaries. Assumptio 3 is formulated i terms of the fuctio for reasos of coveiece, but the restrictio it imposes o the origial fuctios ad follows from (2.1). Assumptio 3 oly restricts how large ca grow if it has the wrog sig, meaig that is positive ear y ad egative ear y: the liear growth is the maximum possible rate. But if has the right sig, the process is beig pulled 5 The same trasformatio, sometimes referred to as the Lamperti trasform, has bee used, for istace, by Flores (1999). 6 Defie a ifiitely differetiable fuctio f as havig at most polyomial growth if there exists a iteger p such that y p f y is bouded above i a eighborhood of ifiity. If p = 1 f is said to have at most liear growth, ad if p = 2 at most quadratic growth. Near, polyomial growth meas that y +p f y is bouded.
maximum likelihood estimatio 229 back away from the boudaries ad I do ot restrict how fast mea-reversio occurs (up to a arbitrary large polyomial rate for techical reasos). The costraits o the behavior of the fuctio are essetially the best possible for the followig reasos. If has the wrog sig ear a ifiity boudary, ad grows faster tha liearly, the explodes (i.e., ca reach the ifiity boudary) i fiite time. Near a zero boudary, say y =, if there exist > ad <1 such that y ky i a eighborhood of +, the becomes attaiable. The behavior of the diffusio implied by the assumptios made is fully characterized by the followig propositio, where T if t t D = y ȳ deotes the exit time from D : Propositio 1: Uder Assumptios 1 3, (2.2) admits a weak solutio t t, uique i probability law, for every distributio of its iitial value. 7 The boudaries of D are uattaiable, i the sese that Prob T = = 1. Fially, if + is a right boudary, the it is atural if, ear + y Ky ad etrace if y Ky for some >1. If is a left boudary, the it is atural if, ear y K y ad etrace if y K y for some >1. If is a boudary (either right or left), the it is etrace. 8 Note also that Assumptio 3 either requires or implies that the process is statioary. Whe both boudaries of the domai D are etrace boudaries, the the process is ecessarily statioary with commo ucoditioal (margial) desity for all t { y / ȳ { v (2.3) y exp 2 u du exp 2 u du dv y provided that the iitial radom variable is itself distributed with desity (2.3) (see, e.g., Karli ad Taylor (1981)). Whe at least oe of the boudaries is atural, statioarity is either precluded or implied i that the (oly) possible cadidate for statioary desity,, may or may ot be itegrable ear 7 A weak solutio to (2.2) i the iterval D is a pair W, a probability space ad a filtratio, such that W satisfies the stochastic itegral equatio that uderlies the stochastic differetial equatio (2.2). For a formal defiitio, see, e.g., Karatzas ad Shreve (1991, Defiitio 5.5.2). Uiqueess i law meas that two solutios would have idetical fiite-dimesioal distributios, i.e., i particular the same observable implicatios for ay discrete-time data. From the perspective of statistical iferece from discrete observatios, this is therefore the appropriate cocept of uiqueess. 8 Natural boudaries ca either be reached i fiite time, or ca the diffusio be started or escape from there. Etrace boudaries, such as, caot be reached startig from a iterior poit i D = +, but it is possible for to begi there. I that case, the process moves quickly away from ad ever returs there. Typically, ecoomic cosideratios require the boudaries to be uattaiable; however, they say little about how the process would behave if it were to start at the boudary, or whether that is eve possible, ad hece it is sesible to allow both types of boudary behavior.
23 yacie aït-sahalia the boudaries. 9 Next, I show that the diffusio admits a smooth trasitio desity: Propositio 2: Uder Assumptios 1 3, admits a trasitio desity p y y that is cotiuously differetiable i >, ifiitely differetiable i y D ad y D, ad three times cotiuously differetiable i. Furthermore, there exists > such that for every, there exist positive costats C i i= 4, ad D such that for every ad y y D 2 : <p y y C 1/2 e 3 y y 2 /8 e C 1 y y y +C 2 y y +C 3 y +C 4 y 2 (2.4) p y y / y (2.5) D 1/2 e 3 y y 2 /8 P y y e C 1 y y y +C 2 y y +C 3 y +C 4 y 2 where P is a polyomial of fiite order i y y, with coefficiets uiformly bouded i. Fially, if ear the right boudary + ad ear the left boudary (either or ), the =+. The ext result shows that these properties essetially exted to the diffusio X of origial iterest. Corollary 1: Uder Assumptios 1 3, (1.1) admits a weak solutio X t t, uique i probability law, for every distributio of its iitial value X. The boudaries of D X are uattaiable, i the sese that Prob T X = = 1 where T X if t X t D X. I additio, X admits a trasitio desity p X x x which is cotiuously differetiable i >, ifiitely differetiable i x D X ad x D X, ad three times cotiuously differetiable i. 2 2 Secod Trasformatio Z The boud (2.4) implies that the tails of p have a Gaussia-like upper boud. I light of the discussio at the begiig of Sectio 2 about the requiremets for covergece of a Hermite series, this is a big step forward. However, while, thaks to its uit diffusio = 1, is closer to a Normal variable tha X is, it is ot practical to expad p. This is due to the fact that p gets peaked aroud the coditioal value y whe gets small. Ad a Dirac mass is ot a particularly appealig leadig term for a expasio. For that reaso, I perform a further trasformatio. For give >, ad y R, defie the pseudoormalized icremet of as (2.6) Z 1/2 y 9 For istace, both a Orstei-Uhlebeck process, where y = y, ad a Browia motio, where y =, satisfy the assumptios made, ad both have atural boudaries at ad +. et the former process is statioary, due to mea-reversio, while the latter (ull recurret) is ot.
maximum likelihood estimatio 231 Of course, sice I do ot require that, I make o claim regardig the degree of accuracy of this stadardizatio device, hece the term pseudo. However, I will show that for fixed Z defied i (2.6) happes to be close eough to a N 1 variable to make it possible to create a coverget series of expasios for its desity p Z aroud a N 1. I other words, Z turs out to be the appropriate trasformatio of X if we are goig to start a expasio with a N 1 term. Expasios startig with a differet leadig term could be cosidered (with matchig orthogoal polyomials) but, should i fact be small, they would have the drawback of startig with a iadequate leadig term ad therefore requirig additioal correctio. 1 Let p y y deote the coditioal desity of t+ t, ad defie the desity fuctio of Z (2.7) p Z z y 1/2 p 1/2 z + y y Oce I have obtaied a sequece of approximatios to the fuctio z y p Z z y, I will backtrack ad ifer a sequece of approximatios to the fuctio y y p y y by ivertig (2.7): (2.8) p y y 1/2 p Z 1/2 y y y ad the back to the object of iterest x x p X x x, by applyig agai the Jacobia formula for the chage of desity: (2.9) p X x x = x 1 p x x 2 3 Approximatio of the Trasitio Fuctio of the Trasformed Data So this leaves us with the eed to approximate the desity fuctio p Z.For that purpose, I costruct a Hermite series expasio for the coditioal desity of the variable Z t, which has bee costructed precisely so that it is close eough to a N 1 variable for a expasio aroud a N 1 desity to coverge. The classical Hermite polyomials are H j z e z2 /2 dj [ e z2 /2 (2.1) ] j dz j ad let z e z2 /2 / 2 deote the N 1 desity fuctio. Also, defie J (2.11) p J Z z y z j= j Z y H j z as the Hermite expasio of the desity fuctio z p Z z y (for fixed y, ad ). 11 By orthoormality of the Hermite polyomials, divided by j! 1 This is because the limitig form of the desity for a diffusio, which is drive by a Browia motio, is Gaussia. However a differet leadig term would be appropriate for processes of a differet kid (for example drive by a o-browia Lévy process). 11 Hece the boudary behavior of the trasitio desity approximatio is desiged to match that of the true desity as the forward variable (ot the backward variable) ears the boudaries of the support: uder the assumptios made, p Z ear the boudaries.
232 yacie aït-sahalia with respect to the L 2 scalar product weighted by the Normal desity, the coefficiets j Z are give by (2.12) j Z y 1/j! H j z p Z z y dz Sectio 4 will idicate how to approximate these coefficiets i closed-form, yieldig a fully explicit sequece of approximatios to p Z. By aalogy with (2.8), I ca the form the sequece of approximatios to p as (2.13) p J y y 1/2 p J 1/2 y y y Z ad fially approximate p X by mimickig (2.9), i.e., (2.14) p J X x x x 1 p J x x The followig theorem proves that the expasio (2.14) coverges uiformly as more terms are added, ad that the limit is ideed the true (but ukow) desity fuctio p X. Theorem 1: Uder Assumptios 1 3, there exists > (give i Propositio 1) such that for every, ad x x D 2 X : (2.15) p J X x x p X x x J I additio, the covergece is uiform i over ad i x over compact subsets of D X.If x >c> o D X, the the covergece is further uiform i x over the etire domai D X.IfD X = + ad lim x + x =, the the covergece is uiform i x i each iterval of the form + >. 3 a sequece of approximatios to the maximum-likelihood estimator I ow study the properties of the sequece of maximum-likelihood estimators ˆ J derived from maximizig over i the approximate likelihood fuctio computed from p J X, i.e., (1.2) with p X replaced by p J X.12 I will show that ˆ J coverges as J to the true (but ucomputable i practice) maximumlikelihood estimator ˆ. I further prove that whe the sample size gets larger ( ), oe ca fid J such that ˆ J coverges to the true parameter value. 13 12 The roots of the Hermite polyomials are such that p J X > o a iterval c J c J with c J as J.Leta J be a positive sequece covergig to as J. Defie J as a (smooth) versio of the trimmig idex takig value 1 if p J X >a J ad a J otherwise. Before takig the logarithm, replace p J X by J p J X. It is show i the Appedix that such trimmig is asymptotically irrelevat. 13 This setup is differet from either the psuedo-maximum likelihood oe (see White (1982) ad Gouriéroux, Mofort, ad Trogo (1984)), or the semi-oparametric case (Gallat ad Nychka
maximum likelihood estimatio 233 3 1 Likelihood Fuctio: Iitial Observatio ad Radom Samplig Extesio Whe defiig the log-likelihood fuctio i (1.2), I igored the ucoditioal desity of the first observatio, l X, because it is domiated by the sum of the coditioal desity terms l p X X i X i 1 as. The sample cotais oly oe observatio o the ucoditioal desity ad o the trasitio fuctio, so that the iformatio o cotaied i the sample does ot icrease with. All the distributioal properties below will be asymptotic, so the defiitio (1.2) is appropriate for the log-likelihood fuctio (see Billigsley (1961)). I ay case, re-itroducig the term l X back ito the log-likelihood poses o difficulty. Note also that I have assumed for coveiece that is idetical across pairs of successive observatios. If istead varies determiistically, say i is the time iterval betwee the i 1 th ad ith observatios, the it is clear from (1.2) that it suffices to replace by its actual value i whe evaluatig the trasitio desity for the ith pair of observatios. If the samplig iterval is radom, the oe ca write dow the joit likelihood fuctio of the pair of observatios ad i ad utilize Bayes Rule to express it as the product of the coditioal desity of the ith observatio X i give the i 1 th ad i, times the margial desity q of i : that is p X i X i X i 1 q j where is a parameter vector parameterizig the samplig desity. 14 If the samplig process is idepedet of X ad, the the margial desity is irrelevat for the likelihood maximizatio ad the coditioal desity is the same fuctio p X as before, evaluated at the realizatio i. Hece for the purpose of estimatig, the criterio fuctio (1.2) is uchaged ad as i the determiistic case it suffices to replace by the realizatio i correspodig to the time iterval betwee the i 1 th ad ith observatios. By cotrast, whe the samplig iterval is radom ad iformative about the parameters of the uderlyig process (for example, if more rapid arrivals of trade sigal a icrease of price volatility), the the joit desity caot be itegrated out as simply. I ow retur to the base case of fixed samplig at iterval. 3 2 Properties of the Maximum-Likelihood Estimator To aalyze the properties of the estimators ˆ ad ˆ J, I itroduce the followig otatio. Defie the K K idetity matrix as Id ad L i (1987)). We are i a somewhat atypical situatio i the sese that the psuedo-likelihood does approximate the true likelihood fuctio, ad we wish to exploit this fact. I particular, the choice of J is idepedet of ad J ca always be chose sufficietly large to make the resultig estimator arbitrarily close to the true MLE. This paper is ot cocered with the potetial misspecificatio of the true likelihood fuctio, i.e., it accepts (1.1) as the data geeratig process, but the does ot require that the desities belog to specific classes such as the liear expoetial family. 14 To isure that Theorem 1 remais applicable whe is ot costat, assume that the distributio of has a support cotaied i a iterval of the form where < < <. I this case, the covergece i Theorem 1 is uiform i.
234 yacie aït-sahalia l p X X i X i 1. L i (ad additioal dots) deotes differetiatio with respect to, ad T deotes traspositio. The score vector V L i=1 i is a martigale. Corollary 1 proved that p X admits three cotiuous derivatives with respect to i ; the same holds for p J X by direct ispectio of its expressio give i Sectio 2.3. Next defie (3.1) i E L i L i T H L i i=1... I diag i T L i i=1 i=1 The fiiteess of i for every is proved as part of Propositio 3 below. Note that if the process is ot statioary. E L i L i T varies with the time idex i because it depeds o the joit distributio of X i X i 1. The square root of the diagoal elemet i i will determie the appropriate speeds of covergece for the correspodig compoet of ˆ, ad I defie the local I 1/2 -eighborhoods of the true parameter as N / I 1/2, where deotes the Euclidea orm o K. Recall that E H = i. 15 To idetify the parameters, we make the followig assumptio. Assumptio 4 (Idetificatio): The true parameter vector I is ivertible, belogs to (3.2) I 1 a s as uiformly i ad R I 1/2 T I 1/2 is uiformly bouded i probability for all i a I 1/2 -eighborhood of. If X is a statioary diffusio, a sufficiet coditio that guaratees (3.2) is that for all k = 1 K, ad x D X, (3.3) < I kk = < + x x x x l p X x x / k 2 p X x x dxdx uiformly i (where p X x x = p X x x x deotes the joit desity of observatios sampled uits of time apart) sice i that case I 1 = 1 I 1 a s. For the upper boud, it is sufficiet that l p X x x / k remai bouded as x varies i D X, but ot ecessary. For the lower boud, it is sufficiet that p X x x / k ot be zero i a 15 The order of differetiatio with respect to ad itegratio with respect to the coditioal desity p X (i.e., computatio of coditioal expectatios) ca be iterchaged due to the smoothess of the log-likelihood resultig from Corollary 1.
maximum likelihood estimatio 235 regio x x where the joit desity has positive mass, i.e., the trasitio fuctio p X must ot be uiformly flat i the directio of ay oe of the parameters k. Otherwise p X x x / k for all x x ad the parameter vector caot be idetified. Furthermore, a sufficiet coditio for (3.3) is that x = x ad x = x for -almost all x imply =. I show i the proof of Propositio 3 that the boudedess coditio o R i Assumptio 4 is automatically satisfied i the statioary case. A ostatioary example is provided i Sectio 5. The strategy I employ to study the asymptotic properties of ˆ J is to first determie those of ˆ (see Propositio 3) ad the show that ˆ J ad ˆ share the same asymptotic properties provided oe lets J go to ifiity with (Theorem 2). I Propositio 3, I show that geeral results pertaiig to time series asymptotics (see, e.g., Basawa ad Scott (1983) ad Jegaatha (1995)) ca be applied to the preset cotext. These properties follow from first establishig that the likelihood ratio has the locally asymptotically quadratic (LAQ) structure, i.e., (3.4) l + I 1/2 h l = h S h T G h /2 + o p 1 for every bouded sequece h such that + I 1/2 h, where S I 1/2 V ad G I 1/2 H I 1/2. The, depedig upo the joit distributio of S G, differet cases arise: Propositio 3: Uder Assumptios 1 4, ad for, the likelihood ratio satisfies the LAQ structure (3.4), the MLE ˆ is cosistet ad has the followig properties: i. (Locally Asymptotically Mixed Normal Structure): If (3.5) d S G G 1/2 Z G where Z is a N Id variable idepedet of the possibly radom but almost surely fiite ad positive defiite matrix G, the d (3.6) I 1/2 ˆ G 1/2 N Id Suppose that is a alterative estimator such that for ay h R K ad, (3.7) I 1/2 I 1/2 h d F uder P 1/2 +I h where F is a proper law, ot ecessarily Normal. The ˆ has maximum cocetratio i that class, i.e., is closer to tha is, i the sese that for ay > ) (3.8) C lim Prob ( I 1/2 ) ˆ C lim ( Prob I 1/2 where C + K. Further, if has the distributio I 1/2 d G 1/2 N Ṽ uder P, the Ṽ Id is o-egative defiite.
236 yacie aït-sahalia ii. (Locally Asymptotically Normal Structure): If X is a statioary diffusio, the a special case of the LAMN structure arises where (3.3) is a sufficiet coditio for Assumptio 4, i E L 1 L 1 T is Fisher s iformatio matrix, i = i I diag i I = I G = I 1/2 i I 1/2 is a oradom matrix ad (3.6) reduces to (3.9) d 1/2 ˆ N i ( 1) The efficiecy result simplifies to the Fisher-Rao form: i 1 is the smallest possible asymptotic variace amog that of all cosistet ad asymptotically Normal estimators of. iii. (Locally Asymptotically Browia Fuctioal structure): If ( S G d 1 1 ) (3.1) M dw M M T d where M W is a Gaussia process such that W is a stadard Browia motio, the (3.11) ( I 1/2 ˆ d 1 ) 1 M M T d 1 M dw If M ad W are idepedet, the LABF is a special case of LAMN, but ot otherwise. If oe had ormed the differece ˆ by the stochastic factor diag H 1/2 rather tha by the determiistic factor I 1/2, the the asymptotic distributio of the estimator would have bee N Id rather tha G 1/2 N Id (see the example i Sectio 5). I other words, the stochastic ormig, while itrisically more complicated, may be useful if the distributio of G is itractable, sice i that case, the distributio of I 1/2 eed ot be asymptotically Normal (ad depeds o ) whereas that of the stochastically ormed differece would simply be N Id. Noe of these difficulties are preset i the statioary case, where G is oradom. 16 Sufficiet coditios ca be give that isure that the LAMN structure holds: p for istace, if G G uiformly i over compact subsets of the (3.5) ecessarily holds by applyig Theorem 1 i Basawa ad Scott (1983, page 34). Note also that whe the parameter vector is multidimesioal, the K diagoal terms of i 1/2 do ot ecessarily go to ifiity at the same rate, ulike the commo rate 1/2 i the statioary case (see agai the example i Sectio 5). Propositio 3 is ot a ed i itself sice i our cotext ˆ caot be computed explicitly. It becomes useful however whe oe proves that the approximate maximum-likelihood estimator ˆ J is a good substitute for ˆ, i the sese 16 I the termiology of Basawa ad Scott (1983), whe G is determiistic (resp. radom), the model is called ergodic (resp. oergodic). But the LAMN situatio where G is radom is oly oe particularly tractable form of oergodicity.
maximum likelihood estimatio 237 that the asymptotic properties of ˆ idetified i Propositio 3 carry over to ˆ J For techical reasos, a mior refiemet of Assumptio 2 is eeded:. Assumptio 5 (Stregtheig of Assumptio 2 i the limitig case where = 1 ad the diffusio is degeerate at ): Recall the costat i Assumptio 2(2), ad the costats ad i Assumptio 3(1). If = 1, the either 1 with o restrictio o, or 2 / 1 if <1. If >1, o restrictio is required. The followig theorem shows that ˆ J (ucomputable) true MLE ˆ : iherit the asymptotic properties of the Theorem 2: Uder Assumptios 1 5, ad for : i. Fix the sample size. The as J ˆ J p ˆ uder P. ii. As, a sequece J ca be chose sufficietly large to deliver ay rate of covergece of ˆ J to ˆ. I particular, there exists a sequece J such that ˆ J ˆ = o p I 1/2 uder P which the makes ˆ J ad ˆ share the same asymptotic distributio described i Propositio 3. 4 explicit expressios for the desity expasio I ow tur to the explicit computatio of the terms i the desity expasio. Theorem 1 showed that (4.1) p Z z y = z j Z y H j z j= Recall that p J Z z y deotes the partial sum i (4.1) up to j = J. From (2.12), we have (4.2) j Z y = 1/j! = 1/j! H j z p Z z y dz H j z 1/2 p ( 1/2 z + y y ) dz ( = 1/j! H j 1/2 y y ) p y y dy = 1/j! E [ ( H j 1/2 t+ y ) t = y ] so that the coefficiets j Z are specific coditioal momets of the process.as such, they ca be computed i a umber of ways, icludig for istace Mote Carlo itegratio. A particularly attractive alterative is to calculate explicitly a Taylor series expasio i for the coefficiets j Z.Letf y y be a polyomial.
238 yacie aït-sahalia Taylor s Theorem applied to the fuctio s E f t+s y t = y yields (4.3) E [ f t+ y t = y ] = K k= A k f y y k k! + E [ A K+1 f t+ y t = y ] K+1 K + 1! where A is the ifiitesimal geerator of the diffusio, defied as the operator A f f/ y+ 1/2 2 f/ y 2. The followig propositio provides sufficiet coditios uder which the series (4.3) is coverget: Propositio 4: Uder Assumptios 1 3, suppose that for the relevat boudaries of D = y ȳ, ear ȳ =+ y Ky for some >1; ear y = y K y for some >1; ear y = y y for some >1 ad >; ad ear ȳ = y y for some >1 ad >. The the diffusio is statioary with ucoditioal desity ad the series (4.3) coverges i L 2 for fixed >. J K Now let pz deote the Taylor series up to order K i of p J Z. The series for the first seve Hermite coefficiets j = 6 are give by Z = 1, ad to order K = 3 by: (4.4) (4.5) (4.6) (4.7) (4.8) (4.9) 1 3 Z 2 3 Z 3 3 Z 4 3 Z 5 3 Z 6 3 Z = 1/2 ( 2 1 ( 4 1 2 = ( 2 + 1 + ( 28 2 1 2 + 4 2 2 ) / ( 2 + 6 2 + 21 2 2 + 32 1 = ( 3 + 3 1 + ) / 2 3/2 4 + 6 1 2 1 + 28 2 3 3 + 2 + 22 2 2 + 24 1 = ( 4 + 6 2 1 + ( 2 4 1 + 18 1 2 + 3 1 2 + 5 3 2 2 = ( 5 + 1 3 1 + 1 1 2 = ( 6 + 15 4 1 + 1 2 2 + 4 1 3 + 5 3 + 4 1 2 + 16 1 3 + 4 3 + 7 2 + 16 3 2 + 16 4 + 3 5 ) / 3/2 6 ( 12 3 1 + 14 3 + 4 2 + 15 1 2 + 15 1 3 + ) / 4 5/2 24 ) / 2 12 ) 3 / 96 + 3 4 + 3 + 1 2 1 2 + 4 + 15 2 3 + 6 1 + 2 3 + 88 1 2 + 28 1 2 ) / 5/2 48 ) 2 / 24 + 5 2 3 + 34 2 2 + 52 1 + 1 2 2 ) / 5/2 12 3 + 2 3 2 + 15 1 2 + 6 4 3 + 23 4 + 4 5 ) 3 / 24 + 45 2 1 2 ) / 3 72 + 5
maximum likelihood estimatio 239 where I have used the more compact otatio k m for k y / y k m. Differet ways of gatherig the terms are available (as i the CLT, where for example both the Edgeworth ad Gram-Charlier expasios are based o a Hermite expasio). Here, if we gather all the terms accordig to icreasig powers of istead of icreasig order of the Hermite polyomials, ad let p K Z p K Z (ad similarly for, so that p K y y = 1/2 p K ( Z 1/2 y y y ), ad the for X), we obtai a explicit represetatio of p K, give by (4.1) p K y y = 1/2 ( y y 1/2 where c y y = 1 ad for all j 1: ) ( y exp w dw y ) K k= c k y y k k! (4.11) y c j y y =j y y j w y j 1 y { w c j 1 w y + ( 2 c j 1 w y / )/ w 2 2 dw Fially, ote that i geeral, coditioal momets of the process eed ot be aalytic i time, 17 i which case (4.3) ad (4.1) must be iterpreted strictly as Taylor series. Eve whe that is the case, their relevace for empirical work lies i the fact that icludig a small umber of terms (oe, two, or three) makes the approximatio very accurate for the values these variables typically take i fiacial ecoometrics, as we shall ow see. 18 5 accuracy of the approximatios ad mote carlo evidece While Figure 1 shows that the approximatio of p X was extremely accurate as a fuctio of the state variables, it does ot ecessarily imply that the resultig parameter estimates would i practice ecessarily be close to the true MLE, as was proved theoretically i Theorem 2. To aswer that questio, I perform Mote Carlo experimets. Cosider first the Orstei-Uhlebeck specificatio, dx t = X t dt + dw t, where 2 ad D X = +. The process X has a Gaussia trasitio desity with mea x e ad variace 1 e 2 2/ 2. I this case, = X = 1 X y =y, ad the additioal terms i 17 Note however that as a result of Theorem 1 the trasitio fuctio is aalytic i the forward state variable. The expasio is desiged to deliver a approximatio of the desity fuctio y p y y for a fixed value of the backward (coditioig) variable y. Therefore, except i the limit where becomes ifiitely small, it is ot desiged to reproduce the limitig behavior of p i the limit where y teds to the boudaries. The expasio delivers the correct behavior for y tedig to the boudaries, except i the limitig situatio of y y 1 i Assumptio 3.1 where it is oly appropriate if becomes ifiitesimally small. 18 See also the compaio paper (Aït-Sahalia (1999)) for examples ad a applicatio to the estimatio of iterest rate models.
24 yacie aït-sahalia the approximatio p J Z eed oly correct for the iadequacy of the coditioal momets i the leadig term p Z, ot for ay o-gaussiaity. I other words, i the trasformatio from X to beig liear, there is o deformatio or stretchig of the Gaussia leadig term whe goig from the approximatio of p to that of p X. By specializig Propositio 3 to this model, oe obtais the followig asymptotic distributios for the MLE: 19 Corollary 2: (Asymptotic Distributio of the MLE for the Orstei- Uhlebeck Model): i. If > (LAN, statioary case): (( ) ( )) ˆ (5.1) ˆ 2 2 ( d N ) e 2 1 2 2 e 2 1 2 2 ( 2 e 2 1 2 4 e 2 1 2 +2 2 2 e 2 +1 +4 e 2 1 2 2 2 e 2 1 ii. If < (LAMN, explosive case), assume X = ; the ) e +1 (5.2) e 2 1 ˆ d G 1/2 N 1 ad ˆ 2 2 d N 2 4 where G has a 2 [1] distributio idepedet of the N 1. G 1/2 N 1 is a Cauchy distributio. iii. If = (LAQ, uit root case), assume X = ; the / ( d 1 ) ˆ 1 W 2 1 2 W 2 t dt ad ˆ 2 (5.3) 2 d N 2 4 where W t deotes a stadard Browia motio. I Mote Carlo experimets, I study the behavior of the true MLE of (which is computable i this example sice the trasitio fuctio is kow i closedform), the Euler estimator, ad the estimators of this paper correspodig to oe ad two orders i respectively. The Euler approximatio correspods to a simple discretizatio of the cotiuous-time stochastic differetial equatio, where the differetial equatio (1.1) is replaced by the differece equatio X t+ X t = X t + X t t+ with t+ N 1, so that (5.4) p Euler X x x = 2 2 x 1/2 exp { ( x x x ) 2/ 2 2 x 19 Sice there is o cofusio possible i what follows, the subscript is omitted whe deotig the true parameter values.
maximum likelihood estimatio 241 I set the true value of 2 at 1., ad examie the behavior of the various estimators of ad 2 for the differet cases of Corollary 2, by settig = 1, 5, ad 1 (statioary root LAN), = (uit root LABF), ad = 1 (explosive root LAMN). For each value of the parameters, I perform M = 5 Mote Carlo simulatios of the sample paths geerated by the model, each cotaiig = 1 observatios. These Mote Carlo experimets aswer four separate questios. Firstly, how accurate are the various asymptotic distributios i Corollary 2? This questio is aswered i Figures 2 ad 3, where I plot the fiite sample distributios of the estimators (histograms) ad the correspodig asymptotic distributio (solid lie). The asymptotic distributio of ˆ reported i Paels A C of Figure 2 is from (5.1). Not surprisigly, as the drift parameter makes the process closer ad closer to a uit root ( decreasig from 1 to 1), the quality of the asymptotic approximatio (5.1) deteriorates ad the small sample distributio starts to resemble (5.3), which is strogly skewed. This oly affects the drift parameter; the estimator of 2 behaves i small samples as predicted by the asymptotic distributio which is compatible with the fact that the distributio for estimatig 2 is cotiuous whe goig through the = boudary. Pael A of Figure 3 reports results for the uit root case, with the asymptotic distributio give i (5.3). I the explosive case <, Pael B of Figure 3 is based o the Cauchy distributio (5.2), while Pael C exploits the possibility of radom ormig to obtai a Gaussia asymptotic distributio of the drift coefficiet (see (A.69) i the Appedix). The diffusio estimator is idetical i both Paels B ad C, ad is therefore ot repeated i Pael C. Sice the rate of covergece i ostatioary cases varies, both Paels B ad C report the distributio of the drift estimator scaled by the relevat rate of covergece, rather tha the raw distributio of ˆ as i all other paels. The simulatios show that i both ostatioary cases, ad i the statioary case whe sufficietly far away from a uit root, the asymptotic distributio of the drift estimator is a accurate guide to its small sample distributio. The secod questio these experimets address is: what is the dispersio of the MLE aroud the true value? Tables I ad II report the first four momets of the fiite sample ad asymptotic distributios. For each of the parameter values ad the M samples, I also report i these tables the first two momets of the differeces betwee the true MLE estimators of ad 2, their Euler versios ad the estimators from usig the method of this paper with oe ad two terms. This makes it possible, thirdly, to compare the MLE dispersio, or samplig oise, to the distace betwee the MLE ad the various approximatios uder cosideratio. I particular, whe selectig the order of approximatio, it is uecessary to select a value larger tha what is required to make the distace betwee ˆ J ad ˆ a order of magitude smaller tha the distace betwee ˆ ad the true value (as measured by the exact MLE samplig distributio). These simulatios show that the parameter estimates obtaied with oe ad eve more so two terms are several orders of magitude closer to the exact MLE tha
242 yacie aït-sahalia Figure 2. Small sample ad asymptotic distributios of the MLE for the Orstei-Uhlebeck process: statioary processes. the MLE is to the parameter, so that the approximate estimates ca be used i place of the exact MLE i practice. Fially, these Mote Carlo experimets make it possible to compare the relative accuracy of the three estimators based o the Euler discretizatio approximatio, ad those of this paper. The results of the bottom part of Tables I ad II,
maximum likelihood estimatio 243 Figure 3. Small sample ad asymptotic distributios of the MLE for the Orstei-Uhlebeck process: ostatioary processes. comparig the differeces betwee the approximate ad exact estimators, show that the estimators with oe ad eve more so with two terms are substatially more accurate tha the Euler estimator, eve though the latter is i a ideal situatio i this example. Ideed, sice the true trasitio fuctio is Gaussia, the oly approximatio ivolved i the Euler estimatio cosists i usig first order Taylor series expasios of the true coditioal mea ad variaces rather
244 yacie aït-sahalia TABLE I Compariso of Approximate Estimators for the Orstei-Uhlebeck Process Statioary Processes Pael A: = 1 Pael B: = 5 Pael C: = 1 ˆ MLE TRUE Mea Asymptotic Sample 11826 11522 1965 Stad. Dev. Asymptotic 1 12619 7529 32561 Sample 1 12894 7712 36598 Skewess Asymptotic Sample 362 457 959 Kurtosis Asymptotic 3 3 3 Sample 3 295 3 432 4 484 ˆ EUL ˆ MLE Mea Sample 933615 248697 13134 Stad. Dev. Sample 23568 74819 91453 ˆ 1 ˆ MLE Mea Sample 279554 783 28979 Stad. Dev. Sample 162931 49566 46431 ˆ 2 ˆ MLE Mea Sample 13228 129 5 Stad. Dev. Sample 5978 671 163 ˆv MLE v TRUE Mea Asymptotic Sample 244 38 4357 Stad. Dev. Asymptotic 49117 46891 451521 Sample 54349 48553 46289 Skewess Asymptotic Sample 67 54 61 Kurtosis Asymptotic 3 3 3 Sample 3 66 3 46 3 41 ˆv EUL ˆv MLE Mea Sample 171692 92254 214 Stad. Dev. Sample 218769 14914 735 ˆv 1 ˆv MLE Mea Sample 395 732 992 Stad. Dev. Sample 14829 439 59 ˆv 2 ˆv MLE Mea Sample 182 1 1 Stad. Dev. Sample 433 75 46 Cov ˆ MLE ˆv MLE Asymptotic 22831 1673 226 Sample 23627 1958 224 Cov ˆ EUL ˆv EUL Sample 529 27 298 Cov ˆ 1 ˆv 1 Sample 2238 1632 222 Cov ˆ 2 ˆv 2 Sample 23785 1978 2242 Notes: The model is dx t =X t dt + dw t. I the table, v desigates the diffusio parameter 2, whose true value is 1.. The superscripts (MLE), (EUL), (1) ad (2) refer to the exact estimator, the estimator based o the Euler approximatio, ad the estimators based o the methods of this paper with oe ad two terms respectively (see (4.1)). Paels A, B, ad C i this table correspod to the same paels i Figure 2. The asymptotic values correspod to the asymptotic distributio give i Corollary 2. The sample momets are averages over 5, Mote Carlo simulatios.
maximum likelihood estimatio 245 TABLE II Compariso of Approximate Estimators for the Orstei-Uhlebeck Process No-Statioary Processes Uit Root Explosive Root Explosive Root Pael A: = Pael B: = 1 Pael C: = 1 ˆ MLE TRUE Mea Asymptotic 9226 Sample 9274 626 153 Stad. Dev. Asymptotic 1671 + 1 Sample 16494 7 266 1 239 Skewess Asymptotic 2 265 udefied Sample 2 154 27 685 428 Kurtosis Asymptotic 11 582 udefied 3 Sample 9 527 1588 61 3 2 ˆ EUL ˆ MLE Mea Sample 34314 Stad. Dev. Sample 14 ˆ 1 ˆ MLE Mea Sample 179 Stad. Dev. Sample 3434 ˆ 2 ˆ MLE Mea Sample 39 Stad. Dev. Sample 313 ˆv MLE v TRUE Mea Asymptotic Sample 2837 127 127 Stad. Dev. Asymptotic 447214 4473 4473 Sample 443634 44962 44962 Skewess Asymptotic Sample 125 97 97 Kurtosis Asymptotic 3 3 3 Sample 2 982 3 79 3 79 ˆv EUL ˆv MLE Mea Sample 1782 Stad. Dev. Sample 3171 ˆv 1 ˆv MLE Mea Sample 13 Stad. Dev. Sample 45 ˆv 2 ˆv MLE Mea Sample 11 Stad. Dev. Sample 44 Cov ˆ MLE ˆv MLE Asymptotic Sample 396 1 9 1 1 1 9 1 1 Cov ˆ EUL ˆv EUL Sample 1255 Cov ˆ 1 ˆv 1 Sample 3955 Cov ˆ 2 ˆv 2 Sample 3953 Notes: The same otes as i Table I apply. I the explosive case, the dispersio of the simulated data aroud the mea of the process (zero) makes it impractical to simulate the approximate estimators. The paels match those of Figure 3. The diffusio estimators i Paels B ad C are idetical.
246 yacie aït-sahalia TABLE III Compariso of Approximate Estimators for the Vasicek, Cox-Igersoll-Ross, ad Black-Scholes Models Vasicek Cox-Igersoll-Ross Black-Scholes dx t = X t dt dx t = X t dt dx t = X t dt + dw t + Xt 5 dw t + X t dw t ˆ MLE TRUE Mea 99674 9711 2561 Stad. Dev. 178366 18772 468815 ˆ EUL ˆ MLE Mea 15993 164 17667 Stad. Dev. 9873 325 8121 ˆ 1 ˆ MLE Mea 3675 53 2946 Stad. Dev. 53 15 1752 ˆ 2 ˆ MLE Mea 12 36 197 Stad. Dev. 27 494 294 ˆ MLE TRUE Mea 23341 6947 ot applicable Stad. Dev. 978321 11893 ot applicable ˆ EUL ˆ MLE Mea 3 89 ot applicable Stad. Dev. 71 1789 ot applicable ˆ 1 ˆ MLE Mea 112 1747 ot applicable Stad. Dev. 19126 257 ot applicable ˆ 2 ˆ MLE Mea 17 9 ot applicable Stad. Dev. 3544 1322 ot applicable ˆ MLE TRUE Mea 869 56 165 Stad. Dev. 11568 495 966928 ˆ EUL ˆ MLE Mea 7362 2768 562312 Stad. Dev. 2212 1492 189766 ˆ 1 ˆ MLE Mea 43 29 5585 Stad. Dev. 248 391 625 ˆ 2 ˆ MLE Mea 2 13 58 Stad. Dev. 29 387 117 Cov ˆ MLE ˆ MLE 38 22 3267 Cov ˆ EUL ˆ EUL 77 17 227254 Cov ˆ 1 ˆ 1 35 199 3588 Cov ˆ 2 ˆ 2 38 2 3262 Cov ˆ MLE ˆ MLE 368 12 ot applicable Cov ˆ EUL ˆ EUL 345 99 ot applicable Cov ˆ 1 ˆ 1 371 12 ot applicable Cov ˆ 2 ˆ 2 367 12 ot applicable Cov ˆ MLE ˆ MLE 3112 6 ot applicable Cov ˆ EUL ˆ EUL 2616 57 ot applicable Cov ˆ 1 ˆ 1 3134 6 ot applicable Cov ˆ 2 ˆ 2 3112 6 ot applicable Notes: The true values of the parameters, chose to be realistic for US iterest rates (Vasicek ad CIR) ad stock prices (Black-Scholes) respectively, are: = 5 = 6 = 3 (Vasicek), = 5 = 6 = 15 (CIR), ad = 2 = 3 (Black-Scholes). All momets reported are averages over 5, Mote Carlo replicatios.
maximum likelihood estimatio 247 tha the exact expressios. By cotrast, the approximate estimators correspodig to oe ad two terms mimic the momets of the MLE fiite sample distributio extremely closely, ofte to multiple accurate decimal places. Further Mote Carlo experimets for three stadard models i fiace (Black-Scholes (1973), Vasicek (1977), Cox-Igersoll-Ross (1985)) reported i Table III reveal that the estimators proposed here outperform by orders of magitude the Euler estimator, especially i o-gaussia situatios. 6 coclusios This paper has costructed a series of explicit fuctios, based o Hermite expasios ad covergig to the coditioal desity of the diffusio process, uder mild regularity coditios. This method makes maximum-likelihood a practical optio for the estimatio of parameters i discretely-sampled diffusio models. Beyod maximum-likelihood, the formulae for the expasio of p X apply to ay specificatio of 2, icludig oparametric oes. Differet types of evidece have bee provided i favor of this method. First, it largely outperforms discrete approximatios, biomial trees, PDE methods, ad simulatiobased methods i a direct compariso of speed ad accuracy (Figure 1). Secod, Mote Carlo experimets show that maximizig the log-likelihood approximatio provides parameter estimates that are very close to the true MLE (Tables I, II, ad III) ad outperforms by several orders of magitude the alterative methods ot oly i terms of computatioal speed ad ease of implemetatio but also i terms of accuracy. Extesios to multi-dimesioal diffusios (icludig uobservable state variables to be itegrated out of the likelihood fuctio, such as stochastic volatility) ad applicatios to derivative pricig will be cosidered i future work. A further appeal of this method lies i its potetial to be geeralized to yet other types of Markov processes, such as those drive by o-browia Lévy processes for istace. As I remarked earlier, this geeralizatio would ivolve differet scalig X Z, a o-gaussia leadig term for p Z (i this case a atural choice is the limitig trasitio desity of the drivig process), ad orthogoal fuctios that correspod to this leadig term. But the basic priciple remais valid: first form a orthogoal series to approximate the desity ad prove its covergece; the determie its coefficiets usig repeated iteratios of the ifiitesimal geerator of the Markov process uder cosideratio. Departmet of Ecoomics, Priceto Uiversity, Priceto, NJ 8544-121, U.S.A., ad NBER; yacie@priceto.edu; http://www.priceto.edu/ yacie Mauscript received December, 1997; fial revisio received October, 2. APPENDIX: Proofs Proof of Propositio 1: I treat the case where D = +, the other boudary cofiguratios beig dealt with similarly. Let s v exp v 2 u du be the scale desity of ad
248 yacie aït-sahalia S y y s v dv its scale fuctio. 2 I each case, the lower limit of itegratio is a fixed value i D, the choice of which is irrelevat i what follows (i.e., for the purpose of determiig whether or ot the relevat quatities below are ifiite or ot). Let m v 1/s v be the speed desity of. Step 1 Existece ad uicity i law of a weak solutio: This follows from the Egelbert-Schmidt criterio (see, e.g., Theorem 5.5.15 i Karatzas ad Shreve (1991), replacig by D throughout). To apply this result, ote that cotiuity of (ad of course = 1) implies the local itegrability requiremets for / 2 ad 1/ 2. Explosios are ruled out i Step 2 of this proof. Step 2 Uattaiability of the boudaries ad + : Defie { v { m u du s v dv= s v dv m u du y y y u (A.1) y { y y { u m u du s v dv= s v dv m u du v From Feller s test for explosios, Prob T = = 1 if ad oly if = ad = (see, e.g., Karatzas ad Shreve (1991, Theorem 5.5.29) or Karli ad Taylor (1981, Sectio 15.6)). Near ȳ =+, Assumptio 3.1 gives the upper boud y Ky for all y E (without restraiig how egative ca get); thus { = s v dv s 1 u du= e v u 2 (A.2) w dw dvdu y u y u e { v u 2Kw dw dvdu = e Kv2 dv e Ku2 du Now by itegratio by parts u y u e Kv2 dv = u y u v 1 ve Kv2 dv = 2Ku 1 e Ku2 2K 1 v 2 e Kv2 dv ad, sice v 2 e Kv2 dv < u 2 + e Kv2 dv, it follows that u u ( 1 + 2K 1 u 2) e Kv2 dv > 2Ku 1 e Ku2 or Therefore (A.3) u u e Kv2 dv > ( 2Ku + u 1) 1 e Ku2 y { u e Kv2 dv e Ku2 du y 2Ku + u 1 1 e Ku2 e Ku2 du =+ If y =, there exist costats such that for all <y ad y y where either >1 ad >, or = 1 ad k 1. If >1, we have for <v { { (A.4) s v = exp 2 w dw exp 2 w dw = exp { 2 1 v 1 v v ad hece u s v dv=+. If however = 1, { (A.5) s v exp 2 w 1 dw = k exp 2 l v = k v 2 v 2 The scale fuctio has the followig ituitive iterpretatio: with x <a<x <b< x, the probability that X will reach a before b (resp. b before a) startig from x is ( S b S x / S b S a ) (resp. oe mius this umber). Takig the limit b x ad a x + respectively, we see that uder Assumptio 2.2 the probability that X will reach either boudary of D X i fiite time is zero. u
maximum likelihood estimatio 249 ad u s v dv u k v 2 dv =+ agai sice we have assumed that 1 whe = 1 (i fact, 1/2 would be eough to obtai a etrace boudary, but we have also required that 1 to isure that lim y + y < + sice y = 1 y 2 if y = y 1 ). I all these iequalities, k deotes a differet positive ad fiite costat. It follows from u s v dv=+ ad the fiiteess of the measure m i the secod equality defiig that =, i.e., y = too is uattaiable. Step 3 Boudary classificatio for ȳ =+ : The boudary + is a atural boudary whe = N =, ad a etrace boudary whe = ad N < (see, e.g., Karli ad Taylor (1981, Table 6.2)), where { v (A.6) N s u du m v dv= y y y { u m v dv s u du Uder Assumptio 3, cosider first the case where there exists E> such that Ky y Ky for all y E. We the have { N = m v dv m 1 u du= e v u 2 (A.7) w dw dv du y u y u e { v u 2Kwdw dv du = e Kv2 dv e Ku2 du =+ y u as i (A.3). If istead we have y Ky >1, for all y E, the y u (A.8) N = = y u y { u e v u 2 w dw dv du e v +1 dv e u +1 du y u e v u 2Kw dw dv du where 2 + 1 1 K. By itegratio by parts hece u (A.9) u e v +1 dv = u v v e v +1 dv = + 1 1 u e u +1 1 + 1 2 v 1 e v +1 dv e v +1 dv < 2K 1 u e u +1.So { N e v +1 dv e u +1 du< 2K 1 u e u +1 e u +1 du < + y u Step 4 Boudary classificatio for y = : Amog uattaiable boudaries (i.e., give that = ), whether is a etrace or a atural boudary depeds upo whether N < or N = respectively, where (A.1) N y { y v s u du m v dv= y { u y u m v dv s u du We have i all cases w w 1 for some > (sice if >1 w w > w 1 ; ote that this costat is ot ecessarily 1/2). The we ca boud N as follows: y u { v y u { u (A.11) N = exp 2 w dw dv du = exp 2 w dw dv du u v y u e y { u u y { v 2 /wdw dv du = v 2 dv u 2 du = 2 + 1 1 u 2 +1 u 2 du = 2 + 1 1 y 2 /2 < + Therefore y = is a etrace boudary for all 1.
25 yacie aït-sahalia Proof of Propositio 2: Step 1 Existece of the trasitio desity p : Cosider first the case where D = +. The fact that Girsaov s Theorem ca be applied to follows from Karatzas ad Shreve (1991, 5.5.38); ote that the explosio time of T, is ifiity with probability 1 as was proved i Propositio 1. By Girsaov s formula, for every A i the usual -field, (A.12) Prob ( A = y ) = E M 1 W A W = y where 1 deotes the idicator fuctio ad the oegative supermartigale (A.13) { M exp W dw 1 2 2 W d is i fact a martigale for all >. Settig y y E M W = y W = y, (A.12) becomes (A.14) Prob A = y = 1 y A y y p BM y y dy where p BM y y = 2 1/2 exp { y y 2 / 2. The existece of the trasitio desity p follows from (A.14), ad is give by p y y = y y p BM y y. Itegratio by parts iside the coditioal expectatio defiig ad the scalig property of Browia motio allows to be further simplified (see Gihma ad Skorohod (1972, Chapter 3.13), Dacuha-Castelle ad Flores-Zmirou (1986), or Rogers (1985)) so that (A.15) p y y = 2 1/2 e y y 2 /2 + ( ) ] y y w dw E [e 1 1 u y +uy+ 1/2 Bu du where B u /u 1 desigates a Browia Bridge with B = B 1 =. Step 2 Boud for p : The strict positivity of p (lower boud) follows from (A.15). From Assumptio 3, we obtai y y w dw H + L y y 1 + y + Q y y 2 for all y y i D 2, where H L, ad Q are positive costats (if y, decompose the itegral from y to E, where is bouded as a cotiuous fuctio o a compact iterval, ad the from E to y, where is bouded by Ky; a similar argumet holds for y ). Hece i geeral Q = K. This is a upper boud for the itegral itself, ot its absolute value. The by the cotiuity of w i w, ad its limit behavior ear the boudaries uder Assumptio 3, it follows that there exists such that w for all w> ad (i geeral, however, will ot be bouded below). Therefore (A.16) [ { E exp 1 Collectig all terms we have that ( 1 u y + uy + 1/2 B u ) ] du e (A.17) p y y 2 1/2 e y y 2 /2 +H+L y y 1+ y +K y y 2 e C 1/2 e 3 y y 2 /8 e C 1 y y y +C 2 y y +C 3 y +C 4 y2 provided that 1/ 2 + Q 3/ 8, i.e., that < 8Q 1. It is clear from the argumet that we could replace 3/ 8 i the boud for p by ay umber less tha but arbitrarily close to 1/ 2, at the cost of reducig, but this will ot be ecessary. Further, whe ear + ad ear, Q ca be set to i the boud for y y w dw ad hece =+ (i which case we could also replace 3/ 8 by 1/ 2 ). Step 3 Differetiability of p : Suppose for ow that we are allowed to differetiate uder the expectatio sig i (A.15). It follows from the assumed smoothess of ad (hece ad )
maximum likelihood estimatio 251 that (A.18) p y y y = 2 1/2 e y y 2 + y 2 y w dw {{ y y [ + E 1 ( ) ] + y E [e 1 1 u y +uy+ 1/2 Bu du u ( 1 u y + uy + 1/2 B u ) du ( ) e 1 1 u y +uy+ 1/2 Bu ] du where w w / w. The fuctios uder the expectatios deped cotiuously o y ad I will ow show that they are bouded by variables havig costat expectatio themselves. By uiform covergece, differetiatig uder the expectatio will the have bee legitimate ad result i a cotiuous derivative. First, we have y y / + y Q 1 y y where Q 1 is a polyomial of degree oe i y y, with coefficiets uiformly bouded i. Secod E [ Ae B] E [ A e B] combied with (A.16) imply 1 ( ) ( ) ] [ E u 1 u y + uy + 1/2 B u du e 1 1 u y +uy+ 1/2 Bu du (A.19) [ 1 ( ] E u 1 u y + uy + 1/2 B u ) du e To boud the expected value o the right-had side, recall that w has at most polyomial growth, thus i particular at most expoetial growth. Hece there exists > ad G> such that w Ge w ad thus [ 1 E u ( 1 u y + uy + 1/2 B u ) ] (A.2) du [ 1 ] GE ue 1 u y +uy+ 1/2Bu du = G 1 ue [ e 1 u y +uy+ 1/2 Bu ] du G 1 ue 1 u y + uy E [ e 1/2 Bu ] du B u is distributed as N u 1 u. IfN is distributed as N 2, the desity of N is give by 2 2 1/2 1 exp x 2 /2 2 x. Therefore for ay costat a: E e a N = 2 2 1/2 1 e ax e x2 /2 2 dx = 2 2 1/2 1 e a2 2 /2 e x a 2 2 /2 (A.21) 2 dx = e a2 2 /2 2 1/2 1 e x a 2 2 /2 2 dx = e a2 2 /2 ad it follows that E e 1/2 Bu = e u 1 u /2. Hece [ 1 E u ( 1 u y + uy + 1/2 B u ) ] (A.22) du G 1 ue 1 u y +u y + u 1 u /2 du Ge y + y (sice u rus from to 1) ad we obtai (2.5) for all < <, where the costat D is uiform i ad P is a polyomial of fiite degree with coefficiets also uiform i. Step 4 Cosider fially (briefly) the case where D = +. What is required i the proof of Theorem 1 is to show that the itegral e w2 /2 p Z w y / w 2 dw coverges. That is, after a chage of variable Z, we eed to show that the itegral 1/2 e y y 2 /2 p y y / y 2 dy
252 yacie aït-sahalia coverges at both boudaries + ad +. The boudary + is either a etrace or a atural boudary for, ad i both cases lim y + p y y / y = (see McKea (1956, Remark 4.2, page 541). Hece the itegral coverges at the left boudary. The chage of measure i Step 1 above is o loger applicable i its simplest form, because the distributio of ad that of a Browia motio are o loger absolutely cotiuous with respect to oe aother sice is ow distributed o a subset of the real lie whereas a Browia motio is distributed o the etire real lie. However, we ca still trasform ito a Browia motio, but the Rado-Nikodym derivative is oly a local martigale istead of a martigale. Girsaov s Theorem ow gives for y> y > : y p y y = p BM y y e y w dw E [ e ] Wu du (A.23) W = y W = y <T where iside the expectatio W follows the law of a Browia motio ad T idicates the first time W hits. From (A.23), the same bouds ca be derived. Proof of Corollary 1: The existece ad uicity i law of a solutio of (1.1) follows, as i Propositio 1, from a applicatio of Theorem 5.5.15 i Karatzas ad Shreve (1991) replacig by D X throughout. Note that x > for every x i D X ad i ; hece the odegeeracy coditio of the theorem is fulfilled (the oly possible local degeeracy of, if ay, occurs as x +, but D X ). The cotiuity of ad implies the local itegrability requiremets for / 2 ad 1/ 2. Explosios are ruled out by showig that Prob T X = = 1. This i tur follows from the fact that t = X t. The fact that x teds to oe of the boudaries of D whe x teds to oe of the boudaries of D X meas that X would ot be able to reach oe of its boudaries without also doig so. But we already kow that caot do it (recall Propositio 1). Hece X caot explode. Fially, the existece of p X ad its derivatives follows from the Jacobia formula; specifically p X x x = x 1 p x x ad the differetiability of p proved i Propositio 2 (ad of course the differetiability of ad which results from Assumptio 2) exted to p X. Proof of Theorem 1: Step 1 Let > be the costat defied i Propositio 2 (possibly = ). Let A X be a compact set cotaied i D X, ad cosider x i A X.LetA be the compact set that cotais the values of x as x varies i A X ad i the closure of (recall that is bouded). Defie x x 1/2 x x. We seek to boud: (A.24) p X x x p J X x x = x 1 1/2 p Z x x x p J Z x x x For that purpose, we will boud the jth coefficiet i the approximatig fuctio p J Z. The J Z s i (2.12) are well-defied sice by (2.4), the momets u y j y j p y y dy are fiite for all j as a result of u y j e C 3 y +C 4 y2 C w y 1/2 j e 3w2 /8 +C 1 y w +C (A.25) 2 w dw where the variable y has bee chaged to w = y y. For each ad y there exists a value ȳ y such that for all w w ȳ y implies that 3w 2 /8 + C 1 y w +C 2 w 5w 2 /16. Next, itegratio by parts with j + 1 H j z = dh j+1 z /dz yields (A.26) j Z y = j! 1 H j w p Z w y dw =j! 1 j + 1 1 H w p j+1 Z w y dw ] + = j + 1! 1 H j+1 w p Z w y + j + 1! 1 H j+1 w p Z w y / w dw
maximum likelihood estimatio 253 With y = y + 1/2 w ad (2.4), we have that (A.27) <p Z w y a exp 3w 2 /8 exp a 1 w y +a 2 w +a 3 y +a 4 y 2 where the costats a i i= 4, are uiform i. By Theorem II i Stoe (1928), there exists a costat K such that for all z i R ad every iteger j, H j z K j! 1/2 j 1/4 1 + 2 5/4 z 5/2 e z2 /4. Therefore (A.28) j + 1! 1 H j+1 w p Z w y j + 1! 1/2 j + 1 1/4 K 1 + w 5/2 /2 5/4 e w2 /4 a e 3w2 /8 e a 1 w y +a 2 w +a 3 y +a 4 y2 ad hece j + 1! 1 H j+1 w p Z w y ] + =. Step 2 Proof that the expasio p J Z of p Z coverges: Defie (A.29) j y j! 1 H j w p Z w y / w dw We ca boud the terms of order j 1 i the series for p Z accordig to (A.3) j Z y H j z = j + 1! 1 H j+1 w p Z w y / w dw H j z = j+1 y H j z K { 1 + z 5/2 /2 5/4 e z2 /4 j 1/4 j! 1/2 j+1 y K 1 + z 5/2 /2 5/4 e z2 /4 j 1/4 j + 1 1/2 j + 1! 1/2 j+1 y K 1 + z 5/2 /2 5/4 e z2 /4 j 1/2 j + 1 1 + j + 1! 2 j+1 y /2 sice 2 + 2 /2. The first series o the right-had side, j 1/2 j + 1 1, is coverget. It remais to prove that the series j= j! 2 y j coverges. The itegral ew2 /2 p Z w y / w 2 dw coverges, sice from (2.5) oe ca coclude that: (A.31) p Z w y / w b e 3w2 /8 R w y e b 1 w y +b 2 w +b 3 y +b 4 y2 where R is a polyomial of fiite order i w y with coefficiets uiform i, ad where the costats b i i= 4, are uiform i. The expad the squared term i { J 2 e w2 /2 p Z w y / w w j y H j w dw = = e w2 /2 p Z w y / w 2 dw 2 2 1/2 + 2 1 J J j y j= j= k= J j y k y j= p Z w y / w H J w dw e w2 /2 p Z w y / w 2 dw 2 1/2 e w2 /2 H j w H k w dw J j! 2 y j ad the (domiated) covergece of the series o the right-had side follows. Further, the series coverges uiformly with respect to i ad to y i the compact set A. j=
254 yacie aït-sahalia Hece p J Z z y = z J j= j Z y H j z is coverget as J. Note that the covergece is uiform i z over the etire real lie sice the two series i (A.3) are idepedet of z ad hece coverge uiformly with respect to z. The covergece is also uiform with respect to i ad y i the compact set A. Step 3 Proof that the limit of p J Z z y (which we ow kow exists) is ideed p Z z y :Letq Z z y lim J p J Z z y. q Z is cotiuous i z as the uiform limit of a series of cotiuous fuctios. Further, with j+1 j 1/2 j + 1 1 + j + 1! 2 y j+1, ote that there exists a costat K such that (A.32) z j Z y H j z K 1 + z 5/2 /2 5/4 e z2 /4 j+1 K e 3z2 /8 j+1 (for z large eough) ad hece q Z satisfies the same boud as p Z (which itself follows from that of p i Propositio 2). Therefore the itegral k! 1 + q Z w y H k w dw exists ad (A.33) k! 1 = k! 1 J = p J Z w y H k w dw j= j Z y { k Z y if k J if k>j 2 1/2 e w2 /2 H j w H k w dw because 2 1/2 e w2 /2 H j w H k w dw = j! if k = j, ad otherwise (see, e.g., Sasoe (1991, page 38)). Hece it follows that k! 1 + q Z w y H k w dw = k y, ad so p Z ad q Z have the same k coefficiets for all k. To fiish, cosider two cotiuous fuctios satisfyig the same first boud as i Propositio 2 ad sharig the same k coefficiets for all k: they must be equal. Ideed, defie the differece r Z w y ; q Z w y p Z w y. The itegral of r Z agaist polyomials w k of all orders k is equal to zero (sice ay polyomial of order k is a liear combiatio of the first k polyomials H k ) ad therefore by Weierstrass s approximatio theorem the fuctio r Z is idetically zero. Step 4 Back to p X : I have show that, for every >, there exists J A such that for all J J A, the boud p Z z y p J Z z y holds for all z R y A, ad. If is globally odegeerate uder Assumptio 2(1), 1 x <c 1 < + implies that for all J J A X p X x x p J X x x for all x i R, x A X ad. If ot, for every >, there exists a costat c such that 1 x <c 1 < + for all x + ad. Therefore the uiform covergece of p J Z to p Z for z i R implies the uiform covergece of p J X to p X for x i + sice for such x s equatio (A.24) implies (A.34) p X x x p J X x x c 1 1/2 p Z x x x p J Z x x x Proof of Propositio 3: First verify that (3.4) holds. By Taylor s Theorem, we have ( ) l + I 1/2 (A.35) h l = h I 1/2 l + h T I 1/2 l I 1/2 h /2 for every bouded sequece h such that + I 1/2 h, with betwee ad + I 1/2 h. Now uder Assumptio 4, we have, agai by Taylor s Theorem, (A.36) I 1/2 l I 1/2 G I 1/2... l I 1/2 I 1/2 where the first term o the right-had side is bouded i probability as the orm of R, while the secod term, which arises because both ad are i the same I 1/2 -eighborhood, goes to zero, so (A.37) ( ) l + I 1/2 h l = h S h T G h /2 + o p 1
maximum likelihood estimatio 255 Therefore uder (3.5) we have the LAMN structure (see, e.g., Jegaatha (1995, Defiitio 3, page 837)), ad uder (3.1) the LABF structure (see, e.g., Jegaatha (1995, Defiitio 4, page 85)). By Taylor s Theorem applied to the score fuctio, l l ˆ = l ˆ, i.e., S = I 1/2 V = I 1/2 H I 1/2 I 1/2 ˆ so (A.38) I 1/2 ˆ = [ I 1/2 H I 1/2 ] 1 S ad hece as i (A.36) we have (A.39) I 1/2 ˆ G 1 S = o p 1 Now both (3.6) ad (3.11) follow from the joit covergece i distributio of S G uder LAMN ad LABF respectively, ad the Cotiuous Mappig Theorem (e.g., Hall ad Heyde (198, Theorem A.3, page 276) applied to (A.39). The efficiecy statemet (3.8) uder LAMN follows from applyig Theorem 3 i Basawa ad Scott (1983, Chapter 2.4, Theorem 3, page 6); the Normal asymptotic variace compariso follows from Chapter 2.3, Corollary 2, page 53. Uder statioarity, the covergece i (3.5) follows from the Cetral Limit Theorem ad the Law of Large Numbers (see, e.g., Hall ad Heyde (198)), ad the fact that E [ 1 H ] = 1 i = i, so (A.4) G = I 1/2 H I 1/2 = I 1/2 [ 1 H ] I 1/2 p I 1/2 i I 1/2 G G is a oradom positive defiite matrix provided that i is (which is guarateed by (3.2)), ad we obtai the classical result (3.9) (see, e.g., Billigsley (1961)). I ow show that the coditio o R i Assumptio 4 is automatically satisfied uder statioarity. I (A.15), let (A.41) (A.42) y ( b w dw f u 1 u y + uy + 1/2 B u ) c y q l ( p y y ) = l 2 /2 y y 2 /2 + b + l ( E e c ) 1 f u du ad recall that ad, hece f u ad c, are three times differetiable i uder Assumptio 2.1. From (A.42), it follows that (A.43) (A.44) q = ḃ + E ċ ec E e c q = b + E c + ċ 2 e c E e c... q =... b + E... c + 3ċ c + ċ 3 e c E e c 3E c + ċ 2 e c E ċ e c + 2E ċ ec 3 E e c 2 E e c 3 E ċ ec 2 E e c 2 where for simplicity I use the same otatio as if the parameter vector were oe-dimesioal. Let v play the role of ċ c +ċ 2, ad... c +3ċ c +ċ 3 respectively ad apply Hölder s Iequality, with p = 4/3 ad r = 4, to (A.45) E v e c E v e c = E e c /p e c /q v E e c 1/p E e c v q 1/q Uder Assumptio 3, the fuctio v has at most polyomial growth i w ad as a result it follows from the same calculatios as i (A.19) (A.22), ad the fact that c that (A.46) E e c v q 1/q Ge + y + y
256 yacie aït-sahalia (where G is a costat) for all the v listed above. We therefore have show that E v e c /E e c Ge + y + y E e c 1/p 1 Ge + y... + y E e c 1/4. Recall ext that, ad all have at most polyomial growth. Hece there exist a costat G ad a fiite order polyomial P such that q P y y +Ge a + y + y E e c 1/4 (A.47) q P y y +Ge a + y + y E e c 1/4 +E e c 1/2... q P y y +Ge a + y + y E e c 1/4 +E e c 1/2 +E e c 3/4 From this it follows that E [ q ] ȳ (A.48) = y {P y y + Ge a + y + y E e c 1/4 y 2 1/2 e y y 2 /2 +b E e c dy is fiite (the egative powers of E e c get compesated), ad similarly for E q = y ad E... q = y. Hece E... q is bouded, ad by the Law of Large Numbers (A.49) R = I 1/2 T I 1/2 p I 1/2 E [... q ] I 1/2 which is a fiite costat, uiformly bouded i. By usig the top iequality i (A.47) ad squarig it, it also follows as i (A.48) that E q 2 = y is bouded (the highest egative power becomes E e c 1/4 2 ad therefore i is fiite. The fact that the derivatives of l p X are bouded follows from the bouds just give for the derivatives of l p, ad the differetiatio chai rule applied to (2.9). Uder Assumptio 2, 1/ is bouded (except possibly ear a boudary) ad the fuctio defied i (2.1) ad its derivatives have at most polyomial growth. The same bouds as i (A.48) (with egative powers E e c 1/2 ad E e c 3/4 similarly aihilated by E e c apply to the secod ad third derivatives of l p with respect to y ad y rather tha. Thus (A.5) l p X = + l p + x l p + x y l p y (ad the ext two derivatives) are bouded similarly to (A.48) usig the bouds for the derivatives of l p i (A.5). Proof of Theorem 2: Step 1 Fix > ad x R. Let r J X x x = p X x x p J X x x, (A.51) R J X x x sup r J x x X ad also defie the correspodig quatities for ad Z. By Theorem 1, the covergece of p J y y to p y y is uiform i y over D ad i over, ad i y over bouded subsets A of D y. Hece there exists J A such that for all J J A sup sup y D sup y A r J y y <. Now recall: p X x x p J X x x = x 1 p x x p J x x ad for give x let A be the set of y described by x as varies i. Sice is bouded ad is cotiuous i ( is by Assumptio 2.1), A is bouded. It follows that for all J J A (A.52) { X x x sup x 1 sup R J sup y D sup r J y y { sup x 1 y A
maximum likelihood estimatio 257 Let 1 x sup x 1, which is fiite by the boudedess of ad the cotiuity of 1 i. The for m = 1 ad m = 2, we have that [ {R E J X X t+ X t ] m x (A.53) X t = x R J X x x m p X x x dx x x m m x p X x x dx [{ J i.e., lim J E R X X t+ X t m ] X t = x = for m = 1 2 provided that we prove that the two itegrals x x m x p X x x dx m = 1 2 coverge. Step 2 Boudig the itegral i the right-had side of (A.53): A difficulty oly arises whe D X = + lim x + x = (otherwise m x c m ad the x x m x p X x x dx c m ). Applyig the chage of variable X, I will prove covergece of the itegral ȳ y m 1 y p y y dy 1 meas 1/ whereas 1 represets the iverse of the fuctio ). We eed to cosider the two cases where y lim x + x is either + or. Uder Assumptio 2.1, we have that 1 x 1 x for all <x ad. For <x, we have x du/ u x 1 u du = 1 1 1 x 1 if <1, ad therefore y = + by takig the limit as x teds to +.Letx = 1 y, ad I have just show that for y ear + y 1 1 1 x 1, from which it follows that 1 y 1 y 1/ 1 ad cosequetly x (A.54) m 1 y m 1 y m m 1 y m / 1 So aturally the upper boud teds to + as y teds to +. The issue is whether this upper boud icreases faster tha p decreases as y teds to +. To aswer this questio, we eed to call upo Assumptio 3.1. For <y, (A.55) { y e w dw = e y w dw e y w dw = y if = 1 e 1 1 1 y 1 if >1 will provide a upper boud to p for y ear + (see the proof of Propositio 2; the other terms are bouded ear + ). It is clear that if >1 the left tail of p decays expoetially fast, while the upper boud for (A.56) m 1 y m 1 y m m 1 y m / 1 icreases oly geometrically, so the itegral will coverge. If = 1, the the tail of p is bouded above by y ad therefore the itegral will coverge if 2 / 1. This is give by Assumptio 5. If istead 1, the ad (A.57) x y = lim 1 u du= x + + + x 1 u du+ lim 1 u du x + where { x x 1 l x lim 1 u du lim 1 u du = lim x + x + x + 1 1 1 x 1 + 1 u du if = 1 if >1 which is, so y = whe 1. I that case, we have for y ear y m 1 y m 1 y m.letx = 1 y. From the same calculatio as above, we have y 1 l x if = 1. Thus 1 y e y ad therefore m 1 y m e m y. Now from (A.17), we kow that p is bouded above by a term of the form e 3y2 /8, so the itegral of e m y e 3y2 /8 coverges for y ear. If >1 y 1 1 1 x 1 ad therefore 1 y 1 y 1/ 1, ad thus
258 yacie aït-sahalia m 1 y m 1 y m / 1, which agai teds to + as y teds to, but ot fast eough to overcome the decay e 3y2 /8 of p. Hece the itegral ȳ y m 1 y p y y dy coverges ear y =whe 1. Therefore from (A.53) we coclude that (A.58) lim E [ J ] J R X X t+ X t m X t = x = for m = 1 2 Step 3 The covergece of its first two momets give by (A.58) to zero imply by Chebyshev s Iequality that the sequece R J X X t+ X t coverges to zero i probability, give X t = x, that is: (A.59) lim Prob( R J X X t+ X t ) > X t = x = J By Bayes Rule we have (A.6) Prob ( R J X X t+ X t ) > = Prob ( R J X X t+ X t > X t = x ) t x dx where t x Prob X t x / x deotes the ucoditioal (or margial) desity of X t at the true parameter value. Note that sice we are ot assumig that the process is strictly statioary, that desity depeds o t. Now sice Prob ( R J X X t+ X t ) > X t = x 1 ad t x dx = 1 it follows from Lebesgue s Domiated Covergece Theorem (see, e.g., Haaser ad Sulliva (1991, Theorem 6.8.6)) that (A.61) ( lim Prob R J X X t+ X t ) > = J Step 4 Covergece as J : I have ow established that p J p X X t+ X t p X X t+ X t as J Before takig the logarithm of p J X, we eed to trim it to isure that it is positive o the etire support D X. The roots of the Hermite polyomials are such that p J X > o a iterval c J c J with c J as J.Leta J be a positive sequece covergig to as J. Defie J as a (smooth) versio of the trimmig idex takig value 1 if p J X >a J ad a J otherwise. As a cosequece of the boud (2.4), p X is tight, i.e., for every > there always exists a compact space K D X that cotais 1 of the mass of the desity p X. As a result, trimmig by J is asymptotically irrelevat (as J p ), that is J 1. It follows that J p J p X p X ad furthermore l J p J p X X t+ X t l p X X t+ X t by the cotiuity of the logarithm. Thus for ay give we have obtaied that l J p l as J, uiformly i. The covergece of the respective argmax i ˆ J p ˆ as J is the a applicatio of stadard methods sice l J ad l ad their derivatives are both cotiuous i for all ad J. This proves part (i) of the Theorem. Step 5 Covergece as. From Step 4, a value of J ca be chose for each to make ˆ J ˆ arbitrarily small i probability. I particular, oe ca select J such that ˆ J ˆ = o p I 1/2 as. This proves part (ii) of the theorem. Proof of Propositio 4: 21 Uder the assumptios made o, the boudaries are etrace (see Propositio 1). The former isures statioarity of (see the discussio followig Propositio 1). It is also the case that the scale measure s v exp v 2 u du diverges expoetially fast at each boudary. The spectrum of A is discrete, from Sectio 4.1 i Hase, Scheikma, 21 I am grateful to Erst Schaumburg for providig a key elemet i this proof.
maximum likelihood estimatio 259 ad Touzi (1998). 22 That is, there exists a coutable umber of eigevalues p ad eigefuctios p p = 1, formig a orthoormal basis i L 2 such that for all fuctios f i the domai of A, A f = p= p f p p where deotes the atural ier product of L 2. Now polyomials f ad their iterates (by repeated applicatio of the geerator) retai their polyomial growth characteristic ear the boudaries; so they are all i L 2 ad satisfy lim y y f y /s y = lim y ȳ f y /s y =. This follows from the expoetial divergece of s y ear both boudaries whereas polyomials ad their iterates diverge at most polyomially (recall that uder Assumptio 3 ad its derivatives have at most polyomial growth; multiplyig ad addig fuctios with polyomial growth yields a fuctio still with polyomial growth). Usig the the Hase, Scheikma, ad Touzi (1998, page 1) characterizatio of the domai of the geerator of a scalar diffusio, polyomials ad their iterates are i the domai of the geerator. Sice f is i L 2, it follows that f = p= f p p with 2 p= f p <. Moreover, A k f = p= p k f p p. Therefore: (A.62) ( ) ( ) K K A k fk! 1 k = k f K p p p k! 1 k = k p k! 1 k f p p k= k= p= p= k= by Fubii s Theorem. beig a time-reversible diffusio, its eigevalues p are all real ad egative. Therefore K k= k p k! 1 k 1, with limit exp p 1, ad it follows that (A.63) ( ) 2 K k p k! 1 k f p 2 f p 2 < p= k= p= for all J ad by the domiated covergece theorem the series K k= Ak fk! 1 k coverges as K. Proof of Corollary 2: Step 1 Part (i) follows directly from (3.9). Step 2 Part (ii): With the otatio i X i e X i 1, the matrices H I ad G of Sectio 3 ca be calculated explicitly. It is easy to see that the terms of H are of the form: (A.64) (A.65) (A.66) [ H ] = a 11 1 + a 2 X 2 + a i 1 3 X i 1 i + a 4 2 i [ H ] = b 22 1 + b 2 2 i i=1 i=1 i=1 i=1 [ H ] = c 12 2 X i 1 i + c 3 2 i i=1 i=1 where a k b k, ad c k are fuctios of the parameters. Now E [ [ i=1 i 1 ] X2 ad E i=1 i ] 2 are asymptotically equivalet as to e 2 2 /2 e 2 1 ad 2 e 2 1 /2 respectively, ad E [ X ] i=1 i 1 i = (see White (1958) ad Aderso (1959)). So to calculate a asymptotic equivalet for I = diag E H we oly eed a 2 = 2 2 e 2 / 2 e 2 1, b 1 = 1/2 4 22 Natural boudaries do ot ecessarily lead to a discrete spectrum (for example, i some istaces a mixed discrete-cotiuous spectrum results). A statioary Orstei-Uhlebeck process (i.e., oe with positive mea reversio) is a example of a process with atural boudaries ad a discrete spectrum. What Propositio 4 shows is that havig a discrete spectrum is a sufficiet coditio for the covergece of the series (4.3).
26 yacie aït-sahalia ad b 2 = /2 6 e 2 1 to obtai that I 11 is equivalet to e 2 +1 2 / e 2 1 2 while I 22 = /2 4. Fially, (A.67) ( H 11 / I 11 H 12 / ) I 11 I 22 G = H 12 / p G I 11 I 22 H 22 / I 22 = ( ) 2 1 1 p because 2 1 e 2 / 2 e 2 i=1 X2 i 1 2 1 = N 1 2, while i=1 2 = O i p 1/2 ad X i=1 i 1 i = O p e. Therefore (5.2) follows, which is a o-gaussia distributio uder determiistic scalig by the asymptotic equivalet of I 11. Sice (A.68) Prob ( G 1/2 11 N 1 z ) =Prob ( N 1 z G 1/2 11 ) = z g e 2 /2 2 e g/2 2 g ddg yields by differetiatio with respect to z the desity 1/ 1 + z 2 G 1/2 11 N 1 is the Cauchy distributio. Alteratively, we obtai a Gaussia distributio uder radom scalig by the asymptotic equivalet of H 11 : 2 2 i=1 X2 i 1 (A.69) ˆ 2 e 2 1 d N 1 Step 3 Part (iii): I 11 is asymptotically equivalet to 2 E [ i=1 X2 i 1 ], i.e., to 2 2 /2, whe =. Further, (A.7) { S 1 = I 1/2 11 2 1 2 X i 1 i 2 1 2 i=1 2 i i=1 (A.71) { G 11 = I 1 11 6 1 2 + 2 i=1 X 2 i 1 + 2 i=1 X i 1 i + 3 1 2 2 i i=1 Thus ( S 1 G 11 ) d ( 2 1/2 1 W 2 1 2 1 W 2 d ) sice from White (1958): (A.72) ( X i=1 i 1 i 2 2 i=1 X2 i 1 2 2 ) ( d 1 W 2 1 1 W 2 d ) Ad (5.3) follows from (3.11) with M = W ad 2 1 W dw = 1 dw 2 1 d = W 2 1 1. I this case the covergece G G occurs i distributio but ot i probability (which as discussed i the text would have bee sufficiet to isure a LAMN likelihood ratio structure). I both ostatioary cases, Assumptio 4, icludig the boudedess coditio o R, is verified explicitly from the exact expressio of the likelihood fuctio (whose secod ad third derivatives diverge at the same rate: differetiate oce more with respect to the expressios (A.64) (A.66)). Fially, whe but ot whe >, the asymptotic distributio of the diffusio coefficiet 2 is uaffected by the estimatio of the drift, sice the covergece rate of the latter is faster whe. REFERENCES Aït-Sahalia,. (1996a): Noparametric Pricig of Iterest Rate Derivative Securities, Ecoometrica, 64, 527 56.
maximum likelihood estimatio 261 (1996b): Testig Cotiuous-Time Models of the Spot Iterest Rate, Review of Fiacial Studies, 9, 385 426. (1999): Trasitio Desities for Iterest Rate ad Other Noliear Diffusios, Joural of Fiace, 54, 1361 1395. Aderso, T.W.(1959): O Asymptotic Distributios of Parameters of Stochastic Differece Equatios, Aals of Mathematical Statistics, 3, 676 687. Azecott, R. (1981): Géodésiques et Diffusios e Temps Petit, ed. by R. Azecott. Paris, Frace: Société Mathématique de Frace. Basawa, I.V., ad B.L.S.Prakasa Rao (198): Statistical Iferece for Stochastic Processes. Lodo, UK: Academic Press. Basawa, I.V., ad D.J.Scott (1983): Asymptotic Optimal Iferece for No-ergodic Models, Lecture Notes i Statistics 17. New ork, N: Spriger Verlag. Billigsley, P. (1961): Statistical Iferece for Markov Processes. Chicago, IL: The Uiversity of Chicago Press. Black, F., ad M. Scholes (1993): The Pricig of Optios ad Corporate Liabilities, Joural of Political Ecoomy, 81, 637 654. Cox, J.C.(1975): The Costat Elasticity of Variace Optio Pricig Model, published i a special issue of The Joural of Portfolio Maagemet, 1996, 15 17. Cox, J.C., J.E.Igersoll, ad S.A.Ross (1985): A Theory of the Term Structure of Iterest Rates, Ecoometrica, 53, 385 47. Cramér, H. (1925): O Some Classes of Series Used i Mathematical Statistics, Proceedigs of the Sixth Scadiavia Mathematical Cogress, 399 425. Dacuha-Castelle, D., ad D. Flores-Zmirou (1986): Estimatio of the Coefficiets of a Diffusio from Discrete Observatios, Stochastics, 19, 263 284. Duffie, D., ad P. Gly (1997): Estimatio of Cotiuous-Time Markov Processes Sampled at Radom Time Itervals, Mimeo, Staford Uiversity. Eraker, B. (1997): MCMC Aaysis of Diffusio Models with Applicatio to Fiace, Mimeo, Norwegia School of Ecoomics, Berge. Flores, D. (1999): Estimatio of the Diffusio Coefficiet from Crossigs, Statistical Iferece for Stochastic Processes, 1, 175 195. Gallat, A.R., ad D.W.Nychka (1987): Semi-oparametric Maximum-Likelihood Estimatio, Ecoometrica, 55, 363 39. Gallat, A.R., ad G.Tauche (1996): Which Momets to Match? Ecoometric Theory, 12, 657 681. Gihma, I.I., ad A.V.Skorohod (1972): Stochastic Differetial Equatios. New ork, N: Spriger-Verlag. Gouriéroux, C., A.Mofort, ad E.Reault (1993): Idirect Iferece, Joural of Applied Ecoometrics, 8, S85 S118. Gouriéroux, C., A.Mofort, ad A.Trogo (1984): Pseudo Maximum Likelihood Methods: Theory, Ecoometrica, 52, 681 7. Haaser, N.B., ad J.A.Sulliva (1991): Real Aalysis. New ork, N: Dover. Hall, P., ad C.C.Heyde (198): Martigale Limit Theory ad its Applicatios. Sa Diego, CA: Academic Press. Hase, L.P., ad J.A.Scheikma (1995): Back to the Future: Geeratig Momet Implicatios for Cotiuous Time Markov Processes, Ecoometrica, 63, 767 84. Hase, L.P., J.A.Scheikma, ad N.Touzi (1998): Idetificatio of Scalar Diffusios Usig Eigevectors, Joural of Ecoometrics, 86, 1 32. Jegaatha, P. (1995): Some Aspects of Asymptotic Theory with Applicatios to Time Series Models, Ecoometric Theory, 11, 818 887. Jese, B., ad R. Poulse (1999): A Compariso of Approximatio Techiques for Trasitio Desities of Diffusio Processes, Mimeo, Aarhus Uiversity. Joes, C.S.(1997): Bayesia Aalysis of the Short-Term Iterest Rate, Mimeo, The Wharto School, Uiversity of Pesylvaia.
262 yacie aït-sahalia Karatzas, I., ad S.E.Shreve (1991): Browia Motio ad Stochastic Calculus. New ork, N: Spriger-Verlag. Karli, S., ad H.M.Taylor (1981): A Secod Course i Stochastic Processes. New ork, N: Academic Press. Kessler, M., ad M. Sorese (1999): Estimatig Equatios Based o Eigefuctios for a Discretely Observed Diffusio, Beroulli, 5, 299 314. Lo,A.W.(1988): Maximum Likelihood Estimatio of Geeralized Itô Processes with Discretely Sampled Data, Ecoometric Theory, 4, 231 247. McKea, H.P.Jr.(1956): Elemetary Solutios for Certai Parabolic Partial Differetial Equatios, Trasactios of the America Mathematical Society, 82, 519 548. Melio, A. (1994): Estimatio of Cotiuous-Time Models i Fiace, i Advaces i Ecoometrics, Sixth World Cogress, Volume II, ed. by C. Sims. Cambridge, UK: Cambridge Uiversity Press. Merto, R.C.(198): O Estimatig the Expected Retur o the Market: A Exploratory Ivestigatio, Joural of Fiacial Ecoomics, 8, 323 361. Pederse, A.R.(1995): A New Approach to Maximum-Likelihood Estimatio for Stochastic Differetial Equatios Based o Discrete Observatios, Scadiavia Joural of Statistics, 22, 55 71. Rogers, L.C.G.(1985): Smooth Trasitio Desities for Oe-Dimesioal Diffusios, Bulleti of the Lodo Mathematical Society, 17, 157 161. Sasoe, G. (1991): Orthogoal Fuctios. New ork, N: Dover. Sata-Clara, P. (1995): Simulated Likelihood Estimatio of Diffusio with a Applicatio to the Short Term Iterest Rate, Mimeo, UCLA. Stato, R. (1997): A Noparametric Model of Term Structure Dyamics ad the Market Price of Iterest Rate Risk, Joural of Fiace, 52, 1973 22. Stoe, M.H.(1928): Developmet i Hermite Polyomials, Aals of Mathematics, 29, 1 13. Vasicek, O. (1977): A Equilibrium Characterizatio of the Term Structure, Joural of Fiacial Ecoomics, 5, 177 188. White H. (1982): Maximum Likelihood Estimatio of Misspecified Models, Ecoometrica, 5, 1 25. White J.S.(1958): The Limitig Distributio of the Serial Correlatio Coefficiet i the Explosive Case, Aals of Mathematical Statistics, 29, 1188 1197. Wog, E. (1964): The Costructio of a Class of Statioary Markov Processes, i Stochastic Processes i Mathematical Physics ad Egieerig, Proceedigs of Symposia i Applied Mathematics, 16, ed. by R. Bellma. Providece, RI: America Mathematical Society, pp. 264 276.