The Statistical Analysis of Self-Exciting Point Processes

Transcription

1 Statistica Sinica (2011): Preprint 1 The Statistical Analysis of Self-Exciting Point Processes KA KOPPERSCHMDT and WNFRED STUTE The Nielsen Company (Germany) GmbH, Frankfurt and Mathematical nstitute, University of Giessen, Germany Abstract: n this paper we study minimum distance estimation for self-exciting point processes. These processes allow for a flexible semi-parametric modelling of the compensator. As a main result, we show the strong consistency and asymptotic normality of our estimator. We also present an application to a real data set from micro-economics. Key words and phrases: Asymptotic normality, consistency, market research, selfexciting point processes. 1. ntroduction Let N = N t, t, be a point process over a compact interval, and let Λ = Λ t, t, be its cumulative intensity or hazard process. This means that Λ is the compensator in the Doob-Meyer decomposition of N. n other words, M t N t Λ t is a (local) martingale w.r.t. a given filtration F t, t. Typically, we observe N over time while Λ serves as a predictor for future values of N. For the homogeneous Poisson process we have dλ t = λdt, where λ > 0 is a constant intensity. A simple extension which takes into account seasonal effects is the so-called heterogeneous Poisson process in which λ = λ t is (only) a function of t. As another simple point process we may consider N t = 1 {X t}, t,

2 2 KA KOPPERSCHMDT and WNFRED STUTE the so-called Single Event process. Here X is a random variable with distribution function F. The corresponding Λ equals 1 {X x} Λ t = 1 F (x ) F (dx), (,t] with F ( ) denoting the left-continuous version of F. The hazard measure Λ(dx) 1 {X x} 1 F (x ) F (dx) vanishes on x > X. This reflects the fact that for the Single Event process no further events may be expected once X has been observed. f rather than one X we observe an i.i.d. sample X 1,..., X n from the distribution function F, we may look at the empirical distribution function Then the martingale equals F n (t) = n 1 M n (t) = F n (t) (,t] n i=1 1 {Xi t}. 1 F n (x ) 1 F (x ) F (dx). Similar to before the hazard measure vanishes right to the largest order statistic. t is worth mentioning that in the two last examples the intensity measure is random. When F admits a Lebesgue density f, then where Λ n (dx) = [1 F n (x )]λ(x)dx, λ(x) = f(x) 1 F (x) is the hazard function of F. This is a special case of a multiplicative model, in which the density of Λ n is a product of an unknown deterministic function and an observable predictable process. n a parametric framework the functions f, F and hence λ may depend on an unknown parameter ϑ, so that the Radon-Nikodym derivative of dλ n w.r.t. Lebesgue measure may be written as a product of a deterministic parametric function and an observable predictable random process not depending on ϑ.

3 Self Exciting Point Processes 3 This example has motivated many researchers to model also point processes much more general than Single Event processes in a similar way. Let N 1,..., N n be a sample of i.i.d. point processes with the same distribution as N. As before, we may aggregate all N i to come up with an extension of F n, which aggregates the Single Event processes. A popular assumption for the cumulative intensity process of N i is the parametric multiplicative intensity model dλ i ϑ = α i(x, ϑ)y i (x)dx. n many cases it is assumed that α i is the same for all i. Fundamental contributions to the statistical inference on the true parameter ϑ 0 may be found in Borgan (1984) and Jacobsen (1982, 1984). The monograph of Andersen et al. (1993), in its chapter V, is based on Borgan (1984). Another relevant textbook reference is Karr (1986). He remarks (p. 175) that multiplicative intensity models provide the broadest setting in which good asymptotic theory based on i.i.d. copies of point processes is available. At the same time he admits (p. 170) that precise results may be obtained when α i is not very or not at all random. The statistical inference about the true parameter ϑ 0 is usually performed through maximum likelihood. This presumes that the model under study is dominated. The technical justification may come from Theorem 2.31 in Karr (1986), which describes densities of point processes obtained from exponential martingale transformations of the Poisson process. An important role in the context of point processes, intensities and martingales is the choice of the filtration F t, t. To make the process (N t ) t adapted, F t must include σ(n s : s t). Very often, one assumes equality, i.e., F t = σ(n s : s t) for all t, (1.1) the left-hand side sometimes being enriched by some events known at t = 0. Hence they have no significant contribution to the dynamics of the process. Andersen et al. (1993), p. 73, call any point process satisfying (1.1) self-exciting irrespective of whether (N t ) t has special features or not. Historically, this notion was first coined by Hawkes (1971a, b). He studied

4 4 KA KOPPERSCHMDT and WNFRED STUTE a point process whose Λ was given by Λ(t) = ν + g(t u)n(du), (1.2) (,t] where N is a stationary process. By this, Λ captures the information given by the past values of N and transferred through g. For example, if g decreases exponentially fast, the process with cumulative hazard function Λ features so-called shot noise effects. Hawkes and Oakes (1974) studied the connection of these processes with immigration-birth processes. Ogata and Akaike (1982) modified the righthand side of (1.2) to also incorporate convolution integrals w.r.t. other processes. Moller and Rasmussen (2005, 2006) studied algorithms to simulate Hawkes processes. To the best of our knowledge, Snyder (1975) was the first monograph discussing point processes featuring such special dynamics. A somewhat different model was proposed by Engle and Russell (1997, 1998). They introduced and studied so-called autoregressive conditional duration models, which constitute an adaptation of ARCH-time series to the point process context. The following question motivated the present work and will be discussed in Section 3. n market research it is important to understand the purchase behavior of customers. Clearly, each customer gives rise to a point process, where each point denotes the time of a purchase of a certain pre-specified fast-moving consumer good. Very often, just after a purchase, we may observe saturation effects leading to a downward jump of λ t, where for the moment we assume that the cumulative intensity process Λ admits a Lebesgue-density λ t : Λ(dt) = λ t dt. Hence in general λ t is a stochastic process admitting jumps. This fact does not mean that our model has been designed to incorporate structural changes (change points). n our situation, discontinuities of λ t do not constitute structural changes but are systemic parts of the model. n addition, λ t often also contains components being in charge of seasonal effects which only depend on time and not on individual customer issues. Finally, we may also include random components which result from external effects and not from the internal purchase history of the customer.

5 Self Exciting Point Processes 5 One of these components which will be discussed in detail in Section 3, is the impact of TV-advertising. From the company s point of view advertising hopefully creates an impulse leading to an upward jump in the intensity process. As is well known, in practice, such effects are followed by certain adstock phenomena, namely, that customers tend to forget about advertising when time passes by. t is then of interest to know how these partial effects enter into the overall Λ and how repeated advertising may overcome the adstock effect. From our experience in market research, any kind of proportionality of λ t across individuals will be too restrictive to explain the behavior of different customers. We only mention in passing that in a medical context, similar questions arise when one wants to analyze the impact of a treatment or other impulses on both the short and long run. n such a complicated situation there is little reason to assume that λ t is multiplicative or that the driving processes are stationary. Worse than that, Λ will not be predictable (or even adapted) w.r.t. the filtration σ(n s : s t) but take respect of external effects. Generally, the model is not dominated so that likelihood methods don t apply. Summarizing, to obtain acceptable models for real life point processes, one often needs to accept intensity processes which combine the internal history of the process with external shocks to the effect that the model is no longer dominated and straightforward likelihood methods don t exist allow for jumps which typically are followed by special patterns like, e.g., shot noise effects the relevant filtration is strictly larger than the internal history of the process. Let F t, t, denote the filtration containing the information up to time t. n this paper the notion of self-exciting will be used in a very general way to describe any filtered adapted point process N. n particular, in most cases of interest F t strictly includes σ(n s : s t).

6 6 KA KOPPERSCHMDT and WNFRED STUTE As a final comment, the simple notation λ t only expresses the dependence on time. This is only for ease of notation. As outlined above, in our approach, λ t may also depend on previous values N(s), s < t, or external measurements taken before t. Typically, this part has a nonparametric flavor. n addition, there may be unknown parameters which are part of a function connecting the nonparametric input processes. At the end of the day, λ t resp. Λ t will constitute a flexible semi-parametric model containing an unknown multivariate parameter ϑ. t is the aim of this paper to provide a methodology for estimating the parameter of interest when complicated input processes are present. As indicated above, since dominance cannot be guaranteed, we shall not dwell on likelihood but stick to the fact that N t Λ t is a (local) martingale with a possibly complicated dynamics. Remark 1. The main focus of this paper is on statistical inference of the parametric part in i.i.d. copies of a point process with complicated dynamics. There is also a rich literature on estimating parameters when only one point process is available, i.e., when n = 1. f we denote with = [0, T ] the observational period, increasing information is then coming in from letting T go to infinity. Proposition 3.23 in Karr (1986) is a prototype of such results. t yields asymptotic normality for the intensity in a simple Poisson Process, as T. mportant extensions are, e.g., due to Ogata (1978). Rathbun (1996), Schoenberg (2005) and Waagepetersen and Guan (2009), who extended the Poisson case to spatiotemporal processes satisfying some stationarity and mixing conditions. For many applications, however, T is fixed so that rather than extending the observational period i.i.d. copies of N need to be sampled. One advantage of this latter approach seems to be that complicated dynamics may be handled without requiring stationarity or mixing. 2. Main Results Let N = N(t), t, be a point process over a compact interval = [t, t]. For each t, let F t be the σ-field of events observable prior to t. Since we assume that N is observable we have σ(n(s) : s t) F t.

7 Self Exciting Point Processes 7 Moreover, the filtration F s, s, is increasing. n practice, at time t, there will be more variables or processes other than N(s), s t, known to us and which are likely to have some impact on future values N(s), s > t, of N. To make our approach as wide as possible applicable, we don t make assumptions which usually turn out to be technically very convenient, such as independence of increments, stationarity or a Markovian property. n such a general situation it will be difficult if not impossible to compute exact distributions. A powerful methodology then is to incorporate the compensator Λ of N. Hence Λ is a predictable process such that M t = N t Λ t, t, is a zero-mean (local) martingale w.r.t. F t, t. The process Λ t serves as a predictor process for N t. n particular, if N t is integrable, EN t = EΛ t. From time to time, we assume that dλ t = λ t dt. Here λ t, t, is a predictable stochastic process, the hazard or intensity process, with possible jumps. The λ-process takes its values in the space of left-hand continuous functions with right hand limits. This is the predictable analog of the better known Skorokhod space D(). See Billingsley (1968). To allow for a flexible modelling we may expect some nonparametric components for Λ as well as a parametric part. This offers a flexible model to the practitioner. Hence our approach is semiparametric with essentially no distributional conditions, except for some weak smoothness assumptions. Let ϑ Θ R d denote the parameter of interest, and let M = {Λ ϑ : ϑ Θ} be the associated model for the cumulative hazard process, the dependence on the nonparametric part being suppressed throughout this section. n the following

8 8 KA KOPPERSCHMDT and WNFRED STUTE we let ϑ 0 be the true parameter, i.e., Λ = Λ ϑ0 for some ϑ 0 Θ. Hence the associated innovation martingale equals M = N Λ ϑ0. t is the purpose of this work to provide some methodology on how to estimate ϑ 0. This estimate takes into account n independent replicates of N, say N 1,..., N n. f each N i is a simple Single Event process, our results take on a special form for empirical distributions. For general self-exciting processes our approach yields a contribution to the analysis of dynamic functional (point process) data. For further motivation, let Λ ϑ,i, for 1 i n and ϑ Θ, be the individual cumulative hazard processes. The unknown parameter ϑ 0 Θ is the same for all 1 i n and hence is intended to describe effects which are equal among the group. The remaining components typically differ from i to i and are thus individual. Usually they are random, but their distributional properties remain unspecified and may be complicated. As a consequence, likelihood methods are unlikely to be applicable. Rather, we apply a minimum distance method which yields robust consistent estimates of ϑ 0 and does not require additional distributional assumptions. As we shall see the Doob-Meyer decomposition and modelling of N through the hazard process is an adequate tool for the estimation of ϑ 0. We only remark in passing, that for the empirical distribution function, most approaches are based on modelling of the density of F so that statistical procedures mostly consist in minimizing the distance between F and F ϑ and not Λ ϑ. We now formulate our main result. To begin with, let N 1,..., N n be i.i.d. copies of N observed over. For each 1 i n, let F i (t), t, be an increasing filtration comprising the relevant information about N i. n particular, σ(n i (s), s t) F i (t). For interesting applications, the inclusion will be strict. Let Λ ϑ,i with ϑ Θ R d be a given class of semiparametric cumulative intensities such that the true Λ i of N i satisfies Λ i = Λ ϑ0,i for some ϑ 0 Θ. Let µ be a finite measure on. f f and g are two square integrable functions

9 w.r.t. µ, we set f, g µ = fgdµ with corresponding semi-norm f µ = Self Exciting Point Processes 9 1/2 f 2 (t)µ(dt). Our first f will be the difference between the aggregated point process and the aggregated compensator: N n = 1 n n i=1 N i Λϑ,n = 1 n n Λ ϑ,i. Hence n N n is the point process obtained by pooling the points in the individual processes. The associated innovation martingale M n is given through d M n = d N n d Λ ϑ0,n. f, for µ, we take µ = N n, the quantity i=1 N n Λ ϑ,n Nn represents an overall measure of fit of Λ ϑ,n to N n. Our final estimator of ϑ 0 will be the parameter minimizing the above distance: ϑ n = arg inf ϑ Θ N n Λ ϑ,n Nn. Other measures respectively cumulative functions which will appear later are Λ ϑ and EΛ ϑ, where throughout the paper we assume without further mentioning that EN( t) < and EΛ ϑ ( t) < for each ϑ Θ. Our first result shows that under weak identifiability and smoothness conditions, ϑ n will be a strongly consistent estimator of ϑ 0. Theorem 1. Let Θ R d be a bounded open set and suppose that for each ε > 0 inf EΛ ϑ 0 EΛ ϑ EΛϑ0 > 0. (2.1) ϑ ϑ 0 ε

10 10 KA KOPPERSCHMDT and WNFRED STUTE The process (t, ϑ) Λ ϑ (t) is continuous with probability one (2.2) and admits a continuous extension to Θ c, where Θ c is the closure of Θ. Then we have lim ϑ n = ϑ 0 with probability one. n Remark 2. Condition (2.1) is a weak identifiability condition. Since ϑ EΛ ϑ admits a continuous extension to Θ c, the infimum in (2.1) is attained, and (2.1) is equivalent to the condition EΛ ϑ0 (t) EΛ ϑ (t) for all ϑ ϑ 0 and t ϑ, where ϑ is a set of t s with µ( ϑ ) > 0. Here µ is the measure with cumulative function EΛ ϑ0. As to (2.2), in our applications Λ ϑ will have a (random) Lebesgue density λ ϑ with values in an appropriate Skorokhod space. This guarantees continuity (but not differentiability) of Λ ϑ in t on the one hand and allows for unexpected jumps in the intensity function λ ϑ as well. The boundedness of Θ was assumed only for convenience. The assertion of Theorem 1 also holds if Λ has a continuous extension to the compactification of Θ in the extended Euclidean space. For our second result, we need to assume that ϑ Λ ϑ (t) is a twice continuously differentiable function in a neighborhood of ϑ 0. This is not only a technical assumption, but first and second order derivatives will explicitly appear in the distributional approximation of ϑ n (see Theorem 2). Define Φ 0 (ϑ) = ϑ a matrix-valued function. (EΛ ϑ (t) EΛ ϑ0 (t)) E ϑ Λ ϑ(t) T EΛ ϑ0 (dt), Here and in the following T denotes transposition. Under standard regularity conditions (see (2.4) in Theorem 2 below) we may

11 Self Exciting Point Processes 11 interchange differentiation and integration to come up with Φ 0 (ϑ) = E ϑ Λ ϑ(t)e ϑ Λ ϑ(t) T EΛ ϑ0 (dt) + (EΛ ϑ (t) EΛ ϑ0 (t)) E 2 ϑ 2 Λ ϑ(t) T EΛ ϑ0 (dt). At ϑ = ϑ 0, the second integral vanishes so that Φ 0 (ϑ 0 ) = E ϑ Λ ϑ(t)e ϑ Λ ϑ(t) T EΛ ϑ0 (dt). (2.3) ϑ=ϑ0 The matrix inside the integral is nonnegative definite for each t. f, on a set of positive EΛ ϑ0 measure, these matrices are positive definite, so is Φ 0 (ϑ 0 ). nterestingly, this matrix may be viewed as an adaptation of the Fisher information matrix to the area of self-exciting point processes. Theorem 2. Assume that the conditions (2.1) and (2.2) of Theorem 1 are satisfied. Furthermore, assume that ϑ (EΛ ϑ(t) EΛ ϑ0 (t)) E ϑ Λ ϑ(t) T C(t) (2.4) for all ϑ in a neighborhood of ϑ 0, where the majorant function C is integrable w.r.t. EΛ ϑ0. The vector-valued function φ(x) = E ϑ Λ ϑ(t)eλ ϑ0 (dt), t x t (2.5) ϑ=ϑ0 [x, t] is square integrable w.r.t. EΛ ϑ0. Then we have n 1/2 Φ 0 (ϑ 0 )(ϑ n ϑ 0 ) = n 1/2 E ϑ Λ ϑ(t)eλ ϑ0 (dt) M n (dx) + o P (1) ϑ=ϑ0 n 1/2 [x, t] φ(x) M n (dx) + o P (1). (2.6) Under (2.5), the integral in (2.6) is a sum of centered i.i.d. square-integrable random vectors to which the Central Limit Theorem may be applied. The following Corollary yields the corresponding limit.

12 12 KA KOPPERSCHMDT and WNFRED STUTE Corollary 1. Under the assumptions of Theorem 2, we have, as n, n 1/2 Φ 0 (ϑ 0 )(ϑ n ϑ 0 ) N d (0, C(ϑ 0 )) in distribution, where C(ϑ 0 ) is a d d matrix with entries C ij (ϑ 0 ) = φ i (x)φ j (x)eλ ϑ0 (dx) and φ = (φ 1,..., φ d ) T. The matrix Φ 0 (ϑ 0 ) is symmetric and nonnegative definite. f it s nonsingular we get, as n, n 1/2 (ϑ n ϑ 0 ) N d (0, Σ) in distribution, where Σ = Φ 1 0 (ϑ 0)C(ϑ 0 )Φ 1 0 (ϑ 0). n the classical case, when we make single i.i.d. measurements in Euclidean spaces rather than observing complicated point processes, Φ 1 0 and C cancel out for the maximum likelihood estimator. n the point process situation studied in the present paper, this need no longer be the case. Note that compared with previous results in the literature Theorem 2 holds under surprisingly mild regularity assumptions. Remark 3. The i.i.d. representation (2.6) is also extremely useful when we plan to derive some goodness-of-fit tests for H 0 : Λ 0 M, where M is a semiparametric model for the true Λ 0. Since this is beyond the scope of the paper, we only sketch the arguments. The basic test process is the process t n 1/2 [N t ˆΛ t ], where ˆΛ is taken from M with estimated parameter ϑ n. Under H 0, this process has the expansion n 1/2 [N Λ 0 ] n 1/2 [ˆΛ Λ 0 ]. (2.7)

13 Self Exciting Point Processes 13 The first part of (2.7) is a standardized martingale, while under the smoothness assumptions of Theorem 2 and because of (2.6), Taylor-expansion yields an i.i.d. representation of the second part of (2.7). For aggregated Single Event processes, i.e., for empirical processes, this decomposition was studied by Durbin (1973), and for regression by Stute (1997). Estimation of unknown parameters typically changes the distributional character of the test process. Stute et al. (1998) discusses how in the regression case a martingale transformation may be applied to come up with some asymptotically distribution-free tests. n principle, this may be done also in the context of this paper but will be studied in detail in future work. 3. A Real Data Example The main motivation for our work was to develop and analyze a model for purchase timing which is able to measure the impact of TV-advertising. n Kopperschmidt and Stute (2009) we gave a review of the most prominent timing models in market research. These models are unable to provide a dynamic framework for advertising effects. n the model to be discussed in this paper this is possible if we are willing to model Λ t respectively λ t in a more flexible way. We now start with the modelling of λ t in a seasonal market. The product of interest was a premium brand of packaged ice cream on the German market. Our λ t will comprise three major effects: seasonality effects the internal purchase history advertising effects The seasonality effect is assumed to be the same for all households. t will serve as a baseline and depends on four parameters α, β, γ and δ through λ 1 (t) = α sin(βt + γ) + δ. δ is a basic consumption rate which is the same throughout the year. The rest is a properly shifted sinus curve which is assumed to have a peak in the months June- September. The parameters α and β denote the amplitude and the frequency

14 14 KA KOPPERSCHMDT and WNFRED STUTE of this periodic component. Prior to a deeper analysis we expect that due to seasonal effects δ > α > 0, β 2π 365 and γ π 2. Next we model the contribution of the past purchases. For household i, we denote with Y i1 < Y i2 <... < Y iki the ordered purchase times. For some households k i = 0. Through our approach this information was not neglected but properly incorporated into our analysis. The quantity Y ini (t ) is the time of the last purchase before t. Hence t Y ini (t ) is the age of the system, i.e., the time elapsed between the last purchase and t. One may assume that the intensity increases as t passes by and the consumer s stock of ice cream diminishes. The speed for recovery of the purchase inclination may be controlled by some parameter ε > 0 which enters into some λ i 2 (t) in the following form: ( ) λ i 2(t) = 1 e ε(t Y in (t )) i 1 {t>yi1 }. Our final component will be W i (t) λ i 3(t) = ξ e η(t Xih). h=1 Here W i (t) is the number of advertising contacts observed by household i up to time t. The advertising times are X i1 < X i2 <.... Hence t X ih denotes the time elapsed between t and the h th advertising contact. The parameter η is the so-called adstock parameter. f η < 0 is small, this would imply that the impact of advertising rapidly decreases as time passes by. The parameter ξ is the final parameter of interest. t measures the mean impulse of an advertising contact at the moment the message is received. At each X ih there is a jump of size ξ followed by a decay. Such effects are usually called shot noise. Hence λ i 3 (t) is a superposition of W i (t) of these effects. Our final λ ϑ,i will be λ ϑ,i = λ 1 λ i 2 + λ i 3. We used a multiplicative-additive version for λ ϑ,i since through this the advertising effect could be better isolated from the rest. For the parameter ϑ, we of course have ϑ = (α, β, γ, δ, ε, η, ξ). Our next comment is on the data set. t stems from a Single Source Panel of AC Nielsen, Germany. We only present those aspects which suffice to un-

15 derstand our application. Self Exciting Point Processes 15 Until December 2006, AC Nielsen equipped several thousand households of the panel during the time of their participation with technical devices that allow for the capture of their purchase and TV behavior. Each household provided information on purchases of arbitrary fast-moving consumer goods on a daily basis. This was achieved by scanning the barcodes at home. n our present analysis the amount of money spent was neglected. Each panel household was also provided with a device by which we could find out whether and when an advertising spot was watched. This gave us the information contained in W i and hence in λ i 3. The statistical analysis was based on 1660 households, 1375 of which purchased at least once. The households were observed over a period of 546 days. Hence = [0, 546]. Only 27 households did not watch a single TV advertisement for packaged ice cream throughout the 18 months. The whole sample was divided into 11 subgroups of size according to sociodemographic factors income and size of household. n the following we present the results of our analysis for two of these 11 subgroups differing in household size and monthly income. Group 1 2 HH Size 4 2 ncome (Euro) < n α n β n γ n δ n ε n ξ n η n We see that the plausibility check β 2π was fulfilled for both groups. The model also recognized the annual seasonality with a seasonal peak in the summer. small. ξ and η. The δ s being in charge of an off-seasonal demand are The most important parameters for the producer of the product were nterestingly enough, the adstock parameter η was always close to

16 16 KA KOPPERSCHMDT and WNFRED STUTE 0.5. The parameter ξ describing the strength of the advertising act differed among the groups. Typically households with children showed the tendency of having a smaller reaction parameter ξ (ξ n = ) compared with small but richer households (ξ n = ). This may be due to the fact that the product of interest was a premium brand so that families with children though attracted by the advertisement showed a tendency to buy cheaper products from the discounter. 4. Proofs First we introduce some notation. For ε > 0 we let B ε (ϑ 0 ) = {ϑ : ϑ ϑ 0 < ε} be the open ε-ball with center ϑ 0. Furthermore, for ε > 0 and r > 0, let B ε r(ϑ) = { ϑ Θ c \ B ε (ϑ 0 ) : ϑ ϑ < r } be the part of the r-ball with center ϑ, which does not belong to B ε (ϑ 0 ). n the following lemmas, let ϑ Θ be a given but fixed parameter. Lemma 1. Under the assumptions of Theorem 1, we have with probability one: { } 1 n lim N i Λ ϑ n n,i 2 N i = E sup N Λ ϑ 2 N (4.1) ϑ Br(ϑ) ε lim n i=1 1 n(n 1) lim n sup ϑ B ε r(ϑ) i j lim n sup ϑ B ε r(ϑ) 1 n(n 1) i j { = E 1 n(n 1)(n 2) { = E N i Λ ϑ,i 2 N j = E sup ϑ B ε r(ϑ) { } sup N 1 Λ ϑ,1 2 N 2 ϑ Br(ϑ) ε (4.2) N i Λ ϑ,i, N j Λ ϑ,j Ni (4.3) } sup N 1 Λ ϑ,1, N 2 Λ ϑ,2 N1 ϑ Br(ϑ) ε i j k sup N i Λ ϑ,i, N j Λ ϑ,j Nk (4.4) ϑ Br ε(ϑ) } sup ϑ B ε r (ϑ) N 1 Λ ϑ,1, N 2 Λ ϑ,2 N3

17 Self Exciting Point Processes 17 Proof. The four statements are immediate consequences of (5.1), our strong law for U-statistics of point processes. f, in the previous lemma, we replace the sup over ϑ Br(ϑ) ε by ϑ, we obtain the limit results as formulated in the following lemma. Proofs again are based on (5.1). Lemma 2. With probability one, lim n 1 lim n n n i=1 1 n(n 1) N i Λ ϑ,i 2 N i = E { N Λ ϑ 2 } N N i Λ ϑ,i 2 N j = E { N 1 Λ ϑ,1 2 } N 2 i j (4.5) (4.6) lim n 1 n(n 1) N i Λ ϑ,i, N j Λ ϑ,j Ni = E { N 1 Λ ϑ,1, N 2 Λ ϑ,2 N1 } (4.7) i j lim n 1 n(n 1)(n 2) N i Λ ϑ,i, N j Λ ϑ,j Nk (4.8) i j k = E { N 1 Λ ϑ,1, N 2 Λ ϑ,2 N3 } n the following lemma we compute the limit in (4.8). Lemma 3. We have E { N 1 Λ ϑ,1, N 2 Λ ϑ,2 N3 } = EΛ ϑ0 EΛ ϑ 2 EΛ ϑ0. (4.9) Proof. The expectation of the left-hand side equals E (N 1 (t) Λ ϑ,1 (t)) (N 2 (t) Λ ϑ,2 (t)) N 3 (dt). Conditionally on N 3, the expectation equals, by independence of N 1 Λ ϑ,1 and N 2 Λ ϑ,2 and by Fubini s Theorem, E 2 [N 1 (t) Λ ϑ,1 (t)] N 3 (dt) = Now integrate out to get the result. [EΛ ϑ0 (t) EΛ ϑ (t)] 2 N 3 (dt).

18 18 KA KOPPERSCHMDT and WNFRED STUTE n our next result we study local fluctuations of the process ϑ (N 1 (t) Λ ϑ,1 (t)) (N 2 (t) Λ ϑ,2 (t)) N 3 (dt). (4.10) Lemma 4. For given ε > 0 and δ > 0, and for each ϑ Θ c, there is a small but positive r > 0 such that { } E sup N 1 Λ ϑ,1, N 2 Λ ϑ,2 N3 inf EΛ ϑ Br(ϑ) ε ϑ Br(ϑ) ε ϑ 0 EΛ ϑ 2 EΛ ϑ0 δ (4.11) and E { inf ϑ Br ε(ϑ) N 1 Λ ϑ,1, N 2 Λ ϑ,2 N3 } EΛ ϑ0 EΛ ϑ 2 EΛ ϑ0 δ (4.12) Proof. By assumption, the process (4.10) is continuous in ϑ. Hence sup ϑ B ε r (ϑ) N 1 Λ ϑ,1, N 2 Λ ϑ,2 N3 N 1 Λ ϑ,1, N 2 Λ ϑ,2 N3 as r 0. By monotone convergence, the expectation of the left-hand side converges to by Lemma 3. Similarly, E { N 1 Λ ϑ,1, N 2 Λ ϑ,2 N3 } = EΛ ϑ0 EΛ ϑ 2 Λ ϑ0, inf ϑ B ε r(ϑ) EΛ ϑ 0 EΛ ϑ 2 EΛ ϑ0 EΛ ϑ0 EΛ ϑ 2 EΛ ϑ0 as r 0. This proves the first part of the lemma. The second part is shown in a similar way. Lemma 5. For each fixed ϑ Θ, we have N n Λ ϑ,n 2 Nn EΛ ϑ0 EΛ ϑ 2 EΛ ϑ0 with probability one. n particular, for ϑ = ϑ 0, the limit equals zero. Proof. We have N n Λ [ ϑ,n 2 Nn = Nn (t) Λ ϑ,n (t) ]2 Nn (dt) n = n 3 N i Λ ϑ,i, N j Λ ϑ,j Nk. i,j,k=1

19 Self Exciting Point Processes 19 By (4.5) (4.7), the last triple sum is dominated by the sum restricted to i j k i. The limit follows from (5.1), (4.8) and Lemma 3. Lemma 6. For each ε > 0 we have, with probability one, inf N n Λ ϑ,n Nn inf EΛ ϑ 0 EΛ ϑ EΛϑ0. ϑ: ϑ ϑ 0 ε ϑ: ϑ ϑ 0 ε Proof. For a given ε > 0, let δ > 0 be an arbitrary positive number. According to Lemma 4, there exists, for each ϑ Θ c, a positive radius r = r(ϑ) > 0 such that (4.11) and (4.12) are satisfied. Then ϑ Θ c \B ε(ϑ 0 ) B ε r(ϑ) (ϑ) = Θc \ B ε (ϑ 0 ) is an open covering of the compact set Θ c \ B ε (ϑ 0 ). We may select finitely many ϑ 1,..., ϑ q so that Θ c \ B ε (ϑ 0 ) is already covered by the B ε r l (ϑ l ), where r l = r(ϑ l ). Conclude that inf N n Λ ϑ,n 2 Nn inf EΛ ϑ 0 EΛ ϑ 2 EΛ ϑ: ϑ ϑ 0 ε ϑ: ϑ ϑ 0 ε ϑ0 = min inf N n Λ ϑ,n 2 Nn min inf EΛ 1 l q ϑ Br ε (ϑ l l ) 1 l q ϑ Br ε ϑ 0 EΛ ϑ 2 EΛ (ϑ l l ) ϑ0 max inf 1 l q N n Λ ϑ,n 2 Nn inf EΛ ϑ Br ε (ϑ l l ) ϑ Br ε ϑ 0 EΛ ϑ 2 EΛ (ϑ l l ) ϑ0. t suffices to prove that for each 1 l q, with probability one, the limsup of the term in brackets in absolute value is less than or equal to δ. But inf N n Λ ϑ,n 2 Nn n 3 ϑ Br ε (ϑ l l ) i,j,k sup ϑ B ε r l (ϑ l ) N i Λ ϑ,i, N j Λ ϑ,j Nk which again is dominated by the sub-sum pertaining to i j k i. With Lemma 1 this converges to { E } sup N 1 Λ ϑ,1, N 2 Λ ϑ,2 N3. ϑ Br ε (ϑ l l ) By Lemma 4 and the choice of r l, this term is within δ-distance of inf ϑ B ε rl (ϑ l ) EΛ ϑ0 EΛ ϑ 2 EΛ ϑ0. The corresponding lower bound may be obtained in a similar way. This concludes the proof of the lemma.

20 20 KA KOPPERSCHMDT and WNFRED STUTE Proof of Theorem 1. Let ε > 0 be given. We need to show that ( ) P lim sup { ϑ n ϑ 0 ε} n = 0. But { } { ϑ n ϑ 0 ε} inf N n Λ ϑ,n Nn < inf N n Λ ϑ,n Nn ϑ: ϑ ϑ 0 ε ϑ: ϑ ϑ 0 <ε { } inf N n Λ ϑ,n Nn < N n Λ ϑ0,n Nn. ϑ: ϑ ϑ 0 ε From Lemma 6, the left-hand side goes to inf EΛ ϑ 0 EΛ ϑ EΛϑ0, ϑ: ϑ ϑ 0 ε which is positive by (2.1). At the same time, the right-hand side tends to EΛ ϑ0 EΛ ϑ0 EΛϑ0 = 0. This proves Theorem 1. The following lemma will play a crucial role in the proof of Theorem 2. t expresses certain point process integrals in terms of the associated innovation martingale M n. Lemma 7. We have [ Λϑ,n (t) Λ ϑ0,n(t) ] ϑ Λ ϑ,n (t) N n (dt) ϑ=ϑn (4.13) = M n (t) ϑ Λ ϑ,n (t) M n (dt) ϑ=ϑn + M n (t) ϑ Λ ϑ,n (t) Λ ϑ0,n(dt) ϑ=ϑn. Proof. The equation is an immediate consequence of the identity d N n = d M n + d Λ ϑ0,n (4.14) and the fact, that ϑ n minimizes N n Λ ϑ,n 2 Nn in ϑ and is in the inner open set of Θ, whence [ Nn (t) Λ ϑ,n (t) ] ϑ Λ ϑ,n (t) N n (dt) = 0 at ϑ = ϑ n.