The Statistical Analysis of SelfExciting Point Processes


 Samson Jefferson
 1 years ago
 Views:
Transcription
1 Statistica Sinica (2011): Preprint 1 The Statistical Analysis of SelfExciting Point Processes KA KOPPERSCHMDT and WNFRED STUTE The Nielsen Company (Germany) GmbH, Frankfurt and Mathematical nstitute, University of Giessen, Germany Abstract: n this paper we study minimum distance estimation for selfexciting point processes. These processes allow for a flexible semiparametric modelling of the compensator. As a main result, we show the strong consistency and asymptotic normality of our estimator. We also present an application to a real data set from microeconomics. Key words and phrases: Asymptotic normality, consistency, market research, selfexciting point processes. 1. ntroduction Let N = N t, t, be a point process over a compact interval, and let Λ = Λ t, t, be its cumulative intensity or hazard process. This means that Λ is the compensator in the DoobMeyer decomposition of N. n other words, M t N t Λ t is a (local) martingale w.r.t. a given filtration F t, t. Typically, we observe N over time while Λ serves as a predictor for future values of N. For the homogeneous Poisson process we have dλ t = λdt, where λ > 0 is a constant intensity. A simple extension which takes into account seasonal effects is the socalled heterogeneous Poisson process in which λ = λ t is (only) a function of t. As another simple point process we may consider N t = 1 {X t}, t,
2 2 KA KOPPERSCHMDT and WNFRED STUTE the socalled Single Event process. Here X is a random variable with distribution function F. The corresponding Λ equals 1 {X x} Λ t = 1 F (x ) F (dx), (,t] with F ( ) denoting the leftcontinuous version of F. The hazard measure Λ(dx) 1 {X x} 1 F (x ) F (dx) vanishes on x > X. This reflects the fact that for the Single Event process no further events may be expected once X has been observed. f rather than one X we observe an i.i.d. sample X 1,..., X n from the distribution function F, we may look at the empirical distribution function Then the martingale equals F n (t) = n 1 M n (t) = F n (t) (,t] n i=1 1 {Xi t}. 1 F n (x ) 1 F (x ) F (dx). Similar to before the hazard measure vanishes right to the largest order statistic. t is worth mentioning that in the two last examples the intensity measure is random. When F admits a Lebesgue density f, then where Λ n (dx) = [1 F n (x )]λ(x)dx, λ(x) = f(x) 1 F (x) is the hazard function of F. This is a special case of a multiplicative model, in which the density of Λ n is a product of an unknown deterministic function and an observable predictable process. n a parametric framework the functions f, F and hence λ may depend on an unknown parameter ϑ, so that the RadonNikodym derivative of dλ n w.r.t. Lebesgue measure may be written as a product of a deterministic parametric function and an observable predictable random process not depending on ϑ.
3 Self Exciting Point Processes 3 This example has motivated many researchers to model also point processes much more general than Single Event processes in a similar way. Let N 1,..., N n be a sample of i.i.d. point processes with the same distribution as N. As before, we may aggregate all N i to come up with an extension of F n, which aggregates the Single Event processes. A popular assumption for the cumulative intensity process of N i is the parametric multiplicative intensity model dλ i ϑ = α i(x, ϑ)y i (x)dx. n many cases it is assumed that α i is the same for all i. Fundamental contributions to the statistical inference on the true parameter ϑ 0 may be found in Borgan (1984) and Jacobsen (1982, 1984). The monograph of Andersen et al. (1993), in its chapter V, is based on Borgan (1984). Another relevant textbook reference is Karr (1986). He remarks (p. 175) that multiplicative intensity models provide the broadest setting in which good asymptotic theory based on i.i.d. copies of point processes is available. At the same time he admits (p. 170) that precise results may be obtained when α i is not very or not at all random. The statistical inference about the true parameter ϑ 0 is usually performed through maximum likelihood. This presumes that the model under study is dominated. The technical justification may come from Theorem 2.31 in Karr (1986), which describes densities of point processes obtained from exponential martingale transformations of the Poisson process. An important role in the context of point processes, intensities and martingales is the choice of the filtration F t, t. To make the process (N t ) t adapted, F t must include σ(n s : s t). Very often, one assumes equality, i.e., F t = σ(n s : s t) for all t, (1.1) the lefthand side sometimes being enriched by some events known at t = 0. Hence they have no significant contribution to the dynamics of the process. Andersen et al. (1993), p. 73, call any point process satisfying (1.1) selfexciting irrespective of whether (N t ) t has special features or not. Historically, this notion was first coined by Hawkes (1971a, b). He studied
4 4 KA KOPPERSCHMDT and WNFRED STUTE a point process whose Λ was given by Λ(t) = ν + g(t u)n(du), (1.2) (,t] where N is a stationary process. By this, Λ captures the information given by the past values of N and transferred through g. For example, if g decreases exponentially fast, the process with cumulative hazard function Λ features socalled shot noise effects. Hawkes and Oakes (1974) studied the connection of these processes with immigrationbirth processes. Ogata and Akaike (1982) modified the righthand side of (1.2) to also incorporate convolution integrals w.r.t. other processes. Moller and Rasmussen (2005, 2006) studied algorithms to simulate Hawkes processes. To the best of our knowledge, Snyder (1975) was the first monograph discussing point processes featuring such special dynamics. A somewhat different model was proposed by Engle and Russell (1997, 1998). They introduced and studied socalled autoregressive conditional duration models, which constitute an adaptation of ARCHtime series to the point process context. The following question motivated the present work and will be discussed in Section 3. n market research it is important to understand the purchase behavior of customers. Clearly, each customer gives rise to a point process, where each point denotes the time of a purchase of a certain prespecified fastmoving consumer good. Very often, just after a purchase, we may observe saturation effects leading to a downward jump of λ t, where for the moment we assume that the cumulative intensity process Λ admits a Lebesguedensity λ t : Λ(dt) = λ t dt. Hence in general λ t is a stochastic process admitting jumps. This fact does not mean that our model has been designed to incorporate structural changes (change points). n our situation, discontinuities of λ t do not constitute structural changes but are systemic parts of the model. n addition, λ t often also contains components being in charge of seasonal effects which only depend on time and not on individual customer issues. Finally, we may also include random components which result from external effects and not from the internal purchase history of the customer.
5 Self Exciting Point Processes 5 One of these components which will be discussed in detail in Section 3, is the impact of TVadvertising. From the company s point of view advertising hopefully creates an impulse leading to an upward jump in the intensity process. As is well known, in practice, such effects are followed by certain adstock phenomena, namely, that customers tend to forget about advertising when time passes by. t is then of interest to know how these partial effects enter into the overall Λ and how repeated advertising may overcome the adstock effect. From our experience in market research, any kind of proportionality of λ t across individuals will be too restrictive to explain the behavior of different customers. We only mention in passing that in a medical context, similar questions arise when one wants to analyze the impact of a treatment or other impulses on both the short and long run. n such a complicated situation there is little reason to assume that λ t is multiplicative or that the driving processes are stationary. Worse than that, Λ will not be predictable (or even adapted) w.r.t. the filtration σ(n s : s t) but take respect of external effects. Generally, the model is not dominated so that likelihood methods don t apply. Summarizing, to obtain acceptable models for real life point processes, one often needs to accept intensity processes which combine the internal history of the process with external shocks to the effect that the model is no longer dominated and straightforward likelihood methods don t exist allow for jumps which typically are followed by special patterns like, e.g., shot noise effects the relevant filtration is strictly larger than the internal history of the process. Let F t, t, denote the filtration containing the information up to time t. n this paper the notion of selfexciting will be used in a very general way to describe any filtered adapted point process N. n particular, in most cases of interest F t strictly includes σ(n s : s t).
6 6 KA KOPPERSCHMDT and WNFRED STUTE As a final comment, the simple notation λ t only expresses the dependence on time. This is only for ease of notation. As outlined above, in our approach, λ t may also depend on previous values N(s), s < t, or external measurements taken before t. Typically, this part has a nonparametric flavor. n addition, there may be unknown parameters which are part of a function connecting the nonparametric input processes. At the end of the day, λ t resp. Λ t will constitute a flexible semiparametric model containing an unknown multivariate parameter ϑ. t is the aim of this paper to provide a methodology for estimating the parameter of interest when complicated input processes are present. As indicated above, since dominance cannot be guaranteed, we shall not dwell on likelihood but stick to the fact that N t Λ t is a (local) martingale with a possibly complicated dynamics. Remark 1. The main focus of this paper is on statistical inference of the parametric part in i.i.d. copies of a point process with complicated dynamics. There is also a rich literature on estimating parameters when only one point process is available, i.e., when n = 1. f we denote with = [0, T ] the observational period, increasing information is then coming in from letting T go to infinity. Proposition 3.23 in Karr (1986) is a prototype of such results. t yields asymptotic normality for the intensity in a simple Poisson Process, as T. mportant extensions are, e.g., due to Ogata (1978). Rathbun (1996), Schoenberg (2005) and Waagepetersen and Guan (2009), who extended the Poisson case to spatiotemporal processes satisfying some stationarity and mixing conditions. For many applications, however, T is fixed so that rather than extending the observational period i.i.d. copies of N need to be sampled. One advantage of this latter approach seems to be that complicated dynamics may be handled without requiring stationarity or mixing. 2. Main Results Let N = N(t), t, be a point process over a compact interval = [t, t]. For each t, let F t be the σfield of events observable prior to t. Since we assume that N is observable we have σ(n(s) : s t) F t.
7 Self Exciting Point Processes 7 Moreover, the filtration F s, s, is increasing. n practice, at time t, there will be more variables or processes other than N(s), s t, known to us and which are likely to have some impact on future values N(s), s > t, of N. To make our approach as wide as possible applicable, we don t make assumptions which usually turn out to be technically very convenient, such as independence of increments, stationarity or a Markovian property. n such a general situation it will be difficult if not impossible to compute exact distributions. A powerful methodology then is to incorporate the compensator Λ of N. Hence Λ is a predictable process such that M t = N t Λ t, t, is a zeromean (local) martingale w.r.t. F t, t. The process Λ t serves as a predictor process for N t. n particular, if N t is integrable, EN t = EΛ t. From time to time, we assume that dλ t = λ t dt. Here λ t, t, is a predictable stochastic process, the hazard or intensity process, with possible jumps. The λprocess takes its values in the space of lefthand continuous functions with right hand limits. This is the predictable analog of the better known Skorokhod space D(). See Billingsley (1968). To allow for a flexible modelling we may expect some nonparametric components for Λ as well as a parametric part. This offers a flexible model to the practitioner. Hence our approach is semiparametric with essentially no distributional conditions, except for some weak smoothness assumptions. Let ϑ Θ R d denote the parameter of interest, and let M = {Λ ϑ : ϑ Θ} be the associated model for the cumulative hazard process, the dependence on the nonparametric part being suppressed throughout this section. n the following
8 8 KA KOPPERSCHMDT and WNFRED STUTE we let ϑ 0 be the true parameter, i.e., Λ = Λ ϑ0 for some ϑ 0 Θ. Hence the associated innovation martingale equals M = N Λ ϑ0. t is the purpose of this work to provide some methodology on how to estimate ϑ 0. This estimate takes into account n independent replicates of N, say N 1,..., N n. f each N i is a simple Single Event process, our results take on a special form for empirical distributions. For general selfexciting processes our approach yields a contribution to the analysis of dynamic functional (point process) data. For further motivation, let Λ ϑ,i, for 1 i n and ϑ Θ, be the individual cumulative hazard processes. The unknown parameter ϑ 0 Θ is the same for all 1 i n and hence is intended to describe effects which are equal among the group. The remaining components typically differ from i to i and are thus individual. Usually they are random, but their distributional properties remain unspecified and may be complicated. As a consequence, likelihood methods are unlikely to be applicable. Rather, we apply a minimum distance method which yields robust consistent estimates of ϑ 0 and does not require additional distributional assumptions. As we shall see the DoobMeyer decomposition and modelling of N through the hazard process is an adequate tool for the estimation of ϑ 0. We only remark in passing, that for the empirical distribution function, most approaches are based on modelling of the density of F so that statistical procedures mostly consist in minimizing the distance between F and F ϑ and not Λ ϑ. We now formulate our main result. To begin with, let N 1,..., N n be i.i.d. copies of N observed over. For each 1 i n, let F i (t), t, be an increasing filtration comprising the relevant information about N i. n particular, σ(n i (s), s t) F i (t). For interesting applications, the inclusion will be strict. Let Λ ϑ,i with ϑ Θ R d be a given class of semiparametric cumulative intensities such that the true Λ i of N i satisfies Λ i = Λ ϑ0,i for some ϑ 0 Θ. Let µ be a finite measure on. f f and g are two square integrable functions
9 w.r.t. µ, we set f, g µ = fgdµ with corresponding seminorm f µ = Self Exciting Point Processes 9 1/2 f 2 (t)µ(dt). Our first f will be the difference between the aggregated point process and the aggregated compensator: N n = 1 n n i=1 N i Λϑ,n = 1 n n Λ ϑ,i. Hence n N n is the point process obtained by pooling the points in the individual processes. The associated innovation martingale M n is given through d M n = d N n d Λ ϑ0,n. f, for µ, we take µ = N n, the quantity i=1 N n Λ ϑ,n Nn represents an overall measure of fit of Λ ϑ,n to N n. Our final estimator of ϑ 0 will be the parameter minimizing the above distance: ϑ n = arg inf ϑ Θ N n Λ ϑ,n Nn. Other measures respectively cumulative functions which will appear later are Λ ϑ and EΛ ϑ, where throughout the paper we assume without further mentioning that EN( t) < and EΛ ϑ ( t) < for each ϑ Θ. Our first result shows that under weak identifiability and smoothness conditions, ϑ n will be a strongly consistent estimator of ϑ 0. Theorem 1. Let Θ R d be a bounded open set and suppose that for each ε > 0 inf EΛ ϑ 0 EΛ ϑ EΛϑ0 > 0. (2.1) ϑ ϑ 0 ε
10 10 KA KOPPERSCHMDT and WNFRED STUTE The process (t, ϑ) Λ ϑ (t) is continuous with probability one (2.2) and admits a continuous extension to Θ c, where Θ c is the closure of Θ. Then we have lim ϑ n = ϑ 0 with probability one. n Remark 2. Condition (2.1) is a weak identifiability condition. Since ϑ EΛ ϑ admits a continuous extension to Θ c, the infimum in (2.1) is attained, and (2.1) is equivalent to the condition EΛ ϑ0 (t) EΛ ϑ (t) for all ϑ ϑ 0 and t ϑ, where ϑ is a set of t s with µ( ϑ ) > 0. Here µ is the measure with cumulative function EΛ ϑ0. As to (2.2), in our applications Λ ϑ will have a (random) Lebesgue density λ ϑ with values in an appropriate Skorokhod space. This guarantees continuity (but not differentiability) of Λ ϑ in t on the one hand and allows for unexpected jumps in the intensity function λ ϑ as well. The boundedness of Θ was assumed only for convenience. The assertion of Theorem 1 also holds if Λ has a continuous extension to the compactification of Θ in the extended Euclidean space. For our second result, we need to assume that ϑ Λ ϑ (t) is a twice continuously differentiable function in a neighborhood of ϑ 0. This is not only a technical assumption, but first and second order derivatives will explicitly appear in the distributional approximation of ϑ n (see Theorem 2). Define Φ 0 (ϑ) = ϑ a matrixvalued function. (EΛ ϑ (t) EΛ ϑ0 (t)) E ϑ Λ ϑ(t) T EΛ ϑ0 (dt), Here and in the following T denotes transposition. Under standard regularity conditions (see (2.4) in Theorem 2 below) we may
11 Self Exciting Point Processes 11 interchange differentiation and integration to come up with Φ 0 (ϑ) = E ϑ Λ ϑ(t)e ϑ Λ ϑ(t) T EΛ ϑ0 (dt) + (EΛ ϑ (t) EΛ ϑ0 (t)) E 2 ϑ 2 Λ ϑ(t) T EΛ ϑ0 (dt). At ϑ = ϑ 0, the second integral vanishes so that Φ 0 (ϑ 0 ) = E ϑ Λ ϑ(t)e ϑ Λ ϑ(t) T EΛ ϑ0 (dt). (2.3) ϑ=ϑ0 The matrix inside the integral is nonnegative definite for each t. f, on a set of positive EΛ ϑ0 measure, these matrices are positive definite, so is Φ 0 (ϑ 0 ). nterestingly, this matrix may be viewed as an adaptation of the Fisher information matrix to the area of selfexciting point processes. Theorem 2. Assume that the conditions (2.1) and (2.2) of Theorem 1 are satisfied. Furthermore, assume that ϑ (EΛ ϑ(t) EΛ ϑ0 (t)) E ϑ Λ ϑ(t) T C(t) (2.4) for all ϑ in a neighborhood of ϑ 0, where the majorant function C is integrable w.r.t. EΛ ϑ0. The vectorvalued function φ(x) = E ϑ Λ ϑ(t)eλ ϑ0 (dt), t x t (2.5) ϑ=ϑ0 [x, t] is square integrable w.r.t. EΛ ϑ0. Then we have n 1/2 Φ 0 (ϑ 0 )(ϑ n ϑ 0 ) = n 1/2 E ϑ Λ ϑ(t)eλ ϑ0 (dt) M n (dx) + o P (1) ϑ=ϑ0 n 1/2 [x, t] φ(x) M n (dx) + o P (1). (2.6) Under (2.5), the integral in (2.6) is a sum of centered i.i.d. squareintegrable random vectors to which the Central Limit Theorem may be applied. The following Corollary yields the corresponding limit.
12 12 KA KOPPERSCHMDT and WNFRED STUTE Corollary 1. Under the assumptions of Theorem 2, we have, as n, n 1/2 Φ 0 (ϑ 0 )(ϑ n ϑ 0 ) N d (0, C(ϑ 0 )) in distribution, where C(ϑ 0 ) is a d d matrix with entries C ij (ϑ 0 ) = φ i (x)φ j (x)eλ ϑ0 (dx) and φ = (φ 1,..., φ d ) T. The matrix Φ 0 (ϑ 0 ) is symmetric and nonnegative definite. f it s nonsingular we get, as n, n 1/2 (ϑ n ϑ 0 ) N d (0, Σ) in distribution, where Σ = Φ 1 0 (ϑ 0)C(ϑ 0 )Φ 1 0 (ϑ 0). n the classical case, when we make single i.i.d. measurements in Euclidean spaces rather than observing complicated point processes, Φ 1 0 and C cancel out for the maximum likelihood estimator. n the point process situation studied in the present paper, this need no longer be the case. Note that compared with previous results in the literature Theorem 2 holds under surprisingly mild regularity assumptions. Remark 3. The i.i.d. representation (2.6) is also extremely useful when we plan to derive some goodnessoffit tests for H 0 : Λ 0 M, where M is a semiparametric model for the true Λ 0. Since this is beyond the scope of the paper, we only sketch the arguments. The basic test process is the process t n 1/2 [N t ˆΛ t ], where ˆΛ is taken from M with estimated parameter ϑ n. Under H 0, this process has the expansion n 1/2 [N Λ 0 ] n 1/2 [ˆΛ Λ 0 ]. (2.7)
13 Self Exciting Point Processes 13 The first part of (2.7) is a standardized martingale, while under the smoothness assumptions of Theorem 2 and because of (2.6), Taylorexpansion yields an i.i.d. representation of the second part of (2.7). For aggregated Single Event processes, i.e., for empirical processes, this decomposition was studied by Durbin (1973), and for regression by Stute (1997). Estimation of unknown parameters typically changes the distributional character of the test process. Stute et al. (1998) discusses how in the regression case a martingale transformation may be applied to come up with some asymptotically distributionfree tests. n principle, this may be done also in the context of this paper but will be studied in detail in future work. 3. A Real Data Example The main motivation for our work was to develop and analyze a model for purchase timing which is able to measure the impact of TVadvertising. n Kopperschmidt and Stute (2009) we gave a review of the most prominent timing models in market research. These models are unable to provide a dynamic framework for advertising effects. n the model to be discussed in this paper this is possible if we are willing to model Λ t respectively λ t in a more flexible way. We now start with the modelling of λ t in a seasonal market. The product of interest was a premium brand of packaged ice cream on the German market. Our λ t will comprise three major effects: seasonality effects the internal purchase history advertising effects The seasonality effect is assumed to be the same for all households. t will serve as a baseline and depends on four parameters α, β, γ and δ through λ 1 (t) = α sin(βt + γ) + δ. δ is a basic consumption rate which is the same throughout the year. The rest is a properly shifted sinus curve which is assumed to have a peak in the months June September. The parameters α and β denote the amplitude and the frequency
14 14 KA KOPPERSCHMDT and WNFRED STUTE of this periodic component. Prior to a deeper analysis we expect that due to seasonal effects δ > α > 0, β 2π 365 and γ π 2. Next we model the contribution of the past purchases. For household i, we denote with Y i1 < Y i2 <... < Y iki the ordered purchase times. For some households k i = 0. Through our approach this information was not neglected but properly incorporated into our analysis. The quantity Y ini (t ) is the time of the last purchase before t. Hence t Y ini (t ) is the age of the system, i.e., the time elapsed between the last purchase and t. One may assume that the intensity increases as t passes by and the consumer s stock of ice cream diminishes. The speed for recovery of the purchase inclination may be controlled by some parameter ε > 0 which enters into some λ i 2 (t) in the following form: ( ) λ i 2(t) = 1 e ε(t Y in (t )) i 1 {t>yi1 }. Our final component will be W i (t) λ i 3(t) = ξ e η(t Xih). h=1 Here W i (t) is the number of advertising contacts observed by household i up to time t. The advertising times are X i1 < X i2 <.... Hence t X ih denotes the time elapsed between t and the h th advertising contact. The parameter η is the socalled adstock parameter. f η < 0 is small, this would imply that the impact of advertising rapidly decreases as time passes by. The parameter ξ is the final parameter of interest. t measures the mean impulse of an advertising contact at the moment the message is received. At each X ih there is a jump of size ξ followed by a decay. Such effects are usually called shot noise. Hence λ i 3 (t) is a superposition of W i (t) of these effects. Our final λ ϑ,i will be λ ϑ,i = λ 1 λ i 2 + λ i 3. We used a multiplicativeadditive version for λ ϑ,i since through this the advertising effect could be better isolated from the rest. For the parameter ϑ, we of course have ϑ = (α, β, γ, δ, ε, η, ξ). Our next comment is on the data set. t stems from a Single Source Panel of AC Nielsen, Germany. We only present those aspects which suffice to un
15 derstand our application. Self Exciting Point Processes 15 Until December 2006, AC Nielsen equipped several thousand households of the panel during the time of their participation with technical devices that allow for the capture of their purchase and TV behavior. Each household provided information on purchases of arbitrary fastmoving consumer goods on a daily basis. This was achieved by scanning the barcodes at home. n our present analysis the amount of money spent was neglected. Each panel household was also provided with a device by which we could find out whether and when an advertising spot was watched. This gave us the information contained in W i and hence in λ i 3. The statistical analysis was based on 1660 households, 1375 of which purchased at least once. The households were observed over a period of 546 days. Hence = [0, 546]. Only 27 households did not watch a single TV advertisement for packaged ice cream throughout the 18 months. The whole sample was divided into 11 subgroups of size according to sociodemographic factors income and size of household. n the following we present the results of our analysis for two of these 11 subgroups differing in household size and monthly income. Group 1 2 HH Size 4 2 ncome (Euro) < n α n β n γ n δ n ε n ξ n η n We see that the plausibility check β 2π was fulfilled for both groups. The model also recognized the annual seasonality with a seasonal peak in the summer. small. ξ and η. The δ s being in charge of an offseasonal demand are The most important parameters for the producer of the product were nterestingly enough, the adstock parameter η was always close to
16 16 KA KOPPERSCHMDT and WNFRED STUTE 0.5. The parameter ξ describing the strength of the advertising act differed among the groups. Typically households with children showed the tendency of having a smaller reaction parameter ξ (ξ n = ) compared with small but richer households (ξ n = ). This may be due to the fact that the product of interest was a premium brand so that families with children though attracted by the advertisement showed a tendency to buy cheaper products from the discounter. 4. Proofs First we introduce some notation. For ε > 0 we let B ε (ϑ 0 ) = {ϑ : ϑ ϑ 0 < ε} be the open εball with center ϑ 0. Furthermore, for ε > 0 and r > 0, let B ε r(ϑ) = { ϑ Θ c \ B ε (ϑ 0 ) : ϑ ϑ < r } be the part of the rball with center ϑ, which does not belong to B ε (ϑ 0 ). n the following lemmas, let ϑ Θ be a given but fixed parameter. Lemma 1. Under the assumptions of Theorem 1, we have with probability one: { } 1 n lim N i Λ ϑ n n,i 2 N i = E sup N Λ ϑ 2 N (4.1) ϑ Br(ϑ) ε lim n i=1 1 n(n 1) lim n sup ϑ B ε r(ϑ) i j lim n sup ϑ B ε r(ϑ) 1 n(n 1) i j { = E 1 n(n 1)(n 2) { = E N i Λ ϑ,i 2 N j = E sup ϑ B ε r(ϑ) { } sup N 1 Λ ϑ,1 2 N 2 ϑ Br(ϑ) ε (4.2) N i Λ ϑ,i, N j Λ ϑ,j Ni (4.3) } sup N 1 Λ ϑ,1, N 2 Λ ϑ,2 N1 ϑ Br(ϑ) ε i j k sup N i Λ ϑ,i, N j Λ ϑ,j Nk (4.4) ϑ Br ε(ϑ) } sup ϑ B ε r (ϑ) N 1 Λ ϑ,1, N 2 Λ ϑ,2 N3
17 Self Exciting Point Processes 17 Proof. The four statements are immediate consequences of (5.1), our strong law for Ustatistics of point processes. f, in the previous lemma, we replace the sup over ϑ Br(ϑ) ε by ϑ, we obtain the limit results as formulated in the following lemma. Proofs again are based on (5.1). Lemma 2. With probability one, lim n 1 lim n n n i=1 1 n(n 1) N i Λ ϑ,i 2 N i = E { N Λ ϑ 2 } N N i Λ ϑ,i 2 N j = E { N 1 Λ ϑ,1 2 } N 2 i j (4.5) (4.6) lim n 1 n(n 1) N i Λ ϑ,i, N j Λ ϑ,j Ni = E { N 1 Λ ϑ,1, N 2 Λ ϑ,2 N1 } (4.7) i j lim n 1 n(n 1)(n 2) N i Λ ϑ,i, N j Λ ϑ,j Nk (4.8) i j k = E { N 1 Λ ϑ,1, N 2 Λ ϑ,2 N3 } n the following lemma we compute the limit in (4.8). Lemma 3. We have E { N 1 Λ ϑ,1, N 2 Λ ϑ,2 N3 } = EΛ ϑ0 EΛ ϑ 2 EΛ ϑ0. (4.9) Proof. The expectation of the lefthand side equals E (N 1 (t) Λ ϑ,1 (t)) (N 2 (t) Λ ϑ,2 (t)) N 3 (dt). Conditionally on N 3, the expectation equals, by independence of N 1 Λ ϑ,1 and N 2 Λ ϑ,2 and by Fubini s Theorem, E 2 [N 1 (t) Λ ϑ,1 (t)] N 3 (dt) = Now integrate out to get the result. [EΛ ϑ0 (t) EΛ ϑ (t)] 2 N 3 (dt).
18 18 KA KOPPERSCHMDT and WNFRED STUTE n our next result we study local fluctuations of the process ϑ (N 1 (t) Λ ϑ,1 (t)) (N 2 (t) Λ ϑ,2 (t)) N 3 (dt). (4.10) Lemma 4. For given ε > 0 and δ > 0, and for each ϑ Θ c, there is a small but positive r > 0 such that { } E sup N 1 Λ ϑ,1, N 2 Λ ϑ,2 N3 inf EΛ ϑ Br(ϑ) ε ϑ Br(ϑ) ε ϑ 0 EΛ ϑ 2 EΛ ϑ0 δ (4.11) and E { inf ϑ Br ε(ϑ) N 1 Λ ϑ,1, N 2 Λ ϑ,2 N3 } EΛ ϑ0 EΛ ϑ 2 EΛ ϑ0 δ (4.12) Proof. By assumption, the process (4.10) is continuous in ϑ. Hence sup ϑ B ε r (ϑ) N 1 Λ ϑ,1, N 2 Λ ϑ,2 N3 N 1 Λ ϑ,1, N 2 Λ ϑ,2 N3 as r 0. By monotone convergence, the expectation of the lefthand side converges to by Lemma 3. Similarly, E { N 1 Λ ϑ,1, N 2 Λ ϑ,2 N3 } = EΛ ϑ0 EΛ ϑ 2 Λ ϑ0, inf ϑ B ε r(ϑ) EΛ ϑ 0 EΛ ϑ 2 EΛ ϑ0 EΛ ϑ0 EΛ ϑ 2 EΛ ϑ0 as r 0. This proves the first part of the lemma. The second part is shown in a similar way. Lemma 5. For each fixed ϑ Θ, we have N n Λ ϑ,n 2 Nn EΛ ϑ0 EΛ ϑ 2 EΛ ϑ0 with probability one. n particular, for ϑ = ϑ 0, the limit equals zero. Proof. We have N n Λ [ ϑ,n 2 Nn = Nn (t) Λ ϑ,n (t) ]2 Nn (dt) n = n 3 N i Λ ϑ,i, N j Λ ϑ,j Nk. i,j,k=1
19 Self Exciting Point Processes 19 By (4.5) (4.7), the last triple sum is dominated by the sum restricted to i j k i. The limit follows from (5.1), (4.8) and Lemma 3. Lemma 6. For each ε > 0 we have, with probability one, inf N n Λ ϑ,n Nn inf EΛ ϑ 0 EΛ ϑ EΛϑ0. ϑ: ϑ ϑ 0 ε ϑ: ϑ ϑ 0 ε Proof. For a given ε > 0, let δ > 0 be an arbitrary positive number. According to Lemma 4, there exists, for each ϑ Θ c, a positive radius r = r(ϑ) > 0 such that (4.11) and (4.12) are satisfied. Then ϑ Θ c \B ε(ϑ 0 ) B ε r(ϑ) (ϑ) = Θc \ B ε (ϑ 0 ) is an open covering of the compact set Θ c \ B ε (ϑ 0 ). We may select finitely many ϑ 1,..., ϑ q so that Θ c \ B ε (ϑ 0 ) is already covered by the B ε r l (ϑ l ), where r l = r(ϑ l ). Conclude that inf N n Λ ϑ,n 2 Nn inf EΛ ϑ 0 EΛ ϑ 2 EΛ ϑ: ϑ ϑ 0 ε ϑ: ϑ ϑ 0 ε ϑ0 = min inf N n Λ ϑ,n 2 Nn min inf EΛ 1 l q ϑ Br ε (ϑ l l ) 1 l q ϑ Br ε ϑ 0 EΛ ϑ 2 EΛ (ϑ l l ) ϑ0 max inf 1 l q N n Λ ϑ,n 2 Nn inf EΛ ϑ Br ε (ϑ l l ) ϑ Br ε ϑ 0 EΛ ϑ 2 EΛ (ϑ l l ) ϑ0. t suffices to prove that for each 1 l q, with probability one, the limsup of the term in brackets in absolute value is less than or equal to δ. But inf N n Λ ϑ,n 2 Nn n 3 ϑ Br ε (ϑ l l ) i,j,k sup ϑ B ε r l (ϑ l ) N i Λ ϑ,i, N j Λ ϑ,j Nk which again is dominated by the subsum pertaining to i j k i. With Lemma 1 this converges to { E } sup N 1 Λ ϑ,1, N 2 Λ ϑ,2 N3. ϑ Br ε (ϑ l l ) By Lemma 4 and the choice of r l, this term is within δdistance of inf ϑ B ε rl (ϑ l ) EΛ ϑ0 EΛ ϑ 2 EΛ ϑ0. The corresponding lower bound may be obtained in a similar way. This concludes the proof of the lemma.
20 20 KA KOPPERSCHMDT and WNFRED STUTE Proof of Theorem 1. Let ε > 0 be given. We need to show that ( ) P lim sup { ϑ n ϑ 0 ε} n = 0. But { } { ϑ n ϑ 0 ε} inf N n Λ ϑ,n Nn < inf N n Λ ϑ,n Nn ϑ: ϑ ϑ 0 ε ϑ: ϑ ϑ 0 <ε { } inf N n Λ ϑ,n Nn < N n Λ ϑ0,n Nn. ϑ: ϑ ϑ 0 ε From Lemma 6, the lefthand side goes to inf EΛ ϑ 0 EΛ ϑ EΛϑ0, ϑ: ϑ ϑ 0 ε which is positive by (2.1). At the same time, the righthand side tends to EΛ ϑ0 EΛ ϑ0 EΛϑ0 = 0. This proves Theorem 1. The following lemma will play a crucial role in the proof of Theorem 2. t expresses certain point process integrals in terms of the associated innovation martingale M n. Lemma 7. We have [ Λϑ,n (t) Λ ϑ0,n(t) ] ϑ Λ ϑ,n (t) N n (dt) ϑ=ϑn (4.13) = M n (t) ϑ Λ ϑ,n (t) M n (dt) ϑ=ϑn + M n (t) ϑ Λ ϑ,n (t) Λ ϑ0,n(dt) ϑ=ϑn. Proof. The equation is an immediate consequence of the identity d N n = d M n + d Λ ϑ0,n (4.14) and the fact, that ϑ n minimizes N n Λ ϑ,n 2 Nn in ϑ and is in the inner open set of Θ, whence [ Nn (t) Λ ϑ,n (t) ] ϑ Λ ϑ,n (t) N n (dt) = 0 at ϑ = ϑ n.
How to Use Expert Advice
NICOLÒ CESABIANCHI Università di Milano, Milan, Italy YOAV FREUND AT&T Labs, Florham Park, New Jersey DAVID HAUSSLER AND DAVID P. HELMBOLD University of California, Santa Cruz, Santa Cruz, California
More informationTHE PROBLEM OF finding localized energy solutions
600 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 45, NO. 3, MARCH 1997 Sparse Signal Reconstruction from Limited Data Using FOCUSS: A Reweighted Minimum Norm Algorithm Irina F. Gorodnitsky, Member, IEEE,
More informationIf You re So Smart, Why Aren t You Rich? Belief Selection in Complete and Incomplete Markets
If You re So Smart, Why Aren t You Rich? Belief Selection in Complete and Incomplete Markets Lawrence Blume and David Easley Department of Economics Cornell University July 2002 Today: June 24, 2004 The
More informationIEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 4, APRIL 2006 1289. Compressed Sensing. David L. Donoho, Member, IEEE
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 4, APRIL 2006 1289 Compressed Sensing David L. Donoho, Member, IEEE Abstract Suppose is an unknown vector in (a digital image or signal); we plan to
More informationSampling 50 Years After Shannon
Sampling 50 Years After Shannon MICHAEL UNSER, FELLOW, IEEE This paper presents an account of the current state of sampling, 50 years after Shannon s formulation of the sampling theorem. The emphasis is
More informationControllability and Observability of Partial Differential Equations: Some results and open problems
Controllability and Observability of Partial Differential Equations: Some results and open problems Enrique ZUAZUA Departamento de Matemáticas Universidad Autónoma 2849 Madrid. Spain. enrique.zuazua@uam.es
More informationA Googlelike Model of Road Network Dynamics and its Application to Regulation and Control
A Googlelike Model of Road Network Dynamics and its Application to Regulation and Control Emanuele Crisostomi, Steve Kirkland, Robert Shorten August, 2010 Abstract Inspired by the ability of Markov chains
More informationA direct approach to false discovery rates
J. R. Statist. Soc. B (2002) 64, Part 3, pp. 479 498 A direct approach to false discovery rates John D. Storey Stanford University, USA [Received June 2001. Revised December 2001] Summary. Multiplehypothesis
More informationLevel sets and extrema of random processes and fields. JeanMarc Azaïs Mario Wschebor
Level sets and extrema of random processes and fields. JeanMarc Azaïs Mario Wschebor Université de Toulouse UPS CNRS : Institut de Mathématiques Laboratoire de Statistiques et Probabilités 118 Route de
More informationDecoding by Linear Programming
Decoding by Linear Programming Emmanuel Candes and Terence Tao Applied and Computational Mathematics, Caltech, Pasadena, CA 91125 Department of Mathematics, University of California, Los Angeles, CA 90095
More informationFoundations of Data Science 1
Foundations of Data Science John Hopcroft Ravindran Kannan Version /4/204 These notes are a first draft of a book being written by Hopcroft and Kannan and in many places are incomplete. However, the notes
More informationFoundations of Data Science 1
Foundations of Data Science John Hopcroft Ravindran Kannan Version 2/8/204 These notes are a first draft of a book being written by Hopcroft and Kannan and in many places are incomplete. However, the notes
More informationON THE DISTRIBUTION OF SPACINGS BETWEEN ZEROS OF THE ZETA FUNCTION. A. M. Odlyzko AT&T Bell Laboratories Murray Hill, New Jersey ABSTRACT
ON THE DISTRIBUTION OF SPACINGS BETWEEN ZEROS OF THE ZETA FUNCTION A. M. Odlyzko AT&T Bell Laboratories Murray Hill, New Jersey ABSTRACT A numerical study of the distribution of spacings between zeros
More informationSubspace Pursuit for Compressive Sensing: Closing the Gap Between Performance and Complexity
Subspace Pursuit for Compressive Sensing: Closing the Gap Between Performance and Complexity Wei Dai and Olgica Milenkovic Department of Electrical and Computer Engineering University of Illinois at UrbanaChampaign
More informationSpaceTime Approach to NonRelativistic Quantum Mechanics
R. P. Feynman, Rev. of Mod. Phys., 20, 367 1948 SpaceTime Approach to NonRelativistic Quantum Mechanics R.P. Feynman Cornell University, Ithaca, New York Reprinted in Quantum Electrodynamics, edited
More informationAnalysis of dynamic sensor networks: power law then what?
Analysis of dynamic sensor networks: power law then what? (Invited Paper) Éric Fleury, JeanLoup Guillaume CITI / ARES INRIA INSA de Lyon F9 Villeurbanne FRANCE Céline Robardet LIRIS / CNRS UMR INSA de
More informationONEDIMENSIONAL RANDOM WALKS 1. SIMPLE RANDOM WALK
ONEDIMENSIONAL RANDOM WALKS 1. SIMPLE RANDOM WALK Definition 1. A random walk on the integers with step distribution F and initial state x is a sequence S n of random variables whose increments are independent,
More informationIEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, 2013. ACCEPTED FOR PUBLICATION 1
IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, 2013. ACCEPTED FOR PUBLICATION 1 ActiveSet Newton Algorithm for Overcomplete NonNegative Representations of Audio Tuomas Virtanen, Member,
More informationIntroduction to Stochastic ActorBased Models for Network Dynamics
Introduction to Stochastic ActorBased Models for Network Dynamics Tom A.B. Snijders Gerhard G. van de Bunt Christian E.G. Steglich Abstract Stochastic actorbased models are models for network dynamics
More informationA point process model for the dynamics of limit order books
A point process model for the dynamics of limit order books Ekaterina Vinkovskaya Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Graduate School of Arts
More informationExplaining Cointegration Analysis: Part II
Explaining Cointegration Analysis: Part II David F. Hendry and Katarina Juselius Nuffield College, Oxford, OX1 1NF. Department of Economics, University of Copenhagen, Denmark Abstract We describe the concept
More informationScalable Collaborative Filtering with Jointly Derived Neighborhood Interpolation Weights
Seventh IEEE International Conference on Data Mining Scalable Collaborative Filtering with Jointly Derived Neighborhood Interpolation Weights Robert M. Bell and Yehuda Koren AT&T Labs Research 180 Park
More informationProbability in High Dimension
Ramon van Handel Probability in High Dimension ORF 570 Lecture Notes Princeton University This version: June 30, 2014 Preface These lecture notes were written for the course ORF 570: Probability in High
More informationBeyond Correlation: Extreme Comovements Between Financial Assets
Beyond Correlation: Extreme Comovements Between Financial Assets Roy Mashal Columbia University Assaf Zeevi Columbia University This version: October 14, 2002 Abstract This paper investigates the potential
More informationAn Introduction to the NavierStokes InitialBoundary Value Problem
An Introduction to the NavierStokes InitialBoundary Value Problem Giovanni P. Galdi Department of Mechanical Engineering University of Pittsburgh, USA Rechts auf zwei hohen Felsen befinden sich Schlösser,
More informationRegression. Chapter 2. 2.1 Weightspace View
Chapter Regression Supervised learning can be divided into regression and classification problems. Whereas the outputs for classification are discrete class labels, regression is concerned with the prediction
More informationSteering User Behavior with Badges
Steering User Behavior with Badges Ashton Anderson Daniel Huttenlocher Jon Kleinberg Jure Leskovec Stanford University Cornell University Cornell University Stanford University ashton@cs.stanford.edu {dph,
More informationOPRE 6201 : 2. Simplex Method
OPRE 6201 : 2. Simplex Method 1 The Graphical Method: An Example Consider the following linear program: Max 4x 1 +3x 2 Subject to: 2x 1 +3x 2 6 (1) 3x 1 +2x 2 3 (2) 2x 2 5 (3) 2x 1 +x 2 4 (4) x 1, x 2
More informationIntroduction to partitioningbased clustering methods with a robust example
Reports of the Department of Mathematical Information Technology Series C. Software and Computational Engineering No. C. 1/2006 Introduction to partitioningbased clustering methods with a robust example
More informationGraphs over Time: Densification Laws, Shrinking Diameters and Possible Explanations
Graphs over Time: Densification Laws, Shrinking Diameters and Possible Explanations Jure Leskovec Carnegie Mellon University jure@cs.cmu.edu Jon Kleinberg Cornell University kleinber@cs.cornell.edu Christos
More information