Multivariate Extremes, Max-Stable Process Estimation and Dynamic Financial Modeling

Transcription

1 Multivariate Extremes, Max-Stable Process Estimation and Dynamic Financial Modeling by Zhengjun Zhang A dissertation submitted to the faculty of the University of North Carolina at Chapel Hill in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Department of Statistics. Chapel Hill 2002 Approved by Advisor: Professor Richard L. Smith Reader: Professor A. Ronald Gallant Reader: Professor Harry Hurd Reader: Professor Chuanshu Ji Reader: Professor Pranab K. Sen Reader: Professor Gordon Simons

3 ABSTRACT ZHENGJUN ZHANG: Multivariate Extremes, Max-Stable Process Estimation and Dynamic Financial Modeling. ( Under the direction of Professor Richard L. Smith ) Studies have shown time series data from finance, insurance and environment etc. are fat tailed and clustered when extremal events occur. In order to characterize such extremal processes, max-stable processes or min-stable processes have been proposed since the 980s and some probabilistic properties have been obtained, but the applications are very limited due to lack of efficient statistical estimation methods. In this work, some probabilistic properties of the processes are proved and a series of estimation procedures to estimate the underlying max-stable processes are proposed, i.e. multivariate maxima of moving maxima processes. The first proposed method is purely probabilistic. It is designed for the time series with only one signature pattern, which can be regarded as a clustering pattern. It gives true parameter values if the model is correct. The second proposed method is a two step estimating method. At the first step, the method gives exact parameter values within each signature pattern, then it estimates the proportions of different signature patterns in the process. Consistency and asymptotic properties for the estimators are proved. The third proposed method is a generalized version of the second one but is not tied with the data, i.e. the data are not assumed to follow the model exactly. It is practically applicable. Three variants of the third method are proposed. They are designed to provide more specific estimators for special cases of the model, such as symmetric, monotone and asymmetric data structure respectively. All the estimators have been proved to be consistent and asymptotically normal. To date, the exceedance over threshold approach which uses a generalized Pareto distribution(gpd) has been advocated. Assuming the population distribution belongs to the multivariate domains of attraction of multivariate extreme value distributions we develop threshold methods to estimate the parameters of the underlying max-stable process from the observed data. All previously developed six methods have their corresponding version under threshold methods. iii

4 How to manage a portfolio efficiently, with the highest expected return for a given level of risk, or equivalently, the least risk for a given level of expected return, is a key to the success or failure of a financial system. As an application of max-stable processes, financial time series data are standardized and transformed. The new time series are modelled as max-stable processes. The VaR ( Value at Risk ), maximal possible losses of portfolios under given confidence level, of portfolios are calculated and portfolio optimizations under VaR constraints are then studied. iv

5 ACKNOWLEDGEMENTS I am deeply indebted to Prof. Richard L. Smith, my advisor, for his stimulating suggestions, encouragement and complete support through all my time and research here at Chapel Hill. I would like to express my gratitude to my committee members, Professor A. Ronald Gallant, Professor Harry Hurd, Professor Chuanshu Ji, Professor Pranab K. Sen, Professor Gordon Simons, who gave me their full support in completing this thesis. A series of presentations given to a study group organized by Professor Smith and Professor Gallant increased my, along with other students, knowledge base at a time when I was beginning to investigate applications of my research. Professor Gallant suggested studying stock return data as an application of max-stable process modeling. Professor Sen suggested several extreme value theory references and put forth two theoretical considerations which, once I proved, formed the basis of several important theorems and examples in this thesis. The discussions I had with Professor Ji, Professor Simons and Professor Hurd helped me to better understand the problems I was studying and helped me make progress. I would also like to express my deep thanks to Professor Jianqing Fan, Professor Edward Carlstein, Professor Amarjit Budhiraja, Professor Paul Embrechts from ETH- Zentrum, Professor Ruey S. Tsay from the University of Chicago, Professor Robert Stine from the University of Pennsylvania and Dr. Dougal Goodman from the Foundation for Science and Technology of the United Kingdom for their comments and suggestions. Given that part of results were developed in the Newton Institute for Mathematical Science at Cambridge University, I would like to thank Professor H. K. Moffatt and the institute for their supporting and assisting me in participating the Managing Uncertainty Program. I have taken courses from almost all the professors in this department. I want to thank them for teaching me and helping me develop my career. Professor Fan has given me constant supports in many ways. Professor Carlstein guided me in becoming a good teacher throughout the undergraduate courses I have taught while at Chapel Hill. I v

6 enjoyed my three years as computer coordinator, a job which gave me opportunities to help our faculty, staff and graduate students. Everything was beautiful at Chapel Hill. Especially, I would like to give my special thanks to my wife, Dr. Chunming Zhang, for her endless support and love. vi

7 Contents Introduction. General introduction Multivariate extreme value theory Extreme value theory for univariate random variables Limit laws of multivariate extremes Basic properties of multivariate extreme value distributions Subclasses of max-stable processes Probabilistic Properties of Multivariate Maxima of Moving Maxima Processes and Basic Estimation of Parameters 6 2. Introduction Extended properties Examples of M4 processes Case L = Case L > Estimation of weight parameters Estimation using independent observed values from a dependent sequence Estimation using the whole dependent sequence Results from discrete time Markov chain theory Approximation of two M4 processes Estimation Based on Bivariate Distribution Introduction Modeling time dependence Preliminary estimation vii

8 3.2.2 Asymptotics for the case of D = Asymptotics for a special case of L = and D > Simulation examples Modeling temporal and inter-serial dependence Inter-serial dependence Temporal and inter-serial dependence The estimators and asymptotics Simulation examples Weighted least squares estimation Threshold Methods Multivariate domains of attraction and non unit Fréchet margins Modeling Extreme Processes with Parametric Structures Symmetric geometric parameter structure model Case L = Case L > Asymmetric geometric parameter structure model Case L = Case L > Monotone parameter structure model Case L = Case L > Proof of Lemma Multivariate Extreme Value Analysis to Value at Risk 8 5. Introduction VaR methods Variance-Covariance approach Historical simulation approach Monte-Carlo simulation approach Extreme value approaches Optimal portfolio theories Dynamic financial data modeling Mean excess plot viii

9 5.5.2 Z-statistics and W -statistics GARCH(,) model Initial diagnostics Data transformation Model selection and parameter estimations VaR calculation and portfolio optimization Using variance-covariance approach Using M4 process approach Historical simulation approach Summary General discussion Directions of future research ix

10 List of Figures 2. Left figure is a time series plot of 50 observations of process Y td = max(a d Z t, a 0d Z t ) for some d. Right figure is a time series plot of 3000 Y observations of ratios td Y td +Y t+,d. The value of a 0d can be read from the right figure Left figure is a time series plot of 50 observations of process Y td = max(a d Z t, a 0d Z t, a d Z t+ ) for some d. Right figure is a time series Y plot of 3000 observations of ratios td Y td +Y t+,d +Y t+2,d. The value of a d can be read from the right figure A demo of multiple signature patterns Transition Diagram of the Constructed Markov Chain A demo of q(x) and its slope q (x) and ratio change points The left plot is the ratios of Y i Y i +Y i+ model (3.26). The right plot is the ratios of at the threshold level 0 under the Y i Y i +Y i+ at the threshold level 0 under the model (3.27) The left plot is the ratios around.3 with distance.0 at the threshold level 0 under the model (3.26). The right plot is the ratios.3 with distance.0 at the threshold level 0 under the model (3.27) A demo of two different bivariate processes. The blue curve are drawn from the original process. The red curve are drawn from the permuted process. q ddp (x) = x log(p (Y, Y 2 x)) These figures show that there are extreme observations and the greatest drop happened in the same day in all three time series, i.e. October 9, 987, the date of the Wall Street crash x

11 5.2 The mean excess plot usually suggests whether a extreme value distribution fitting is appropriate or not. The plot for the Pfizer data suggests extreme value distribution fitting since the plot is contained in its corresponding confidence interval. The other two are more doubtful since the plot goes outside the confidence bands, though further analysis shows that the extreme value approximation is reasonable in this case also W -plots show a generalized extreme value distribution fitting is appropriate. Some caution should be given since a few points, partly the result of Oct87 crash, are away from the straight line W -plots show a generalized extreme value distribution fitting is appropriate W -plots show a generalized extreme value distribution fitting is appropriate Estimated volatility Standardized series W -plots show a generalized extreme value distribution fitting is appropriate. Some caution should be given since a few points, partly the result of Oct87 crash, are away from the straight line W -plots show a generalized extreme value distribution fitting is appropriate W -plots show a generalized extreme value distribution fitting is appropriate The negative returns after transformed into unit Fréchet scale (i) All plots are based on Fréchet transformed exceedances of a high threshold based on negative log returns ( so they represent price drops, not price rises); (ii) the purpose of the plots is to look for dependence among neighboring values; (iii) the numbers in parentheses show expected and observed numbers of simultaneous exceedances by the two variables, where expected is calculated on an independence assumption Number of signature patterns L vs Err plot Simulated time series VaR for a portfolio of 3 stock products Expected returns for a portfolio of 3 stock products VaR for a portfolio of 3 stock products, standardized data xi

12 5.8 Expected returns for a portfolio of 3 stock products, standardized data VaR for a portfolio of 3 stock products Portfolio Optimization Using Extreme Value Theory Portfolio Optimization Using historical simulation approach Portfolio Optimization Using historical simulation approach xii

13 Chapter Introduction. General introduction Extreme events or rare events have major impacts (bad or good) on the real world. Imagine the major damage caused by a disaster hurricane or the impact of winning three million dollars in the lottery. Such rare events are part of our life. We must face, understand and study those phenomenon and problems caused by rare events. Indeed, the study of extreme events has become very important and drawn major attention in probability and statistical research. The extreme type theorems play a central role of the study of extreme value theory. In the literature, Fisher and Tippett (928) were the first who discovered the extreme type theorems and later these results were proved in complete generality by Gnedenko (943). Galambos (987), Leadbetter, Lindgren and Rootzén (983), and Resnick (987) are excellent reference books on the probabilistic aspect. Smith (990) gives a comprehensive account of statistical aspects, especially maximum likelihood methods in parameter estimation. A recent book by Embrechts, Klüppelberg, and Mikosch (997) gives an excellent viewpoint of modeling extremal events. The extreme type theorems say that for a sequence of i.i.d. random variables with suitable normalizing constants, the limiting distribution of maximum statistics, if it exists, follows one of three types of extreme value distributions. In the multivariate context, the maximum is taken componentwise and there is no specific parametric type of limiting distribution. However, there have been many attempts to characterize the possible limits, such as de Haan and Resnick (977), de Haan (985) and Resnick s (987) point process approach, and Pickands s (98) representation theorem for multivariate extreme value distribution with unit Fréchet margins. Some efforts have been devoted to extending

14 the i.i.d classical results to dependent sequences under some conditions such as stationary, mixing conditions etc. For instance, Leadbetter, et al (983) contains an abundant account of the theory of extreme values for dependent sequences, both stationary and non-stationary, as well as for stationary continuous time processes at a rigorous mathematical level. The extremal index, originated by Cartwright (958), Newell (964), Loynes (965), O Brien (974) and Leadbetter (983), is a quantity which allows one to associate the limiting distribution of a dependent sequence to the extreme value distributions. In a multivariate analog, Nandagopalan (990, 994) introduced the multivariate extreme index and derived some elementary properties. In the statistical aspects, the focus is on extremal events modeling, parameter estimation and testing of hypotheses. There are many applications to real problems. Among them, extreme value theory has been largely applied to environmental problems such as river flow, wind speed, sea level, temperature and rainfall, and insurance and finance (cf. Smith 990 and Embrechts, Klüppelberg, and Mikosch 997). To model extremal events in a univariate context, usually the generalized extreme value distribution is adopted. To date, the exceedance over threshold approach which uses a generalized Pareto distribution(gpd) (Pickands 975, Davison and Smith 990) and a fixed number of extreme order statistics approach (Weissman 978, Smith 986, Tawn 988, Robinson and Tawn 995, Smith 997) have been advocated. Although there are well-developed approaches to model univariate extremal processes, problems concerning the environment, finance and insurance etc. are multivariate in character: for example, floods may occur at different sites along a coastline; the failure of a portfolio management may be caused by a single extreme price movement or multiple movements. Here multivariate extreme modeling is essential for risk management and precision of modeling. What one needs is to choose or develop appropriate multivariate extreme value distributions which can be used in modeling multivariate extremal processes. As we mentioned earlier, multivariate extreme value distributions have no specific parametric form. Fortunately several models have been developed that are multivariate extreme value distributions. Among them, it is worth mentioning models such as Bivariate Logistic model by Marshall and Olkin (983), Bivariate Exponential model by Mardia (970), Asymmetric Logistic model by Tawn (990), Negative Asymmetric Logistic model by Joe (989), Dirichlet model by Coles and Tawn (99), Bilogistic model by Smith (990), Nested Logistic model by McFadden (978) and Time Series Logistic model by Coles and Tawn (99). These models are 2

15 listed in Coles and Tawn (99). Coles and Tawn (994) also demonstrate how statistical methods for multivariate extremes may be applied to a very practical problem of data analysis. In general, an multivariate distribution function characterizes the dependence structure within the random vector. It does not show the time dependent structure of the vector time series. Since multivariate extreme value distributions are in fact max-stable distributions ( Resnick 987 ), and the extreme values of an multivariate stationary process may be characterized in terms of a limiting max-stable process under quite general conditions ( Smith and Weissman 996 ), it is very natural to model extreme processes by max-stable processes. In this work, I mainly focus on proving probabilistic properties of a certain class of max-stable processes and proposing a series of estimation procedures of estimating the underlying max-stable process. The consistency and asymptotic properties of all estimators are proved. Applications of max-stable process in finance will be addressed. In the rest of this chapter we will give some background results. We will discuss extreme value theory for multivariate random variables in section.2. Also in section.2 extreme value theory for univariate random variables will be briefly reviewed since the marginal distributions of MEVD have to be univariate extreme value distributions. And multivariate maxima of moving maxima processes will be discussed in section.3. In chapter 2, we will study the properties of multivariate maxima of moving maxima processes, extend probability properties, propose statistical estimation methods for the parameters and prove consistency and asymptotics. Some examples will be given in this chapter to demonstrate the processes and the statistical aspects. In chapter 3, we will consider modeling multivariate maxima of moving maxima processes by using the bivariate joint distribution of the sequence of dependent random variables. We shall study estimation of parameters and consistent properties as well as asymptotics. Examples will be illustrated. In chapter 4, we will consider the parametric structure for multivariate maxima of moving maxima processes. Estimation of parameters, consistency and asymptotic properties will be addressed. In chapter 5, we first review literature on the applications of extreme value theory to finance and insurance. We will briefly review the definition of VaR and some typical calculation methods. Then extreme value approaches will be discussed. Finally we model multivariate extreme value distributions to multivariate financial time series 3

16 data and illustrate VaR calculation as well as portfolio optimizations. among extreme value approaches and other approaches will be illustrated also..2 Multivariate extreme value theory Comparison.2. Extreme value theory for univariate random variables Suppose X, X 2,..., X n are an i.i.d. sequence with distribution function F (x) and let Then M n has the distribution function M n = max(x, X 2,..., X n ). (.) Pr{M n x} = Pr{X x,, X n x} = F n (x). (.2) It is clear that the maximum of a sample simply tends to the right-hand endpoint of the distribution almost surely, no matter whether it is finite or infinite. Let X F be the right endpoint, since Pr{ M n X F > ɛ} = Pr{M n < X F ɛ} = Pr{X < X F ɛ} n n= = Pr{X <X F ɛ} Pr{X <X F ɛ} < and this shows M n n= n= a.s. X F. What we are interested in is the form of limits lim F n (a n x + b n ) = lim Pr( (M n b n ) x) = H(x) (.3) n n a n for suitable normalizing constants a n > 0, and b n. If (.3) holds, we say F (or X) belongs to the (maximum) domain of attraction of H and write F MDA(H) (or X MDA(H)). H has one of the following three parametric forms (which are generally called extreme value distributions) T ype I : H(x) = exp{ { exp( x)} ( < x < ) 0 if x 0, T ype II : H(x) = exp( x α ) if x > 0, { exp( ( x) α ) if x < 0, T ype III : H(x) = if x 0. In II and III, α is any positive number. Gumbel, Fréchet and Weibull types respectively. The three types are also often called the The following theorems are very useful in finding the MDA(H) of F and the suitable normalizing constants. The proofs of the theorems can be found in Leadbetter et al. (983), Resnick (987), Galambos (987), etc.. 4

17 Theorem. Let 0 τ and suppose that for suitable normalizing constants a n > 0 and b n, u n = u n (x) = x a n + b n such that n( F (u n )) τ as n (.4) then P (M n u n ) e τ as n (.5) Conversely, if (.5) holds for some τ, 0 τ, then (.4) holds. Theorem.2 Necessary and sufficient conditions for the distribution F belongs to the MDA of Type I: ( F (u))du <, 0 lim t X F F (t + xg(t)) F (t) = e x for all real x, where for t < X F. g(t) = XF t ( F (u))du F (u) Type II: X F = and α > 0, for each x > 0. F (tx) lim t F (t) = x α Type III: X F < and F (X F xh) lim h 0 F (X F h) = xα α > 0, for each x > 0. Some other theoretical results may be very useful for finding the MDA(H) of F and finding the normalizing constants. Those results and examples whose distributions belong to each of the three domains of attraction can be found in Leadbetter et al. (983), Resnick (987), Galambos (987), etc.. As a simple example, we consider now the Pareto distribution F (x) = κx α, α > 0, κ > 0, x κ /α. 5

18 We have F (tx) F (t) = (tx) α t α = x α so F belongs to MDA of a Type II extreme value distribution. By setting n( F (u n )) = τ we have u n = (κn/τ) /α. By putting τ = x α for x 0, we have P {(κn) /α M n x} exp( x α ) so a n = (κn) /α, b n = 0. The extreme value distributions are max-stable distributions. We say a non-degenerate distribution H is max-stable, if H n (a n x + b n ) = H(x) holds for some constants a n > 0, and b n for each n = 2, 3,.... The next result (Theorem.4. in Leadbetter et al. 983) shows the relation. Theorem.3 Every max-stable distribution is of extreme value type, i.e. equal to H(ax + b) for some a > 0 and b; Conversely, each distribution of extreme value type is max-stable. The three types of extreme value distributions can be written into a generalized extreme value (GEV) distribution form (which is very useful for statistical purposes) H(x; µ, σ, ξ) = exp{ [ + ξ(x µ) ] /ξ } (.6) σ where + ξ(x µ)/σ > 0, σ > 0 and µ, ξ arbitrary. The case ξ = 0 is interpreted as the limit ξ 0, that is H(x; µ, σ, 0) = exp{ exp[ (x µ) ]} (.7) σ Type II and III correspond to ξ > 0 (ξ = ) and ξ < 0 (ξ = ) respectively. Smith α α (990) has a detailed review of statistical treatments, applications and estimations, of GEV. 6

19 Suppose now {X i, i =, 2,..., } is a stationary sequence with a continuous marginal distribution function F (x) and { X i, i =, 2,..., } is the so-called associated sequence of i.i.d. random variables with the same marginal distribution function F. M n stands for the maximum as usual, defined by (.), while M n denotes the corresponding maximum of { X,, X n }. The limiting distribution of M n can be related to the limiting distribution of M n via a quantity θ defined below. If for every τ > 0 there exists a sequence of thresholds {u n } such that Pr{ M n u n } e τ (.8) and under quite mild additional conditions, Pr{M n u n } e θτ (.9) Then θ is called the extremal index of the sequence {X n }. This concept originated in papers by Cartwright (958), Newell (964), Loynes (965), O Brien (974). Leadbetter (983) gave a formal definition. The index θ can take any values in [0,] and is interpreted as mean cluster size θ of exceedance over some high threshold. When θ = 0, it corresponds to a strong dependence (infinite cluster sizes) but not so strong that all the values can be the same. While θ = is a form of asymptotic independence of extremes, but it does not mean that the original sequence is independent. If (.9) holds for some τ and corresponding {u n }, then it holds for all τ (equal or not equal to τ) and its corresponding {u n}. Estimators of the extremal index have been proposed by Leadbetter, Weissman, de Haan, and Rootzén (989), Nandagopalan (990), Hsing (993). Smith and Weissman (994) gave a review of estimating the extreme index and proposed two estimating methods, i.e., blocks method and runs method. Other references include chapter 8 in the book by Embrechts et al. (997)..2.2 Limit laws of multivariate extremes Suppose {X i = (X i,, X id ), i =, 2,...} is a D-dimensional i.i.d. random process with distribution F (x) = F (x,..., x D ) = Pr{X id x d, d =,..., D} and marginal distributions F d (x) = Pr{X id x d }, d =,..., D. Let M n = (M n,, M nd ) denote the vector of pointwise maxima, where M nd = max{x id, i n}. If there exist 7

20 normalizing constants a n > 0, b n such that Pr{M n a n x + b n } = Pr{M nd a nd x d + b nd, d =,..., D} = F n (a n x + b n, a n2 x 2 + b n2,..., a nd x D + b nd ) (.0) = F n (a n x + b n ) H(x) as n and for the limit distribution H being non-degenerate such that each H i, i =,..., D is non-degenerate and must be in the GEV family, then the distribution H is called a D-dimensional multivariate extreme value distribution and F is said to belong to the domain of attraction of H, which we write F D(H). These distributions have received theoretical consideration in works by de Haan and Resnick (977), de Haan (985), Pickands (98), and Resnick (987). In the characterization of the multivariate extreme distribution, max-stable (or min-stable) distributions play a central role. We say a distribution H(x) is max-stable if for every t > 0 there exist functions α(t) > 0, β(t) such that H t (x) = H(α(t)x + β(t)) = H(α (t)x + β (t),..., α D (t)x D + β D (t)). (.) The following theorem describes the equivalence between multivariate extreme value distributions and max-stable distributions. Theorem.4 The class of multivariate extreme value distributions is precisely the class of max-stable distribution functions with non-degenerate marginals. This is Proposition 5.9 in Resnick (987). After slight modification of Pickands representation of a min-stable multivariate exponential into a representation for a max-stable multivariate Fréchet distribution, we have Theorem.5 Suppose H(x) is a limit distribution satisfying (.0), then H(x) = exp{ max (w i )dg(w)} (.2) i D x i where G is a positive finite measure on the unit simplex S D and G satisfies S D = {(w,..., w D ) : D w i =, w i 0, i =,..., D} i= S D w i dg(w) =, i =,..., D (.3) 8

21 Note v(x) = S D max ( w i i D x i )dg(w) is called the exponent measure by de Haan and Resnick (977). So to model a multivariate extreme value distribution function is in fact to model the measure function G. De Haan (985) gave a simple nonparametric procedure for modeling the measure function G. Coles and Tawn (99) argued that parametric models are preferable when one wants simultaneously to estimate the exponent measure and the dependence structure. In section.2., we looked at the limit distribution of a dependent sequence of univariate random variables, and some of the results can be extended in the multivariate context. Suppose now {X i = (X i,, X id ), i =, 2,...} is a D-dimensional stationary stochastic processes with distribution function F and marginals F d. Also let { X i } be the associated sequence of i.i.d. random vectors having the same distribution function F. M n and M n are both pointwise maxima of {X i } and { X i } respectively. Suppose lim Pr{M n u n,..., M nd u nd } = H(τ ) n lim Pr{ M n u n,..., M nd u nd } = Ĥ(τ ) (.4) n both exist and are nonzero, then a quantity that Nandagopalan (990, 994) called the multivariate extremal index can relate the extreme value properties of a stationary process to those of i.i.d. sequence. The multivariate extremal index is defined by where θ(τ ) satisfies (i) 0 θ(τ ) for all τ, H(τ ) = Ĥ(τ )θ(τ ) (.5) (ii) θ(0,..., 0, τ d, 0,..., 0) = θ d for τ d > 0, where θ d is the extremal index of the d th component process. (iii) θ(cτ ) = θ(τ ) for all c > 0(Theorem. of Nandagopalan 994). Smith and Weissman (996) pointed out that these properties are not sufficient to characterize the function θ(τ ). They also argued two reasons why one needs to obtain a more precise characterization to cover a much broader range of processes and to correspond to real stochastic processes, for instance, multivariate maxima of moving maxima processes which we are going to address in this work. The first reason is that the number of examples for which the multivariate extreme index has been calculated is currently very small (Nandagopalan 994, Weissman 994) and it is important to 9

22 be able to extend this class to cover a much broader range of processes. The second reason is that why we need a characterization is statistical: crude estimators of θ(τ) are easy to construct, but would not correspond to multivariate extreme index of any real stochastic process..2.3 Basic properties of multivariate extreme value distributions In this subsection, we study some basic properties of multivariate extreme value distribution functions. The following two lemmas are very general, not restricted to MEV, they are theorems 5.. and lemma 5.2. in Galambos (987). Lemma.6 Let F (x) be a D-dimensional distribution function with marginals F d (x), d D. Then, for all x, x 2,..., x D, max(0, D F d (x d ) D + ) F (x, x 2,..., x D ) min(f (x ), F 2 (x 2 ),..., F D (x D )). d= Lemma.7 Let F n (x) be a sequence of D-dimensional distribution functions, F nd (x d ) be the dth univariate marginal of F n (x). If F n (x) converges weakly to a nondegenerate continuous distribution function F (x), then, for each d with d D, F nd (x d ) converges weakly to dth marginal F d (x d ) of F (x). The Copula, or dependence function, is a very useful concept in the investigation of limit distributions for normalized extremes. It is an multivariate distribution with all marginals being uniform U(0, ). Definition. Let F (x) be a D-dimensional distribution function, with dth univariate margin F d. The copula associated with F, is a distribution function C : [0, ] m [0, ] that satisfies F (x, x 2,..., x D ) = C[F (x ), F 2 (x 2 ),..., F D (x D )]. Write C F = C F (y) = C(y) over the unit cube 0 y d, d D. Based on the function C(y), we now re-state theorems which relate the univariate marginals and the multivariate or dependence structure of the limit distributions. 0

23 Theorem.8 If (.0) holds, then the dependence function C H satisfies C k H(y /k, y /k 2,..., y /k D ) = C H(y, y 2,..., y D ) of the limit H(x) where k is an arbitrary integer. (This is Theorem 5.2. of Galambos 987). Theorem.9 A D-dimensional distribution function H(x) is a limit of (.0) if and only if its univariate marginals are of the same type as one of three type distributions and if its copula C H satisfies the condition of Theorem.8. (This is theorem of Galambos 987). Theorem.9 tells in principle that if we want to determine a n and b n we just need to determine the components from the marginal limit convergence forms. Let s look at a simple example to illustrate how Theorem.9 works. Example. Let (X, Y ) have a bivariate exponential distribution function F (x, y). If Mn an b n choose converges weakly to a nondegenerate distribution function H(x, y), we can a n = (log n, log n) and b n = (, )..3 Subclasses of max-stable processes Davis and Resnick (989) studied what they called the max-autoregressive moving average (MARMA(p,q)) process of a stationary process {X n } which satisfy the MARMA recursion, X n = φ X n φ p X n p Z n θ Z n θ q Z n q for all n where φ i, θ j 0, i p, j q and {Z n } is i.i.d. with common distribution function F (x) = exp{ σx }. For any given {φ i }, {θ j }, the corresponding process is a max-stable process. They have argued it is unlikely that another subclass of the max-stable processes can be found which is as broad and tractable as the MARMA class. Some basic properties of the MARMA processes have been shown and the prediction of a max-stable process has been studied relatively completely. However, much less is known about estimation of MARMA process. For prediction, see also Davis and Resnick (993). A naive estimation procedure for φ i, θ j s when the order q = is given in Davis and Resnick(989).

24 Deheuvels (983) defined what he called the moving minimum(mm) corresponding process as T i = min{δ k Z i k, < k < }, < i <, where δ k > 0, and {Z k } are i.i.d. exponential. The main theorem of Deheuvels (983) is exactly stated as the following theorem. Theorem.0 If (T 0,..., T m ) follows a joint multivariate extreme value distribution for minima with exponentially E() distributed margins, then there exist m+ sequences {a i k (n), < k < } depending on n =, 2,..., of positive numbers, such that, if T i (n) = min{a i k (n)z k, < k < }, i = 0,..., m, then (T 0 (n),..., T m (n)) converges in distribution to (T 0,..., T m ) as n. The results of Deheuvels (983) are very strong, but the model itself is still not easily tractable for the estimation of parameters. Notice that the reciprocal of T i moving maximum processes as T i = max{ δ k Z i k, < k < }, < i < gives the where {Z k } are i.i.d unit Fréchet random variables. Smith and Weissman (996) extended this definition to a more general framework which is more realistic and is called multivariate maxima of moving maxima (henceforth M4) process. The definition is Y id = max l max a l,k,d Z l,i k, d =,, D, (.6) k where {Z li, l, < i < } are an array of independent unit Fréchet random variables. The constants {a l,k,d, l, < k <, d D} are nonnegative constants satisfying a l,k,d = for d =,..., D (.7) l= k= As we have seen that M4 processes deal with D dimensional random processes whereas M M processes deal with univariate processes (D = ). Under the model (.6), Smith and Weissman (996) have shown very attractive results. Some are parallel to the results of Deheuvels (983). Although M M processes are only specified over one index there are possibilities to easily extend to over two indexes. The extension of MM processes to M4 processes results in hopes to estimate model parameters easily. 2

25 Following de Haan(984), (.6) defines max-stable processes because for any finite number r and positive constants {y id } we have Pr{Y id y id, i r, d D} = Pr{Z l,i k y id = Pr{Z l,m min m k r m = exp[ a l,k,d for l, < k <, i r, d D} y m+k,d a l,k,d, l, < m < } min d D l= m= max max a l,k,d m k r m d D y m+k,d ] (.8) This is (2.5) of Smith and Weissman (996) and we have Pr n {Y id ny id, i r, d D} = Pr{Y id y id, i r, d D} which tells that {Y i } are max-stable. They have argued that the extreme values of a multivariate stationary process may be characterized in terms of a limiting maxstable process under quite general conditions. They also showed that a very large class of max-stable processes may be approximated by the M4 processes mainly because those processes have the same multivariate extremal index (Theorem 2.3 in Smith and Weissman 996). The theorem and conditions appear below. Now fix τ = {τ,..., τ D } with 0 τ d <, d =,..., D. Let {u nd, n } be a sequence of thresholds such that n{ F d (u nd )} τ d under the model assumption. Since Z lk is unit Fréchet we can take u nd = n τ d. Denote u n = (u n,..., u nd ) and Bj k (u n ) the σ-field generated by the events {X id u nd, j i k, d D} for j k n. Define α nt = sup{ P (A B) P (A)P (B) : A B(u k n ), B Bk+t(u n n )} (.9) where the supremum is taken over k n t and two respective σ-fields. If there exists a sequence {t n, n } such that t n, t n /n 0, α n,tn 0 as n (.20) the mixing condition (u n ) is said to hold (Nandagopalan 994, Smith and Weissman 996). And further, there exists a sequence {k n, n } such that k n, k n t n /n 0, k n α n,tn 0 as n. (.2) Let r n = [n/k n ] the integer part of n/k n. We now exactly state a lemma and a theorem (Lemma 2.2 and their main theorem Theorem 2.3 of Smith and Weissman 996). 3

26 Lemma. Suppose (.9)-(.2) hold. Then θ(τ ) = lim Pr{Y id u nd, 2 i r n, d D max( Y d ) > }. (.22) n d u nd Alternatively, if we assume lim lim r n then (.22) is equivalent to r n i=r D d= Pr{Y id > u nd max( Y d ) > } = 0, (.23) d u nd θ(τ ) = lim lim Pr{Y id u nd, 2 i r, d D max( Y d ) > }. (.24) r n d u nd This lemma is basically a restatement of results of O Brien, for example O Brien (987). Theorem.2 Suppose (u n ) and (.23) hold for {Y i }, so that the multivariate extremal index θ Y (τ ) is given by (.24). Suppose also the same assumptions hold for {X i } (with the same t n, k n sequences). So the multivariate extremal index θ X (τ ) is also given by (.24) with X id replacing Y id everywhere. Then θ Y (τ ) = θ X (τ ). The extremal index of the process defined by (.6) is θ(τ ) = l max k max d a l,k,d τ d k max. (.25) d a l,k,d τ d l Although theoretical results have been obtained, the estimation of parameters in both MARMA(p,q) and M4 processes are not well developed and the applications of the two processes are very limited. Recently Hall, Peng and Yao (200) discussed moving maximum models Y i = sup{a j i Z i, < i < } where the distribution of Z i is assumed either F (z θ) = exp( z θ ) or the generalized Pareto distribution F (z θ) = ( + z) θ. Then for a finite number of parameters, they chose (θ, a (m) ) to minimize D m (θ, a (m) ) = (Ĝ(y) k i=2 m F [min{a j i y j, max(i, ) j min(i + m, k)} θ]) 2 w(y)dy, where the integral is over y = (y,..., y k ) R k + and (.26) n k Ĝ(y) = (n k) I (Yi+j y j for, (.27) j k) i= and w is a nonnegative weight function. We state their main theorem as follows. 4

27 Theorem.3 Under conditions then F has support on the positive half-line, and is in the domain of attraction of a Type II extreme value distribution. each a i is nonnegative and, for some ɛ (0, r), 0 < i ar ɛ i <. sup Pr(Y y,..., Yk y k Y,..., Y n ) Pr(Y y,..., Y k y k ) 0 <y,...,y k < (.28) where Y j is defined by Y j = sup{â j i Z i, < i < } â j i and θ are solutions of (.26) and Z i has distribution function F (. θ). Moreover, if m C 4 (log n) 2 for C 4 sufficiently large, the rate of convergence in (.28) is O p (n (/2)+δ ) for all δ > 0. Our present work on the estimation of M4 processes is somewhat parallel to Hall et al. (200) s work. In contrast to the bootstrapped processes which Hall et al. (200) used to construct confidence intervals and prediction intervals, we directly construct parameter estimators and prove their asymptotic properties. We will systematically solve the estimation problems of M4 processes in this work. 5

28 Chapter 2 Probabilistic Properties of Multivariate Maxima of Moving Maxima Processes and Basic Estimation of Parameters 2. Introduction In this chapter we consider the model specified in Smith and Weissman (996), extend some properties and propose estimating procedures which determine max moving range, signature patterns and estimation of parameters. Let {Z lk, l, < k < } be an array of independent unit Fréchet random variables. Smith and Weissman (996) studied the following process which they call M4 process. Y id = max l max k a l,k,d Z l,i k, d =,, D, for nonnegative constants {a l,k,d, l, < k < } satisfying l= k= a l,k,d = for d =,, D. Since in practice we will not have infinitely many parameters, usually we have l =,, L and K k K 2 for some finite numbers L, K and K 2. Here L corresponds to the maximum number of moving patterns. And K and K 2 characterize the range of the sequence dependence. We will focus on the finite dimensional M 4 process which we state as Y id = max max a l,k,d Z l,i k, d =,, D, (2.) l L K k K 2 where L K2 l= k= K a l,k,d = for d =,, D.

29 The model assumptions made here can be related to some real motivations in insurance and finance as well as environmental engineering. For example, insurance claims result from different factors (or patterns) and claims are usually made within a certain period. Stock market variation results from an internal or external big market price movement and will last a certain period. As we mentioned in section., the exceedance over threshold approaches have been advocated in modern extreme value theory applications. The exceedances over a high threshold of the observed process are modeled as the exceedances over threshold of an M4 process. We are not modeling the whole process as M4. Under model (2.), it is easily to obtain the finite distribution of {Y id, i r, d D} as a consequence of (.8). The distribution has the following form Pr{Y id y id, i r, d D} = exp[ L r+k max max m k r m d D l= m= K 2 a l,k,d y m+k,d ] where a l,k,d = 0 for k < K or k > K 2. The goal is to estimate all parameters {a l,k,d } under the constraints that all parameters are nonnegative and the summation is equal to for each d =,..., D. Due to the singularity that appears in the distribution function, maximum likelihood method is not directly applicable because of the singularities. One way to avoid this problem is to use a grouped likelihood approach, which has been advocated in similar circumstances by Barnard (965) and Kempthorne (966), and developed in detail by Giesbrecht and Kempthorne (976) for the particular case of a three parameter log-normal distribution. But this is not so easy to apply in a multivariate context, so we consider alternative approaches. In this work, first we study the structure of model (2.) and prove probabilistic properties which can be used to construct estimating procedures. Second, we study empirical distribution functions of the finite number of random variables. Guaranteed by the strong law of large numbers or ergodicity, we are able to construct estimators of all parameters and prove the consistency and asymptotics of the proposed estimators. We will start from simple examples which help us to understand the model structure and easily construct some basic estimating procedure which is based on the probabilistic properties of the M4 process. The related results are illustrated in section 2.2. Then in section 2.4 we study more general case and develop an estimating procedure which first estimates the weights within the same pattern and then estimates the weights among patterns. 7

30 2.2 Extended properties Under model (2.), it is possible that a big value of Z lk (unobservable) dominates all other values within a certain period of length K 2 + K + and there is a strong dependence among the big values (this is part of motivation for the model (2.)). Actually this occurs infinitely many times of the whole process. It will be clear after looking at some examples and some theoretical results later on. Consider now a simplified model, Y id = which is corresponding to L = (single pattern). Define max a kd Z i k, (2.2) K k K 2 A d t = [a K dz t+k max a kd Z t k, K k K 2 k K a K +,dz t+k max. K k K 2 k K + a K2 dz t+k max K k K 2 k K 2 a kd Z t+ k, (2.3) a kd Z t+k +K 2 k] We have Pr(A d t ) > 0, t. We now derive the explicit form of Pr(A d t ). Denote A dz t = [a K dz max a kd Z t k, K k K 2 k K a K +,dz max. K k K 2 k K + a K2 dz max K k K 2 k K 2 Based on (2.3), we can draw the following diagram: a kd Z t+ k, (2.4) a kd Z t+k +K 2 k] Z t K2 Z t 2 Z t Z t Z t+ Z t+k Z t+k Z t+k + Z t+k +2 a K2 d a 2d a d a 0d a d a K +,d a K d a 3d a 2d a d a 0d a K +2,d a K +,d a K d 8. a K2 d a K2,d a K2 2,d

31 then so thus Pr(A zd t ) = Pr(Z t K2 a K d a K2 d z, Z t K2 + min( a K d Pr(A d t ) =, a K +,d a K2,d a K2 d Z t+k min( a K d a K +2,d, a K +,d a K +2,d,..., a K 2 2,d Z t+k + min( a K +,d a K d Z t+k +2 min( a K +2,d a K d )z a K2 d )z a K2 d, a K +2,d,..., )z a K +,d a K2,d, a K +3,d,..., a K2 d )z a K +,d a K2 2,d Z t+2k +K 2 min( a K 2,d, a K2,d )z a K d a K +,d Z t+2k +K 2 a K 2 d z) a K d = exp[ z {K +K 2 = exp[ d 0 Pr(A d t ) = j= z ] ( j a K +i,d + j a K2 j+i,d i= i= Pr(A zd t ) z 2 e z dz = 0 [ + K +K 2 ( j j= i= a K +i,d a K2 j+i,d + d z 2 e z dz = + j i= a K2 i+,d a K +j i,d )}] ( + d ) 2. (2.5) a K2 i+,d a )]2 K +j i,d For P (A d t A d t+m), it is clear P (A d t A d t+m) = (P (A d t )) 2 if m > K + K 2. Suppose m K + K 2, then from A t we get a K +m,dz t+k a K dz t+m+k, (2.6) from A d t+m we get a K dz t+m+k a K +m,dz t+k. (2.7) So (2.6) and (2.7) imply a K +m,dz t+k = a K dz t+m+k, thus { P (A d t A d (P (A d t )) 2 if m > K + K 2, t+m) = 0 if m K + K 2. (2.8) We have the following lemma. Lemma 2. Under the model (2.2), for each d we have Pr(A d t, i.o.) = 9

32 Proof. Let {t, t 2,, } be subsequence of {t } such that t i+ t i > K + K 2, i, then {A d t i } is an independent sequence of events. By the Borel-Cantelli lemma for independent events we get Pr( A d t n ) = i= n=i since P (A d t i ) > 0. But A d n A d t n, so t= n=t i= n=i Pr(A d t, i.o.) = Pr( A d n) = t= n=t and this completes the proof. This theorem tells there are an infinite number of time periods within which the process is driven by a single extreme jump. For example, a real-world interpretation might be that a flood in a certain region and a certain time period is caused by a specific hurricane. The strengths of different hurricanes are different and the costs are different. Or we say they follow different patterns. We have following theorems. Corollary 2.2 Under the model (2.2), for each d we have P (Y td = a K dz t+k, Y t+,d = a K +,dz t+k,, Y t+k2 +K,d = a K2 dz t+k, i.o.) = or Y td P ( Y td + Y t+,d + + Y t+k2 +K,d = a K d, i.o.) =, (2.9) and equivalently Y t+m,d P ( Y td + Y t+,d + + Y t+k2 +K,d = a K +m,d, i.o.) = (2.0) m = 0,, K + K 2 Proof. The condition defining the set A d t implies Y td = a K dz t+k, Y t+,d = a K +,dz t+k,, Y t+k2 +K,d = a K2 dz t+k, and hence by the theorem we have proved the corollary. 20

33 Y t+m,d Theorem 2.3 Under the model (2.2), if P ( = c Y td +Y t+,d + +Y t+k2 +K md, i.o.) =,d for m = 0,, K + K 2, then c md = a K +m,d and those {Y td,, Y t+k +K 2,d} form the events A d t if Y t+m,d Y td +Y t+,d + +Y t+k2 +K,d = c md. Remark: The theorem says, for example when m = 0, there is only one constant c 0d = a K d such that (2.9) is true. And if m, it is true for all m. Y t+m,d Y td +Y t+,d + +Y t+k2 +K,d = c md is true for one Proof. We only prove the case when m = 0. Define random variables T td = si (Ytd =a t s,d Z s). Notice that t K 2 s t+k and T td is uniquely defined for each t and hence for all t because the Z t s have an absolutely continuous distribution. The event A d t corresponds to T td = T t+,d = = T t+k +K 2,d = t + K. Suppose now we have that Y td Y td + Y t+,d + + Y t+k2 +K,d = p d (2.) occurs infinitely many times for p d a K d, then T td, T t+,d,, T t+k +K 2,d must follow one of the following two cases. () T td = T t+,d = = T t+k +K 2,d = t + K, where K K, K { K 2, K 2 +,, K } (2) T td, T t+,d,, T t+k +K 2,d contain at least two different values. For case, Y td = a Kd Z t+k, Y t+,d = a K,d Z t+k,, Y t+k +K 2,d = a K +K 2 K,dZ t+k = 0 since K + K 2 K > K 2, and a K2 +,d = a K2 +2,d = = 0. This is a contradiction to Y td > 0 all t. For case 2, this means the LHS of (2.) is a function of at least two different Z t s, because if they are the same, the value must be equal to t + K which corresponds to p d = a K d, otherwise it is the case (i). (2.) can be written into a t s,dz s a t s,dz s + a t+ s2,dz s2 + + a t+k +K 2 s K +K2+,dZ sk +K 2 + = p d (2.2) where s, s 2,..., s K +K 2 + depend on t. Since the range of t s, t + s 2,..., t + K + K 2 s K +K 2 + is finite under the assumption of (2.2), there are fixed numbers 2

34 h 0, h,..., h K +K 2 such that a h0,dz s a h0,dz s + a h,dz s2 + + a hk +K2,dZ sk +K 2 + = p d (2.3) occurs infinitely many times. This implies Pr( a h0,dz s a h0,dz s +a h,dz s2 + +a hk +K2,d Z sk +K 2 + = a h0,dz s a h0,dz s +a h,dz s 2 + +a hk +K2,d Z s K +K 2 + ) > 0 (2.4) for some s, s 2,..., s K +K 2 + and s, s 2,..., s K +K 2 + such that max(s, s 2,..., s K +K 2 +) < min(s, s 2,..., s K +K 2 +). But (2.4) is not possible since all Z t s are continuous random variables and the quantity of the LHS in the bracket in (2.4) is independent of the quantity of the RHS in the bracket in (2.4). This shows that (2.) can not be true. Suppose now (2.) occurs at t = t and t 2, i.e. Pr( a h0,dz s a h0,dz s +a h,dz s2 + +a hk +K2,d Z sk +K 2 + a h 0,d Z s = a h 0,d Z s +a h,dz s 2 + +a h K +K2,dZ s K +K 2 + = p d ) > 0 (2.5) then s, s 2,..., s K +K 2 + and s, s 2,..., s K +K 2 + must have some common values, otherwise (2.5) cannot be true. But (2.5) implies (2.3) occurs infinitely often, and (2.3) implies (2.4) and so we have (2.) cannot occur even twice. Both cases have shown contradictions for p d a K. So c 0d = a K dand those {Y td,, Y t+k +K 2,d} form events A d t, therefore the proof is completed. The following theorem tells that the range cannot be over K 2 + K + numbers in order to get infinitely many ratios which are equal to a constant. Theorem 2.4 Under the model (2.2), for each d Y td P ( Y td + Y t+,d + + Y t+k2 +K +,d = c d, i.o.) = 0 for any constant c d. Proof. Because Y td and Y t+k +K 2 +,d cannot be written as functions of just one Z t, Y td is a function of at least two different Z Y td +Y t+,d + +Y t+k2 +K t s. The proof then +,d follows by the same arguments as in Theorem

35 Theorems 2.3 and 2.4 tell that in order to estimate the parameter a K +m, we only need to observe two equal values of Y t +m,d, Y t2 +m,d Y t d+ +Y t +K +K 2,d Y t2 d+ +Y t2 +K +K 2,d at time t and t 2, where t 2 t > K + K 2. Naturally one wants to know how fast a sequence defined Y by t+m,d, t =, 2,..., reaches the desired value a Y td + +Y t+k K +K 2,d +m,d. We use discrete time Markov chain method to study this problem in section Examples of M 4 processes In order to have insight into what the theorems have shown, now we turn to give some examples to demonstrate applications of the theorems Case L = Example 2. Consider the model Y td = max(a d Z t, a 0d Z t ), d =,..., D Define A d t = [a d Z t a 0d Z t, a d Z t a 0d Z t+ ] We have P (A d t ) > 0, so by Theorem 2.3, P (Y td = a 0d Z t, Y t+,d = a d Z t, i.o.) = or P ( Y t+,d Y td = a d a 0d = a 0d a 0d = a 0d, i.o.) =. We seek all those ratios of Y t+,d Y td such that the ratios are close to a constant. Or equivalently consider Y td P ( = a 0d, i.o) = Y td + Y t+,d The ratios are bounded and in [0,], we have a d = a 0d. In this way we can find accurate estimates for parameters. Figure 2. plots the partial points from the original sequence and points from (2.9). Those points falling onto a horizontal line (in the right figure) correspond to those spikes with same shape (in the left figure). Those spikes follow the same moving pattern. The intercept of the line to vertical axis gives the value of a 0d. 23

36 20 50 observations of an M4 process Ratios within moving range of an M4 process Figure 2.: Left figure is a time series plot of 50 observations of process Y td = max(a d Z t, a 0d Z t ) for some d. Right figure is a time series plot of 3000 observations Y of ratios td Y td +Y t+,d. The value of a 0d can be read from the right figure. Example 2.2 Y td = max(a d Z t, a 0d Z t, a d Z t+ ) Define we have P (A d t ) > 0, so by Theorem 2.3, or Equivalently, and A d t = [a d Z t+ max(a d Z t, a 0d Z t ), a 0d Z t+ max(a d Z t, a d Z t+2 ), a d Z t+ max(a 0d Z t+2, a d Z t+3 )] P (Y td = a d Z t+, Y t+,d = a 0d Z t+, Y t+2,d = a d Z t+, i.o.) = Y td P ( = a d, i.o.) =. Y td + Y t+,d + Y t+2,d Y t+,d P ( = a 0d, i.o.) = Y td + Y t+,d + Y t+2,d Y t+2,d P ( = a d, i.o.) =. Y td + Y t+,d + Y t+2,d Figure 2.2 again shows the significant pattern of value a d. Examples 2. and 2. have shown how to get the moving coefficients in each individual process, but we are mainly interested in multivariate processes. In other words, 24

37 80 50 observations of an M4 process Ratios within moving range of an M4 process Figure 2.2: Left figure is a time series plot of 50 observations of process Y td = max(a d Z t, a 0d Z t, a d Z t+ ) for some d. Right figure is a time series plot of 3000 Y observations of ratios td Y td +Y t+,d +Y t+2,d. The value of a d can be read from the right figure. we need to know how to distinguish different processes. bivariate processes { Y i = 2 max(z,i, Z,i ) By plotting or Y i2 Y i2 + Y i+,2 + Y i+2,2, Y i2 = max(z 3,i, Z,i, Z,i+ ) { Y i = max(z 2,i, Z,i ) Y i2 = 3 max(z,i, Z,i+, Z,i+2 ) Y i Y i + Y i+,, Y i+, Y i + Y i+, Y i+,2 Y i2 + Y i+,2 + Y i+2,2, For example we have two Y i+2,2 Y i2 + Y i+,2 + Y i+2,2 (2.6) (2.7) we can get all coefficients and, which can be read off from the pictures. But we 2 3 need to know which model, (2.6) or (2.7), is the true model. We now study this. When were from Z,i, then Y i Y i + Y i+, = 2, Y i+, Y i + Y i+, = 2 Y i,2 = Y i,2 + Y i,2 + Y i+,2 3, Y i,2 = Y i,2 + Y i,2 + Y i+,2 3, Y i+,2 = Y i,2 + Y i,2 + Y i+,2 3 (2.8) 25

38 for model (2.6), but Y i 2,2 Y i 2,2 + Y i,2 + Y i,2 = 3, Y i,2 Y i 2,2 + Y i,2 + Y i,2 = 3, Y i,2 Y i 2,2 + Y i,2 + Y i,2 = 3 (2.9) for model (2.7). So if (2.8) is the case we conclude model (2.6), otherwise it s model (2.7). Some other comparison also can be done in order to distinguish the models Case L > Now consider L >, where the model is (2.). Define for each l A ld t = [a l, K,dZ l,t+k max a l,k,d Z l,t k, K k K 2 k K a l, K +,dz l,t+k max. K k K 2 k K + a l,k2,dz l,t+k max K k K 2 k K 2 a l,k,d Z l,t+ k, (2.20) a l,k,d Z l,t+k +K 2 k]. Remark: we can define such event for all l simultaneously, but we don t need here. Notice P (A ld t ) > 0, so by Theorem 2.3, for each m = 0,,, K + K 2, we have P ( Y t+m,d = Y td +Y t+,d + +Y t+k2 +K,d a l, K +m,d a l, K,d+a l, K +,d+ +a l,k2,d, i.o.) = (2.2) Y We expect to have L signature patterns on the plot of t+m,d Y td +Y t+,d + +Y t+k2, and these +K,d a patterns give estimates of l, K +m,d, l L. Figure 2.3 shows three a l, K,d+a l, K +,d+ +a l,k2,d different signature patterns (points fall onto 3 horizontal lines) which correspond to L = 3. As we have already seen, the plots give accurate estimates of the ratios. When L =, we can exactly get all the values of a kd. But for L > we cannot. Even for L =, we have assumed that the model assumptions are exactly satisfied, not something we would expect to use in practice. Also, the whole method presupposes that the margins are transformed into unit Fréchet margin and this wouldn t be exact in practice, either. We need to develop estimation procedures to obtain estimates of a l,k,d in a more practical way. We study this in Chapter 3. In the next section we still assume that the model assumptions are exactly satisfied but L is greater than. 26

39 observations of an M4 process Ratios within moving range of an M4 process Figure 2.3: A demo of multiple signature patterns. 2.4 Estimation of weight parameters In the previous section, (2.2) gives estimates of a l, K +m,d a l, K,d+a l, K +,d+ +a l,k2,d, not the parameters themselves. We solve this problem in this section. Rewrite the model as Y id = max max a l,k,d Z l,i k l L K k K 2 = max b ld max c l,k,d Z l,i k (2.22) l L K k K 2 where b ld is the weight of l s signature pattern and such that l b ld = and k c l,k,d = for each l and d. It is easy to show P (Y d y d ) = e y d from (.8). As mentioned in section 2. our goal is to approximate the distribution function and from the approximation we obtain estimates of all parameters. Since the univariate distribution studied here does not relate any parameters to the distribution function, we seek a jointly k-variate distribution function approximation. Throughout this work we will consider k = 2 only because the cases of k > 2 can be generalized from the case k = 2. First we have, P (Y d y d, Y 2d y 2d ) = exp[ where c l,k2 +,d = 0, c l, K,d = 0 L l= 2+K b ld max( c l, m,d y d m= K 2 Under the model (2.22) all c l,k,d can be estimated by looking into, c l,2 m,d y 2d )] (2.23) Y t+m,d P ( Y td + Y t+,d + + Y t+k2 +K,d = c l, K +m,d, i.o.) = (2.24) 27

40 and therefore we only need to estimate b ld. It is known that an empirical process approximates the random process from which observations are obtained. Let A be any subset of R 2 + = (0, ) (0, ) and define X id = I A (Y id, Y i+,d ). Let µ d be the mean of X id, then E(X id ) = P ((Y id, Y i+,d ) A) = µ d V ar(x id ) = E(X 2 id) (E(X id )) 2 = µ d µ 2 d By appropriately choosing A, we can construct parameter estimators Estimation using independent observed values from a dependent sequence Now let A d = (0, x d ) (0, x d ),, A L,d = (0, x L,d ) (0, x L,d ) be different and define X Ajd = n n i= I Ajd (Y id, Y i+,d) (2.25) where (Y id, Y i+,d ) are i.i.d pairs taken from an M-dependent time series, which we study in this work. Then SLLN implies X Ajd a.s. P (A jd ) = P (Y d x jd, Y 2d x jd). (2.26) Now let exp[ L l= 2+K bld max( c l, m,d x jd m= K 2, c l,2 m )] = x X Ajd, j =,, L (2.27) jd then we can construct parameter estimators from solving D systems of linear equations L b 2+K l= ld m= K 2 max( c l, m,d x d, c l,2 m,d ) = log( X x Ad ) d. L l= b 2+K ld m= K 2 max( c l, m,d x L,d, c l,2 m,d ) = log( X (2.28) x AL,d ) L,d L b l= ld = We choose values of x d, x d,, x L,d, x L,d such that this system of linear equations has unique solution. Since now c l,k,d s are known, we are able to choose values of 28

41 x d, x d,, x L,d, x L,d such that the determinant of the system of linear equations is not zero. This may not be true for some special cases, for example when all coefficients are identical we get L l= b ld = for each equation of those L equations in (2.28). We say two signature patterns, lth and l th, are identical when the coefficients can be written as b ld (c l, K,d, c l, K +,d,, c l,k2,d), b l d(c l, K,d, c l, K +,d,, c l,k2,d). The identical patterns have the following property: max(b ld max c l,k,d Z l,i k, b l d max c l,k,dz l,i k) = d (b ld + b l d) K k K 2 K k K 2 max K k K 2 c l,k,dz l,i k where Z l,i k s are unit Fréchet. We assume there are no identical patterns in models considered in this section. The results may apply to some cases when there exist identical patterns. For a specific d, define 2+K max( c, m,d x d, c 2+K,2 m,d ) x d m= K 2. d = K max( c, m,d x L,d, c 2+K,2 m,d ) x L,d m= K 2 max( c L, m,d x d m= K 2 max( c L, m,d x L,d m= K 2, c L,2 m,d x d, c L,2 m,d x L,d i.e. d is the determinant of the system of linear equations. Assume now the L determinants of the (L ) (L ) matrices formed from the bottom L rows are not all zero. For fixed x d, since c l,k,d are known and 2+K m= K 2 c l,i m,d =, i =, 2, then there exist x min,d and x max,d such that when x d < x min,d or x d > x max,d, all elements of first row in d are x d or respectively. And so when x x d < x min,d or d x d > x max,d, d = 0. When x d varies in [x min,d, x max,d ], denote d by d (x d ), then where d j 0, d j ). ) d (x d ) = cijd d j + ci x d x j d d j (2.29) d 0 are the (, j) or (, j ) minors of d. Both summations are over all non-zero minors of the first row of d and the corresponding c ijd x d or c i j d. x d If d (x d ) = 0, by varying x d in [x min,d, x max,d ], at some point x, some x d c ijd d j of the summation cijd x d d j change to c x i j d d j and add to d x ci j d d j, d or vice versa, and this change results in d (x) 0. Hence it cannot be true that 29

42 d = 0 for all x d. This argument can be applied to lower dimension matrices. On the other hand, we can start from a 2 2 matrix and extend it to L L matrix such that the determinant is not zero as required. Therefore, there exist constants x d, x d,, x L,d, x L,d such that each system of linear equations (2.28) has a unique solution. So we get bld = L j= θ ljd log( X Ajd ) + constant ld a.s. L j= θ (2.30) ljd log(pr(a jd )) + constant ld = b ld for suitable constants θ ljd which are the elements of the inverse of d. Let µ jd = E(I Ajd (Y d, Y 2d)) = Pr(A jd ) µ ijd = E(I Aid (Y d, Y 2d)I Ajd (Y d, Y 2d)) = E(I Aid A jd (Y d, Y 2d)) = Pr(A id A jd ) then we have well-known asymptotic normality properties. Proposition 2.5 If (Y id, Y i+,d ) are i.i.d pairs, and X Ajd (i) n( X Ajd µ jd ) d N(0, µ jd µ 2 jd ) (ii) n(log( X Ajd ) log(µ jd )) N(0, µ jd µ 2 jd ) (iii) n where X Ad. X AL,d µ. µ L d µ 2 jd d N(0, Σ d ), { σ ijd = µ ijd µ id µ jd if i j, σ ijd = µ id µ 2 id if i = j. is defined as in (2.25), then The proofs are trivial and can be found in most theoretical statistical books, for example Arnold (990), Chen (98). Theorem 2.6 For each l, n( bld b ld ) d N(0, σ 2 ld) where σld 2 = ( θ ld,, θ l,l,d )Σ d µ d µ L,d θ ld µ d. θ l,l,d µ L,d 30

43 Proof. By the mean-value theorem, n( bld b ld ) = n[ dh dξ ] X Ad. X AL,d µ d. µ L,d where h(µ d ) = L j= θ ljd log(µ jd ), [ dh dµ d ] = ( θ ld µ d,, θ l,l,d µ L,d ) and ξ is between ( X Ad,, X AL,d ) and (µ d,, µ L,d ). By the proposition and Slutsky theorem, X Ad µ d nh (ξ) T. X AL,d and this completes the proof.. µ L,d d [ dh dµ d ] Z, Z N(0, Σ d ) N(0, [ dh dµ d ] Σ[ dh dµ d ]) A generalization of the theorem to the joint distribution of b d,..., b Ld obtained and the proof arguments are similar. We have Theorem 2.7 n( b d b d ) bld d N(0, Θ d Σ d Θ d ), where bd b d θ d /µ d θ 2d /µ 2d θ,l,d /µ L,d b2d b d =., b b 2d d =., Θ θ 2d /µ d θ 22d /µ 2d θ 2,L,d /µ L,d d = b Ld θ L,d /µ d θ L,2d /µ 2d θ L,L,d /µ L,d Note: Θ d Σ d Θ d is singular because L l= b ld =. We still need to specify the asymptotic joint distribution of all b ld. Now let µ ijdd = E(I Aid (Y d, Y 2d)I Ajd (Y d, Y 2d )). Σ = (Σ dd ) can be where each component of Σ is a covariance matrix. Σ dd = Σ d, Σ dd (ij) = µ ijdd µ ijd µ ijd. Then we have the following generalization. Corollary 2.8 n( b b) where d N(0, ΘΣΘ ), b b Θ b 2 b =., b = b 2., Θ = Θ 2 b D b D... Θ D. 3

44 2.4.2 Estimation using the whole dependent sequence In subsection 2.4., we were using independent draws from the bivariate distribution of (Y td, Y t+,d ) for each d. Since the observed process is a dependent process, independent draws do not contain all information from the data. The estimates may not be accurate. If we use the entire observed process to estimate all parameters, the efficiency of the estimators may be higher and of course it is a more realistic scenario for practical applications. Like previous subsection, we estimate parameters associated to dth series by the observed values of that series. We now drop the sub-index d from Y td, a l,k,d, etc. We write Y t, a lk only. We use the same notations ( without index d ) as in section 2.4., for A j s and define X Aj = n n I Aj (Y i, Y i+ ) i= which means we use the original observations which are an M-dependent sequence (M = K + K 2 + here). In order to derive similar results as in section 2.4. without loss of data information, we will apply ergodicity. We quote an ergodic theorem here, whose proof can be found in Billingsley (995). Let (Ω, F, µ) be a complete probability space and T : Ω Ω be a one-to-one onto map such that T and T are both measurable: T F = T F = F. Assume further that µ(t E) = µ(e) for all E F. A map T satisfying these conditions is called a measure-preserving transformation (or m.p.t. for short). The F-set A is invariant under T if T A = A; it is a nontrivial invariant set if 0 < µ(a) <. And T is said ergodic if there are no nontrivial invariant sets in F. A measurable function f is invariant if f(t ω) = f(ω) for all ω. Theorem 2.9 The ergodic theorem Suppose that T is a m.p.t. on (Ω, F, µ) and that f is measurable and integrable. Then lim n n n f(t k ω) = f(ω) k= with probability, where f is invariant and integrable and E[ f] = E[f]. If T is ergodic, then f = E[f] with probability. 32

45 If f = I A and T is ergodic, we have n lim I A (T k ω) = P (A) n n with probability. By the mean-value ergodic theorem X Aj k= a.s. E(I Aj (Y i, Y i+ )) = Pr(A j ) where T is taken to be a Bernoulli shift. And again we apply the same method we had in section 2.4., we get bl = L j= θ lj log( X Aj ) + constant l a.s. L j= θ lj log(pr(a j )) + constant l = b l In order to study asymptotic normality, we introduce the following proposition which is Theorem 27.4 in Billingsley (995). First we introduce the so-called α-mixing condition. For a sequence Y, Y 2,... of random variables, let α n be a number such that P (A B) P (A)P (B) α n for A σ(y,..., Y k ), B σ(y k+n, Y k+n+,...), and k, n. When α n 0, the sequence {Y n } is said to be α-mixing. This means that Y k and Y k+n are approximately independent. Proposition 2.0 Suppose that X, X 2,, is stationary and α-mixing with α n = O(n 5 ) and that E[X n ] = 0 and E[Xn 2 ] <. If S n = X + + X n, then n Var[S n ] σ 2 = E[X] E[X X +k ], where the series converges absolutely. If σ > 0, then S n /σ n d N(0, ). Remark: The constants α n = O(n 5 ) and E[X 2 n ] < are stronger than necessary as stated in the remark followed Theorem 27.4 in Billingsley (995) to avoid technical complication in the proof. Proposition 2. If σ j > 0, n( X Aj µ j ) where k= d N(0, σ 2 j ), K +K 2 + σj 2 = µ j µ 2 j + 2 (Pr(Y x j, Y 2 x j, Y +k x j, Y 2+k x j) µ 2 j) k= 33

46 Proof. Let X n = I Aj (Y n, Y n+ ) µ j, then E[X n ] = 0 and E[X 2 n ] < because X n is bounded. And the α-mixing condition is satisfied since Y n s are M-dependent. So the conditions of Proposition 2.0 are satisfied. Then the proof follows after calculating the following values and applying Proposition 2.0. X 2 = I Aj (Y, Y 2 ) 2µ j I Aj (Y, Y 2 ) + µ 2 j and EX 2 = µ j 2µ 2 j + µ 2 j = µ j µ 2 j X X +k = (I Aj (Y, Y 2 ) µ j )(I Aj (Y +k, Y 2+k ) µ j ) = I Aj (Y, Y 2 )I Aj (Y +k, Y 2+k ) µ j I Aj (Y, Y 2 ) µ j I Aj (Y +k, Y 2+k ) + µ 2 j Lemma 2.2 n where E(X X +k ) = Pr(Y x j, Y 2 x j, Y +k x j, Y 2+k x j) µ 2 j X A. X AL µ. µ L σ ij = µ ij µ i µ j, the matrix W k has entries w ij k x j) µ i µ j, µ ii = µ i. d N(0, Σ + K +K 2 + k= {W k + W k }) = Pr(Y x i, Y 2 x i, Y +k x j, Y 2+k Proof. Let U = (I A (Y, Y 2 ) µ,, I AL (Y, Y 2 ) µ L ), U +k = (I A (Y +k, Y 2+k ) µ,, I AL (Y +k, Y 2+k ) µ L ), and α = (α,, α L ) 0 be an arbitrary vector. Let X = α U, X 2 = α U 2,, then E[X n ] = 0 and E[X 2 n ] <. And so Proposition 2.0 can apply. We say expectation are applied on all elements if expectation is applied on a random matrix. But E[X 2 ] = α E[U U ]α = α Σα, E[X X +k ] = α E[U U +k ]α = α W k α where E[(I Ai (Y, Y 2 ) µ i )(I Aj (Y, Y 2 ) µ j )] = µ ij µ i µ j E[(I Ai (Y, Y 2 ) µ i )(I Aj (Y +k, Y 2+k ) µ j )] = Pr(Y x i, Y 2 x i, Y +k x j, Y 2+k x j) µ i µ j So the proof is completed by applying the Cramér-Wold device (see below). 34

47 Cramér-Wold device: (Cambanis and Leadbetter 994) Let ξ = (ξ,..., ξ D ), ξ n = (ξ n,..., ξ nd ), n =, 2,..., be random vectors. Then ξ n d ξ as n if and only if α ξ n + + α D ξ nd d α ξ + + α D ξ D as n for all α,..., α D R. Theorem 2.3 For each l, n( bl b l ) d N(0, σ 2 l ) where σl 2 = ( d l,, d K +K 2 + l,l )(Σ + {W k + W µ µ k}) L k= d l µ. d l,l µ L Proof. Theorem 2.6. This follows from lemma 2.2 using the same argument as in the proof of A generalization of the theorem to the joint distribution of b,..., b L can be obtained and the proof arguments are similar. We have Theorem 2.4 n( b b) K +K 2 + d N(0, Θ(Σ + {W k + W k})θ ), where b, b and Θ are defined as the same as in Theorem 2.7. k= For all b ld, a similar result of Corollary 2.8 can be obtained. 2.5 Results from discrete time Markov chain theory In order to simplify the notation, we only consider the case of D = and L = in this section. Now define X t = I At, t = 0, ±, ±2,... 35

48 and suppose the sequence {X t } starts at t = 0 with X t k = 0, k =,..., K + K 2. If X t = 0, then Pr(X t+ = 0 X t = 0) = Pr(A c t+ A c t) = Pr(Ac t Ac t+ ) = 2 Pr(At) Pr(A t) = p Pr(A c t ) = Pr(At) Pr(A t+) Pr(A t) If X t =, then Pr(X t+ = 0 X t = ) =, Pr(X t+2 = 0 X t = 0) =,..., Pr(X t+k +K 2 + = 0 X t+k +K 2 = 0) =. If we consider all the 0 s before the sequence {X t } reaches are different, and similarly for the 0 s after reaching, then we can construct a Markov chain which has the following transition diagram. Where b corresponds to A c t, b2 corresponds to A c t+, p p p p p b b2 b3 bk q q q q ak a2 a Figure 2.4: Transition Diagram of the Constructed Markov Chain. and similarly for b3,..., b K, K = K + K 2 +. Once the chain reaches the state b K, it moves either to or to b. Because A t+k +K 2 +2 is independent of A t, so we can think the chain return to b and restart again. Once the chain reaches the state, it must move at least K + K 2 + steps to reach the state again. Or after K + K 2 steps, the chain restart again. We have the following transition probability matrix of 36

49 {b, b 2,..., b K,, a, a 2,..., a K }. 0 p 0 0 q p 0 q p q p q P = Define now a = 2, a 2 = 3,..., a K = K, b = K +, b 2 = K + 2,..., b K = 2K +, and and T = min{n : X n = }, u i (n) = Pr{T n X 0 = i}, m i = E(T X 0 = i). Then we have standard DTMC results: u i (n) = P i + m i = + 2K+ j=2 P ij u j (n ), i 2, n, u i () = 0, i, 2K+ j=2 P ij m j, i = 2,..., 2K +. The proofs of these results can be found from books such as Kulkarni (995). We give an example to show how fast we can get a desired value. Example 2.3 Let then by (2.5) we can have Y t = max{.z t 2,.2Z t,.4z t,.2z t+,.z t+2 } Pr(A t ) =.033, p = 2 Pr(A t) Pr(A t ) =.9658, q = The value of.0342 approximately tells among 00 independent events, we can get 3 times of the desired value. But to get 00 independent events, we need to have 600 A t s. Since A t s are dependent, the number of 600 can be dramatically reduced. In fact from m i = E(T X 0 = i), 29 is the mean number of needed A t s to get the desired value once. 37

50 2.6 Approximation of two M 4 processes The characterization of extreme observations of a stationary process in Smith and Weissman (996) are infinite order M 4 processes. But in practice, it is unrealistic to estimate infinite many parameters. It is natural to apply models with finite number of parameters to real data if the candidate model well approximates the true model. In this section, we will create conditions under which two processes are arbitrarily close and the following two theorems show that. Lemma 2.5 Suppose <i< α i =, i >K α i = δ, X = where {Z i } are i.i.d Fréchet. Let Z δ = δ Y, then lim P [ Z δ X Y > ɛ] = 0. δ 0 i >K α i Z i and Y = Proof. It is easy to check that X, Y, X Y, Z δ have the distributions: X e δ x, Y e δ y, X Y e x, Zδ e z, i K α i Z i, Now P (X > Y ) = = 0 0 P [X > y] δ y 2 ( e δ y ) δ y 2 = ( δ) = δ, 0 e δ y dy e δ y dy e y 2 y dy P [Z δ Y > ɛ] = P [( )Y > ɛ] = P [ δ Y > ɛ] δ δ = P [Y > ( δ)ɛ ] = e ( δ)δ ( δ)ɛ δ = e δ ɛ. P [ Z δ X Y > ɛ] = P [Z δ X Y > ɛ] + P [X Y Z δ > ɛ] = P [Z δ X > ɛ, X > Y ] + P [Z δ Y > ɛ, Y > X] +P [X Z δ > ɛ, X > Y ] + P [Y Z δ > ɛ, Y > X] 2P [X > Y ] + P [Z δ Y > ɛ] + 0 = 2δ + e δ ɛ which proves the assertion. 38

51 Now pick δ n so that P [ Z δ X Y > ɛ] 2 n for δ δ n ; then So Z δn So a.s. X Y. P [ lim Z δn X Y ] = P [ Z δn X Y > ɛ, i.o.] n = P [ Z δj X Y > ɛ] n= j=n = lim P [ Z δj X Y > ɛ] n j=n lim P [ Z δj X Y > ɛ] n j=n Apply this result to a process, we have Z iδn lim 2 n+ = 0 n a.s. X i Y i, each i, and P [ lim Z iδ n = X i Y i ] = P [ lim Z iδ n X i Y i ] i= n i= n P [ lim Z iδn X i Y i ] =. n P [ i= m= n=m i= Z iδn X i Y i > ɛ] = 0. We now state the theorem which shows how a finite moving range model arbitrarily closely approximates an infinite range moving process. The proof is just a generalization of the arguments above. Theorem 2.6 Suppose index set. l= k= Y id = max l a l,k,d =, {lk} K a l,k,d = δ d > 0, where K is a finite max a l,k,d Z l,i k, d =,, D, (2.3) k Ỹ iδd = max b l,k,d Z l,i k, d =,, D, (2.32) {lk} K where L K2 l= k= K b l,k,d = for d =,, D. And b l,k,d = δ d a l,k,d for {lk} K, then there exist {δ md }, δ md 0 as m, such that D P [ d= i= n= m=n Ỹiδ md Y id > ɛ] = 0. Therefore we conclude {Ỹiδ d } {Y id } for all i and d with probability one. 39

52 Chapter 3 Estimation Based on Bivariate Distribution 3. Introduction The methods developed in previous chapter are idealized methods because they assume the model holds exactly. Also the M4 process is itself just an approximation to the general max-stable process. The process may be max-stable without being exactly M4. In practice we may not be able to estimate the ratios accurately as stated in sections 2.4. and 2.4.2, especially when the data are with error. In this chapter, we will develop methods which can be applied to estimate a l,k,d directly. In the previous chapter, the bivariate distribution functions were used to estimate the weight parameters. Here our intention is to determine when the bivariate distribution functions determine all the a id s, and then to construct estimators. Define A d, A 2d,, A L (K +K 2 +),d similarly as we did in the previous chapter and then solve a system of nonlinear equations L 2+K l= m= K 2 max( âl, m,d x d, âl,2 m,d ) = log( X x Ad ) d L 2+K l= ) = log( X A2d ) L l= 2+K m= K 2 max( âl, m,d x 2d, âl,2 m,d x 2d â m= K 2 max( l, m,d, x L (K +K 2 +),d. â l,2 m,d ) x L (K +K 2 +),d = log( X ) AL (K +K 2 +),d Under some conditions, this will give unique solutions which converge to true parameter values for each d, i.e. the bivariate distribution function determines the whole process. We will introduce such conditions and prove some theoretical results. Like Chapter 2, we drop the index d when we deal with a single process or we treat multiple processes separately.

53 3.2 Modeling time dependence In this section we mainly focus on the time dependence of each single process and we will not use the index d in sub-sections 3.2. and Preliminary estimation First we let L = and study the structure of bivariate distribution function P (Y y, Y 2 y 2 ) = exp[ where a K2 + = 0, a K = 0. Now define q(x) = a K + K 2 then P (Y, Y 2 x) = exp[ q(x)/x]. Define 2+K max( a m y m= K 2, a 2 m y 2 )] (3.) j= K max(xa j, a j+ ) + xa K2, (3.2) M(x) = {j : a j+ a j > x} (3.3) where we include K M(x), K 2 M(x) complement of M(x) for all x (0, ). Note that M(x) as x. Then q(x) = x j M(x) a j + j M(x) a j+, (3.4) and q (x) = j M(x) a j everywhere except when x is one of a K + a K, a K +2 a K +,, a K2 a K2. A typical q(x) picture is shown in Figure 3.. As x 0, q(x) K 2 K a j =. For x sufficiently large, q(x) = a K + x a j = a K + x. So if r < r 2 < < r p denote the p K + K 2 distinct values of a j+ a j, we can deduce from q(x), (i) a K, (ii) the values of r, r 2,, r p, (iii) all sums of the form j A(i) a j and j A(i) a j+ where A(i) = {j : a j+ a j = r i }. This is true because j A(i) a j is just the change in q (x) at x = r i, while j A(i) a j+ = r i j A(i) a j. 4

54 3.5 A typical q(x) picture q(x) x Figure 3.: A demo of q(x) and its slope q (x) and ratio change points. In particular, when all the ratios a j+ a j are distinct we can deduce all the r i s and the values of a j and a j+ = r i a j corresponding to each r i. However, the model may still be non-identifiable if there are non-trivial permutations of the a j s that preserve the r i s. are distinct, the model is uniquely identi- Proposition 3. If all ( K +K 2 ) + a 2 ratios j a j fied by q(x). The reason why Proposition 3. is true is that in this case, any permutation of the a j s must create a new set of values of r,, r K +K 2. Remark : This justifies statements like for almost all (w.r.t Lebesgue measure) choices of coefficients a K,, a K2, the model is identifiable from q(x). Remark 2: The uniqueness means the values of the vector are uniquely determined. (a K, a K +,..., a K2 ) following two processes without further analysis. The reason is because we can not simply distinguish the Y i = max(.2z i,.3z i,.5z i+ ), Y i = max(.2z i,.3z i+,.5z i+2 ). 42

55 But it should be no ambiguity that we treat them as one model since they have the same joint distribution functions within each sequence. Since we use the bivariate distribution to construct estimators of parameters, from the previous arguments, if the condition of Proposition 3. is false then we may not be able to identify the model. But we may be able to identify the model via some higher-order joint distribution. We now construct an artificial example to demonstrate this idea. Example 3. This is a counterexample to show a process that is not identifiable via the bivariate joint distribution, but can be identifiable from the trivariate joint distribution. Let (a 0,, a 4 ) = (,, 2,, ) and (b 6 0,, b 4 ) = (, 2,,, ). We consider the two 6 processes generated by the sequences a 0,, a 4 and b 0,, b 4. Then p = 3, r =, r 2 2 =, r 3 = 2 for both configurations, so q(x) is the same and displayed in Figure 3. with 0 < x < 6, 2 q 2 (x) = < x <, 2 5 < x < 2, 6 2 < x i.e. we can t distinguish the a i s from the b i s on the basis of q(x). However, consider the formula log(pr(y y, Y 2 y 2, Y 3 y 3 )) = a 4 y + max( a 3 y, a 4 y 2 ) + max( a 2 y, a 3 y 2, a 4 + max( a y, a 2 y 2, a 3 y 3 ) + max( a 0 y, a y 2, a 2 + max( a 0 y 2, a y 3 ) + a 0 y 3 and let y =, y 2 = y 3 = c where c > 2. With a = (,, 2,, ): 6 With a = (, 2,,, ): 6 log P = 6 [ c + c ] = + 3c, log P = 6 [ c + c ] = + 2c, So the two values of Pr(Y y, Y 2 y 2, Y 3 y 3 ) are distinct in this case. In other words, the two possible models for a are distinguishable from their trivariate distributions, but not bivariate. y 3 ) y 3 ) However this is a specific example where we need trivariate distribution function, in most cases bivariate distributions are enough. 43

56 Now we turn to Case L >, we have P (Y y, Y 2 y 2 ) = exp[ where a l,k2 + = 0, a l, K = 0. And L l= 2+K max( a l, m y m= K 2, a l,2 m y 2 )] (3.5) q(x) = L [a l, K + l= K 2 j= K max(xa lj, a l,j+ ) + xa l,k2 ]. (3.6) And similarly, q(x) is a piecewise linear function and its change points are those adjacent ratios of the coefficients. We have Proposition 3.2 If all ( L (K +K 2 ) +) a 2 ratios lj a lj identified by q(x). are distinct, the model is uniquely Proof.. Since all the ratios are different and are points at which q(x) changes slopes or q (x) has jumps. So based on the jump points of q(x), the ratios of a l,j+ a lj determined. Let s now rewrite (3.6) as are uniquely q(x) = L b l [c l, K + l= K 2 j= K max(xc lj, a l,j+ ) + xc l,k2 ] (3.7) where j c lj = for each l and all c lj are uniquely determine by the ratios. We also write q(x) as q(x) = L 2+K xb l max(c l, m, c l,2 m )]. (3.8) x m= K 2 l= Suppose now q(x) has a different representation, say q(x) = L l= xb l 2+K max(c l, m, c l,2 m x m= K 2 )] (3.9) then L (b l b l) l= 2+K max(c l, m, c l,2 m x m= K 2 )] = 0 (3.0) for all x. 44

57 Suppose we have chosen x, x 2,..., x L and formed the matrix 2+K max(c, m,d, c 2+K,2 m,d x ) max(c L, m,d, c L,2 m,d x 2 ) m= K 2 m= K 2 d = K max(c, m,d, c 2+K,2 m,d x 3 ) max(c L, m,d, c L,2 m,d x L ) m= K 2 m= K 2 and 2+K max(c, m,d, c 2+K,2 m,d x ) max(c L, m,d, c L,2 m,d x 2 ) m= K 2 m= K 2 b b K max(c, m,d, c 2+K,2 m,d x 3 ) max(c L, m,d, c L,2 m,d x L ). = 0 m= K 2 m= K 2 b L b L we can follow the lines after (2.28) and show d 0, then conclude b l = b l, all l. So q(x) uniquely determine all a l,j Asymptotics for the case of D = Now we define L l= [ a l,k2 + max(a l,k2, a l,k 2 x + max(a l,k2 3, a l,k 2 2 x ) + max(a l,k 2 2, a l,k 2 ) x ) + + max(a l, K, a l, K + x so we have q(x) = xb(x). Let Y, Y 2,, Y n be observed values and then q(x) = x b(x). b(x) = log( n Theorem 3.3 For each x, we have where σ 2 = µx µ2 x +2 K +K 2 + ) + a l, K x ] = b(x)(3.) n I (Yi,Y i+ x)), (3.2) i= n( b(x) b(x)) d N(0, σ 2 ) k= (Pr(Y,Y 2 x,y +k,y 2+k x) µ 2 x ) µ 2 x, µ x = Pr(Y, Y 2 x). Proof. Directly apply Proposition 2.5 and

58 Corollary 3.4 For each x, we have d n( q(x) q(x)) N(0, x 2 σ 2 ) Now suppose the functions b(x), q(x), b(x), q(x) are evaluated at x = x, x 2,, x m, then we have the following theorem. Theorem 3.5 where where n( b b) K +K 2 + d N(0, Θ(Σ + {W k + W k})θ ), k= b(x ) b(x ) µ 0 0 b(x2 ) b =., b = b(x 2 )., Θ = 0 µ b(xm ) b(x m ) 0 0 µ m µ i = Pr(Y, Y 2 x i ), µ ij = Pr(Y, Y 2 min(x i, x j )), σ ij = µ ij µ i µ j, w ij k = Pr(Y, Y 2 x i, Y +k, Y 2+k x j ) µ i µ j, µ ii = µ i. Proof. This follows from Lemma 2.2, Proposition 2.5 and the same arguments in the proof of Theorem 2.7. Theorem 3.5 indicates that when the sample size n is sufficiently large, b b is an asymptotically normally distributed random vector. If the points x i s are used to get estimates of a i s, we have the following theorems. Theorem 3.6 If all the ratios of parameters in the model (2.22) are different, then (3.) uniquely determine all a lk. Proof. We only prove the case when L = and all the ratios are different. Similar proof for L > can be done. Now suppose we have different b i such that b K2 + max(b K2, b K 2 x ) + max(b K 2 2, b K 2 x ) + max(b K2 3, b K 2 2 ) + + max(b x K, b K + ) + b K x x and suppose {a i } are the true values in (3.). Define = b(x) (3.3) r i = a K 2 i a K2 i, r i = b K 2 i, i = 0,,..., K + K 2 +. b K2 i Let {r i } be a permutation of {r i } such that 0 = r 0 < r < r 2 < < r K +K 2 + = 46

59 then when x, y are varying within (r i, r i+), b(y) b(x) = a K y a K x + a k+ >r a k i since max(a k, a k+ ) is a x k when a k+ a k < x, and a k+ otherwise. x When x is in (ri, ri+) and y is in (ri+, ri+2), then b(y) b(x) = a K y a K x + a k a k x + ( a k y a k x ) (3.4) a k+ a k >r i+ ( a k y a k x ) (3.5) for some k since max(a k, a k+ ) = a x k implies max(a k, a k+ ) = a y k and there exists a k such that max(a k, a k+ ) = a k x x and max(a k, a k+ ) = a y k. Let {r i} be a permutation of {r i} such that 0 = r 0 < r < r 2 < < r K +K 2 + = and without loss of generality we assume r 0 < r 0, then let x < r 0, r 0 < y < r 0, then (3.4) and (3.5) give different b(y) b(x), and hence we must have r 0 = r 0 and therefore r i = r i, i.e. r i = r i, all i. Within (0, r 0), we have Within (r K +K 2, ), we have b(x) = a K2 + x = b K 2 + x which gives a K = b K, a K2 = b K2. Within(r K +K 2, r K +K 2 ), we have b(x) = + a K x b(y) b(x) = a K y a K x + = b K y b K x + = + b K x a k+ a k >r K +K 2 b k+ >r b k K +K 2 ( a k y a k x ) ( b k y b k x ) which gives a i = b j for i K, j K, inductively, we have a i = b j = a i, but by Proposition 3., we have all a i = b i. So the proof is completed. Remark: the conditions of Proposition 3. are stronger than necessary and can be weakened. 47

60 Now let Y, Y 2,, Y n be observed values and b(x) = log( n n i= I (Yi,Y i+ x)) which we expect to get (K + K 2 ) r i such that (3.4) and (3.5) are true when b(x) is replaced by b(x) and a i are replaced by â i. a.s. By the ergodic theorem, b(x) b(x) as n. Now suppose the model is identifiable from x,..., x m, where these values are different from the ratios of true parameters as stated in Corollary 3., and define b = (b(x ),..., b(x m )), and b = ( b(x ),..., b(x m )) a.s., then b b. Suppose the probability space is (Ω, F, P ), then there exists an A F such that Pr(A) = and for each ω A, b ω b. Notice that a permutation of l and l will not change the value of b(x) in (3.). This can be easily seen in the following example. Example 3.2 The following two processes 0.05Z,i, 0.Z,i, 0.03Z,i+ Y i = max 0.5Z 2,i, 0.2Z 2,i, 0.02Z 2,i+ 0.6Z 3,i, 0.7Z 3,i, 0.2Z 3,i+ 0.6Z,i, 0.7Z,i, 0.2Z,i+ Y i = max 0.05Z 2,i, 0.Z 2,i, 0.03Z 2,i+ 0.5Z 3,i, 0.2Z 3,i, 0.02Z 3,i+ have the same joint distributions. Unless we specifically say l and l are not permutable, otherwise we allow those l s in (3.) are permutable. Suppose the solutions of L [â l,k2 + max(â l,k2, âl,k 2 l= x ) + max(â l,k2 2, âl,k 2 x ) + max(â l,k2 3, âl,k 2 2 x ) + + max(â l, K, âl, K + x ) + âl, K x ] = b ω (x ) (3.6) L [â l,k2 + max(â l,k2, âl,k 2 l= x m ) + max(â l,k2 2, âl,k 2 x m ) + max(â l,k2 3, âl,k 2 2 x m ) + + max(â l, K, âl, K + x m ) + âl, K x m ] = b ω (x m ) 48

61 are â ω. This is equivalent to C nω â ω = b ω in matrix notations where C nω is uniquely determined by â ω. The elements of C nω are either, x i or + x i. And the solutions of L [a l,k2 + max(a l,k2, a l,k 2 ) + max(a l,k2 2, a l,k 2 x ) l= x + max(a l,k2 3, a l,k 2 2 x ) + + max(a l, K, a l, K + x ) + a l, K x ] = b(x ) (3.7) L [a l,k2 + max(a l,k2, a l,k 2 l= x m ) + max(a l,k2 2, a l,k 2 x m ) + max(a l,k2 3, a l,k 2 2 x m ) + + max(a l, K, a l, K + x m ) + a l, K x m ] = b(x m ) are a. And similarly this is equivalent to Ca = b. Since b ω b, which implies the estimated ratios converge to the true ratios, then â ω â as n by Theorem 3.6 and the assumption that the model is identifiable from x,..., x m. max(â l,k2 i, âl,k 2 i+ x k words, x k remains in intervals ( âl,k 2 i+ â l,k2 i But â l,k2 j+ â l,k2 j ) converges to max(a l,k2 i, a l,k 2 i+ x k ) for each i and k. In other, ) for some i and j when n is sufficiently large. So C nω C as n. And so we have proved the following theorem. Theorem 3.7 Suppose the model is identifiable from x,..., x m, where these values are different from the ratios of true parameters, then the solutions of (3.6) converge to the solutions of (3.7) almost surely. i.e. â a.s. a.s. a and C n C. Since the elements of both C n and C are either, x i or + x i, then for sufficiently large n, we have C n = C. Bearing this in mind, we have the following multivariate central limit theorem. Theorem 3.8 Suppose the model is identifiable from x,..., x m, where these values are different from the ratios of true parameters, then n(â a) K +K 2 + d N(0, BΘ(Σ + {W k + W k})θ B ) where B = (C C) C, Θ, Σ and W k are defined the same as in Proposition 2.5, Theorem 2.7 and Lemma 2.2. Proof. Since (C nc n ) C n a.s. (C C) C so k= n(â a) = n((c n C n ) C n b (C C) C b) = n(c nc n ) C n( b b) + n((c nc n ) C n (C C) C )b d N(0, BΘ(Σ + K +K 2 + k= {W k + W k })Θ B ). and hence this proves the theorem. 49

62 3.2.3 Asymptotics for a special case of L = and D > For all â l,k,d, a similar asymptotic joint normal distribution result of Corollary 2.8 can be obtained if the permutability of index l in each d are not an issue. But this is not the case in general, so we will not give the asymptotic joint distribution here based on the estimates obtained one component at a time since we didn t simultaneously estimate â l,k,d. We illustrate a special case here under which we can identify all parameters based on estimating each single process and putting all estimates together. We will discuss why this special case can not be generalized in section 3.3. Also in section 3.3 a method which can simultaneously estimate all parameters for the cases of L > and D > is proposed. We now consider the model Y id = where a k,d =, a k,d 0 for each d. k By Proposition 3., [a K2,d + max(a K2,d, a K 2,d x + max(a K2 3,d, a K 2 2,d x max a k,d Z i k, d =,..., D (3.8) K k K 2 ) + max(a K 2 2,d, a K 2,d ) (3.9) x ) + + max(a K,d, a K +,d x ) + a K,d ] = b d (x) x uniquely determines a k,d for each d when the values of b d (x) are given. But we just can not simply put all values obtained from (3.9) and form (3.8) because for some d, (3.9) may give different vector values of (a K,d, a K +,d,..., a K2,d), for example when K + K 2 + = 4 we may get something like (0,.2,.3,.5) or (.2,.3,.5, 0). But their functions in (3.8)are different and will result in a different multivariate joint distribution. The following proposition shows that under certain conditions we can simply put the solutions of (3.9) for each d together and then those solutions uniquely determine the true model (3.8). 50

63 Proposition 3.9 Suppose a K,d > 0, a K2,d > 0 for all d, and all ratios a j+,d a jd distinct for each d, then are [a K2,d + max(a K2,d, a K 2,d x + max(a K2 3,d, a K 2 2,d x d =,..., D ) + max(a K 2 2,d, a K 2,d ) (3.20) x ) + + max(a K,d, a K +,d x ) + a K,d ] = b d (x) x uniquely determine the matrix a K,, a K +,,..., a K2, (3.2) a K,D, a K +,D,..., a K2,D Furthermore, there exist points such that x d, x 2d,..., x md, d =,..., D [a K2,d + max(a K2,d, a K 2,d ) + max(a K2 2,d, a K 2,d ) (3.22) x jd x jd + max(a K2 3,d, a K 2 2,d x jd d =,..., D uniquely determine the matrix in (3.2). ) + + max(a K,d, a K +,d x jd ) + a K,d x jd ] = b d (x jd ) The proof of this proposition is obvious by noticing that a K,d > 0, a K2,d > 0 for all d. Now let A jd = (0, ) (0, x jd ), for j =,..., m, d =,..., D and define X Ajd = n I Ajd (Y id, Y i+,d ), (3.23) n Lemma 3.0 n where i= µ jd = E{I Ajd (Y id, Y i+,d )} = P (Y d, Y 2d x jd ), (3.24) µ jdj d = E{I A jd (Y id, Y i+,d )I Aj d (Y id, Y i+,d )} = P (Y d, Y 2d x jd, Y d, Y 2d x j d ), (3.25) X A. X Am X A2. X AmD µ. µ m µ 2. µ md d N(0, Σ + K +K 2 + k= {W k + W k }) 5

64 Σ has entries σ rs = µ jdj d µ jdµ j d, the matrix W k has entries wk rs = P (Y d, Y 2d x jd, Y +k,d, Y 2+k,d x j d ) µ r jdµ j d for d = [ ], j = r (d ) m, m d = [ s ], m j = s (d ) m. Proof. This can be done exactly as the proof of Lemma 2.2. We now state the asymptotic joint distribution of all â k,d in the following theorem. Theorem 3. Suppose the model is identifiable from x d,..., x md for each d, where these values are different from the ratios of true parameters, then under the conditions in Proposition 3.9 n(â a) K +K 2 + d N(0, BΘ(Σ + {W k + W k})θ B ) where C d is the matrix formed from (3.22) when dth model is considered. Θ = diag{ µ,..., µ m,..., µ md }, C = diag{c, C 2,..., C D }, B = (C C) C Simulation examples In this section we perform simulation studies. The first model illustrates a simulated M4 process with two signature patterns where each pattern has order of 2 and the k= second model adds Gaussian noise into the first model. Example 3.3 We perform two simulation experiments with the following two processes. Y i = max(.z,i,.4z,i,.35z 2,i,.5Z 2,i ) (3.26) and Y i = max(.z,i,.4z,i,.35z 2,i,.5Z 2,i ) + N i (3.27) where N i N(0,.0) are i.i.d. We plot the ratios Y i Y i +Y i+ for both models. Plots in Figure 3.2 look almost exactly the same. However, when a portion of the plot is magnified, as in Figure 3.3, we can see the difference. We now apply estimating methods developed in previous sections and list all results in the following tables. The estimated values are based on a sample of size The standard deviations are obtained by evaluating the formula in Theorem 3.8 with the true values 52

65 Figure 3.2: The left plot is the ratios of model (3.26). The right plot is the ratios of model (3.27). Y i Y i +Y i+ Y i Y i +Y i+ at the threshold level 0 under the at the threshold level 0 under the Figure 3.3: The left plot is the ratios around.3 with distance.0 at the threshold level 0 under the model (3.26). The right plot is the ratios.3 with distance.0 at the threshold level 0 under the model (3.27). Parameter a, a,0 a 2, a 2,0 True value Estimated value Standard Deviation Table 3.: Simulation results for model (3.26). x =(0.324, ,.0275,.3778,.6789, , , 4.547). 53

66 Parameter a, a,0 a 2, a 2,0 True value Estimated value Standard Deviation Table 3.2: Simulation results for model (3.26). x =(0.324, ,.5283, 2.979, 4.547). Parameter a, a,0 a 2, a 2,0 True value Estimated value Standard Deviation Table 3.3: Simulation results for model (3.27). x =(0.324, ,.0277,.3782,.6799, , , ). Parameter a, a,0 a 2, a 2,0 True value Estimated value Standard Deviation Table 3.4: Simulation results for model (3.27). x =(0.325, ,.5279, , ). 54

67 Parameter a, a,0 a 2, a 2,0 True value Estimated value Standard Deviation Table 3.5: Simulation results for model (3.26) Parameter a, a,0 a 2, a 2,0 True value Estimated value Standard Deviation Table 3.6: Simulation results for model (3.27) approximated by the empirical values. These simulation experiments show that the effectiveness of the estimating procedures proposed. The estimated values in Tables 3.5 and 3.6 are mean values of estimates based on 00 replications of sample size The standard deviations are sample standard deviations. 3.3 Modeling temporal and inter-serial dependence As we mentioned in section we can t just estimate the coefficient one component at a time and then put them all together to derive the full model for the joint distribution of the multivariate processes, even if each single component process only has one signature pattern. Example 3.2, in section 3.2.2, showed that two different processes have the same distribution functions. The reason is because the coefficients in the second process are permutations of the coefficients of the first process. The permutations are on index l. Proposition 3.2 actually tells that all the values of a l,k are uniquely determined by b(x) when the permutation of index l is allowed. But this is not the case when we have multivariate processes. We use the following artificial bivariate processes to illustrate why this is not the case. 55

68 Suppose we have the bivariate processes 0.05Z,i, 0.0Z,i, 0.03Z,i+ Y i = max 0.2Z 2,i, 0.6Z 2,i, 0.09Z 2,i+ 0.6Z 3,i, 0.7Z 3,i, 0.2Z 3,i+ 0.0Z,i, 0.3Z,i, 0.7Z,i+ Y i2 = max 0.07Z 2,i, 0.04Z 2,i, 0.03Z 2,i+ 0.Z 3,i, 0.2Z 3,i, 0.23Z 3,i+ (3.28) Suppose we have the observed values {Y i, Y i2 } and get estimates based on the methods developed for single component process. A possible representation or estimation of process {Y i } could be 0.6Z,i, 0.7Z,i, 0.2Z,i+ Y i = max 0.05Z 2,i, 0.0Z 2,i, 0.03Z 2,i+ (3.29) 0.2Z 3,i, 0.6Z 3,i, 0.09Z 3,i+ and a possible representation or estimation of process {Y i2 } could be 0.0Z,i, 0.3Z,i, 0.7Z,i+ Y i2 = max 0.Z 2,i, 0.2Z 2,i, 0.23Z 2,i+ (3.30) 0.07Z 3,i, 0.04Z 3,i, 0.03Z 3,i+ Note: we used the exact coefficients of original processes in these two representations, in real situation this may not be the case. What we do here is just for illustration. Now if we put the two estimated processes together, we have 0.6Z,i, 0.7Z,i, 0.2Z,i+ Y i = max 0.05Z 2,i, 0.0Z 2,i, 0.03Z 2,i+ 0.2Z 3,i, 0.6Z 3,i, 0.09Z 3,i+ 0.0Z,i, 0.3Z,i, 0.7Z,i+ i2 = max 0.Z 2,i, 0.2Z 2,i, 0.23Z 2,i+ 0.07Z 3,i, 0.04Z 3,i, 0.03Z 3,i+ Y (3.3) It is obvious {Y i } and {Y i} have the same joint distributions, {Y i2 } and {Y i2} have the same distributions, but {(Y i, Y i2 )} and {(Y i, Y i2)} don t have the same joint distributions. This can be seen from Figure Inter-serial dependence We now consider modeling spatial dependence of multivariate time series. We use a similar structure as we used for modeling time dependence, see section 3.2. What we 56

69 5.5 5 Original Permuted q ddp (x) x values Figure 3.4: A demo of two different bivariate processes. The blue curve are drawn from the original process. The red curve are drawn from the permuted process. q ddp (x) = x log(p (Y, Y 2 x)). do in this section is to estimate parameters based on the joint distribution of a pair of random sequence {(Y id, Y id )}, where d d, d, d D. It is easy to derive that Pr{Y d y d, Y d y d } = exp[ and simply we have L l= +K max( a l, m,d y,d m= K 2, al, m,d )] y,d Pr{Y d, Y d x} = exp[ L l= +K m= K 2 max(a l, m,d, a l, m,d )]. x Define b dd (x) = L l= +K bdd (x) = log( n m= K 2 max(a l, m,d, q dd (x) = xb dd (x), n i= q dd (x) = x b dd (x), I (Yid, Y id x)), a l, m,d ), (3.32) x then it s obvious that as n bdd (x) a.s. b dd (x), q dd (x) a.s. q dd (x). 57

70 Like section 3.2, we assume now all ratios a l,k,d a l,k,d are distinct for all l and k. Then b dd (x) or q dd (x) can uniquely determine all ratios a l,k,d a l,k,d since these ratios are the jump points of piecewise linear function of q dd (x), but b dd (x) or q dd (x) can not uniquely determine all a l,k,d and a l,k,d since b dd (x) doesn t distinguish the index k and the index l Temporal and inter-serial dependence We now combine (3.) and (3.32) together as a system of nonlinear equations. b d (x) = L l= [a l,k 2,d + max(a l,k2,d, a l,k 2,d ) + max(a x l,k2 2,d, a l,k 2,d ) x + max(a l,k2 3,d, a l,k 2 2,d ) + + max(a x l, K,d, a l, K +,d ) + a l, K,d ] x x b d (x) = L l= [a l,k 2,d + max(a l,k 2,d, a l,k 2,d ) + max(a x l,k2 2,d, a l,k 2,d ) x + max(a l,k2 3,d, a l,k 2 2,d ) + + max(a x l, K,d, a l, K +,d ) + a l, K,d (3.33) ] x x b dd (x) = L +K max(a l, m,d, al, m,d ) x m= K 2 l= then we will show (3.33) uniquely determine all parameters a l,k,d, a l,k,d. Proposition 3.2 Suppose all ratios a l,j,d a l,j,d a l,j,d a l,j,d for all l and j j are distinct, and all ratios a l,k,d a l,k,d then (3.33) uniquely determine all values of a l,k,d and a l,k,d. for all l and j j are distinct, all ratios for all l and k are distinct, Furthermore, there exist points x, x 2,..., x m, m 3L(K + K 2 + 2), such that b d (x i ) and b dd (x i ), i =,..., m uniquely determine all values of a l,k,d and a l,k,d. Proof. By Proposition 3.2, b d (x) and b d (x) uniquely determine all values of parameters a l,k,d and a l,k,d respectively. So we can get (a l, K,d, a l, K +,d,..., a l,k2,d), l =,..., L and (a l, K,d, a l, K +,d,..., a l,k 2,d ), l =,..., L. Since all ratios a l,k,d a l,k,d are distinct, any permutation of index l in a l,k,d will result in different ratios which will be different from the jump points of q dd (x), so the jump points of q dd (x) uniquely determine ( a l, K,d a l, K,d, a l, K +,d a l, K +,d,, 58 a l,k2,d a l,k2,d )

71 for some l and l. So (3.33) eventually uniquely determine all the true values of all parameters a l,k,d and a l,k,d. The reason why x, x 2,..., x m uniquely determine all values of a l,k,d and a l,k,d because q d (x), q d (x) and q dd (x) are piecewise linear functions which can be uniquely determined by finite number of points as long as there are at least two points between any two jump points. Proposition 3.3 Suppose all ratios a l,j,d for all l and j j are distinct for each a l,j,d d =,..., D and a l,k, for all l, l and k are distinct for each d = 2,..., D, then a l,k,d b d (x) = L l= [a l,k 2,d + max(a l,k2,d, a l,k 2,d ) + max(a x l,k2 2,d, a l,k 2,d ) x + max(a l,k2 3,d, a l,k 2 2,d ) + + max(a x l, K,d, a l, K +,d ) + a l, K,d ], x x d =,..., D b d (x) = L +K max(a l, m,, al, m,d ), d = 2,..., D, x m= K 2 l= uniquely determine all values of a l,k,d, d =,..., D, l =,..., L, K k K 2. Furthermore, there exist points x, x 2,..., x m, m (2D )(K + K 2 + ) + 2D, such that b d (x i ) and b d (x i ), i =,..., m, d =,..., D, d = 2,..., D uniquely determine all values of a l,k,d. is Proof.. This can be done by following the arguments in Proposition The estimators and asymptotics Now for suitable choice of x d, x 2d,..., x md, d =,..., D, x d, x 2d,..., x m d, d = 2,..., D, define U d (x jd ) = n I (Yid, Y n i+,d x jd ), j =,..., m, d =,..., D, i= bd (x jd ) = log(u d (x jd )), j =,..., m, d =,..., D, U d (x jd ) = n I (Yi, Y n id x jd ), j =,..., m, d = 2,..., D, i= 59

72 bd (x jd ) = log(u d (x jd )), j =,..., m, d = 2,..., D, b (x U (x ) ) U (x 2 ) b (x 2 ).. U (x m ) b (x m ) U 2 (x 2 ) b2 (x 2 ).. U U = D (x md ) bd (x U 2 (x, b = md ) 2) b2 (x U 2 (x. 2) 22) b2 (x 22). U 2 (x m 2 ). b2 U 3 (x (x m 3) 2 ) b3 (x 3). U D (x m D ). bd (x m D ) Let µ djd = E(U d (x jd )) = Pr(Y d, Y 2d x jd ), d =,..., D, j =,..., m. µ d jd = E(U d (x j d )) = Pr(Y, Y d x j d ), d = 2,..., D, j =,..., m. µ djd, d j d = E[(I (Y d,y 2d x jd ) µ djd )(I (Yd,Y 2d x j d ) µ d j d )] = Pr(Y d, Y 2d x jd, Y d, Y 2d x j d ) µ djdµ d j d, d, d =,..., D, j, j =,..., m. µ djd, d j d = E[(I (Y d,y 2d x jd ) µ djd )(I (Y,Y d x j d ) µ d j d )] = Pr(Y d, Y 2d x jd, Y, Y d x j d ) µ djdµ d j d, d =,..., D, j =,..., m, d = 2,..., D, j =,..., m. µ d j d, djd = E[(I (Y,Y d x j d ) µ d j d )(I (Y d,y 2d x jd ) µ djd )] = Pr(Y d, Y 2d x jd, Y, Y d x j d ) µ djdµ d j d = µ djd, d j d, d =,..., D, j =,..., m, d = 2,..., D, j =,..., m. µ djd, d j d = E[(I (Y,Y d x jd ) µ djd )(I (Y,Y d x j d ) µ d j d )] = Pr(Y, Y d x jd, Y, Y d x j d ) µ djdµ d j d, d = 2,..., D, j =,..., m, d = 2,..., D, j =,..., m. 60

73 w (k) djd, d j d = E[(I (Yd,Y 2d x jd ) µ djd )(I (Y+k,d,Y 2+k,d x j d ) µ d j d )] = Pr(Y d, Y 2d x jd, Y +k,d, Y 2+k,d x j d ) µ djdµ d j d, d, d =,..., D, j, j =,..., m. w (k) djd, d j d = E[(I (Yd,Y 2d x jd ) µ djd )(I (Y+k,,Y +k,d x j d ) µ d j d )] = Pr(Y d, Y 2d x jd, Y +k,, Y +k,d x j d ) µ djdµ d j d, d =,..., D, j =,..., m, d = 2,..., D, j =,..., m. w (k) d j d, djd = E[(I (Y,,Y,d x j d ) µ d j d )(I (Y +k,d,y 2+k,d x jd ) µ djd )] = Pr(Y,, Y,d x j d, Y +k,d, Y 2+k,d x jd ) µ djd µ d j d, d =,..., D, j =,..., m, d = 2,..., D, j =,..., m. w (k) djd, d j d = E[(I (Y,Y d x jd ) µ djd )(I (Y+k,,Y +k,d x j d ) µ d j d )] = Pr(Y, Y d x jd, Y +k,, Y +k,d x j d ) µ djdµ d j d, d = 2,..., D, j =,..., m, d = 2,..., D, j =,..., m. µ = µ µ 2. µ m µ 22. µ DmD µ 22 µ 222. µ 2m 2 µ 33. µ Dm D = µ µ 2. µ m µ m+. µ D m µ D m+ µ D m+2. µ D m+m µ D m+m +. µ D m+(d )m b (x ) b (x 2 ). b (x m ) b 2 (x 2 ). b, b = D (x md ) b 2 (x. 2) b 2 (x 22). b 2 (x m 2 ) b 3 (x 3). b D (x m D ) We have the following relations { µ djd µ r, d = [ r ] +, j = r (d ) m if r D m, m µ d jd µ r, d = [ r D m ] + 2, j = r D m (d 2) m if r > D m,. m 6

74 We now use the similar relations between the indexes of µ djd and the indexes of µ r define the following variables. and σ rs = w rs k = µ djd,d j d µ djd,d j d µ djd,d j d µ djd,d j d w (k) djd,d j d w (k) djd,d j d w (k) djd,d j d w (k) if, r D m, s D m if, r D m, s > D m if, r > D m, s D m if, r > D m, s > D m. if, r D m, s D m if, r D m, s > D m if, r > D m, s D m djd,d j d if, r > D m, s > D m. Σ = (σ rs ), W k = (w rs k ), Θ = (diag{µ}). We now put everything above together. Then we obtain the following lemma. Its proof is just simply following the lines used in lemma 2.2. Lemma 3.4 For the choices of x jd, x j d we have n(u µ) n( b b) and the definitions of each variables above, K +K 2 + d N(0, Σ + {W k + W k}) k= K +K 2 + d N(0, Θ(Σ + {W k + W k})θ ). Now consider the system of non-linear equations b d (x jd ) = L l= [a l,k 2,d + max(a l,k2,d, a l,k 2,d x jd ) + max(a l,k2 2,d, a l,k 2,d x jd ) + max(a l,k2 3,d, a l,k 2 2,d x jd ) + + max(a l, K,d, a l, K +,d x jd ) + a l, K,d x jd ] j =,..., m, d =,..., D (3.34) b d (x j d ) = L +K max(a l, m,d, al, m,d ) x l= m= K j d 2 j =,..., m, d = 2,..., D and denote the left hand side of (3.34) as b. Since (3.34)uniquely determine the values k= of all parameters a l,k,d, (3.34)has the matrix representation b = Ca (3.35) or equivalently (C C) C b = a. (3.36) 62

75 We now obtain our estimators by solving the system of non-linear equations bd (x jd ) = L l= [â l,k 2,d + max(â l,k2,d, âl,k 2,d x jd ) + max(â l,k2 2,d, âl,k 2,d x jd ) + max(â l,k2 3,d, âl,k 2 2,d x jd ) + + max(â l, K,d, âl, K +,d x jd ) + âl, K,d x jd ] j =,..., m, d =,..., D (3.37) b d (x j d ) = L +K max(â l, m,d, âl, m,d ) x l= m= K j d 2 j =,..., m, d = 2,..., D As n sufficiently large (3.37) can be written as the following matrix representation (C C) C b = â. (3.38) Summarize all arguments above we have obtained the following theorem Theorem 3.5 If all ratios a l,j,d and a l,k, a l,k,d a l,j,d for all l and j j are distinct for each d =,..., D for all l, l and k are distinct for each d = 2,..., D, of the multivariate processes {Y id }, then there exist such that the estimators â satisfies n(â a) where B = (C C) C Simulation examples x d, x 2d,..., x md, d =,..., D, x d, x 2d,..., x m d, d = 2,..., D, K +K 2 + d N(0, BΘ(Σ + {W k + W k})θ B ) We continue the simulation examples used in section and consider now the bivariate processes k= [ ] 0.0Z,i, 0.40Z Y i = max,i 0.35Z 2,i, 0.5Z 2,i [ ] 0.35Z,i, 0.5Z Y i2 = max,i 0.0Z 2,i, 0.40Z 2,i (3.39) We first generate data by simulating these bivariate processes, then based on the simulated data re-estimate all coefficients simultaneously and compute their asymptotic 63

76 Parameter a,, a,0, a 2,, a 2,0, a,,2 a,0,2 a 2,,2 a 2,0,2 True value Estimated value Standard Deviation Table 3.7: Simulation results for model covariance matrix. Notice that the coefficients in the second process are the permuted coefficients in the first process. The purpose here is to apply estimators developed in section and compare the asymptotic standard deviations obtained by Theorem 3.5 with those obtained in section Table 3.7 is obtained using simulated data with a sample size of If we compare Table 3.2 with Table 3.7, we find the standard deviations in Table 3.7 are smaller than those in Table 3.2. We think this is because the model in section uses more data and data information than other models use. 3.4 Weighted least squares estimation In this section, we assume the model is identifiable from the bivariate distributions. Proposition 3.2 has given sufficient conditions for this. From (3.4) and finite number of {x, x 2,, x K +K 2, x K +K 2 +} such that x < r < x 2 < r 2 < < x K +K 2 < r K +K 2 < x K +K 2 +, then the model is identified by q(x ), q(x 2 ),..., q(x K +K 2 +). And we have Corollary 3.6 Suppose the model is identifiable w.r.t. bivariate joint distribution. Let r 0 = 0, x 0 = 0, {x, x 2,, x m } are m points such that x < r, x m > r K +K 2, and min(x i+ x i, i = 0,, m) < min(r i+ r i, i = 0,, K + K 2 ) then the model is identified by q(x ), q(x 2 ),..., q(x m ). Under the assumptions in Proposition 3., solving a j s iteratively is equivalent to an optimizing problem min aj = a j >0 m (x i i= j M(x i ) a j + 64 j M(x i ) a j+ q(x i )) 2 (3.40)

77 which gives least squares solutions. In practice, what we observed is q(x i ), or an estimation of q(x i ), for each i. Based on Corollary 3.4, for sufficiently large n, q(x i ) = q(x i ) + ɛ i, i =,, m (3.4) where ɛ i are normally distributed and correlated with variance-covariance matrix V, a generalization of Corollary 3.4. So the weighted least squares estimates are optimum solutions of min aj = a j >0 where q = (q(x ),, q(x m )), q = ( q(x ),, q(x m )). ( q q) V ( q q). (3.42) Suppose now â is a solution of (3.42), if we restrict the parameter space to a neighborhood of â, written δ(â), then each q(x i ) is a linear combination of a j s and so there exists a matrix C which depends on â but can be exactly determined such that min aj = ( q Ca) V ( q Ca) (3.43) a j >0,a δ(â) has a solution of â = (C V C) C V q = B b, where B = (C V C) C V diag{x,, x m }, q(x) = xb(x). And then we have a corollary, Corollary 3.7 n(â a) K +K 2 + d N(0, BΘ(Σ + {W k + W k})θ B ) k= Proof. By the same arguments as in the Theorem Threshold Methods In the real world, a time series may not follow the assumed statistical model. But in our applications the tail probability of large observations is the main concern, for example, what is the probability of a big price movement of next day given today s information on the stock market. There has been an extensive development of threshold methods in extreme value statistical research. In this section, we develop a unified procedure for 65

78 modeling within-cluster behavior at extreme levels by using M 4 processes to model the temporal dependence of exceedances. We assume those values above certain threshold value u are actually observed and follow the tail distribution of unit Fréchet. Those values below the threshold u are not observed or treated as zero. Our development is focusing on univariate time series data, but without any difficulty it can be easily extended to multivariate time series data with multiple thresholds. {Pr(Y i < u + x, Y i+ < u) = e L 2+K l= Pr(Y i < u, Y i+ < u + x) = e L 2+K l= m= K 2 max( a l, m u+x m= K 2 max( a l, m u, a l,2 m u ), a l,2 m u+x ) Let A (x) = (0, u) (0, u + x), A 2 (x) = (0, u + x) (0, u), and define X Aj (x) = n then for a fixed u, as n, n I Aj (x)(y i, Y i+ ), j =, 2. i= X A (x) X A2 (x) a.s. Pr(Y i < u, Y i+ < u + x) = p (x), a.s. Pr(Y i < u + x, Y i+ < u) = p 2 (x). We have or equivalently L 2+K l= m= K 2 max( a l, m L 2+K l= m= K 2 max( a l, m u L 2+K l= m= K 2 max( a l, m L 2+K l= m= K 2 max( a l, m u u+x, a l,2 m, a l,2 m u+x m, a l,2 m, a l,2 m ) = log(p u (x )) u+x ) = log(p 2 (x )). ) = log(p u (x m )) u+x m ) = log(p 2 (x m )) L 2+K a l= m= K 2 max(a l, m, l,2 m ) = log(p (x )) u/(u+x ) u+x L 2+K a l= m= K 2 max(a l, m, l,2 m ) = log(p 2(x )) (u+x )/u u. L 2+K a l= m= K 2 max(a l, m, l,2 m ) = log(p (x m)) u/(u+x m) u+x m L 2+K a l= m= K 2 max(a l, m, l,2 m ) = log(p 2(x m)) (u+x m)/u u (3.44) (3.45) The left hand side is similar to (3.7) and hence according to Theorem 3.6 and assuming unit Fréchet margin for those observed values at extreme level, P (Y i+ < u+x, Y i < u) and P (Y i+ < u, Y i < u + x) uniquely determine all the parameters since we can take logarithm transformation to get (3.), so the model is identified from (3.44) or (3.45). 66

79 Lemma 3.8 Let ζ n = ( X A (x ), XA2 (x ),..., X A (x m), XA2 (x m)) and µ = (p (x ), p 2 (x ),..., p (x m ), p 2 (x m )) then n(ζn µ) where the elements of Σ have the forms K +K 2 + d N(0, Σ + {W k + W k}) k= E[(I Ai (x s)(y, Y 2 ) p i (x s ))(I Aj (x t)(y, Y 2 ) p j (x t ))] = EI Ai (x s) A j (x t)(y, Y 2 ) p i (x s )p j (x t ) = p ij (s, t) p i (x s )p j (x t ) with i, j =, 2; s, t =, 2,..., m, and the elements of W k have the forms E[(I Ai (x s)(y, Y 2 ) p i (x s ))(I Aj (x t)(y +k, Y 2+k ) p j (x t ))] = EI Ai (x s)(y, Y 2 )I Aj (x t)(y +k, Y 2+k ) p i (x s )p j (x t ) = p k ij(s, t) p i (x s )p j (x t ) with i, j =, 2; s, t =, 2,..., m. By setting L l= 2+K L l= 2+K L 2+K l= L 2+K l= m= K 2 max( âl, m u+x, âl,2 m m= K 2 max( âl, m u m= K 2 max( âl, m u+x m, âl,2 m m= K 2 max( âl, m u ) = log( X u A (x )), âl,2 m u+x ) = log( X A2 (x )). ) = log( X u A (x m)), âl,2 m u+x m ) = log( X A2 (x m)) (3.46) we get Theorem 3.9 Suppose the model is identified from x, x 2,..., x m, where these values satisfy that u u+x i or u+x i u n(â a) are different from the ratios of true parameter, then K +K 2 + d N(0, BΘ(Σ + {W k + W k}))θ B ) for B = (C C) C, Σ and W k are defined the same as in Lemma 3.8 and Θ is defined by k= Θ = diag( p (x ), p 2 (x ),..., p (x m ), p 2 (x m ) ) 67

80 3.6 Multivariate domains of attraction and non unit Fréchet margins So far the models studied assumed the data follow the M4 process exactly. Some natural relaxations of the assumption would be that the distribution of underlying random variables belongs to the domain of attraction of a multivariate extreme value distribution and the marginal distribution of Z s is non unit Fréchet. In the univariate case, from these assumptions, efficient estimating methods based on threshold exceedances and generalized Pareto distribution have been developed. We now develop a similar estimating procedure in the multivariate context. First, we rephrase Theorem of Galambos (987) into Lemma 3.20 under the assumption the marginal distribution of bivariate extreme value distribution is unit Fréchet. Lemma 3.20 Let x = (x, x 2 ), F be the population distribution and F n (a n x + b n ) H(x) (3.47) where H(x) has Fréchet marginals H 2,ξj (x j ), j =, 2. Let F has the same univariate marginals F and F 2 which are eventually strictly increasing, then F belongs to the bivariate domains of attraction of H if and only if as u. F (ux, ux 2 ) F (u) log H(x) (3.48) We now use this lemma to construct estimators for all parameters a l,k,d. Assume that F belongs to the bivariate domains of attraction of H which has the distribution H(x) = exp[ L l= 2+K Substitute F (ux, ux 2 ) and F (u) by n n I (Yi u) respectively, then for all u. n Pr { n i= lim n n max( a l, m x m= K 2, a l,2 m x 2 )] (3.49) n I (Yi ux, Y i+ ux 2 ) and n i= I (Yi ux, Y i+ ux 2 ) n I (Yi u) i= 68 i= = log H(x) } = (3.50)

81 From (3.50) we can construct estimation methods for parameters a lk s. Let x =, then log H(x) = b(x 2 ) which was defined in (3.), i.e. n Pr { n i= lim n n I (Yi u, Y i+ ux) n I (Yi u) i= = b(x) } =. (3.5) for all u. For any fixed u, let A u (x) = (0, u) (0, ux) and X Au(x) = n n I Au(x)(Y i, Y i+ ) i= then X Au(x) a.s. P (Y u, Y 2 ux) = p u (x) Lemma 3.2 Let ζ un = ( X Au( ), XAu(x ),..., X Au(x m)) and µ u = (p u ( ), p u (x ),..., (p u (x m )) then n(ζun µ u ) where the elements of Σ u have the forms K +K 2 + d N(0, Σ u + {W uk + W uk}) k= E[(I Au(x s)(y, Y 2 ) p u (x s ))(I Au(x t)(y, Y 2 ) p u (x t ))] = EI Au(x s) A u(x t)(y, Y 2 ) p u (x s )p u (x t ) = p u (x s x t ) p u (x s )p u (x t ) with s, t =, 2,..., m, and the elements of W uk have the forms with s, t =, 2,..., m. E[(I Au(x s)(y, Y 2 ) p u (x s ))(I Au(x t)(y +k, Y 2+k ) p u (x t ))] = EI Au(x s)(y, Y 2 )I Au(x t)(y +k, Y 2+k ) p u (x s )p u (x t ) = p uk (s, t) p u (x s )p u (x t ) Let b u (x i ) = p u(x i ) p u ( ), i =,..., m 69

82 then the Jacobian matrix of transforming p u (x i ) into b u (x i ) has the form: p u(x ) ( p u( )) 2 p u( ) 0 0 p u(x 2 ) J = ( p u( )) 0 2 p u( ) p u(x m) ( p u( )) p u( ) so we have (3.52) Lemma 3.22 Let n n i= b un = ( n n then Let n(bun b u ) I (Yi u, Y i+ ux ) I (Yi u) i= n n i=,..., n b u = (b u (x ),..., b u (x m )) I (Yi u, Y i+ ux m) n ) I (Yi u) i= K +K 2 + d N(0, J(Σ u + {W uk + W uk})j ) bu (x) = Suppose the solutions of L [â l,k2 + max(â l,k2, âl,k 2 l= x n n k= I (Yi u, Y i+ ux) i= n n I (Yi u) i= ) + max(â l,k2 2, âl,k 2 x ) + max(â l,k2 3, âl,k 2 2 x ) + + max(â l, K, âl, K + x ) + âl, K x ] = b u (x ) (3.53) L [â l,k2 + max(â l,k2, âl,k 2 l= x m ) + max(â l,k2 2, âl,k 2 x m ) + max(â l,k2 3, âl,k 2 2 x m ) + + max(â l, K, âl, K + x m ) + âl, K x m ] = b u (x m ) are â u. This is equivalent to Ĉunâ u = b u in matrix notations where C un is uniquely determined by â u. And the solutions of L [a u l,k 2 + max(a u l,k 2, au l,k 2 ) + max(a u l,k 2 2, au l,k 2 x ) l= x + max(a u l,k 2 3, au l,k 2 2 x ) + + max(a u l, K, au l, K + x ) + au l, K x ] = b u (x ) (3.54) L [a u l,k 2 + max(a u l,k 2, au l,k 2 l= x m ) + max(a u l,k 2 2, au l,k 2 x m ) + max(a u l,k 2 3, au l,k 2 2 x m ) + + max(a u l, K, au l, K + x m ) + au l, K x m ] = b u (x m ) 70.

83 are a u. And similarly this is equivalent to C u a u = b u. And the solutions of L [a l,k2 + max(a l,k2, a l,k 2 l= x ) + max(a l,k2 2, a l,k 2 x ) + max(a l,k2 3, a l,k 2 2 x ) + + max(a l, K, a l, K + x ) + a l, K x ] = b(x ) (3.55) L [a l,k2 + max(a l,k2, a l,k 2 l= x m ) + max(a l,k2 2, a l,k 2 x m ) + max(a l,k2 3, a l,k 2 2 x m ) + + max(a l, K, a l, K + x m ) + a l, K x m ] = b(x m ) are a. And similarly this is equivalent to Ca = b. The following theorem can be obtained. Theorem 3.23 Suppose the model is identified from x, x 2,..., x m, where these values are different from the ratios of true parameter, then n(âu a u ) K +K 2 + d N(0, B u J(Σ u + {W uk + W uk})j B u) for B u = (C uc u ) C u, J, Σ u and W uk are defined the same as in Lemma 3.2, 3.22, (3.52). k= We now study the limiting distribution of n(â u a). By Lemma 3.22, for each u and any vector α, we have X un = n α ( b un b u ) σ αu d N(0, ) as n where σ αu = α J(Σ u + K +K 2 + k= {W uk + W uk })J α. Denote the distribution function of X un by F un (y), and standard normal distribution function Φ(y), then Since Φ(y) is continuous, so F un (y) Φ(y), < y <, as n lim n sup F un (y) Φ(y) = 0, y for each u. Now suppose u(n) is a sequence of numbers chosen to satisfy the condition n max i m b u(n)(x i ) b(x i ) 0, as n. 7

84 These u(n) satisfy which implies so lim n sup F u(n),n (y) Φ(y) = 0 y X u(n),n = n α ( b u(n),n b u(n) ) σ αu(n) n(bu(n),n b u(n) ) d N(0, ), d N(0, Σ) where Σ = lim u J(Σ u + K +K 2 + k= {W uk + W uk })J. From the condition n max i m b u(n)(x i ) b(x i ) 0, as n, we know the ratios of parameters from b u(n) (x i ) converge to the ratios of parameters from b(x i ) since ratios are jump points. So the matrix C u(n) formed from (3.54) and the matrix formed from (3.55) are identical for sufficiently large n because all elements of C u(n) are also either, x i or + x i, and this gives n(a u(n) a) 0. Let B = (C C) C, then n(âu(n) a u(n) ) d N(0, BΣB ), and n(âu(n) a) = n(â u(n) a u(n) ) + n(a u(n) a) d N(0, BΣB ). We form these results into the following corollary. Corollary 3.24 Suppose the model is identified from x, x 2,..., x m, where these values are different from the ratios of true parameter, and under the condition n max i m b u(n)(x i ) b(x i ) 0, as n, we have n(âu(n) a) d N(0, BΣB ). 72

85 Chapter 4 Modeling Extreme Processes with Parametric Structures In chapters 2 and 3, we studied probabilistic properties of M4 processes, proposed estimation methods, proved consistency and asymptotic normality of the estimators. All the methods are not restricted to the number of parameters. But in numerical computation aspect, the more parameters, the less precision of the estimates. It s not just a numerical stability issue, of course. The statistical precision of the estimates will be poor if the number of parameters is too large. So in this chapter, we will consider several parametric structures which could be imposed on the parameters. For instance we consider a l,k,d being symmetric about k = 0 for each l and d and some other specific possibilities. In section 4., we shall study a symmetric parameter structure model. In section 4.2, we shall study an asymmetric geometry parameter structure model. In section 4.3, a monotone parameter structure model will be discussed. The following fact will be used many times in this chapter. It should be no ambiguity when it is used without mentioning it. Fact: If two lines f(x) = ax + b and g(x) = cx + d on the plane satisfy f(x ) = g(x ) and f(x 2 ) = g(x 2 ) for two different points x and x 2, then a = c, and b = d.

86 4. Symmetric geometric parameter structure model In this section we study a particular form of symmetric geometric parameter structure model, i.e., we assume a l,k,d = b ld λ k ld, k = K,..., K 2, d =,..., D (4.) for each l, where the unknown parameters are b ld and λ ld. We first consider the case L = and then the case L > in the following subsections. 4.. Case L = When L =, for simplicity we assume K = K 2, then we have a kd = b d λ k d, k = K,..., K, d =,..., D (4.2) Since K k= K a kd = b K d k= K λ k d =, b d = then by (.25) we have K k= K λ k d. Let τ = (0, 0,..., τ d, 0,..., 0), θ(τ ) = max k max d a kd τ d k max d a kd τ d which immediately implies = max k k b d λd τ d k b dλ k d τ d = max k λ k d k λ k d = θ d θ d = max b d λ k d k = max a kd. (4.3) k This tells that we can either from the estimation of max k a kd to get the estimation of θ d or vice versa. Especially, if λ d <, then θ d = b d ; if λ d >, then θ d = b d λ K d. Since we have k λ k d = + 2λ d ( + λ d + + λ K d ) = 2( + λ d + λ 2 d + + λk d ) + λ d + λ 2 d + + λ K d = 2 ( b d + ). (4.4) Let f(t) = + t + + t K, then f (t) = + 2t + + Kt K > 0, for t > 0. So f(t) is strictly increasing and λ d is uniquely determined by λ d = f ( ( 2 b d have a theorem. Proposition 4. Under the parameter structure (4.2), we have θ d = max a kd, λ d = f ( k 2 ( + )), d =,..., D. b d + )). And so we 74

87 By the definition of q(x) in (3.2), when D = we have q(x) = bλ K x + max(bλ K x, bλ K ) + + max(bλx, bλ 2 ) + max(bx, bλ) + max(bλx, b) + max(bλ 2 x, bλ) + + max(bλ K x, bλ K ) + bλ K. (4.5) When x 2 > max(λ, /λ), q(x 2 ) = x 2 + bλ K. (4.6) When /λ < x < λ, q(x ) = b(x + )(2λ K + λ K + + λ). (4.7) When λ < x 3 < /λ q(x 3 ) = b(x 3 + )(λ K + λ K + + ). (4.8) When use q(x 2 ) together q(x ) or q(x 3 ), we can uniquely determine b and λ. Now let x < x 2 be two points, the goal is to find a point x 2 such that x 2 > max(λ, /λ). We have the following two cases.. if q(x 2) q(x ) x 2 x 2. if q(x 2) q(x ) x 2 x =, then x 2 > x > max(λ, /λ), <, calculate the intercept point (x 3, q(x 3 )) of the line which goes through (x, q(x )) and (x 2, q(x 2 )) to the line y(x) = x. Let x = x 2, x 2 = x 3 and repeat this process until we have the case. Note: if x 2 > max(λ, /λ), then x 2 < min(λ, /λ). Suppose now x and q(x) are known, where x > max(λ, /λ), then we can get the value for b from or equivalently b K f ( 2 ( b + )) = (q(x) x)k (4.9) f ( 2 ( x + )) = (q(x) ) K b b or 2b + 2 = f((q(x) x ) K ). b When q(x) is replaced by q(x) in (4.9), we get the estimate of b, i.e. b. Since q(x) a.s. a.s. a.s. q(x), by continuous mapping theorem, b b, and hence λ λ. Since q(x) = bf ( ( + )) + x, so we have 2 b q = f K ( ( + )) + Kbf (K+) ( 2 ( b +)) b 2 b f ( 2 ( b +)) 2b 2 = f K ( ( + )) + 2 b 2Kb3 f (K+) ( 2 ( b +)) f ( 2 ( b +)) 75

88 so b q = f ( ( + )) 2 b f ( ( + ))f 2 b K ( ( + )) + 2 b 2Kb3 f (K+) ( ( + )) = δ 2 b This together with Corollary 3.4, We have proved the following theorem. Theorem 4.2 Suppose x > max(λ, /λ), then d n( b b) N(0, x 2 σ 2 δ 2 ) where σ is defined as in the Theorem 3.3. Since λ = f ( ( λ + )), 2 b = 2b 2 b f ( 2 ( b Corollary 4.3 Suppose x > max(λ, /λ), then d n( λ λ) N(0, x 2 σ 2 δ 2 λ 2 ) +)) = λ, we then have the following corollary. In order to obtain asymptotic properties of ( b, λ) we start from (4.6)-(4.8), the following is a corollary of Theorem 3.5. Corollary 4.4 Suppose min(λ, /λ) < x < max(λ, /λ), x 2 > max(λ, /λ), then where and n( q q) q = K +K 2 + d N(0, Θ(Σ + {W k + W k})θ ), ] [ q(x ), q = q(x 2 ) k= [ ] q(x ), Θ = q(x 2 ) [ x µ 0 x 0 2 µ 2 µ i = Pr(Y, Y 2 x i ), µ 2 = Pr(Y, Y 2 x ), σ ij = µ ij µ i µ j, w ij k = Pr(Y, Y 2 x i, Y +k, Y 2+k x j ) µ i µ j, µ ii = µ i. ]. Theorem 4.5 Suppose min(λ, /λ) < x < max(λ, /λ), x 2 > max(λ, /λ), then ( [ b ] [ ] b ) K +K d 2 + n N(0, J λ Θ(Σ + {W λ k + W k})θ (J ) ) where and J = { J, J 2, k= if /λ < x < λ if λ < x < /λ. [ (x + )(2λ J = K + λ K + + λ) b(x + )(2Kλ K + (K )λ K ) λ K Kbλ K [ (x + )(λ J 2 = K + λ K + + λ 2 + λ + ) b(x + )(Kλ K + (K )λ K λ + ) λ K Kbλ K ] ] 76

89 Proof. We prove the case when λ < x < /λ. Since q(x ) b = (x + )(λ K + λ K + + λ 2 + λ + ), q(x ) = b(x + )(Kλ K + (K )λ K λ + ), λ q(x 2 ) = λ K q(x 2 ), = Kbλ K. b λ So the Jacorbian matrix is ] J = [ q(x ) b q(x 2 ) b q(x ) λ q(x 2 ) λ = J 2. The determinant of J is J 2 = b(x + )(λ 2K 2 + 2λ 2K (K 2)λ K+ + (K )λ K + Kλ K ) > 0, so J exists. And then by the mean value theorem and Slutsky theorem the proof is completed Case L > We consider D = in this subsection. Define q l (x) = b l λ K l x + max(b l λ K l x, b l λ K l ) + + max(b l λ K l x, b l λ K l ) + b l λ K l. (4.0) then q(x) = q (x) + q 2 (x) + + q L (x). (4.) Without loss of generality, we assume λ < λ 2 < < λ L. Since q(x) is a piecewise linear function of x, the jumping points of q (x) are /λ, /λ 2,..., /λ L, λ, λ 2,..., λ L. We assume all these points are different. Suppose now where q(x) = q (x) all x and q l (x) = b l λ K l x + max(b l λ K l q (x) = q (x) + q 2(x) + + q L(x) (4.2) x, b l λ K l Lemma 4.6 If q(x) = q (x) all x, then ) + + max(b l λ K l (λ, λ 2,..., λ L ) = (λ, λ 2,..., λ L) (b,..., b L ) = (b,..., b L). x, b l λ K l ) + b l λ K l. (4.3) 77

90 Remark: Since both q(x) and q (x) are piecewise linear functions, we only need finite number of points such that the two are equal at those points. The following proof shows that. Proof. q (x) has the same jumping points as q (x) has. Suppose max(/λ, /λ 2,..., /λ L, λ, λ 2,..., λ L ) = λ L, max(/λ, /λ 2,..., /λ L, λ, λ 2,..., λ L ) = /λ, (4.4) Which imply λ L = /λ. Let f(λ) = λ + λ λ K. Let x > λ L and y lies in the range between the second largest jumping point and λ L. Then by the formulas (4.6) (4.8), we have q l (x) = b l x(2f(λ l ) + ) + b l λ K l q(x) = b l x(2f(λ l ) + ) + b l λ K l = x + b l λ K l (4.5) q l (y) = b l (y + )(λ K L + f(λ l )) q(y) = L l= b l y(2f(λ l ) + ) + L l= b l λ K l + b L (y + )(λ K L + f(λ L)) = y + L b l λ K l + b L y(λ K L f(λ L) ) + b L f(λ L ) l= (4.6) (4.5)-(4.6) gives q(x) q(y) = x y + b L y(λ K L f(λ L ) ) b L f(λ L ) (4.7) Now consider using λ s and for x > /λ we get For λ < y < /λ, we have q (x) = x + b l λ K l (4.8) q (y) = b (y + )(f(λ ) + ) q (y) = b (y + )(f(λ ) + ) + L b l y(2f(λ l ) + ) + L = y b yf(λ ) + L l=2 l=2 b l λ K l + b (f(λ ) + ) l=2 b l λ K l (4.9) 78

91 (4.8)-(4.9) gives q (x) q (y) = x y + b yf(λ ) b (f(λ ) + λ K ) (4.20) q(x) q(y) = q (x) q (y) gives b L (f(λ L ) λ K L + ) = b f(λ ) (4.2) b L f(λ L ) = b (f(λ ) + λ K ) (4.22) Now suppose both x and y are between the maximum and the second maximum values of λ, /λ, then which gives q(x) q(y) = x y b L (y x)(f(λ L ) λ K L + ) q (x) q (y) = x y b (x y)f(λ ) b L (f(λ L ) λ K L + ) = b f(λ ) (4.23) (4.2) and (4.23) can not be true simultaneously. So λ L = /λ can not be true, so λ L = λ L. And b L = b L is obvious. Now Suppose λ l = λ l, b l = b l, l = k +,..., L then q l (x) = q l (x), l = k +,..., L Suppose /λ k < λ k where λ k is the L k largest among /λ, /λ 2,..., /λ L, λ, λ 2,..., λ L and /λ k > λ k where /λ k is the L k largest among /λ, /λ 2,..., /λ L, λ, λ 2,..., λ L. So λ k = /λ k. Suppose x > max(λ l, /λ l ), l =,..., k then For /λ k < y < λ k, q(y) = k l= q(x) = k b l x(2f(λ l ) + ) + l= b l y(2f(λ l ) + ) + k l= k b l λ K l + l= L l=k+ q l (x) b l λ K l + b k (y + )(λ K k + f(λ k)) + L l=k+ = k b l y(2f(λ l ) + ) + k b l λ K l + b k (y + )(λ K k + f(λ k)) + L l= l= b k y(2f(λ k ) + ) b k λ K k 79 l=k+ q l (y) q l (y)

92 q(x) q(y) = (x y) k b l y(2f(λ l ) + ) + L l=k+ l= (q l (x) q l (y)) b k (y + )(λ K k + f(λ k)) +b k y(2f(λ k ) + ) + b k λ K k Suppose x > max(λ l, /λ l ), l =,..., k then For /λ k < y < λ k, q (x) = k b l x(2f(λ l ) + ) + l= k l= b l λ K l + L l=k+ q l (x) q (y) = b k (y + )(f(λ k k ) + ) + = k b l y(2f(λ l ) + ) + k l= +b k (f(λ k ) + λ K l= k ) l= b l y(2f(λ l q (x) q (y) = (x y) k b l y(2f(λ l ) + ) + l= +b k (f(λ k ) + λ K k ) + ) + l= b l λ K l b k yf(λ k ) + L ql (y) l=k+ k ) L l=k+ From q(x) q(y) = q (x) q (y) and k b l (2f(λ l ) + ) = k l= b l λ K l + L ql (y) l=k+ (q l (x) q l (y)) + b k yf(λ k ) l= b l (2f(λ l ) + ), we get b k (f(λ k ) λ K k + ) = b kf(λ k) (4.24) b k f(λ k ) = b k(f(λ k) λ K k + ) (4.25) These two equations can not be true simultaneously. So λ k = λ k, b k = b k. So by induction, we have all λ l = λ l, b l = b l. Therefore q(x) uniquely determines all parameters. Let q(x) evaluate at x, x 2,..., x m such that q(x ), q(x 2 ),..., q(x m ) uniquely determine all parameters. This can be done as long as there are at least two points between every two adjacent jumping points. then Now suppose that q(x ), q(x 2 ),..., q(x m ) are estimates of q(x ), q(x 2 ),..., q(x m ), ( q(x ), q(x 2 ),..., q(x m )) a.s. (q(x ), q(x 2 ),..., q(x m )) (4.26) where q(x) defined by (3.2). By (4.26), we have the following theorem. 80

93 Theorem 4.7 The solution of L ( b l λk l x + max( b l λk l l= L ( b l λk l x 2 + max( b l λk l= l= l l x, b l λk l ) + + max( b l λk l x, b l λk l ) + b l λk l ) = q(x ) x 2, b l λk l ) + + max( b l λk l x 2, b l λk l ) + b l λk l ) = q(x 2 ) L ( b l λk l x m + max( b l λk x m, b l λk l ) + + max( b l λk l x m, b l λk l ) + b l λk l ) = q(x m ) converges almost surely to the true parameter values. The following is a corollary of Theorem 3.5. Corollary 4.8 where where n( q q) K +K 2 + d N(0, Θ(Σ + {W k + W k})θ ), k= x q(x ) q(x ) µ 0 0 q(x 2 ) q =., q = q(x 2 ) x., Θ = 0 2 µ x q(x m ) q(x m ) 0 0 m µm µ i = Pr(Y, Y 2 x i ), µ ij = Pr(Y, Y 2 min(x i, x j )), σ ij = µ ij µ i µ j, w ij k = Pr(Y, Y 2 x i, Y +k, Y 2+k x j ) µ i µ j, µ ii = µ i. As long as x is not a jumping point, and we view q(x) as a function of all b l and λ l, then q(x) has all continuous first order partial derivatives in a neighborhood of b,..., b L, λ,..., λ L. So we can construct the transformation Jacobian matrix J. And we have the following theorem. Theorem 4.9 ( [ b ] n λ [ ] b ) K +K d 2 + N(0, J Θ(Σ + {W λ k + W k})θ (J ) ) where b = (b,..., b L ), λ = (λ,..., λ L ), b = ( b,..., b L ), λ = ( λ,..., λ L ). k= 8

94 4.2 Asymmetric geometric parameter structure model In this section we study asymmetric geometric parameter structure model, for each l. Here a l0d = b ld. a l,k,d = b ld λ (k) + ld φ (k) ld, k = K,..., K 2 (4.27) Note: the informal note by Smith and Weissman (997) had a general form Case L = When L =, the parameter structure becomes a kd = b d λ (k) + d φ (k) d, k = K,..., K 2, d =,..., D. (4.28) By the definition of q(x) in (3.2) when D = we have q(x) = bλ K 2 x + max(bλ K 2 x, bλ K 2 ) + + max(bλx, bλ 2 ) + max(bx, bλ) + max(bφx, b) + max(bφ 2 x, bφ) + + max(bφ K x, bφ K ) + bφ K. (4.29) When x < /φ, x < λ, q(x ) = bλ K 2 x +. (4.30) When /φ < x 2 < λ, q(x 2 ) = b[λ K 2 x 2 + (λ K 2 + λ K2 + + λ) + x 2 (φ + + φ K ) + φ K ]. (4.3) When λ < x 2 < /φ, q(x 2 ) = b[x 2 (λ K 2 + λ K2 + + ) + ( + φ + + φ K )]. (4.32) When x 3 > λ, x 3 > /φ, q(x 3 ) = x 3 + bφ K. (4.33) Note: parameters either satisfy (4.30), (4.3), and (4.33) or (4.30), (4.32) and (4.33). Consider now (4.30), (4.3), and (4.33). From (4.30) and (4.33), we have λ K 2 = q(x ) x (q(x 3 ) x 3 ) φk. Subtract = b(λ K 2 + λ K φ + + φ K ) from (4.3) and substitute λ K 2 with q(x ) x (q(x 3 ) x 3 ) φk, we have q(x 2 ) φ K = x 2(q(x ) ) q(x 3 ) x 3 x (q(x 3 ) x 3 ) φk + (x 2 )(φ + + φ K ) + φ K 82

95 or equivalently x 2 (q(x ) ) x (q(x 2 ) ) + x 2 x (q(x 3 ) x 3 ) φ K + (x 2 )(φ + + φ K ) = 0 (4.34) x (q(x 3 ) x 3 ) There are multiple solutions for this equation. To overcome this problem, we need to introduce additional points such that all parameters can be uniquely determined. Suppose x < x < min(/φ, λ) < x 2 < x 2 < max(/φ, λ) < x 3 < x 3, and q(x ), q(x ), q(x 2), q(x 2 ), q(x 3), q(x 3 ) are known, then we can have the following three lines: L : y = q(x ) q(x ) x + x q(x ) x q(x ) x x x x L 2 : y = q(x 2) q(x 2) x + x 2q(x 2) x 2q(x 2 ) x 2 x 2 x 2 x 2 L 3 : y = q(x 3) q(x 3) x + x 3q(x 3) x 3q(x 3 ) x 3 x 3 x 3 x 3 The intercept points of L and L 2, L 2 and L 3 determine the values of and λ. What φ we need is to distinguish the values of φ and λ from the jumping points of q (x), i.e. the intercept points. By solving the intercept points, we then get b, λ, φ each is a function of (q(x ), q(x ), q(x 2), q(x 2 ), q(x 3), q(x 3 )) and hence can calculate the transformation Jacobian matrix. Now suppose q(x) satisfies (4.30), (4.3), and (4.33) while q (x) satisfies (4.30), (4.32), and (4.33). Then φ = /λ, λ = /φ. And b φ K = bφ K, b λ K 2 = bλ K 2 imply (λφ) K 2 = (λφ) K which implies K 2 = K. Thus if K 2 K, there are no such q(x) and q (x). Assume K = K 2 = K, then we have b[λ K 2 x + (λ K 2 + λ K λ) + x(φ + + φ K ) + φ K ] = b [x(λ K 2 + λ K λ + ) + ( + φ + + φ K )] So and we have and bλ K 2 + b(φ + + φ K ) = b λ K 2 + b (λ K 2 + λ K2 + + λ + ) b(λ K 2 + λ K2 + + λ) + φ K = b ( + φ + + φ K ) b(φ + + φ K ) = b (λ K2 + λ K2 + + λ + ) b(λ K 2 + λ K2 + + λ) = b ( + φ + + φ K ) 83

96 From these two equations we have φ φ K + φ λ λ K + λ = φ K 2 φ λ K λ Substitute φ = /λ, λ = /φ in the above equation, we get Let then φ φ K φ = λ λk K λ K f(x) = x xt x t f (x) = (t )xt tx t + ( x t ) 2 By induction we can prove f (x) > 0 for t >, so f(x) is strictly increasing. Therefore λ = φ. So if the model follows (4.30), (4.3) and (4.33), then the values of φ and λ will not satisfy (4.30), (4.32) and (4.33), or vice versa. Summarize all the arguments we have a theorem in this subsection. Theorem 4.0 Under (4.28), when x < x < min(/φ, λ) < x 2 < x 2 < max(/φ, λ) < x 3 < x 3, and q(x ), q(x ), q(x 2), q(x 2 ), q(x 3), q(x 3 ) are known, then (4.30)-(4.33) uniquely determine all parameters. We have the following corollary. Corollary 4. When replace q(x ), q(x ), q(x 2), q(x 2 ), q(x 3), q(x 3 ) in (4.30)- (4.33) by q(x ), q(x ), q(x 2), q(x 2 ), q(x 3), q(x 3 ) and denote the solutions by b, λ, φ, then ( b, λ, φ) a.s. (b, λ, φ) (4.35) as n. Proof. Since ( q(x ), q(x ), q(x 2), q(x 2 ), q(x 3), q(x 3 )) a.s. (q(x ), q(x ), q(x 2), q(x 2 ), q(x 3), q(x 3 )) and the uniqueness of solutions of (4.30)-(4.33) by Theorem 4.0, (4.35) is true, and so the proof is completed. By following Corollary 4.8 and the arguments followed, the following theorem tells the limit joint distribution of ( b, λ, φ) after suitably normalized tends to a multivariate normal distribution. 84

97 Theorem 4.2 Suppose x < x < min(/φ, λ) < x 2 < x 2 < max(/φ, λ) < x 3 < x 3, then ( b b n λ λ ) K +K 2 + d N(0, JΘ(Σ + {W k + W k})θ J ) φ φ k= where W k, Σ, Θ are defined in Corollary 4.8 with m = 6 and the elements of J are J j = b, J q(x j ) 2j = λ, J q(x j ) j = φ q(x j, j =,..., 6. ) Proof. By the same arguments in Theorem 2.6 and Case L > We consider D = in this subsection. Define then q l (x) = b l λ K2 l x + max(b l λ K2 l x, b l λ K2 l ) + + max(b l λ l x, b l λ 2 l ) + max(b lx, b l λ l ) x, b l φ K l ) + b l φ K l. + max(b l φ l x, b l ) + max(b l φ 2 l x, b lφ l ) + + max(b l φ K l (4.36) q(x) = q (x) + q 2 (x) + + q L (x). (4.37) Without loss of generality, we assume λ < λ 2 < < λ L. Since q(x) is a piecewise linear function of x, the jumping points of q (x) are /φ, /φ 2,..., /φ L, λ, λ 2,..., λ L. We assume all these points are different. Suppose now where q(x) = q (x) all x and q l (x) = b l λ K2 l x + max(b l λ K2 l + max(b l φ l x, b l ) + max(b l φ 2 l q (x) = q (x) + q 2(x) + + q L(x) (4.38) ) + + max(b l λ l x, b l λ 2 l ) + max(b l x, b l λ l ) x, b l φ K l ) + b l φ K l. x, b l λ K2 l x, b l φ l ) + + max(b l φ K l (4.39) Then q (x) has the same jumping points as q (x) has. With out loss of generality we can assume or max(/φ, /φ 2,..., /φ L, λ, λ 2,..., λ L ) = λ L, (4.40) max(/φ, /φ 2,..., /φ L, λ, λ 2,..., λ L ) = /φ, (4.4) since the order in index l does not matter in the M4 process. We need to show that (4.37) and (4.38) agree with each other. We state this as the following lemma. 85

98 Lemma 4.3 If q(x) = q (x) all x, then (/φ, /φ 2,..., /φ L, λ, λ 2,..., λ L ) = (/φ, /φ 2,..., /φ L, λ, λ 2,..., λ L) is true for L = 2. (4.42) We leave the proof in the proofs section of this chapter. There are possibilities that this lemma can be generalized to case L > 2. But we will restrict our discussion in case L = 2 in this subsection. Now let x < x < x 2 < x 2 < x 3 < x 3 < x 4 < x 4 < x 5 < x 5 and there is one jumping point that belongs to (x i, x i+ ), i =, 2, 3, 4. Let q(x i ) be the estimation of q(x i ), and b = ( b, b2 ), λ = ( λ, λ2 ), φ = ( φ, φ2 ) be estimations of b = (b, b 2 ), λ = (λ, λ 2 ), φ = (φ, φ 2 ) respectively. Then the following theorem follows immediately. Theorem 4.4 ( b, λ, φ ) a.s. (b, λ, φ ) as n. Suppose J is the transformation Jacobian matrix, then the following theorem follows by the arguments before Theorem 4.9 and Theorem 4.4. Theorem 4.5 ( b b n λ λ ) K +K 2 + d N(0, JΘ(Σ + {W k + W k})θ J ) φ φ k= where W k, Σ, Θ are defined in Corollary 4.8 with m = Monotone parameter structure model In this section we study monotone parameter structure model, i.e. for each l. Note: this is a special asymmetric form. a lk = b K λ k+k l, k = K,..., K 2 (4.43) 86

99 4.3. Case L = We have a kd = b d λ k+k d, k = K,..., K 2, d =,..., D (4.44) Since K 2 k= K a kd = b K2 d k= K λ k+k d =, b d = K2 Let τ = (0, 0,..., τ d, 0,..., 0), then by (.25) we have k= K λ k+k d. θ(τ ) = max k max d a kd τ d k max d a kd τ d which immediately implies k+k = max k b d λd τ d k b = max k λ dλ k+k d τ d k+k d k λk+k d = θ d θ d = max b d λ k+k d k = max a kd. (4.45) k This tells that we can either from the estimation of max k a kd to get the estimation of θ d or vice versa. Especially, if λ d <, then θ d = b d ; if λ d >, then θ d = b d λ K d, where K = K + K 2 and hereafter. we have + λ d + λ 2 d + + λ K d = b d. (4.46) Let f(t) = + t + + t K, then f (t) = + 2t + + Kt K > 0, for t > 0. So f(t) is strictly increasing and λ d is uniquely determined by λ d = f ( b d ). And so we have a corollary. Corollary 4.6 Under the parameter structure (4.44), we have θ d = max a kd, λ d = f ( ), d =,..., D. k b d By the definition of q(x) in (3.2) when D = we have When x > λ, q(x) = b + max(bx, bλ) + + max(bλ K x, bλ K ) + bλ K x. (4.47) q(x) = b + bx + bλx + + bλ K x + bλ K x = b + x. (4.48) So b = q(x) x and λ = f ( ). When q(x) is replaced by q(x), we then have q(x) x b = q(x) x, λ = f ( ). (4.49) q(x) x Since q(x) a.s. q(x) and mapping f h is continuous, where h(q) = λ a.s. q x, so b a.s. b, λ. We have the following corollary which immediately follows Corollary

100 Corollary 4.7 When x > λ, we have d n( b b) N(0, x 2 σ 2 ). Since b = follows. λ λ K+, so b λ = +(K+)λK Kλ K+ ( λ K+ ) 2 = δ. Then a corollary immediately Corollary 4.8 When x > λ, we have n( λ λ) d N(0, x 2 σ 2 δ 2 ). In order to obtain asymptotic properties of ( b, λ), we consider x < λ and x 2 > λ. We have q(x ) = + bλ K x, q(x 2 ) = x 2 + b. (4.50) So q(x ) b = λ K x, q(x ) λ = bkλ K x, q(x 2 ) b So the transformation Jacobian matrix is ( ) λ J = K x bkλ K x 0 We then have the following theorem. =, q(x 2 ) λ = 0. (4.5) (4.52) Theorem 4.9 Suppose x < λ < x 2, then ( [ b ] [ ] b ) K +K d 2 + n N(0, J λ Θ(Σ + {W λ k + W k})θ (J ) ) where Θ, Σ, W k are defined as in Corollary Case L > As usual we consider D = only. Define q l (x) = b l + max(b l x, b l λ l ) + + max(b l λ K l x, b l λ K l ) + b l λ K l x. (4.53) k= then q(x) = q (x) + q 2 (x) + + q L (x). Without loss of generality, we assume λ < λ 2 < < λ L as change points. Let f(b, λ) = b + bλ + + bλ K where L f(b l, λ l ) =. l= 88

101 When x > λ l, When x < λ l, q l (x) = b l + f(b l, λ l )x q l (x) = f(b l, λ l ) + b l λ K l x Lemma 4.20 If q(x) = q (x) all x, then (λ, λ 2,..., λ L ) = (λ, λ 2,..., λ L) (b,..., b L ) = (b,..., b L) Proof. Suppose now x, x, x 2, x 2,..., x L, x L, x L+ satisfy then x < x < λ < x 2 < x 2 < λ 2 < < x L < x L < λ L < x L+ q(x ) = + x (b λ K + b 2 λ K b L λ K L ) q(x ) = + x (b λ K + b 2 λ K b L λ K L ) q(x 2 ) = L f(b l, λ l ) + x 2 (b 2 λ K b L λ K L ) + b + f(b, λ )x 2 Similarly l=2 = f(b, λ ) + x 2 (b λ K + b 2 λ K b L λ K L ) x 2b λ K + b + f(b, λ )x 2 q (x ) = + x (b λ K + b 2λ K b L λk L ) q (x ) = + x (b λ K + b 2λ K b L λk L ) q (x 2 ) = L f(b l, λ l) + x 2 (b 2λ K b L λk L ) + b + f(b, λ )x 2 l=2 = f(b, λ ) + x 2 (b λ K + b 2λ K b L λk L ) x 2b λ K + b + f(b, λ )x 2 From q(x ) q(x ) = q (x ) q (x ) we have b λ K + + b L λ K L = b λ K + + b Lλ K L From q(x 2 ) q(x ) = q (x 2 ) q (x ) we have b + f(b, λ ) = b + f(b, λ ) b λ K f(b, λ ) = b λ K f(b, λ ) Which give b = b. 89

102 Suppose when l = k we have b = b,..., b k = b k. Let λ k < x < λ k+ Which imply q(x) = q (x) = k L (b l + f(b l, λ l )x) + (f(b l, λ l ) + b l λ K l x) l= l=k+ k L (b l + f(b l, λ l )x) + (f(b l, λ l ) + b l λ K l x) l= l=k+ b k+ λ K k+ + + b L λ K L When λ k+ < x < λ k+2 q(x) = k L (b l + f(b l, λ l )x) + b k+ + f(b k+, λ k+ )x + (f(b l, λ l ) + b l λ K l x) l= l=k+2 q (x) = Which imply which imply k (b l + f(b l, λ l )x) + b k+ + f(b k+, λ k+ )x + l= b k+ + so by induction all b l = b l. L l=k+2 f(b l, λ l ) = b k+ + L l=k+2 L (f(b l, λ l ) + b l λ K l x) l=k+2 f(b l, λ l ) b k+ f(b k+, λ k+ ) = b k+ f(b k+, λ k+ ) b k+ = b k+ Let q(x i ) be the estimation of q(x i ), and b = ( b,... b L ), λ = ( λ,... λ L ), φ = ( φ,... φ L ) be estimations of b = (b,... b L ), λ = (λ,... λ L ), φ = (φ,... φ L ) respectively. Then the following theorem follows immediately. Theorem 4.2 ( b, λ, φ ) a.s. (b, λ, φ ) as n. The transformation Jacobian matrix J can be easily calculated. And a theorem then follows. 90

103 Theorem 4.22 ( b b n λ λ ) K +K 2 + d N(0, JΘ(Σ + {W k + W k})θ J ) φ φ k= where W k, Σ, Θ is defined in Corollary 4.8 with m = L Proof of Lemma 4.3 Suppose we have x, x 2, x 3, x 4, x 5 satisfy where j, j 2, j 3, j 4 are jumping points. x < j < x 2 < j 2 < x 3 < j 3 < x 4 < j 4 < x 5 (4.54) Since the order in index l does not matter, we assume that λ < λ 2 if one of λ and λ 2 is the biggest number among all jumping points and that /φ 2 < /φ if one of /φ and /φ 2 is the biggest number among all jumping points. When λ 2 =max(λ, λ 2, /φ, /φ 2 ) we have the following six possible combinations: Λ : φ < λ < φ 2 < λ 2 Λ 2 : φ 2 < λ < φ < λ 2 Λ 3 : φ < φ 2 < λ < λ 2 Λ 4 : φ 2 < φ < λ < λ 2 Λ 5 : λ < φ < φ 2 < λ 2 Λ 6 : λ < φ 2 < φ < λ 2 When /φ =max(λ, λ 2, /φ, /φ 2 ) we have the following six possible combinations: The notation Λ in this section means Φ : λ < λ 2 < φ 2 < φ Φ 2 : λ 2 < λ < φ 2 < φ Φ 3 : λ < φ 2 < λ 2 < φ Φ 4 : λ 2 < φ 2 < λ < φ Φ 5 : φ 2 Φ 6 : φ 2 < λ < λ 2 < φ < λ 2 < λ < φ Λ : φ < λ < φ 2 < λ 2 and similarly for other notations if used. 9

104 When Λ is satisfied, we have which gives q (x ) = b λ K 2 x + x < /φ < x 2 < λ < x 3 < /φ 2 < x 4 < λ 2 < x 5 q (x 2 ) = b [λ K 2 x 2 + (λ K 2 + λ K λ ) + x 2 (φ + + φ K ) + φ K q (x 3 ) = x 3 + b φ K q (x 4 ) = x 4 + b φ K q (x 5 ) = x 5 + b φ K q 2 (x ) = b 2 λ K 2 2 x + q 2 (x 2 ) = b 2 λ K 2 2 x 2 + q 2 (x 3 ) = b 2 λ K 2 2 x 3 + q 2 (x 4 ) = b 2 [λ K 2 2 x 4 + (λ K λ K λ 2 ) + x 4 (φ φ K 2 ) + φ K q 2 (x 5 ) = x 5 + b 2 φ K 2 When Λ 2 is satisfied, we have ] 2 ] which gives x < /φ 2 < x 2 < λ < x 3 < /φ < x 4 < λ 2 < x 5 q (x ) = b λ K 2 x + q (x 2 ) = b λ K 2 x 2 + q (x 3 ) = b [x 3 (λ K 2 + λ K ) + ( + φ + + φ K q (x 4 ) = x 4 + b φ K q (x 5 ) = x 5 + b φ K q 2 (x ) = b 2 λ K 2 2 x + )] q 2 (x 2 ) = b 2 [λ K 2 2 x 2 + (λ K λ K λ 2 ) + x 2 (φ φ K 2 ) + φ K q 2 (x 3 ) = b 2 [λ K 2 2 x 3 + (λ K λ K λ 2 ) + x 3 (φ φ K 2 ) + φ K q 2 (x 4 ) = b 2 [λ K 2 2 x 4 + (λ K λ K λ 2 ) + x 4 (φ φ K 2 ) + φ K q 2 (x 5 ) = x 5 + b 2 φ K 2 When Λ 3 is satisfied, we have 2 ] 2 ] 2 ] x < /φ < x 2 < /φ 2 < x 3 < λ < x 4 < λ 2 < x 5 92

105 which gives q (x ) = b λ K 2 x + q (x 2 ) = b [λ K 2 x 2 + (λ K 2 + λ K λ ) + x 2 (φ + + φ K ) + φ K q (x 3 ) = b [λ K 2 x 3 + (λ K 2 + λ K λ ) + x 3 (φ + + φ K ) + φ K q (x 4 ) = x 4 + b φ K q (x 5 ) = x 5 + b φ K q 2 (x ) = b 2 λ K 2 2 x + q 2 (x 2 ) = b 2 λ K 2 2 x 2 + q 2 (x 3 ) = b 2 [λ K 2 2 x 3 + (λ K λ K λ 2 ) + x 3 (φ φ K 2 ) + φ K q 2 (x 4 ) = b 2 [λ K 2 2 x 4 + (λ K λ K λ 2 ) + x 4 (φ φ K 2 ) + φ K q 2 (x 5 ) = x 5 + b 2 φ K 2 When Λ 4 is satisfied, we have which gives x < /φ 2 < x 2 < /φ < x 3 < λ < x 4 < λ 2 < x 5 q (x ) = b λ K 2 x + q (x 2 ) = b λ K 2 x 2 + q (x 3 ) = b [λ K 2 x 3 + (λ K 2 + λ K λ ) + x 3 (φ + + φ K ) + φ K q (x 4 ) = x 4 + b φ K q (x 5 ) = x 5 + b φ K q 2 (x ) = b 2 λ K 2 2 x + q 2 (x 2 ) = b 2 [λ K 2 2 x 2 + (λ K λ K λ 2 ) + x 2 (φ φ K 2 ) + φ K q 2 (x 3 ) = b 2 [λ K 2 2 x 3 + (λ K λ K λ 2 ) + x 3 (φ φ K 2 ) + φ K q 2 (x 4 ) = b 2 [λ K 2 2 x 4 + (λ K λ K λ 2 ) + x 4 (φ φ K 2 ) + φ K q 2 (x 5 ) = x 5 + b 2 φ K 2 When Λ 5 is satisfied, we have which gives q (x ) = b λ K 2 x + x < λ < x 2 < /φ < x 3 < /φ 2 < x 4 < λ 2 < x 5 q (x 2 ) = b [x 2 (λ K 2 + λ K ) + ( + φ + + φ K q (x 3 ) = x 3 + b φ K q (x 4 ) = x 4 + b φ K q (x 5 ) = x 5 + b φ K q 2 (x ) = b 2 λ K 2 2 x + )] q 2 (x 2 ) = b 2 λ K 2 2 x 2 + q 2 (x 3 ) = b 2 λ K 2 2 x 3 + q 2 (x 4 ) = b 2 [λ K 2 2 x 4 + (λ K λ K λ 2 ) + x 4 (φ φ K 2 ) + φ K q 2 (x 5 ) = x 5 + b 2 φ K 2 93 ] ] 2 ] 2 ] ] 2 ] 2 ] 2 ] 2 ]

106 When Λ 6 is satisfied, we have which gives q (x ) = b λ K 2 x + x < λ < x 2 < /φ 2 < x 3 < /φ < x 4 < λ 2 < x 5 q (x 2 ) = b [x 2 (λ K 2 + λ K ) + ( + φ + + φ K q (x 3 ) = b [x 3 (λ K 2 + λ K ) + ( + φ + + φ K q (x 4 ) = x 4 + b φ K q (x 5 ) = x 5 + b φ K q 2 (x ) = b 2 λ K 2 2 x + )] )] q 2 (x 2 ) = b 2 λ K 2 2 x 2 + q 2 (x 3 ) = b 2 [λ K 2 2 x 3 + (λ K λ K λ 2 ) + x 3 (φ φ K 2 ) + φ K q 2 (x 4 ) = b 2 [λ K 2 2 x 4 + (λ K λ K λ 2 ) + x 4 (φ φ K 2 ) + φ K q 2 (x 5 ) = x 5 + b 2 φ K 2 When Φ is satisfied, we have 2 ] 2 ] which gives x < λ < x 2 < λ 2 < x 3 < /φ 2 < x 4 < /φ < x 5 q (x ) = b λ K 2 x + q (x 2 ) = b [x 2 (λ K 2 + λ K ) + ( + φ + + φ K q (x 3 ) = b [x 3 (λ K 2 + λ K ) + ( + φ + + φ K q (x 4 ) = b [x 4 (λ K 2 + λ K ) + ( + φ + + φ K q (x 5 ) = x 5 + b φ K q 2 (x ) = b 2 λ K 2 2 x + )] )] )] q 2 (x 2 ) = b 2 λ K 2 2 x 2 + q 2 (x 3 ) = b 2 [x 3 (λ K λ K ) + ( + φ φ K q 2 (x 4 ) = x 4 + b 2 φ K 2 q 2 (x 5 ) = x 5 + b 2 φ K 2 When Φ 2 is satisfied, we have 2 )] x < λ 2 < x 2 < λ < x 3 < /φ 2 < x 4 < /φ < x 5 94

107 which gives q (x ) = b λ K 2 x + q (x 2 ) = b λ K 2 x 2 + q (x 3 ) = b [x 3 (λ K 2 + λ K ) + ( + φ + + φ K q (x 4 ) = b [x 4 (λ K 2 + λ K ) + ( + φ + + φ K q (x 5 ) = x 5 + b φ K q 2 (x ) = b 2 λ K 2 2 x + q 2 (x 2 ) = b 2 [x 2 (λ K λ K ) + ( + φ φ K )] )] 2 )] 2 )] q 2 (x 3 ) = b 2 [x 3 (λ K λ K ) + ( + φ φ K q 2 (x 4 ) = x 4 + b 2 φ K 2 q 2 (x 5 ) = x 5 + b 2 φ K 2 When Φ 3 is satisfied, we have which gives q (x ) = b λ K 2 x + x < λ < x 2 < /φ 2 < x 3 < λ 2 < x 4 < /φ < x 5 q (x 2 ) = b [x 2 (λ K 2 + λ K ) + ( + φ + + φ K q (x 3 ) = b [x 3 (λ K 2 + λ K ) + ( + φ + + φ K q (x 4 ) = b [x 4 (λ K 2 + λ K ) + ( + φ + + φ K q (x 5 ) = x 5 + b φ K q 2 (x ) = b 2 λ K 2 2 x + )] )] )] q 2 (x 2 ) = b 2 λ K 2 2 x 2 + q 2 (x 3 ) = b 2 [λ K 2 2 x 3 + (λ K λ K λ 2 ) + x 3 (φ φ K 2 ) + φ K q 2 (x 4 ) = x 4 + b 2 φ K 2 q 2 (x 5 ) = x 5 + b 2 φ K 2 When Φ 4 is satisfied, we have which gives x < λ 2 < x 2 < /φ 2 < x 3 < λ < x 4 < /φ < x 5 q (x ) = b λ K 2 x + q (x 2 ) = b λ K 2 x 2 + q (x 3 ) = b λ K 2 x 3 + q (x 4 ) = b [x 4 (λ K 2 + λ K ) + ( + φ + + φ K )] q (x 5 ) = x 5 + b φ K q 2 (x ) = b 2 λ K 2 2 x + q 2 (x 2 ) = b 2 [x 2 (λ K λ K ) + ( + φ φ K q 2 (x 3 ) = x 3 + b 2 φ K 2 q 2 (x 4 ) = x 4 + b 2 φ K 2 q 2 (x 5 ) = x 5 + b 2 φ K )] 2 ]

108 When Φ 5 is satisfied, we have which gives x < /φ 2 < x 2 < λ < x 3 < λ 2 < x 4 < /φ < x 5 q (x ) = b λ K 2 x + q (x 2 ) = b λ K 2 x 2 + q (x 3 ) = b [x 3 (λ K 2 + λ K ) + ( + φ + + φ K q (x 4 ) = b [x 4 (λ K 2 + λ K ) + ( + φ + + φ K q (x 5 ) = x 5 + b φ K q 2 (x ) = b 2 λ K 2 2 x + )] )] q 2 (x 2 ) = b 2 [λ K 2 2 x 2 + (λ K λ K λ 2 ) + x 2 (φ φ K 2 ) + φ K q 2 (x 3 ) = b 2 [λ K 2 2 x 3 + (λ K λ K λ 2 ) + x 3 (φ φ K 2 ) + φ K q 2 (x 4 ) = x 4 + b 2 φ K 2 q 2 (x 5 ) = x 5 + b 2 φ K 2 When Φ 6 is satisfied, we have 2 ] 2 ] which gives x < /φ 2 < x 2 < λ 2 < x 3 < λ < x 4 < /φ < x 5 q (x ) = b λ K 2 x + q (x 2 ) = b λ K 2 x 2 + q (x 3 ) = b λ K 2 x 3 + q (x 4 ) = b [x 4 (λ K 2 + λ K ) + ( + φ + + φ K )] q (x 5 ) = x 5 + b φ K q 2 (x ) = b 2 λ K 2 2 x + q 2 (x 2 ) = b 2 [λ K 2 2 x 2 + (λ K λ K λ 2 ) + x 2 (φ φ K 2 ) + φ K q 2 (x 3 ) = x 3 + b 2 φ K 2 q 2 (x 4 ) = x 4 + b 2 φ K 2 q 2 (x 5 ) = x 5 + b 2 φ K 2 2 ] We need to show that if q(x) = q (x) for all x then (/φ, /φ 2, λ, λ 2, ) = (/φ, /φ 2, λ, λ 2) (4.55) where q(x) corresponds to (/φ, /φ 2, λ, λ 2, ), while q (x) corresponds to (/φ, /φ 2, λ, λ 2). In other words we need to show every combination from all combinations Λ i and Λ j, Λ i and Φ j, Φ i and Φ j results in a contradiction or an impossible result. The 96

109 total combinations are C C 2 6 = = 66. We start from Λ i and Λ j, then Φ i and Φ j, and finally Λ i and Φ j. For all combinations we have q(x ) = q (x ) b λ K 2 + b 2 λ K 2 2 = b λ K 2 + b 2λ K 2 2, (4.56) q(x 5 ) = q (x 5 ) b φ K + b 2 φ K 2 = b φ K + b 2φ K 2. (4.57) Case Λ and Λ 2: From q(x 3 ) = q (x 3 ), we get 2 l= 2 l= b l [λ K 2 l + ( + φ l + + φ K l )] = b 2 λ K 2 2 +, b l [φ K l + (λ K 2 l + + λ l + )] = b φ K +. But this is not possible since the LHS is less than, while the RHS is greater than. Case Λ and Λ 3: Use the same arguments as in Case Λ and Λ 2. Case Λ and Λ 4: Use the same arguments as in Case Λ and Λ 2. Case Λ and Λ 5: From q(x 2 ) = q (x 2 ), we get 2 l= b l [(λ K 2 l + + λ l ) + φ K l ] = b ( + φ + + φ K ) + which is not possible for the same reason as in the Case Λ and Λ 2. Case Λ and Λ 6: Use the same arguments as in Case Λ and Λ 5. Case Λ 2 and Λ 3: We have φ 2 = φ, λ =, φ φ =, λ 2 λ 2 = λ 2. (4.58) From q(x 2 ) = q (x 2 ), we get b 2 (φ φ K 2 ) = b (φ + + φ K ) (4.59) b 2 (λ K λ K λ 2 ) + b 2 φ K 2 = b (λ K 2 + λ K λ ) + b φ K (4.60) From (4.58) and (4.59) we get b 2 = b. And from (4.60) we get λ 2 = λ, which is a contradiction to λ < λ 2. Case Λ 2 and Λ 4: We have φ 2 = φ 2, λ =, φ φ =, λ λ 2 = λ 2. 97

110 From q(x 2 ) = q (x 2 ), we get b 2 (λ K λ K λ 2 ) + b 2 φ K 2 = b 2(λ K λ K λ 2) + b 2φ K 2 which implies b 2 = b 2, and q 2 (x) = q 2(x). So q (x) = q (x) which can be viewed as a case of L = which has been shown to have uniqueness and it required λ = φ. Case Λ 2 and Λ 5: We have From q(x 2 ) = q (x 2 ), we get φ = φ 2, λ =, φ φ 2 =, λ λ 2 = λ 2. b 2 (φ φ K 2 ) = b 2(φ φ K 2 ) b 2 (λ K λ K λ 2 ) + b 2 φ K 2 = b 2(λ K λ K λ 2) + b 2φ K 2 2 From q(x 4 ) = q (x 4 ), we get b 2 λ K b 2 (φ φ K 2 ) = b 2λ K b 2(φ φ K 2 ) So b 2 = b 2 and φ 2 = φ 2 = φ, which is a contradiction to φ φ 2. Case Λ 2 and Λ 6: From q(x 2 ) = q (x 2 ), we get b 2 (λ K λ K λ 2 ) + b 2 φ K 2 = b ( + + φ K ) + b 2(λ K λ K λ 2) + b 2φ K 2 (4.6) From q(x 3 ) = q (x 3 ), we get b ( + φ + + φ K ) + b 2 (λ K λ K λ 2 ) = b ( + φ + + φ K ) + b 2(λ K λ K λ 2) (4.62) By subtracting (4.62) from (4.6) we get b φ K + b 2φ K 2 = b 2 φ K 2 b ( + φ + + φ K ) which is not possible because of (4.57). Case Λ 3 and Λ 4: We have From q(x 2 ) = q (x 2 ), we get φ = φ 2, φ 2 = φ, λ = λ, λ 2 = λ 2. b (φ + + φ K ) = b 2(φ φ K 2 ) 98

111 b (λ K 2 + λ K λ ) + b φ K = b 2(λ K λ K λ 2) + b 2 φ K 2 which imply λ = λ 2 = λ 2. But this is a contradiction to λ < λ 2. Case Λ 3 and Λ 5: From q(x 3 ) = q (x 3 ), then follow the same arguments in Λ and Λ 2. Case Λ 3 and Λ 6: We have φ =, φ λ 2 = φ 2, λ =, λ φ 2 = λ 2. (4.63) From q(x 2 ) = q (x 2 ), we get b (φ + + φ K ) = b (λ K ) (4.64) b (λ K 2 + λ K λ ) + b φ K = b ( + φ + + φ K ) (4.65) From q(x 3 ) = q (x 3 ), we get b (φ + + φ K ) + b 2 (φ φ K 2 ) = b (λ K ) + b 2(φ φ K 2 ) (4.66) b (λ K 2 + λ K λ ) + b 2 (λ K λ K λ 2 ) = b ( + φ + + φ K ) + b 2(λ K λ K λ 2) (4.67) (4.63), (4.64), (4.66) imply b 2 = b 2, so q 2 (x) = q 2(x). Thus q (x) = q (x) which can be viewed as Case L =. Case Λ 4 and Λ 5: From q(x 3 ) = q (x 3 ), then follow the same arguments in Λ and Λ 2. Case Λ 4 and Λ 6: We have φ = φ 2, φ 2 =, λ λ =, λ φ 2 = λ 2. From q(x 4 ) = q (x 4 ), we get b 2 (λ K λ K λ 2 ) = b 2(λ K λ K λ 2) b 2 (φ K φ K φ 2 ) = b 2(φ K φ K φ 2) which imply b 2 = b 2 and φ 2 = φ 2 = φ, which is a contradiction to φ φ 2. Case Λ 5 and Λ 6: From q(x 3 ) = q (x 3 ), then follow the same arguments in Λ and Λ 2. Case Φ and Φ 2: We have λ = λ 2, λ 2 = λ, φ = φ, φ 2 = φ 2. 99

112 From q(x 2 ) = q (x 2 ), we get b (λ K 2 + λ K ) = b 2(λ K λ K ) b ( + φ + + φ K ) = b 2( + φ φ K 2 ) which imply b = b 2, φ = φ 2 = φ 2. But this is a contradiction to φ < φ 2. Case Φ and Φ 3: We have From q(x 2 ) = q (x 2 ), we get λ = λ, λ 2 = /φ 2, φ = φ, φ 2 = /λ 2. b (λ K 2 + λ K ) = b (λ K 2 + λ K ) b ( + φ + + φ K ) = b ( + φ + + φ K ) which imply b = b, λ = λ = λ 2. But this is a contradiction to λ λ 2. Case Φ and Φ 4: From q(x 3 ) = q (x 3 ), we get b (λ K 2 + λ K ) + b 2 (λ K λ K ) = b λ K 2 + which is an impossible result. Case Φ and Φ 5: We have From q(x 2 ) = q (x 2 ), we get λ = /φ 2, λ 2 = λ, φ = φ, φ 2 = /λ 2. (4.68) b (λ K 2 + λ K ) = b 2(φ φ K 2 ) (4.69) b ( + φ + + φ K ) = b 2(λ K λ K λ 2) + b 2φ K 2 (4.70) From q(x 3 ) = q (x 3 ), we get b (λ K 2 + λ K ) + b 2 (λ K λ K ) = b (λ K λ ) + b 2(φ φ K 2 ) (4.7) b ( + φ + + φ K ) + b 2 ( + φ φ K = b ( + φ + + φ K ) + b 2(λ K λ K λ 2) + b 2φ K 2 From (4.69)-(4.72) we get b = b = b 2. From q(x 4 ) = q (x 4 ), we get 2 ) (4.72) b (λ K 2 + λ K ) = b (λ K 2 + λ K ) (4.73) 00

113 which gives λ = λ = λ 2. But this is a contradiction to λ λ 2. Case Φ and Φ 6: From q(x 3 ) = q (x 3 ), we get b (λ K ) + b 2 (λ K ) = b λ K 2 + which is not impossible for the same reason as in the Case Λ and Λ 2. Case Φ 2 and Φ 3: We have From q(x 2 ) = q (x 2 ), we get λ = /φ 2, λ 2 = λ, φ = φ, φ 2 = /λ 2. b 2 (λ K λ K ) = b (λ K 2 + λ K ) b 2 ( + φ φ K 2 ) = b ( + φ + + φ K ) which give b 2 = b and φ 2 = φ = φ, which is a contradiction to φ < φ 2. Case Φ 2 and Φ 4: From q(x 3 ) = q (x 3 ), then follow the same arguments in Case Φ and Φ 4. Case Φ 2 and Φ 5: We have From q(x 4 ) = q (x 4 ), we get λ = λ, λ 2 = /φ 2, φ = φ, φ 2 = /λ 2. b (λ K 2 + λ K ) = b (λ K 2 + λ K ) which gives b = b, so q (x 4 ) = q (x 4 ) and q 2 (x 4 ) = q 2(x 4 ). This is a case of L =. Case Φ 2 and Φ 6: From q(x 3 ) = q (x 3 ), then follow the same arguments in Case Φ and Φ 6. Case Φ 3 and Φ 4: From q(x 3 ) = q (x 3 ), then follow the same arguments in Case Φ and Φ 4. Case Φ 3 and Φ 5: We have From q(x 4 ) = q (x 4 ), we get λ = /φ 2, λ 2 = λ 2, φ = φ, φ 2 = /λ. b (λ K 2 + λ K ) = b (λ K 2 + λ K ) b ( + φ + + φ K ) = b ( + φ + + φ K ) 0

114 which give b = b, λ = λ, φ 2 = φ 2. Case Φ 3 and Φ 6: From q(x 3 ) = q (x 3 ), we get b (λ K 2 + λ K ) + b 2 λ K b 2 (φ φ K 2 ) = b λ K 2 + which is not possible for the same reason as in the Case Λ and Λ 2. Case Φ 4 and Φ 5: From q(x 3 ) = q (x 3 ), we get b λ K 2 + = b (λ K 2 + λ K ) + b 2λ K b 2 (φ φ K 2 ) which is not possible because the LHS is greater than, but the RHS is less than. Case Φ 4 and Φ 6: We have From q(x 4 ) = q (x 4 ), we get λ = λ, λ 2 = /φ 2, φ = φ, φ 2 = /λ 2. b (λ K 2 + λ K ) = b (λ K 2 + λ K ) which gives b = b, so q (x 4 ) = q (x 4 ) and hence q 2 (x 4 ) = q 2(x 4 ). This is the Case L = and hence λ 2 = λ 2, φ 2 = φ 2. Case Φ 5 and Φ 6: From q(x 3 ) = q (x 3 ), we get b (λ K 2 + λ K ) + b 2 λ K b 2 (φ φ K 2 ) = b λ K 2 + which is not possible for the same reason as in the Case Λ and Λ 2. Case Λ and Φ : From q(x 3 ) = q (x 3 ), we get 2 l= b l (λ K 2 l + λ K 2 l + + ) = b 2 λ K which is not possible for the same reason as in the Case Λ and Λ 2. Case Λ and Φ 2: Just as the case Λ and Φ. Case Λ and Φ 3: From q(x 3 ) = q (x 3 ), we get b (λ K 2 + λ K ) + b 2λ K b 2(φ φ K 2 ) = b 2 λ K which is not possible for the same reason as in the Case Λ and Λ 2. Case Λ and Φ 4: we have φ = /λ 2, λ = /φ 2, φ 2 = /λ, λ 2 = /φ (4.74) 02

115 From q(x 3 ) = q (x 3 ), we get (4.75) and (4.76) give which results in K = K 2. b φ K = b 2φ K 2 b 2 φ K 2 = b φ K (4.75) b 2 λ K 2 2 = b λ K 2 b λ K 2 = b 2λ K 2 2 (4.76) φ K λ K 2 = φ K 2 λ K 2 2 = φk 2 λ K Suppose now K = K 2 = K, From q(x 2 ) = q (x 2 ), we get (4.78) is equivalent to (4.77) and (4.79) give φ + + φ K λ K 2 + λ K 2 b (φ + + φ K ) = b 2(λ K ) (4.77) b (λ K 2 + λ K λ ) + b φ K = b 2( + φ φ K 2 ) (4.78) b (λ K 2 + λ K λ ) = b 2( + φ φ K 2 ) (4.79) + + λ = λ which implies λ = φ, λ 2 = φ 2. K 2 + λ K φ + + φ K = λk φ K Similarly from q(x 4 ) = q (x 4 ), we can get λ 2 = φ 2, λ = φ. + φ + + φ K λ K + λ K Summarize the above arguments, we get down to a symmetric case which we have proved the uniqueness of the parameters. Case Λ and Φ 5: From q(x 3 ) = q (x 3 ), we get b (λ K 2 + λ K ) + b 2λ K b 2(φ φ K 2 ) = b 2 λ K which is not possible for the same reason as in the Case Λ and Λ 2. Case Λ and Φ 6: we have From q(x 2 ) = q (x 2 ), we get φ = φ 2, λ = λ 2, φ 2 = /λ, λ 2 = /φ b (φ + + φ K ) = b 2(φ φ K 2 ) 03

116 which implies b = b 2. And so q (x 2 ) = q 2(x 2 ), q 2 (x 2 ) = q (x 2 ). And this is the case L =. Case Λ 2 and Φ : we have From q(x 2 ) = q (x 2 ), we get φ = φ 2, λ = λ 2, φ 2 = /λ, λ 2 = /φ (4.80) b 2 (φ φ K 2 ) = b 2(λ K ) (4.8) b 2 (λ K λ K λ 2 ) + b 2 φ K 2 = b ( + φ + + φ K ) (4.82) From q(x 3 ) = q (x 3 ), we get b (λ K ) + b 2 (φ φ K 2 ) = b (λ K ) + b 2(λ K ) (4.83) b ( + φ + + φ K ) + b 2 (λ K λ K λ 2 ) = b ( + φ + + φ K ) + b 2( + φ φ K 2 ) (4.84) From q(x 4 ) = q (x 4 ), we get b 2 λ K b 2 (φ φ K 2 ) = b (λ K 2 + λ K ) (4.85) b 2 (λ K λ K λ 2 ) = b ( + φ + + φ K ) (4.86) (4.80), (4.8) and (4.83) imply b = b 2, then follow the same arguments as in Λ and Φ 6. Case Λ 2 and Φ 2: we have From q(x 2 ) = q (x 2 ), we get φ = φ 2, λ = λ, φ 2 = /λ 2, λ 2 = /φ (4.87) b 2 (φ φ K 2 ) = b 2(λ K ) (4.88) b 2 (λ K λ K λ 2 ) + b 2 φ K 2 = b 2( + φ φ K 2 ) (4.89) From q(x 3 ) = q (x 3 ), we get b (λ K ) + b 2 (φ φ K 2 ) = b (λ K ) + b 2(λ K ) (4.90) 04

117 b ( + φ + + φ K ) + b 2 (λ K λ K λ 2 ) = b ( + φ + + φ K ) + b 2( + φ φ K From q(x 4 ) = q (x 4 ), we get 2 ) (4.9) b 2 λ K b 2 (φ φ K 2 ) = b (λ K 2 + λ K ) (4.92) b 2 (λ K λ K λ 2 ) = b ( + φ + + φ K ) (4.93) (4.87), (4.9) and (4.93) imply b = b 2. (4.87), (4.88) and (4.90) imply b = b. Then (4.88) and (4.9) imply φ = φ = φ 2, which is a contradiction to φ < φ 2. Case Λ 2 and Φ 3: we have From q(x 2 ) = q (x 2 ), we get φ = /λ 2, λ = /φ 2, φ 2 = /λ, λ 2 = /φ (4.94) b 2 (φ φ K 2 ) = b (λ K ) (4.95) b 2 (λ K λ K λ 2 ) + b 2 φ K 2 = b ( + φ + + φ K ) (4.96) From q(x 3 ) = q (x 3 ), we get b (λ K ) + b 2 (φ φ K 2 ) = b (λ K ) + b 2(φ φ K 2 ) b ( + φ + + φ K ) + b 2 (λ K λ K λ 2 ) = b ( + φ + + φ K ) + b 2(λ K λ K λ 2) From q(x 4 ) = q (x 4 ), we get (4.97) (4.98) b 2 λ K b 2 (φ φ K 2 ) = b (λ K 2 + λ K ) (4.99) b 2 (λ K λ K λ 2 ) = b ( + φ + + φ K ) (4.00) From (4.95) and (4.99), we get From (4.96) and (4.00), we get b 2 λ K 2 2 = b 2λ K 2 2 (4.0) b 2 φ K 2 = b 2φ K 2 (4.02) From (4.0) and (4.02), we get λ K 2 2 φ K = λ K 2 φ K 2 05

118 Suppose K 2 > K, since λ 2 φ = λ φ 2 =, we have λ 2 = λ. So λ 2 = /φ 2, which violates the condition /φ 2 < λ 2. Similar situation for K 2 < K. So K 2 = K. Suppose now K = K 2 = K, (4.95) and (4.97) give (4.98) and (4.00) give (4.03) and (4.04) give b (λ K + + ) = b 2(φ φ K 2 ) (4.03) b ( + φ + + φ K ) = b 2(λ K 2 + λ K λ 2) (4.04) λ K φ + + φ K = φ φ K 2 λ K 2 + λ K λ 2 which implies φ = λ and then φ 2 = λ 2. (4.96) and (4.99) give b 2 (λ K 2 + λ K ) = b ( + φ + + φ K ) b 2 (φ φ K 2 ) = b (λ K + + ) which imply φ 2 = λ 2 and φ = λ. But this is a symmetric case. Case Λ 2 and Φ 4: we have From q(x 3 ) = q (x 3 ), we get φ = /λ, λ = /φ 2, φ 2 = /λ 2, λ 2 = /φ (4.05) b (λ K 2 + λ K ) + b 2 λ K b 2 (φ φ K 2 ) = + b λ K 2 which is not possible for the same reason as in the Case Λ and Λ 2. Case Λ 2 and Φ 5: we have From q(x 2 ) = q (x 2 ), we get φ = /λ 2, λ = λ, φ 2 = φ 2, λ 2 = /φ (4.06) b 2 (φ φ K 2 ) = b 2(φ φ K 2 ) (4.07) b 2 (λ K λ K λ 2 ) + b 2 φ K 2 = b 2(λ K λ K λ 2) + b 2φ K 2 (4.08) (4.06) and (4.07) gives b 2 = b 2. Then (4.08) gives λ 2 = λ 2 which violates condition λ 2 < /φ. 06

119 Case Λ 2 and Φ 6: Just as Λ 2 and Φ 4, from q(x 3 ) = q (x 3 ), we get b (λ K 2 + λ K ) + b 2 λ K b 2 (φ φ K 2 ) = + b λ K 2 which is not possible for the same reason as in the Case Λ and Λ 2. Case Λ 3 and Φ : we have From q(x 2 ) = q (x 2 ), we get φ = /λ, λ = /φ 2, φ 2 = /λ 2, λ 2 = /φ (4.09) b (φ + + φ K ) = b (λ K ) (4.0) b (λ K 2 + λ K λ ) + b φ K = b ( + φ + + φ K ) (4.) From q(x 3 ) = q (x 3 ), we get b (φ + + φ K ) + b 2 (φ φ K 2 ) = b (λ K ) + b 2(λ K λ 2 + ) (4.2) b (λ K 2 + λ K λ ) + b 2 (λ K λ K λ 2 ) = b ( + φ + + φ K ) + b 2( + φ φ K 2 ) (4.3) From q(x 4 ) = q (x 4 ), we get b 2 λ K b 2 (φ φ K 2 ) = b (λ K 2 + λ K ) (4.4) b 2 (λ K λ K λ 2 ) = b ( + φ + + φ K ) (4.5) From (4.0) and (4.2) we get From (4.3) and (4.5) we get b 2 (φ φ K 2 ) = b 2(λ K ) (4.6) b (λ K 2 + λ K λ ) = b 2( + φ φ K 2 ) (4.7) From (4.0), (4.), (4.4) and (4.5) we get b (λ K 2 + λ K λ ) + b ( + φ + + φ K ) = 2 (4.8) b 2(λ K λ K λ 2) + b 2( + φ φ K 2 ) = 2 (4.9) 07

120 b (λ K 2 + λ K λ ) + b ( + φ + + φ K ) = 2 b 2 (λ K λ K λ 2 ) + b 2 ( + φ φ K 2 ) = 2 (4.20) (4.2) From (4.20), we know that when λ, φ are given, then b can be determined. From (4.20) and (4.0), we determine b. From (4.5), if additionally λ 2 is also given, then b 2 can be determined. In other words, provided that λ, φ, λ 2 are known, then b and b 2 can be determined, even φ 2 can be determined also. But it is not possible to determine the true values of b 2 and φ 2 based on λ, φ, λ 2 since we can use only equation (4.2) which has two unknown variables. Case Λ 3 and Φ 2: we have From q(x 2 ) = q (x 2 ), we get φ = /λ 2, λ = /φ 2, φ 2 = /λ, λ 2 = /φ (4.22) b (φ + + φ K ) = b 2(λ K ) (4.23) b (λ K 2 + λ K λ ) + b φ K = b 2( + φ φ K 2 ) (4.24) From q(x 3 ) = q (x 3 ), we get b (φ + + φ K ) + b 2 (φ φ K 2 ) = b (λ K ) + b 2(λ K λ 2 + ) (4.25) b (λ K 2 + λ K λ ) + b 2 (λ K λ K λ 2 ) = b ( + φ + + φ K ) + b 2( + φ φ K 2 ) (4.26) From q(x 4 ) = q (x 4 ), we get b 2 λ K b 2 (φ φ K 2 ) = b (λ K 2 + λ K ) (4.27) b 2 (λ K λ K λ 2 ) = b ( + φ + + φ K ) (4.28) From (4.26) and (4.28), we get b (λ K 2 + λ K λ ) = b 2( + φ φ K 2 ) (4.29) From (4.24) and (4.29), we get b φ K = b 2φ K 2 b 2 φ K 2 = b φ K (4.30) 08

121 From (4.23) and (4.25), we get From (4.27) and (4.3), we get From (4.30) and (4.32), we get b 2 (φ φ K 2 ) = b (λ K ) (4.3) b 2 λ K 2 2 = b λ K 2 b λ K 2 = b 2λ K 2 2 (4.32) φ K λ K 2 = φ K 2 λ K 2 2 K = K 2 = K Suppose now K = K 2 = K, (4.28) and (4.3) give λ K λ K λ 2 φ φ K 2 which implies λ 2 = φ 2 and λ = φ. = + φ φ K 2 λ K Similarly (4.23) and (4.29) imply λ = φ and λ 2 = φ 2. So the cases here correspond to symmetric cases. Case Λ 3 and Φ 3: we have From q(x 2 ) = q (x 2 ), we get φ = /λ, λ = λ 2, φ 2 = φ 2, λ 2 = /φ (4.33) b (φ + + φ K ) = b (λ K ) (4.34) b (λ K 2 + λ K λ ) + b φ K = b ( + φ + + φ K ) (4.35) From q(x 3 ) = q (x 3 ), we get b (φ + + φ K ) + b 2 (φ φ K 2 ) = b (λ K ) + b 2(φ φ K 2 ) (4.36) b (λ K 2 + λ K λ ) + b 2 (λ K λ K ) = b ( + φ + + φ K ) + b 2(λ K λ K λ 2) (4.37) From q(x 4 ) = q (x 4 ), we get b 2 λ K b 2 (φ φ K 2 ) = b (λ K 2 + λ K ) (4.38) 09

122 b 2 (λ K λ K λ 2 ) = b ( + φ + + φ K ) (4.39) From (4.37) and (4.39), we get b (λ K 2 + λ K λ ) = b 2(λ K λ K λ 2) (4.40) From (4.34) and (4.36), we get From (4.40) and (4.4), we get b 2 (φ φ K 2 ) = b 2(φ φ K 2 ) (4.4) b 2 = b 2 = b So q (x) = q 2(x), q 2 (x) = q (x). So we get L = case. Case Λ 3 and Φ 4: From q(x 3 ) = q (x 3 ) we have b λ K 2 + b 2 λ K l= (φ l + + φ K l ) = b λ K 2 + which is not possible for the same reason as in the Case Λ and Λ 2. Case Λ 3 and Φ 5: we have From q(x 2 ) = q (x 2 ), we get φ = /φ 2, λ = λ 2, φ 2 = /λ, λ 2 = /φ (4.42) b (φ + + φ K ) = b 2(φ φ K 2 ) (4.43) b (λ K 2 + λ K λ ) + b φ K = b 2(λ K λ K λ 2) + b 2φ K 2 (4.44) From q(x 3 ) = q (x 3 ), we get b (φ + + φ K ) + b 2 (φ φ K 2 ) = b (λ K ) + b 2(φ φ K 2 ) (4.45) b (λ K 2 + λ K λ ) + b 2 (λ K λ K ) = b ( + φ + + φ K ) + b 2(λ K λ K λ 2) (4.46) From q(x 4 ) = q (x 4 ), we get b 2 λ K b 2 (φ φ K 2 ) = b (λ K 2 + λ K ) (4.47) 0

123 b 2 (λ K λ K λ 2 ) = b ( + φ + + φ K ) (4.48) From (4.43), (4.45), (4.46) and (4.48), we get b = b 2 which implies q (x) = q 2(x). And then we have the same situation as in the case Λ 3 and Φ 3 Case Λ 3 and Φ 6: From q(x 3 ) = q (x 3 ) we get the same equation as in the Case Λ 3 and Φ 4 which is not possible. Case Λ 4 and Φ : we have From q(x 2 ) = q (x 2 ), we get φ = /λ 2, λ = /φ 2, φ 2 = /λ, λ 2 = /φ (4.49) b 2 (φ φ K 2 ) = b (λ K ) (4.50) b 2 (λ K λ K λ 2 ) + b 2 φ K 2 = b ( + φ + + φ K ) (4.5) From q(x 3 ) = q (x 3 ), we get b (φ + + φ K ) + b 2 (φ φ K 2 ) = b (λ K ) + b 2(λ K λ 2 + ) (4.52) b (λ K 2 + λ K λ ) + b 2 (λ K λ K λ 2 ) = b ( + φ + + φ K ) + b 2( + φ φ K 2 ) (4.53) From q(x 4 ) = q (x 4 ), we get b 2 λ K b 2 (φ φ K 2 ) = b (λ K λ + ) (4.54) b 2 (λ K λ K λ 2 ) = b ( + φ + + φ K ) (4.55) From (4.50) and (4.52), we get From (4.53) and (4.55), we get b (φ + + φ K ) = b 2(λ K λ 2 + ) (4.56) b (λ K 2 + λ K λ ) = b 2( + φ φ K 2 ) (4.57) From (4.50), (4.54), (4.5)and (4.55) we get b 2 φ K 2 = b φ K b φ K = b 2φ K 2 (4.58)

124 (4.58) and (4.59) give which results in K = K 2. b 2 λ K 2 2 = b λ K 2 b λ K 2 = b 2λ K 2 2 (4.59) φ K λ K 2 = φ K 2 λ K 2 2 = φk 2 λ K Suppose now K = K 2 = K, From (4.56) and (4.57), we get φ + + φ K λ K 2 + λ K λ = λ which implies λ = φ, λ 2 = φ 2. K 2 + λ K φ + + φ K = λk φ K Similarly from (4.54) and (4.55), we can get λ 2 = φ 2, λ = φ. And then we get symmetric cases. Case Λ 4 and Φ 2: we have From q(x 2 ) = q (x 2 ), we get + φ + + φ K λ K + λ K φ = /λ, λ = /φ 2, φ 2 = /λ 2, λ 2 = /φ (4.60) b 2 (φ φ K 2 ) = b 2(λ K ) (4.6) b 2 (λ K λ K λ 2 ) + b 2 φ K 2 = b 2( + φ φ K 2 ) (4.62) From q(x 3 ) = q (x 3 ), we get b (φ + + φ K ) + b 2 (φ φ K 2 ) = b (λ K ) + b 2(λ K λ 2 + ) (4.63) b (λ K 2 + λ K λ ) + b 2 (λ K λ K λ 2 ) = b ( + φ + + φ K ) + b 2( + φ φ K 2 ) (4.64) From q(x 4 ) = q (x 4 ), we get b 2 λ K b 2 (φ φ K 2 ) = b (λ K λ + ) (4.65) b 2 (λ K λ K λ 2 ) = b ( + φ + + φ K ) (4.66) From (4.6) and (4.63), we get b (φ + + φ K ) = b (λ K ) (4.67) 2

125 From (4.64) and (4.66), we get b (λ K 2 + λ K λ ) = b 2( + φ φ K 2 ) (4.68) From (4.6), (4.62), (4.65) and (4.66) we can get equations (4.20), (4.2), (4.8), (4.9). Then the arguments follow similarly in Case Λ 3 and Φ. And we say (4.6)-(4.66) can not be satisfied. Case Λ 4 and Φ 3: we have From q(x 2 ) = q (x 2 ), we get φ = φ 2, λ = λ 2, φ 2 = /λ, λ 2 = /φ (4.69) b 2 (φ φ K 2 ) = b (λ K ) (4.70) b 2 (λ K λ K λ 2 ) + b 2 φ K 2 = b ( + φ + + φ K ) (4.7) From q(x 3 ) = q (x 3 ), we get b (φ + + φ K ) + b 2 (φ φ K 2 ) = b (λ K ) + b 2(φ φ K 2 ) (4.72) b (λ K 2 + λ K λ ) + b 2 (λ K λ K λ 2 ) = b ( + φ + + φ K ) + b 2(λ K λ K λ 2) From q(x 4 ) = q (x 4 ), we get (4.73) b 2 λ K b 2 (φ φ K 2 ) = b (λ K λ + ) (4.74) b 2 (λ K λ K λ 2 ) = b ( + φ + + φ K ) (4.75) From (4.70) and (4.72), we get b (φ + + φ K ) = b 2(φ φ K 2 ) (4.76) b (λ K 2 + λ K λ ) = b 2(λ K λ K λ 2) (4.77) From (4.76) and (4.77), we get b = b 2 which implies q (x) = q 2(x) and then q 2 (x) = q (x) which becomes a case of L =. Case Λ 4 and Φ 4: just as Case Λ 3 and Φ 4. Case Λ 4 and Φ 5: we have φ = /λ, λ = λ 2, φ 2 = φ 2, λ 2 = /φ (4.78) 3

126 From q(x 2 ) = q (x 2 ), we get b 2 (φ φ K 2 ) = b 2(φ φ K 2 ) (4.79) b 2 (λ K λ K λ 2 ) + b 2 φ K 2 = b 2(λ K λ K λ 2) + b 2φ K 2 (4.80) (4.79) implies b 2 = b 2. (4.80) implies λ 2 = λ 2 = λ which violates λ < λ 2. Case Λ 4 and Φ 6: just as Case Λ 3 and Φ 4. Case Λ 5 and Φ : From q(x 3 ) = q (x 3 ), we get + b 2 λ K 2 2 = 2 l= b l (λ K 2 l + λ K 2 l + + ) which is not possible for the same reason as in the Case Φ 4 and Φ 5. Case Λ 5 and Φ 2: just as Case Λ 5 and Φ. Case Λ 5 and Φ 3: From q(x 3 ) = q (x 3 ), we get + b 2 λ K 2 2 = b (λ K λ K ) + b 2 λ K b 2 (φ φ K 2 ) which is not possible for the same reason as in the Case Φ 4 and Φ 5. Case Λ 5 and Φ 4: we have From q(x 2 ) = q (x 2 ), we get φ = φ 2, λ = λ 2, φ 2 = /λ, λ 2 = /φ (4.8) b (λ K ) = b 2(λ K ) (4.82) b ( + φ φ K 2 ) = b 2( + φ φ K 2 ) (4.83) (4.82) and (4.83) imply b = b 2 and q (x) = q 2(x). So this is a case of L =. Case Λ 5 and Φ 5: just as Case Λ 5 and Φ 3. Case Λ 5 and Φ 6: we have From q(x 2 ) = q (x 2 ), we get φ = /λ 2, λ = /φ 2, φ 2 = /λ, λ 2 = /φ (4.84) b (λ K ) = b 2(λ K ) (4.85) b ( + φ + + φ K ) = b 2( + φ φ K 2 ) (4.86) 4

127 From q(x 3 ) = q (x 3 ), we get b φ K = b 2φ K 2 b 2 φ K 2 = b φ K (4.87) b 2 λ K 2 2 = b λ K 2 b λ K 2 = b 2λ K 2 2 (4.88) (4.87) and (4.88) imply K = K 2. From q(x 4 ) = q (x 4 ), we get b 2 λ K2 + b 2 (φ φ K 2 ) = b (λ K 2 + λ K ) (4.89) b 2 (λ K λ K λ 2 ) = b ( + φ + + φ K ) (4.90) (4.85) and (4.86) imply λ = φ, λ 2 = φ 2. (4.87), (4.88), (4.89) and (4.90) imply λ 2 = φ 2, λ = φ. So we get the symmetric cases. Case Λ 6 and Φ : we have From q(x 2 ) = q (x 2 ), we get φ = φ 2, λ = λ, φ 2 = /λ 2, λ 2 = /φ (4.9) b (λ K ) = b (λ K ) (4.92) b ( + φ + + φ K ) = b ( + φ + + φ K ) (4.93) (4.9) and (4.92) imply b = b. (4.93) implies φ = φ = φ 2 which violates the condition φ < φ 2. Case Λ 6 and Φ 2: we have From q(x 2 ) = q (x 2 ), we get φ = φ 2, λ = λ, φ 2 = /λ 2, λ 2 = /φ (4.94) b (λ K ) = b 2(λ K ) (4.95) b ( + φ + + φ K ) = b 2( + φ φ K 2 ) (4.96) (4.94) and (4.95) imply b = b 2. (4.96) implies φ = φ = φ 2 which violates the condition φ < φ 2. Case Λ 6 and Φ 3: we have φ = /λ 2, λ = λ, φ 2 = φ 2, λ 2 = /φ (4.97) 5

128 From q(x 2 ) = q (x 2 ), we get b (λ K ) = b (λ K ) (4.98) b ( + φ + + φ K ) = b ( + φ + + φ K ) (4.99) We get b = b, φ = φ. So q (x) = q (x), q 2 (x) = q 2(x) and the case becomes a case of L =. Case Λ 6 and Φ 4: From q(x 3 ) = q (x 3 ), we get b (λ K 2 + λ K ) + b 2 λ K b 2 (φ φ K 2 ) = b λ K 2 + which is not possible for the same reason as in the Case Λ and Λ 2. Case Λ 6 and Φ 5: we have From q(x 2 ) = q (x 2 ), we get φ = /λ 2, λ = /φ 2, φ 2 = /λ, λ 2 = /φ (4.200) b (λ K ) = b 2(φ φ K 2 ) (4.20) b ( + φ + + φ K ) = b 2(λ K λ K λ 2) + b 2φ K 2 (4.202) From q(x 3 ) = q (x 3 ), we get b (λ K ) + b 2 (φ φ K 2 ) = b (λ K ) + b 2(φ φ K 2 ) (4.203) b ( + φ + + φ K ) + b 2 (λ K λ K λ 2 ) = b ( + φ + + φ K ) + b 2(λ K λ K λ 2) (4.204) From q(x 4 ) = q (x 4 ), we get b 2 λ K b 2 (φ φ K 2 ) = b (λ K 2 + λ K ) (4.205) b 2 (λ K λ K λ 2 ) = b ( + φ φ K 2 ) (4.206) From (4.20) and (4.20), we get b 2 (φ φ K 2 ) = b (λ K ) (4.207) b ( + φ φ K 2 ) = b 2(λ K λ K λ 2) (4.208) 6

129 From (4.205) and (4.205), we get From (4.202) and (4.208), we get b 2 λ K 2 2 = b λ K 2 b λ K 2 = b 2λ K 2 2 (4.209) b φ K = b 2φ K 2 b 2 φ K 2 = b φ K (4.20) (4.209) and (4.20) imply K = K 2. But (4.20) and (4.208) imply λ = φ and (4.206) and (4.206) imply λ 2 = φ 2. This gets down to a symmetric case. Case Λ 6 and Φ 6: From q(x 3 ) = q (x 3 ), we get b (λ K 2 + λ K ) + b 2 λ K b 2 (φ φ K 2 ) = b λ K 2 + which is not possible for the same reason as in the Case Λ and Λ 2. 7

130 Chapter 5 Multivariate Extreme Value Analysis to Value at Risk 5. Introduction How to manage a portfolio efficiently, with the highest expected return for a given level of risk, or equivalently, the least risk for a given level of expected return, is the key to the success or failure of a financial system. J.P. Morgan s RiskMetrics TM (996) defines risk as the degree of uncertainty of future net return. Financial systems face many risks which may result in financial collapse if not appropriately managed. According to RiskMetrics, a common classification of risks gives four main categories of risk which are Credit Risk (the potential loss because of the inability of a counterparts to meet its obligation), Operational Risk (errors that can be made in instructing payments or setting transactions), Liquidity Risk (inability of a firm to fund its illicit assets), and Market Risk (loss resulting from a change in the value of traceable assets). This work focuses on ways of modeling market risk via multivariate extreme value theories and methods. To be aware of and to understand risks which a manager will face are very important to the modern financial management. Especially, as financial trading systems have been extended broadly and become more sophisticated, there has been increased awareness of the dangers of very large losses. For example, see Smith (2000), large price movements in security markets may cause the failure of financial systems. Examples include the bankruptcy of Baring Bank, Daiwa Bank and Orange County in California. The most spectacular example to date was the near-collapse of the hedge fund Long Term Capital Management in September 998. LTCM was trading a complex mixture of derivatives

131 which, according to some estimates, gave it an exposure to market risk as high as $200 billion. Things started to go wrong after the collapse of the Russian economy in the summer of 998, and to avoid a total collapse of the company, 5 major banks contributed to a $3.75 billion rescue package. From the insurance industry, very large claims can cause insurance companies to go bankrupt. Embrechts et al. (997) lists 30 most costly insurance losses in table (the largest one is 6,000 million dollars) and 30 worst catastrophes in terms of fatalities in table 2 (the largest fatality is 300,000), both taken from Sigma (996). These events are relatively rare, but important. These and other examples have increased awareness of the need to quantify probabilities of large losses, and for risk management systems to control such events. A tool called Value at Risk (henceforth, VaR) has been increasingly employed by many banks. It gained a higher profile in 994 when J.P. Morgan published its RiskMetrics system. The Basle Committee on Banking Supervision has proposed in 996 that internal VaR models may be used in the determination of the capital requirements that banks must fulfill to back their trading activities (cf. Dave and Stahl 999). Books, like Jorion (996), Dowd (998), aimed at financial academics and traders and explained the statistics basis behind VaR. Best (998) is aimed at the risk management practitioners. Dave and Stahl (999) s working paper studied 5 different VaR models with real data performance analysis. Among many applications and models, portfolio returns are assumed normally distributed, or tail normally distributed. Such assumption makes the estimation easy. However this may underestimate the risk of the system which actually has a fat-tailed distribution. Most financial data are actually distributed with fat tail. LTCM and banks have been criticized for not stress-testing risk models against extreme market movements (Embrechts, 999). Also back to November 995, the Director of the Federal Reserve, Mr. A. Greenspan stated work that characterizes the distribution of extreme events would be useful as well (Embrechts, 999). The excellent recent book by Embrechts et al. (997) surveys the mathematical theory of EVT and discusses its applications to both financial and insurance risk management. Although the use of EVT in finance and insurance industries has a considerable literature on the subject, especially there is a much longer history of its use in the insurance industry, most applications are restricted in univariate stochastic process data. Again Embrechts et al. (997) is an excellent literature. Smith (2000) presented 9

132 a demonstration of the merits of combining established models for extreme values with modern statistical techniques including Bayesian inference, hierarchical models and Monte Carlo sampling for real insurance data. Tsay (999) applied extreme value theory to investigate the occurrence times and excesses over some high thresholds of financial time series (S&P index). Both two works used similar methodology though they had different kinds of data. As stated in Embrechts et al. (997), in 900 Louis Bachelier showed that Brownian motion lies at the heart of any model for asset returns; around the same time, Filip Lundberg showed that the homogeneous Poisson process, after a suitable time transformation, is the key model for insurance liability data. As to multivariate aspects, not much work has been done. However, it is very natural to consider multivariate extremes in which a portfolio of asset returns is under high risk due to a combination of various processes at extreme levels. For instance, daily exchange rates for the value of US dollar against foreign currency, or insurance models in which there are several types of claims each day. There exist dependence structures among the various assets in a portfolio. If the composition of the portfolio is held fixed, then it may be enough to only assess the composition risk of the portfolio, which can be done by applying univariate EVT. However, to manage a portfolio efficiently, or equivalently to optimize the portfolio, the real rationale for considering multivariate aspects is often to help design the portfolio. The famous mean-variance approach first introduced by Markowitz is broadly used in financial management (Markowitz 952, 987, Korn 997, Michaud 998). The approach is based on an assumption of multivariate normality for the joint distribution of assets or securities. One of its formulae is to maximize the expected return subject to given risk (which is the variance of a linear combination of assets). An alternative option is to use VaR as the constraints, which we will investigate further and has drawn much more attention in financial management. Conventional VaR theory is highly questionable due to the joint multivariate normal distribution assumption of log returns which may not be appropriate to the fat-tailed data and may result in an underestimate of the risk. One approach to the problem is through multivariate EVT. Resnick (987) is an excellent source of information on possible approaches. Due to no standard notion of order in high dimensional Euclidean space, most approaches to date have focused on the one dimensional case. The good news is that there is a considerable progress. Coles and Tawn (99, 994) have done an impressive progress on modeling extreme multivariate events and made multivariate extreme value theory into a very practical method of data 20

133 analysis. Embrechts, de Haan and Huang (999) presented an approach for modeling tail events and showed results in 2-dimensional case theoretically and numerically. Smith and Weissman (996) have proposed some alternative representations of extreme value processes to characterize the joint distribution of extremes in multivariate time series. They showed under fairly general conditions, extremal properties of a wide class of multivariate time series may be calculated by approximating the process in the tails by one of M4 form. 5.2 VaR methods In risk management, one of its functions is to measure risk the financial system is exposed to. Questions like: how much could a bank lose on a normal trading day? Or what kind of risk the bank exposes to the market? These and other questions are very often asked to the risk manager by the CEO of the bank. To answer these questions, we seek some risk measurements to quantify the risk of all trading positions of the bank. However, some traditional risk measurements have difficulties to answer questions like those we have asked. Traditional methods usually only calculate each individual risk of market variables invested in the market by a financial institution. The overall market risk cannot be efficiently measured because of the number of market variables (hundreds or thousands) and very long computing time needed. These methods may be of benefit to traders who manage the trading activities for each financial instrument, but are not very useful to senior risk managers or regulators. For example, the variance of the portfolio return tells how variable the return is, but does not tell us how likely and what amount of money the bank will lose. VaR methods can overcome those difficulties the traditional methods suffered. VaR is defined as the value at risk is the maximum possible loss on a portfolio over a given time interval, with a given level of confidence. Statistically, if we let X t be the loss over time t within a time horizon, say [0, T ], and the confidence level is α, then the VaR is just the upper α percentile x α of the random variable X T, which is Pr(X T < x α ) > α. (5.) In the literature, there are three typical methods to calculate VaR, i.e. variancecovariance approach, historical simulation approach, and Monte-Carlo simulation approach. We will briefly describe these methods further in the following subsections. 2

134 5.2. Variance-Covariance approach The variance-covariance approach is first widely used VaR calculation method. The references, among others, include Chapter of Jorion (997), Chapter 3 of Dowd (998), Chapter 2 of Best (998). This approach assumes the return has a normal distribution for a single asset or the returns have jointly multivariate normal distribution for a portfolio with multiple assets. How to implement VaR calculation is presented as in the following. Consider now a portfolio of d assets and at time t the returns are R t, R 2t,..., R dt. We write R t = (R t, R 2t,..., R dt ). R t is assumed to be jointly normally distributed, i.e. R t N(µ, Σ). Let w = (w t,..., w dt ) be the weights of each individual position and sum to unity. The return of the portfolio at time t is defined as d R p t = w it R it = w R t. (5.2) i= So the VaR for the portfolio is calculated from Pr(R p t > c) = α. (5.3) Since R p t is distributed as N(w µ, w Σw) and hence c = w µ + z α (w Σw) 2 (5.4) where z α is the standard normal upper α percentile. And finally the VaR for the portfolio is V ar p = V c (5.5) where V is the original position value or the investment. In practice, however, it is customary to assume that the expected price change is zero with given time period. The VaR for a portfolio is calculated simply as V ar p = V z α (w Σw) 2 = (v Σv) 2 (5.6) where v is a vector of VaRs for each individual position Historical simulation approach The historical simulation approach is to use the empirical distribution based on the past data. Theoretically, the empirical distribution converges to the true distribution 22

135 of the interest random variable, here it is the portfolio return. The idea is to use the empirical distribution to approximate the true distribution and to read off VaR from the empirical distribution curve, usually a histogram. The advantages of using historical simulation include no model assumption and no distribution required; VaR can be read directly from a spread sheet. It also can capture the fat tailed behavior of the data with enough observations and very large observed values. But there are some disadvantages. For example, it may not be a good way to represent very extreme events because there is no possibility of extrapolation beyond the observed range of the data. It also assumes stationarity in time Monte-Carlo simulation approach While the historical simulation approach using historical data to construct empirical distribution, the Monte-Carlo simulation approach draws the data from a random process and uses the drawn values to construct empirical distributions and further read off the VaR from the constructed empirical distribution. Here we need a random process to represent the price change. To achieve this, a continuous stochastic process which can be written into a stochastic model is often used in the literature. A simple model of price change may have the form dx t = σdw t where W t is a Brownian motion, and σ is known or at least estimable. This simple model has a solution X t = X 0 + σw t. From this representation, we can draw values of X t from Monte-Carlo simulation scheme which draws a random number from a random number generator, and then transforms this number into a normal random variate, and finally gets a simulated return value. This procedure is repeated over and over, until we think we have enough values which can be used to construct an empirical distribution. Note: since the example model here is rather simple, X t itself is normally distributed. We can calculate the V ar from Variance-covariance approach directly. The Monte-Carlo simulation approach is usually taking advantage when the portfolio contains multiple assets and relatively complicated structure which may not be easily solved by simpler approaches. 23

136 5.3 Extreme value approaches The three methods described in the previous section have their own disadvantages. In general VaR is regarded as an extreme quantile of the loss distribution because we are interested in a 95%, 99% or even higher confidence level. For example, Bankers Trust uses 99%; Chemical and Chase uses 97.5%; Citibank, a 95.4% level; Bank America and J. P. Morgan uses 95% (Jorion, p20, 997). Financial returns are fat-tailed and exhibit the form of clustering which usually caused by extreme price movements. The variance-covariance approach may not be suitable because its normality assumption. Historical simulation may not work due to lack of extreme observations. Also it is hard to use historical simulation to characterize the dependence structure among assets of a portfolio. The Monte-Carlo simulation also has a normality assumption for the underlying stochastic process. The extreme value approach has drawn a major attention in VaR study recently. It has advantages in analyzing fat tailed data, which financial data are, and extrapolating beyond the range of observed data. Just as the variance-covariance approach has a distributional assumption, extreme value approaches assume the underlying limiting (for the extreme values) distributions are extreme type distributions. There are three types of extreme value distributions which we stated in Chapter and they can be written into a generalized form H(x; µ, σ, ξ) = exp{ [ + ξ(x µ) ] /ξ } (5.7) σ which is (.6) in Chapter. Now to calculate VaR of the underlying loss distribution is equivalent to computing the extreme quantile of (5.) in case a portfolio only has one position or is statistically univariate. It is known that the normal distribution is stable and the extreme value distributions are max-stable. As a result, if R it in (5.2) follows one of the extreme type distributions, R p t won t follow any extreme value distribution. But the limiting form of R t will follow multivariate extreme distribution which we are interested in. So the VaR calculation is transformed into the calculation of critical value c in (5.3) from a multivariate extreme value distribution. The multivariate extreme value distribution has no explicit forms, unlike the univariate extreme value distributions. We need to adopt one of the existing multivariate extreme value distributions to model the data. But the observed data are not independent, so we then need to estimate the extreme index for the multivariate time series 24

137 under stationarity conditions. Since financial data exhibits clustering, it may be a better choice to model the extreme value of the data by using one of the M4 process since we have seen under mild general conditions a stationary stochastic process can be approximated in the tails by a max-stable process and very closely a max-stable process can be approximated by an M4 process. We present applications of M4 process to the financial data in section Optimal portfolio theories Not putting all one s eggs in one basket has been a basic concept for a long history if one suspects the basket is not completely secured. In finance, portfolio diversification has been thought as an essential component of modern risk management. To minimize the risk will result in investing money on those market variables with smaller risk. On the other hand, an investor expects to gain the maximum possible returns with his investment. In general, the higher the risk, the higher the return. These two investment strategies, low risk and high return, are opposite to each other. Naturally an investor would seek an optimal investment plan with which either to maximize the portfolio mean return such that the estimated risk is not higher than an upper risk limit or to minimize the risk such that the mean return is not lower than a lower mean return limit within a given time period. This is referred as the portfolio problem in the literature. The mean-variance approach pioneered by Markowitz (Markowitz 952, 987, Korn 997, Michaud 998) is the earliest approach to solve a portfolio investment problem. Although it is a one time period model approach it is still highly valued. In 990, Harry M. Markowitz, Merton M. Miller and William E. Sharpe gained the Nobel Prize in economic sciences for their pioneering work in the theory of financial economics. Let s now assume a portfolio consists of d assets. At time t = 0, an investor has to decide how many shares of each asset to hold until time t = T. Suppose the proportion of total money invested on asset i is π i and the price of asset i at time t is P i (t), then the return of asset i is R i = P i(t ) P i. Finally the portfolio return is (0) R = d π i R i i= 25

138 and the mean return is defined by µ(π) = E(R) = d π i E(R i ) = i= d π i µ i i= and the variation of the portfolio return is d d σ 2 (π) = Var(R) = π i Cov(R i, R j )π j = π i σ ij π j. i,j= i,j= Then mean-variance approach is to decide the optimal investment plan π by solving one of the following two optimization problems. min σ 2 (π) π R d d (5.8) s.t. µ(π) µ low, π i 0, π i = i= where µ low is lowest mean return. Or max µ(π) π R d s.t. σ 2 (π) σmax, π i 0, d π i = i= (5.9) where σmax is regarded as maximum risk one can take. (5.8) is a quadratic optimization problem. There is a unique solution. Under some conditions (5.8) and (5.9) are equivalent. In practice, (5.9) seems more natural. We adopt (5.9) as a basic model in this present work and extend it to the VaR constraints calculated from a extreme value distribution. Traditional risk measurement, for example, the variance-covariance approach, may under-estimate the risk a financial institution exposed to. And so we model the portfolio returns by using a multivariate extreme value distribution function, especially in this present case we adopt M4 processes modeling. And so our optimization model is max µ(π) π R d d (5.0) s.t. Pr(R p > c) = α, π i 0, π i = where R p = d i= π i R i = π R, and the limiting distribution of exceedances over a high threshold of R follows a multivariate extreme value distribution. i= 26

139 5.5 Dynamic financial data modeling In this section we will model financial time series data as M4 processes. Stock prices of GE, CITIBANK and Pfizer will be studied. Parameter estimates of M 4 processes are based on an multivariate time series of approximately 5000 days. We first look at those extreme values of the negative returns and check whether extreme value distribution fitting is appropriate or not by using mean excess plot, Z-plot and W -plot. The data are standardized using GARCH(,) model which gives estimated conditional standard deviation. Then we check whether extreme value distribution fitting is appropriate or not to the standardized time series. A generalized Pareto distribution is used to fit the data above certain threshold(.02 is used in this study) for each sequence. The data are then transformed into Fréchet scale from fitted GPD function. The transformed data are used in M4 processes modeling. We begin at introducing some concepts and basic backgrounds Mean excess plot The mean excess plot is a plot of the mean of all excess values over a threshold u against u itself. It usually suggests whether a extreme value distribution fitting is appropriate or not. It is very useful for initial diagnostics and selecting the threshold. It is based on the following identity: if Y has a generalized Pareto distribution, provided ξ <, then for threshold u > 0, define the mean excess function e(u) = E{Y u Y > u} = α + ξu ξ. Thus, a sample plot of mean excess against threshold should be approximately a straight line if the model is correct Z-statistics and W -statistics The underlying idea behind these analysis of Z-statistics and W statistics is the pointprocess approach to univariate extreme value modeling due to Smith (989). According to this viewpoint, the exceedance times and excess values of a high threshold are viewed as a two-dimensional point process. If the process is stationary and satisfies a condition that there are asymptotically no clusters among the high-level exceedances, then its limiting form is non-homogeneous Poisson. Smith and Shively (995) introduced 27

140 a number of diagnostic devices to examine the fit of the generalized extreme value distributions. One idea is based on what they called Z-statistics Z k = Tk T k Λ s (u)ds where T k denotes the time of the k th exceedance of u. Λ t (x) is given by x µ t Λ t (x) = ( + ξ t ) /ξt + ψ t the intensity of a nonhomogeneous Poisson process of exceedances of a level x. the model is correct, then Z, Z 2,..., will be independent exponentially distributed random variables with mean. The Z-statistics are an indication of how closely the exceedances of a fixed level u are represented by a nonhomogeneous Poisson process, but they do not test the generalized Pareto distribution assumption for the distribution of excesses over the threshold. This can be done via W -statistics: W k = Y k u log[ + ξ Tk ξ Tk ψ Tk + ξ Tk {u µ Tk } ]. Then W, W 2,... are also independent exponential random variables with mean, if the model is correct. These techniques have been broadly used in model diagnostics, for example, Tsay (999), Smith and Goodman (2000) GARCH(,) model Traditional time series model AR(p) assumes constant variance cross the time which experience has shown not the case. GARCH, generalized autoregressive conditional heteroscedasticity, process model the residual of a time series regression. The model was proposed by Bollerslev ( 986 ). It does not assume the constant variance. Research has shown that it has been quite successful to use GARCH model of fitting financial time series. We now introduce the GARCH model. where Suppose time series regression has the form y t = φ 0 + φ y t + + φ p y t p + u t, u t = h t v t h t = κ + δ h t + + δ r h t r + θ u 2 t + + θ s u 2 t s v t N(0, ). These three formula together are called GARCH(r,s) model. 28 If

141 5.5.4 Initial diagnostics Figure 5. are time series plots of three stock returns. We can see there are extreme observed values in each sequence and they are not stationary. A data transformation may be needed in order to have stationarity and apply M4 process modeling. Before we do that, we check whether extreme value distribution fittings are appropriate or not for the observations which are above certain threshold. Figure 5.2 is initial diagnosis which suggests that extreme value distribution fitting for the Pfizer data but not for the other two data sets. Further diagnosis based on Z and W statistics are used. All W -plots from Figures suggest a generalized extreme value distribution fitting is appropriate. Some caution should be taken since a few points, partly the result of Oct. 87 crash, are away from the straight line. But Z-plots do not suggest a generalized extreme value distribution fitting Data transformation As we mentioned earlier, our goal is to model M4 process to the three time series data sets. We now use GARCH(,) to model the volatility. Figure 5.6 shows estimated conditional standard deviation. Figure 5.7 shows standardized time series. Visually they look stationary. Figure suggest a generalized extreme value distribution fitting is appropriate. Notice that the earlier Z-plots were not consistent with the model, but now they are. After fitting the generalized extreme value distribution, the data set are transformed into unit Fréchet scale and the transformed data are plotted in Figure 5.. Since an M4 process has double indexes, one for signature patterns and one for moving range, we need to determine the order of moving range and the number of signature patterns. We apply graphical diagnostic methods to determine the order and propose a criteria to determine the number of signature patterns. Based on the properties that an M 4 process appears clustered observations when an extreme observation occurs, we check those observed values which are larger than a certain threshold. Empirical counts can tell both the moving range order and the dependence range. We look at the counts of paired neg-daily returns on unit Fréchet scale in different ways. We count the days when two different stock products both had price drops over certain threshold. We count the days when a single stock product had 29

142 0.2 Negative Daily Return Neg Log Return /04/82 09/30/84 06/27/87 03/23/90 2/7/92 09/3/95 06/09/98 03/05/0 GE Neg Log Return /04/82 09/30/84 06/27/87 03/23/90 2/7/92 09/3/95 06/09/98 03/05/0 CITI Neg Log Return /04/82 09/30/84 06/27/87 03/23/90 2/7/92 09/3/95 06/09/98 03/05/0 PFIZER Figure 5.: These figures show that there are extreme observations and the greatest drop happened in the same day in all three time series, i.e. October 9, 987, the date of the Wall Street crash. 30

143 Mean excess plots with confidence bands (a) Pfizer 0.05 Mean excess Threshold y (b) GE Mean excess Threshold y (c) Citibank 0.05 Mean excess Threshold y Figure 5.2: The mean excess plot usually suggests whether a extreme value distribution fitting is appropriate or not. The plot for the Pfizer data suggests extreme value distribution fitting since the plot is contained in its corresponding confidence interval. The other two are more doubtful since the plot goes outside the confidence bands, though further analysis shows that the extreme value approximation is reasonable in this case also. 3

144 Z values vs.time GE Data W values vs. time QQ plot for Z QQ plot for W Correlation plot for Z Correlation plot for W Lag Lag Figure 5.3: W -plots show a generalized extreme value distribution fitting is appropriate. Some caution should be given since a few points, partly the result of Oct87 crash, are away from the straight line. 32

145 Z values vs.time CITI Bank Data W values vs. time QQ plot for Z QQ plot for W Correlation plot for Z Correlation plot for W Lag Lag Figure 5.4: W -plots show a generalized extreme value distribution fitting is appropriate. 33

146 Z values vs.time Pfizer Data W values vs. time QQ plot for Z QQ plot for W Correlation plot for Z Lag Correlation plot for W Lag Figure 5.5: W -plots show a generalized extreme value distribution fitting is appropriate. 34

147 0.06 Estimated Conditional Standard Deviation Using GARCH(,) GE data CITI Bank data Pfizer data Figure 5.6: Estimated volatility 35

148 0 Negative Daily Return Divided by Estimated Standard Deviation, Neg Log Return /04/82 09/30/84 06/27/87 03/23/90 2/7/92 09/3/95 06/09/98 03/05/0 GE 0 Neg Log Return /04/82 09/30/84 06/27/87 03/23/90 2/7/92 09/3/95 06/09/98 03/05/0 CITI 0 Neg Log Return /04/82 09/30/84 06/27/87 03/23/90 2/7/92 09/3/95 06/09/98 03/05/0 PFIZER Figure 5.7: Standardized series 36

149 Z values vs.time GE Standardized Data W values vs. time QQ plot for Z QQ plot for W Correlation plot for Z Correlation plot for W Lag Lag Figure 5.8: W -plots show a generalized extreme value distribution fitting is appropriate. Some caution should be given since a few points, partly the result of Oct87 crash, are away from the straight line. 37

150 Z values vs.time CITI Bank Standardized Data W values vs. time QQ plot for Z QQ plot for W Correlation plot for Z Lag Correlation plot for W Lag Figure 5.9: W -plots show a generalized extreme value distribution fitting is appropriate. 38

151 Z values vs.time Pfizer Standardized Data W values vs. time QQ plot for Z QQ plot for W Correlation plot for Z Correlation plot for W Lag Lag Figure 5.0: W -plots show a generalized extreme value distribution fitting is appropriate. 39

152 Neg Daily Returns on Frechet Scale (a) GE Day (b) CITIBank Day (c) Pfizer Day Figure 5.: The negative returns after transformed into unit Fréchet scale. 40

153 two consecutive daily drops. We also count the days of two day range in which one stock price dropped in the first day, the other stock price dropped in the second day. We also calculate the expected counts under the assumption of independent. We observe throughout that the observed values are larger than the expected, and therefore we conclude that the variables are dependent. The choice of M 4 processes to model the dependence may be appropriate Model selection and parameter estimations All the figures suggest an M4 process fitting may be a good choice for financial time series data with multivariate temporal dependence. Figure 5. and 5.2 suggest that a model of time dependance range of order 2 or and at least 3 signature patterns. Some of these patterns have order of 2, corresponding to drops happened in two consecutive days, and one has order of, which corresponds to a single drop. Figure 5.2 shows that the strong dependence appears in the same day between series and in two consecutive days within each series. We now use the following model to fit the data. Y id = max(a,,d Z,i, a,0,d Z,i, a L,,d Z L,i, a L,0,d Z L,i, a L,0,d Z L,i ) (5.) But we need to determine the number of signature patterns L. Define Q (x) = Pr(u + x > Y i+ > u, Y i > u) Q 2 (x) = Pr(u + x > Y i+ > u, Y i < u) A (x) = (u, + ) (u, u + x) A 2 (x) = (0, u) (u, u + x) X Aj (x) = n I Aj (x)(y i, Y i+ ), j =, 2. n i= We extend Kolmogorov and Smirnov s distance into the following form err l = sup[ Q (x) X A (x), Q 2 (x) X A2 (x) ]} x>0 4

154 CITI day (22, 27) CITI day (22, 96) CITI day (22, 3) Pfizer day Pfizer day GE day GE day (23, 7) GE day (2, 20) 0 Pfizer day 0 Pfizer day GE day (23, 87) GE day (2, 77) CITI day 0 CITI day (24, 33) (20, 29) CITI day Pfizer day Pfizer day Pfizer day GE day (23, 28) GE day (2, 20) CITI day (23, 33) GE day CITI day Pfizer day 0 Figure 5.2: (i) All plots are based on Fréchet transformed exceedances of a high threshold based on negative log returns ( so they represent price drops, not price rises); (ii) the purpose of the plots is to look for dependence among neighboring values; (iii) the numbers in parentheses show expected and observed numbers of simultaneous exceedances by the two variables, where expected is calculated on an independence assumption. 42

155 Where Q (x) and Q 2 (x) are calculated under estimated parameter values. The idea here is that we use the given information up to today and compute the probability of the event that the return falls into a certain range next day. Instead of computing conditional probability, we compute joint probability and joint empirical probability and compare the approximation errors. For l = 3, 4, 5,..., we compute err l at discrete points in interval [u, 2u]. Figure 5.3 shows the trends of err l when l is increasing. We can see when l = 5 the curve reaches stability for CITI data. And similarly l = 6 for GE and Pfizer data GE CITI Pfizer Figure 5.3: Number of signature patterns L vs Err plot. We now fit three transformed time series data using the following model. Y id = max(a,,d Z,i, a,0,d Z,i, a 5,,d Z 5,i, a 5,0,d Z 5,i, a 6,,d Z 6,i, a 6,0,d Z 6,i, a 9,,d Z 9,i, a 9,0,d Z 9,i, a 0,,d Z 0,i, a 0,0,d Z 0,i, a 4,,d Z 4,i, a 4,0,d Z 4,i, a 5,0,d Z 5,i ) (5.2) where a l,k,d = 0, l =,..., 5, d = 2, 3, a l,k,d = 0, l = 6,..., 9, d =, 3, a l,k,d = 0, l = 0,..., 4, d =, 2. We have considered here to treat those drops in two consecutive days as independent processes and single drops as dependent processes. We can apply more complex structure and use the arguments about how to model 43

156 inter-series dependence discussed in section 3.3. But we adopt a relatively simple model to illustrate M 4 process modeling to financial time series data here. Table is constructed based on the probabilistic properties of M 4 processes. The returns above certain threshold in two consecutive days are clustered into 4 or 5 groups. Initial estimates are obtained by taking average of those points within each group. Then we can solve the system of nonlinear equations or minimize a weighted least squares functions. What we actually do here is to use the initial estimates as the base of selecting evaluation points x, x 2,, x m used in (3.) and (3.2). Based on the initial estimates, we compute the adjacent parameter ratios. For each ratio value, say r i, we let x j =.95 and x j+ =.25r i. The value of m is equal to twice of the number of those ratios. The numbers.95 and.25 are arbitrary. Other numbers can be used as long as one is less than and another one is larger than. Intuitively we choose the numbers as big as possible because the nature of function b(x), which has less variability for large x, but we need to have two points between two adjacent ratios. We don t have a criterion to guide the choice of x j s at this time. An optimization problem based on a s and x j s may be useful, but we will not pursue this in the current work. After the x j s have been chosen we solve (3.6) under a constraint that the matrix formed in the left hand side of (3.6) is the same as the one when the initial estimates are used. Using the estimated parameter values, we can compute asymptotic covariance matrix. Since we only have about 2200 observed drops but many parameters, the computation of the asymptotic covariance matrix of the joint distribution of the estimates is not efficient, i.e. the standard deviations of the estimators are not less than as they are supposed to be. The results in Table are simulation results based on 00 replications of sample size

157 Table 5.: Estimations of Parameters Param. GE CITI Pfizer a, /a, / std / a 2, /a 2, / std / a 3, /a 3, / std / a 4, /a 4, / 0.02 std / a 5, /a 5, / std 0.00 / a 6, /a 6, / std / a 7, /a 7, / std / a 8, /a 8, / std / a 9, /a 9, / std / a 0, /a 0, / std / a, /a, / std / a 2, /a 2, / std / a 3, /a 3, / std / a 4, /a 4, / std / a 5, std

158 Simulated Neg Daily Returns on Frechet Scale (a) GE Day (b) CITIBank Day (c) Pfizer Day Figure 5.4: Simulated time series 46

159 5.6 VaR calculation and portfolio optimization So far we have considered modeling the three time series to M4 processes. We now turn to do VaR calculation and portfolio optimization. For comparison, we will use variance-covariance approach and extreme value approach Using variance-covariance approach We now compute the VaR of a portfolio containing three stock products, GE, CITI and Pfizer. At time t the returns are R t, R 2t, R 3t. We write R t = (R t, R 2t, R 3t ). R t is assumed to be jointly normally distributed, i.e. R t N(µ, Σ). We can estimate the covariance matrix Σ by the sample covariance matrix Σ = For the standardized time series we have the estimates Σ = The following figures plot calculated VaRs for different investment combinations, using original time series, standardized time series. We also plot the VaR versus Expected returns. In all plots, each point represents a portfolio investment plan or combination. For example, one point may represent (.25,.35,.4) which means 25% of money invested in GE, 35% of money invested in CITI and 40% of money invested in Pfizer. Figures 5.5 and 5.6 are based on the original data. Figures 5.7 and 5.8 are based on the standardized data. Both figures 5.5 and 5.7 show risk diversification. In all figures, a green circle means investing all money on GE stock, a red cross means investing all money on CITI stock, and a blue diamond means investing all money on Pfizer stock. A green dot, a red dot or a blue dot means investing 50% or more of total money on stock GE, CITI or Pfizer respectively. A black dot means no individual stock received more than 50% of total money. From figures 5.6 and 5.8 we can optimize the portfolio with highest expected return for a given level of risk. It is the point of the upper curve when the risk is given. 47

160 VaR calculation using original time series data (,000,000$) x VaR in dollars Figure 5.5: VaR for a portfolio of 3 stock products 48

161 850 Expected return in dollars GE CITI Pfizer 800 Return VaR x 0 4 Figure 5.6: Expected returns for a portfolio of 3 stock products 49

162 VaR calculation using standardized time series data(,000,000$) x VaR in dollars Figure 5.7: VaR for a portfolio of 3 stock products, standardized data 50

163 Expected return in dollars GE CITI Pfizer Return VaR x 0 4 Figure 5.8: Expected returns for a portfolio of 3 stock products, standardized data 5

164 5.6.2 Using M 4 process approach The multivariate normal distribution assumption makes most statistical computations easier. But it may give inaccurate results if it doesn t fit the data and result in a wrong decision. In risk measurement it may under estimate the risk since Normal distribution is best fitting the centralized data. We now introduce a new method based on the M4 process modeling and use it to compute VaR and optimize a portfolio with the VaR constrains. Suppose c, c 2, c 3 are proportions of stock products in a portfolio, V ar p is the VaR of the portfolio return c Y + c 2 Y 2 + c 3 Y 3 calculated from P (c Y + c 2 Y 2 + c 3 Y 3 > V ar p ) < α for given level of confidence α. Due to the complex transformation to the original data, it s not possible to compute the maximal possible loss based on V ar p since each individual stock behaves differently. One way is to model a univariate time series c Y i + c 2 Y 2i + c 3 Y 3i in M4 process. But this is not practically applicable since the investments change all the time. We propose the following procedure to determine the V ar p = d and individual risk factors V ar = d, V ar 2 = d 2 and V ar 3 = d 3 simultaneously. max P {c Y d, c 2 Y 2 d 2, c 3 Y 3 d 3 } d >0,d 2 >0,d 3 >0 s.t. P (c Y + c 2 Y 2 + c 3 Y 3 > d) < α (5.3) d = d + d 2 + d 3 The constraints are very natural since they are the definitions of VaR of a portfolio. The objective is thought to have the highest probability for all individual risk factors beyond certain values when the portfolio is at the VaR. Figure 5.9 draws the VaR versus the different combinations. Contrast to Figure 5.5 and 5.7, it has same trend as the other two have. But it doesn t show smoothed features. Figure 5.20 plots the VaR and expected returns. It is very different from Figures 5.6, 5.8. The maximal VaR of candidate portfolios under normal assumption is less than the minimal VaR of same candidate portfolios calculated by extreme value approach. In extreme value model, when the VaR is beyond certain value, the expected highest return is decreasing when the risk is increasing. From Figure 5.8, one see that Pfizer gives highest return if invest all money to stock Pfizer. Stock CITI gives highest risk. Stock GE has lowest return. From Figure 5.20, the extreme portfolios ( invest all, or almost all, money to one stock ) behave similarly. But the overall structure is different. 52

165 VaR calculation using M4 process x Figure 5.9: VaR for a portfolio of 3 stock products 53

166 Expected return in dollars GE CITI Pfizer Return VaR x 0 4 Figure 5.20: Portfolio Optimization Using Extreme Value Theory 54

167 5.6.3 Historical simulation approach As we have seen that variance-covariance approach gives low VaR, while M 4 process approach gives high VaR. We just can not simply tell which method is better. The goal of this section is to distinguish them more through the results from historical simulation approach. The variance-covariance approach may be useful in routine risk management, while the extreme value approach should be the method used to extreme risk management. All VaR calculations in previous sections are unconditional, which means we didn t use the price move history. The variance-covariance approach can not be used to calculate the conditional VaR since they are under the independent assumption, but the extreme value approach can calculate the conditional VaR. Further comparisons between two approaches can be done using historical simulation approach which gives VaRs between VaRs obtained from the variance-covariance approach and the extreme value approach. In Figure 5.2, we use those historical data when all three stocks had price drops simultaneously. As you can see the VaRs in Figure 5.2 are higher than those in Figure 5.6, but lower than those in Figure If we use thresholds to historical simulation approach, we can see the VaRs move forward to the right, the VaRs in Figure 5.22 are very close to those, some are even higher than, in figure Since the historical simulation approach doesn t model dependence structure and is very difficult to calculate the conditional VaRs, the extreme value approach may be a better method, especially when considering extreme risk management. 55

168 Expected return in dollars GE CITI Pfizer Return VaR x 0 4 Figure 5.2: Portfolio Optimization Using historical simulation approach 56

169 Expected return in dollars GE CITI Pfizer Return VaR x 0 4 Figure 5.22: Portfolio Optimization Using historical simulation approach 57

170 Chapter 6 Summary 6. General discussion The methods described here represent completely new approaches to the modeling of financial time series data. The main goal here is to propose an approach which can efficiently model multivariate time series which are both inter-serially and temporally dependent. In order to achieve that goal, we have extended and proved some probabilistic properties of M 4 processes. Then we have proposed estimating procedures when the ratios can be determined using probabilistic approach. We have also proposed a practically applicable method of M 4 processes modeling. The consistency and asymptotic properties have been proved. We have also proposed a VaR calculation method based on M4 process modeling. The main theorems proved are Theorem 2.3, Theorem 2.4, Theorem 2.6, Theorem 3.7, Theorem 3.8, Proposition 3.2, Proposition 3.3, Theorem 3.5. The results obtained can be used in many ways. For example, they can be used to compute VaR or to optimize the portfolio under VaR constraints and given information or historical data. Studies have shown financial data are fat tailed. They are not normally distributed. Compare with traditional assumption of normality of underlying distribution. These results provide more information to risk managers who may be most interested in the situation when an extreme price movement occurs what kind of risk the company is exposed to. The methods described can be used to other fields, such as modeling insurance data, environment data etc. It may be possible to propose some variants of proposed estimators and to reduce the conditions imposed on the parameters. The choice of the points around the jump points and the selection of model need some further work.

171 6.2 Directions of future research In this section, I list some research directions under extreme value settings. Some extreme theorist believe that multivariate extreme value study still has a long way to go. Besides M4 processes, it is worth to explore other dependence structures which have multivariate extreme value distribution representation. It may be worth to study modeling time series data through Markov process and extreme events, or Bernoulli jumps and extreme process, or Poisson jumps and extreme process. In a short term, it is worth to study the models: Y id = max max a lkd Z l,i k + N(0, σ 2 ), d =,, D, (6.) l L K k K 2 where L K2 l= k= K a lkd = for d =,, D. Under the model we may be able to reduce the number of signature patterns and get more efficient estimates and their standard deviations. Y id = max max a lkd Z l,i k N(0, σ 2 ), d =,, D, (6.2) l L K k K 2 where L K2 l= k= K a lkd = for d =,, D. Or other variants under which we can study both positive and negative returns and allow model to include short selling. It increases the flexibility and may be more practically useful. Estimation based on regression over threshold is worth to look into. It may be worth to look into multivariate max-stable ARMA(p, q) processes for its natural link to ARMA(p, q) process. It may be worth to study general volatility modeling in an extreme value settings. 59

172 Bibliography [] Arnold, S. F. (990)Mathematical Statistics. Prentice Hall, Englewood Cliffs, New Jersey. [2] Barnard, George A. (965) The use of the likelihood function in statistical practice. (967) Proc. Fifth Berkeley Sympos. Math. Statist. and Probability (Berkeley, Calif., 965/66) Vol. I: Statistics, pp Univ. California Press, Berkeley, Calif. [3] Best, P. (998) Implementing value at risk. John Wiley & Sons, England. [4] Billingsley, P. (995) Probability and Measure, 3rd edition, John & Wiley, New York. [5] Bollerslev, T. (986) Generalized autoregressive conditional heteroscedasticity. J. Econometrics 3, [6] Cambanis, S. and Leadbetter, M.R. (994) Measure and Probability, Lecture Notes, Department of Statistics at the University Of North Carolina. [7] Cartwright, D. E. (958) On estimating the mean energy of sea waves from the highest waves in a record. Proc. Roy. Soc. London. Ser. A 247, [8] Chen, X. R. (98) Introduction to Mathematical Statistics. (in Chinese), Science Press, Beijing. [9] Coles, S. G. and Tawn, J. A. (99) Modelling Extreme Multivariate Events. J. R. Statist. Soc. B, 53,pp [0] Coles, S. G. and Tawn, J. A. (994) Statistical Methods for Multivariate Extrems: an Application to Structural Design(with discussion). Appl. Statist. 43, No., pp. -48 [] Dave, R. D. and Stahl, G. (999) On the Accuracy of VaR Estimates Based on the Variance-Covariance Approach. [2] Davis, R. A. and Resnick, S. I. (989) Basic Properties and Prediction of Max- ARMA Processes. Adv. Appl. Prob., 2, [3] Davis, R. A. and Resnick, S. I. (993) Prediction of Stationary Max-Stable Processes. Ann. Appl. Prob., 3, no 2,

173 [4] Davison, A. C. and Smith, R. L. (990) Models for Exceedances over High Thresholds. J. R. Statist. Soc. B, 52, No. 3, pp [5] Deheuvels, P. (983) Point Processes and Multivariate Extreme Values. Jour. of Multi. Anal. 3, pp [6] Dowd, K. (998) Beyond value at risk. John Wiley & Sons, England. [7] Embrechts, P. (999) Extreme Value Theory in Finance and Insurance. [8] Embrechts, P. (999) Extreme Value Theory: Potential and Limitations as an Integrated Risk Management Tool. [9] Embrechts, P., de Haan, L. and Huang, X. (2000) Modelling Multivariate Extrems. Extremes and Integrated Risk Management, pp (Edited by Paul Embrechts), UBS Warburg. [20] Embrechts, P., Klüppelberg, C. and Mikosch, T. (997) Modelling Extremal Events for Insurance and Finance, Springer [2] Fisher, R. A. and Tippett, L.H.C. (928) Limiting forms of frequency distribution of the largest or smallest member of a sample. Proc. Cambridge Philos. Soc., 24, [22] Galambos, J. (987) Asymptotic theory of extreme order statistics, 2nd edition. Krieger, Malabar, Florida. [23] Giesbrecht, F. and Kempthorne, O. (976) Maximum Likelihood Estimation in the Three-parameter Lognormal Distribution. J. R. Statist. Soc. B 38, [24] Gnedenko, B. (943) Sur la distribution limite du terme maximum d une série aléatoire. (French) Ann. of Math. (2) 44, [25] de Haan, L. (984) A Spectral Representation for Max-Stable Processes. Ann. of Prob. Vol 2, No. 4, [26] de Haan, L. (985) Extremes in higher dimensions: the model and some statistics. In Proc. 45th Sess. Int. Statist. Inst., paper The Hague: International Statistical Institute. [27] de Haan, L. and Resnick, S. I. (977) Limit Theory for Multivariate Sample Extremes. Z. Wahrscheinlichkeitstheorie verw. Gebiete, 40, pp [28] Hall, P., Peng, L. and Yao. Q. (200) Moving-maximum models for extrema of time series. Manuscript. [29] Hsing, T. (993) Extremal index estimation for a weakly dependent stationary sequence. Ann. Statist. 2, no. 4,

174 [30] Joe, H. (989) Families of min-stable multivariate exponential and multivariate extreme value distributions. Statist. Probab. Lett. 9, [3] Jorion, P. (996) Value at Risk., Irwin [32] Kempthorne, O. (966) Some Aspects of Experimental Inferences. J. Am. Statist. Ass. 6, -34 [33] Korn, R. (997) Optimal Portfolio, World Scientific, Singapore. [34] Leadbetter, M. R. (983) Extremes and local dependence in stationary sequences. Z. Wahrsch. Verw. Gebiete 65, no. 2, [35] Leadbetter, M. R., Lindgren, G. and Rootzén, H. (983) Extremes and related properties of random sequences and processes. Springer, Berlin. [36] Leadbetter, M.R., Weissman, I., de Haan, L. and Rootzén, H. (989) On clustering of high values in statistically stationary series. In: Proc. 4th Int. Meet. Statistical Climatology (ed: J. Sanson). Wellington: New Zealand Meteorological Service. [37] Loynes, R. M. (965) Extreme values in uniformly mixing staionary stochastic processes. Ann. Math. Statist. 36, [38] Mardia, K. V. (970) Families of bivariate distributions. Griffin s Statistical Monographs and Courses, No. 27. Hafner Publishing Co., Darien, Conn. [39] Markowitz, H. (952) Portfolio selection. J. of Finance, 7, [40] Markowitz, H. (987) Mean-varaince analysis in portfolio choice and capital markets. Cambridge, MA: Balckwell. [4] Marshall, A.W., Olkin, I. (983) Domains of attraction of multivariate extreme value distributions. Ann. Probab., no., [42] McFadden, D. (978) Modelling the choice of residential location. In Spatial Interaction Theory and Planning Models( eds A. Karlqvist, L. Lundquist, F. Snickers and J. Weibull), pp Amsterdam: North-Holland. [43] Michaud, R. O., (998) Efficient asset management. Harvard Businees School Press, Boston. [44] Nandagopalan, S. (990) Multivariate extremes and the estimation of the extremal index. Ph.D. dissertation, Dept. Of Statistics, University of North Carolina, Chapel Hill. [45] Nandagopalan, S. (994) On the multivariate extremal index. J. of Research, National Inst. of Standards and Technology. 99,

175 [46] Newell, G. F. (964) Asymptotic extremes for m-dependent random variables. Ann. Math. Statist. 35, [47] O Brien, G. L. (974) Limit theorems for the maximum term of a stationary process. Ann. Probab [48] O Brien, G. L. (987) Extreme Values for Stationary and Markov Sequences. Ann. of Prob. Vol 5, No., [49] Pickands, J. III (975) Statistical Inference Using Extreme Order Statistics. The Ann. of Statis. 3, No, 9-3 [50] Pickands, J. III (98) Multivariate Extreme Value distributions. Proc. 43rd Session I.S.I. (Buenos Aires), [5] Resnick, S. I. (987) Extreme Values, Regular Variation, and Point Processes. Springer, New York. [52] RiskMetrics (996) Technical Document, the fourth edition. [53] Robinson, M. E. and Tawn, J. A. (995) Statistics for Exceptional Athletics Records. Appl. Statist. 44, No. 4, pp [54] Sigma (996) Natural catastophes and major losses in 995: decrease compound to previous year but continually high level of losses since 989. Sigma publication No 2, Swiss Re. Zürich. [55] Smith, R. L. (986) Extreme value theory based on the r largest annual events. J. Hydrology, 86, [56] Smith, R. L. (990) Extreme Value Theory. Handbook of Applicable Mathematics (Supplement), W. Ledermann (ed.) John Wiley, Chichester. [57] Smith, R. L. (997) Statistics for Exceptional Athletics Records. Appl. Statist. 46, No., pp [58] Smith, R. L. (2000) Measuring Risk with Extreme Value Theory. Extremes and Integrated Risk Management, pp (Edited by Paul Embrechts), UBS Warburg. [59] Smith, R. L. and Goodman, D. J. (2000) Bayesian Risk Ananlysis. Extremes and Integrated Risk Management, pp (Edited by Paul Embrechts), UBS Warburg. [60] Smith, R. L., and Shively, T. S. (995) Point Process Approach to Modeling Trends in Troposphereic Ozone Based on Exceedances of A High Threshold.Atmospheric Environment, 29, No. 23, pp

176 [6] Smith, R. L., and Weissman, I. (994) Estimating the extremal index. J. Royal Statist. Soc. B. 56, [62] Smith, R. L., and Weissman, I. (996) Characterization and Estimation of the Multivariate Extremal Index. [63] Smith, R. L., and Weissman, I. (997) On the Estimation of Max-Stable Processes. [64] Tawn, J. A. (988) An extreme-value theory model for dependent observations. J. Hydrology, 0, [65] Tawn, J. A. (990) Modelling multivariate extreme value distribution. Biometrika, 77, [66] Tsay, R. S. (999) Extreme Value Analysis of Financial Data [67] Weissman, I. (978) Estimation of parameters and large quantiles based on the k largest observations. J. Amer. Statist. Assoc. 73, no. 364, [68] Weissman, I. (994) On the extremal index of stationary sequences. Technical report, presented at the Conference on Multivariate Extreme Value Estimation with Applications to Economics and Finance, Erasmus University, Rotterdam. 64