Estimation of non-stationary GEV model parameters

Transcription

1 Estimation of non-stationary GEV model parameters S. El-Adlouni, T. Ouarda & X. Zhang, R. Roy & B. Bobée Extreme Value Analysis August 2005 Statistical Hydrology Chair (INRS-ETE) 1

2 Outline Problem definition Objectives General Extreme Value Distribution Non-stationary GEV model Parameter estimation Simulation based comparison of estimation methods Case study Conclusions 2

3 Position of the problem In frequency analysis, data must generally be independent and identically distributed (i.i.d) which implies that they must meet the statistical criteria of independence, stationarity and homogeneity In reality, the probability distribution of extreme events can change with time Need to develop frequency analysis models which can handle various types of non-stationarity (trends, jumps, etc.) 3

4 Objectives of the study Develop tools for frequency analysis in a non-stationary framework Include the potential impacts of climate change Explore the case of trends or dependence on covariables Bayesian framework 4

5 GEV distribution Y is GEV (Generalised Extreme Value) distributed if : FGEV y y α κ ( ) ( ) 1/ = exp 1 µ κ κ 1 ( y µ ) > 0 α ( ), ( > 0 ) et ( ) µ α κ are respectively the location, scale and shape parameters. 5

6 Non-stationary GEV model Non-stationary framework: ( ) Y ~,, t t t t GEV µ α κ Parameters are function of time or other covariates. 6

7 Non-stationary GEV model Illustration of the two types of non-stationarity 2000 Tendance 2000 Changement d'échelle

8 Non-stationary GEV model I Xt GEV µ 0, ακ, ( ) Classic model : all parameters are constant. II ( µ = β + β α κ) X ~ GEV Y,, t 1 t 1 2 t The location parameter is a linear function of a covariable. The other two parameters are constant. III ( 2 µ = β + β + β α κ) Xt GEV2 t 1 2Yt 3 Yt,, The location parameter is a quadratic function of a temporal covariable. The other two parameters are constant. 8

9 Parameter Estimation 1. Maximum likelihood method (ML) 2. Bayesian model (Bayes) 3. Generalised maximum likelihood method (GML) 9

10 Maximum Likelihood Method Properties of ML estimators Under some regularity conditions, ML estimators have the desired optimality properties. These regularity conditions are not met when the shape parameter is different from 0, since the support of the distribution depends on parameters (Smith 1985). For small samples, the numerical resolution of the ML system can generate parameter estimators that are physically impossible and leads to very high quantile estimator variances. 10

11 Bayesian estimation : prior distribution Prior distribution of parameter vector θ = ( µακ,, ) Fisher information matrix Iij ( θ ) = E Jeffrey s information prior ( y θ ) 2 ln f θ θ i j J θ = I θ ( ) ( ) 1 2 For the GEV distribution, the Fisher information matrix is given by Jenkinson (1969) 11

12 Bayesian estimation GEV 0 model In the absence of any additional information about the parameters (regional information, historic, expert opinion, etc.), we consider the Jeffrey s noninformative prior. π 0 µακ,, = J µακ,, ( ) ( ) 12

13 Bayesian estimation GEV 1 model π β, β, α, κ = J β, α, κ p β ( ) ( ) ( ) With a vague prior for the parameter β 2 p ( β ) ( 2 ) 2 N 0, σ = and σ =

14 Bayesian estimation GEV 2 model π β, β, β, α, κ = J β, α, κ p β p β ( ) ( ) ( ) ( ) β 2 β with pβ β σ 2 ( ) ( 2 ) 2 = N 0, ( ) ( 2 ) 1 pβ β3 = N 0, σ 2 3 and σ1 = σ 2 =

15 Generalised Maximum Likelihood Method Stationary case The GML method is based on the same principle than the ML method with an additional constraint on the shape parameter to restrain its domain. Martins & Stedinger (2000) presented the GML approach for the case GEV0 with a Beta [-0.5;0.5] prior distribution for the shape parameter : π κ = Beta u = 6, v = 9 κ ( ) ( ) 15

16 Generalised Maximum Likelihood Method 4 Beta pdf on [ ] prior distribution function of the shape parameter E [ κ] = 0.1 and Var[ κ] = ( 0.122) 2 16

17 Generalised Maximum Likelihood Method Non-stationary case The GML method can be generalized to the non-stationary case: 1. Adopt the same prior distribution for the shape parameter, 2. Solve the equation system obtained by the ML method under this constraint. The GML parameter estimators are the solution of the following optimisation problem : ( x θ ) max Ln ; θ κ Beta u (,v) 17

18 Generalised Maximum Likelihood Method Non-stationary case The solution of the optimisation problem is equivalent to the maximisation of the posterior distribution of the parameters conditionally to the data : ( ) ( ) ( ) π θ θ π κ x Ln x κ The GML estimator of the parameter vector is the mode of the posterior distribution. 18

19 Parameter and quantile estimation ML : numerical solution : Newton-Raphson method. GML & Bayes : Monte-Carlo Markov-Chain methods (MCMC) The GML estimator corresponds to the mode of the posterior distribution, The Bayesian estimator corresponds to the posterior mean. 19

20 Parameter and quantile estimation MCMC method adopted For the GML and Bayesian method, the posterior distribution is simulated with the Metropolis-Hastings (M- H) algorithm (Gilks et al. 1996). Chain size and burn-in period Several techniques allow to check convergence of generated Markov Chain to the stationary distribution (El Adlouni et al. 2005). For all cases presented in this work, the convergence of the MCMC methods is obtained with a chain size of N=15000 and with a burn-in period of N0=

21 21

22 Parameter and quantile estimation Quantile estimation Aside from parameter estimators, the MCMC algorithm iterations allow to obtain the conditional distribution of quantiles given an observed value y 0 of the covariate Y t. For each iteration of the MCMC algorithm i=1,,n we ( i) compute the quantile x py, corresponding to a non-exceedance 0 probability p ( i) () i ( () i x ) ( () ( )) p, y 1 log 0 y p κ () i α = µ + 0 i κ Conditional on the value y 0 22

23 Parameter and quantile estimation Quantile estimation (cont.) ( i) µ Is the location parameter conditional to a particular y 0 value y 0 of the covariate Y t. () i ( i) µ = µ y 0 For the GEV0 model () i () i ( i) µ = β + β y For the GEV1 model y ( i) ( i) ( i) ( i) 2 µ y = β1 + β2 y0 + β3 y0 For the GEV2 model 0 23

24 Simulation based comparison The three estimation methods are compared, for all three models, using Monte Carlo simulations. The covariate Y t represents time Y t =t The following values of the shape parameter are considered: κ = 0.1, κ = 0.2 et κ =

25 Simulation based comparison Methodology Performance criteria are the bias and the RMSE of quantile estimates for different non-exceedance probabilities : p = 0.5, 0.8, 0.9, 0.99 and Obtained for R=1000 samples of size n=50. 25

26 Simulation based comparison Bias and RMSE of quantile estimates for the ML, GML and Bayesian approach and for model GEV0 GEV0 Bias RMSE κ = 0.1 κ = 0.2 κ = 0.3 p ML GML Bayes ML GML Bayes

29 Simulation based comparison Results For the ML method : 1. GEV0 & GEV1 models: large RMSE is caused by the high variance. 2. GEV2 model: reduction of the variance for extreme quantiles. Bayesian approach: positive Bias and low RMSE for all quantiles. 29

30 Simulation based comparison Results GML method 1. Best bias performance for all three models. 2. Generally low RMSE. 3. Negative Bias for high skewness since the prior is centered on -0.1 Conclusion The GML method leads to best results for all three models. 30

31 Case study Location of the Randsburg station in California Code : Station Name : Randsburg Latitude : Longitude : Period : Size of the series : n=51 31

32 Case study 90 Station Randsburg Prec. Max. Ann. (mm) Année Annual Max rainfall series at Randsburg 32

33 Case study: characteristics Basic statistics Max. Ann. Prec. at Randsburg Number of observations 51 Minimum 3.00 Maximum 87.0 Mean 27.2 St. dev Median 25.0 Coefficient of variation (Cv) Coefficient of skewness (Cs) 1.14 Coefficient of kurtosis (Ck) 4.52 Parameter estimators for the GEV (ML) µ = α = κ =

34 Case study: ML fitting Fitting of the GEV model / Classic ML 34

35 Case study : Correlation with SOI (mm) Pluie Max. Ann. SOI SOI Annual Max rainfall series and SOI index at Randsburg 35

36 Case study : Correlation with SOI Observed Annual Max rainfall and corresponding SOI value 36

37 Case study: linear dependence model GML method: Outputs of the MCMC algorithm MCMC algorithm iterations for the estimation of the GEV0 model parameters with the GML method. 37

38 Case study: linear dependence model GML method: Outputs of the MCMC algorithm Histogram of the GEV0 parameters posterior distribution, obtained with the last N-N0 iterations of the MCMC algorithm. The GML estimator corresponds to the mode of the posterior distribution for each parameter. 38

39 Case study : model comparison Maximized log-likelihood function and GML parameter estimators for each model l n * β 1 β 2 β 3 α κ GEV GEV GEV

40 2 χ ν Case study : model comparison A simple approach to compare the validity of two models M 1 and M 0 such that M is to use the deviance statistic 0 M1 defined as: * * D = l M l M ( ) ( ) { } n n l n* is the maximized log-likelihood function for each model The statistic is Chi-squared distributed. The parameter of the Chi-deux distribution is the difference of the number of parameters of the two models M 1 et M 0. 40

41 Case study : model comparison Comparison of models GEV0 and GEV1 indicates that the difference is significant, since the statistic value is D = 6.2 D is considered significant at the confidance level 95% ( 2 ) 1 Pr χ 6.2 = The GEV1 model provides a better representation of data variability than model GEV0. 41

42 Case study : model comparison Comparison of models GEV1 and GEV2 leads to a statistic value of D = 4.3 with ( 2 ) 1 Pr χ 4.3 = The difference between the two models is hence significant at the 95% level. The GEV2 model is the most adequate to represent the dependence between the location parameter of the GEV and the SOI index. 42

43 Case study : Correlation with SOI GEV0 and GEV1 Median estimators conditional to SOI values. 43

44 Case study : Correlation with SOI GEV0 AND GEV2 Median estimators conditional to SOI values. 44

45 Case study : Correlation with SOI GEV0, GEV1 and GEV2 Median estimators conditional to particular SOI values. SOI = SOI = 0.04 SOI = 2.04 GEV 0 24 (21-28) GEV 1 54 (51-58) GEV 2 77 (72-82) 24 (21-28) 23 (19-27) 21 (18-24) 24 (21-28) 4 (0.5-7) 17 (15-22) 45

46 Case study : Correlation with SOI Results The difference is more important for lower SOI values which correspond to higher extremes of annual max. precipitations. GEV2 median estimate can be 3 times higher than GEV0 estimate. The linear trend model (GEV1) under-estimates the median for the higher SOI values. 46

47 Conclusions Advantages of the non-stationary GEV model in describing the variance. No assumption of normality as in classical models. Generalisation of the GML model to the non-stationary case provides efficient estimators for real hydrometeorological data series. Use of MCMC methods allows a robust solution to the likelihood equation system and leads to estimates of credibility intervals for parameters and quantiles. 47

48 References Coles G. S. (2001). An Introduction to Statistical Modeling of Extreme Values, Springer, 208 p. El Adlouni, S., Favre, A-C. and Bobée, B. (2005). Comparison of methodologies to assess the convergence of Markov Chain Monte- Carlo Methods. Computational Statistics and Data Analysis (Under Press). Hastings, W. (1970). Monte Carlo sampling methods using Markov Chains and their applications. Biometrika, Martins, E. S., J. R. Stedinger (2000). Generalized Maximum Likelihood GEV Quantile Estimators for Hydrologic Data. Water Resources Research, vol.36., no.(3)., pp Scarf, P.A. (1992). Estimation for a four parameter generalized extreme value distribution, Comm. Stat. Theor. Meth., 21, Smith, R. L. (1985). Maximum likelihood estimation in a class of non-regular cases. Biometrika 72,