Extreme-Value Analysis of Corrosion Data

Size: px

Start display at page:

Download "Extreme-Value Analysis of Corrosion Data"

Candace Grant
9 years ago
Views:

1 July 16, 2007 Supervisors: Prof. Dr. Ir. Jan M. van Noortwijk MSc Ir. Sebastian Kuniewski Dr. Marco Giannitrapani

2 Outline Motivation

3 Motivation in the oil industry, hundreds of kilometres of pipes and other equipment can be affected by corrosion it is extreme defect depth/wall loss that influences the system reliability extreme-value methods are sensible for application only part of the system can be subjected to inspection results extrapolation is needed

influences the system reliability extreme-value methods are sensible for

4 Motivation defects/wall losses caused by corrosion are likely to be locally dependent independence assumption questionable

5 present statistical methods to model extremes of corrosion data, taking into account local defect dependence spatial extrapolation of the results examples of application framework/guideline for modelling extreme-values of corrosion

dependence spatial extrapolation of the results examples

6 The Generalised Extreme-Value (GEV) distribution (block maxima data) G(z) = ( ) h z µ i 1 ξ exp 1 + ξ, ξ 0 σ + n h io exp exp, ξ = 0, z µ σ

7 The Generalised-Pareto (GP) distribution (excess over threshold data) Y i = X i u, for X > u, i = 1,...,n u 4 Threshold exceedances x i 3.5 u H(y) =» ξȳ σ 1 exp ȳ σ 1/ξ, ξ 0 +, ξ = i

..,n u 4 Threshold exceedances x i 3.5 u 2.5 2 1.

8 (extrapolation-gev) return-level method G(z p ) = 1 p Pr{M > z p } = p = 1 N implied distribution of the maximum corresponding to the not inspected area Pr{X N z} = G N (z) = G(z) N

9 (extrapolation-gp) based on Poisson frequency of threshold exceedances (Poisson-GP model) return-level method H(y p ) = 1 p Pr{Y > y p } = p = 1 N E N E = λ GEV ( S k A ) - expected number of exceedances on the not inspected area

10 (extrapolation-gp) implied distribution of the maximum corresponding to the not inspected area ( Pr{X N x} = exp { λ GEV 1 + ξ x u ) } 1/ξ N σ +

11 -summary two methods for statistical inference about extreme-values of corrosion - the GEV distribution (block maxima) - the GP distribution (excess over threshold data) two methods for spatial results extrapolation - return-level - distribution of the maximum corresponding to the not inspected area the GEV and GP distributions are closely related and theoretically, should give the same results

spatial results extrapolation - return-level - distribution of the maximum corresponding to the not

12 Modelling extremes of dependent data for the stationary data characterised by the limited extend of long-range dependence at extreme levels, the extreme-value methods can be still applied in corrosion application it is reasonable to assume that pit depths are locally dependent extreme-value methods are applicable

extreme-value methods can be still applied in corrosion application it is

13 Modelling extremes of dependent data with the GEV distribution assuming local data dependence, block maxima of stationary data (for sufficiently large block sizes) can be considered as approximately independent the GEV distribution is used in its standard form

(for sufficiently large block sizes) can be considered as

14 Modelling extremes of dependent data with the GP distribution neighbouring exceedances may be dependent, therefore the change of practise is needed one of the most widely adopted method is data declustering - filtering out dependent observations such that remaining exceedances can be considered as approximately independent

the most widely adopted method is data declustering - filtering out dependent

15 Modelling extremes of dependent data with the GP distribution-approach define clusters of exceedances identify maximum excess within each cluster assuming that cluster maxima are independent fit the GP distribution

identify maximum excess within each cluster assuming

16 Estimation of the number of data clusters

17 Extremal index method: the extent of short-range dependence of extreme events is captured by the parameter θ, called extremal index 1 θ = limiting mean cluster size extremal index measures the degree of clustering of the process at extreme levels

extremal index 1 θ = limiting mean cluster size extremal

18 Extremal index method 1 θ = N c = θ N e limiting mean cluster size where N c - number of clusters, N e - number of exceedances above threshold u θ is estimated by the intervals estimator

19 Finding number of clusters using clustering algorithm define a validity criteria for the number N c of found clusters run the clustering algorithm for a range of N c as proper number of clusters choose the one for which the validity criteria are optimised

clustering algorithm for a range of N c as proper number of

20 Agglomerative hierarchical clustering algorithm starts with single points as clusters at each step the two closest clusters are merged stops when only one cluster remains

21 Agglomerative hierarchical clustering algorithm distance distance Dendrogram 2 link Distance between objects distance= height of the link Objects

22 Validation criteria, that we used, aim at identifying clusters that are compact and well isolated: silhouette plot - maximum value indicates optimum Davies-Bouldin index - minimum indicates optimum

23 -summary two methods to estimate the number of data clusters - the extremal index method - clustering algorithm + validation criteria (Davies-Bouldin index, silhouette plot) prior to data declustering perform clustering tendency test

24 Simulated corroded surface application of the gamma-process model dependence in terms of the product moment correlation ( 2 ) q/p ρ(x k, X l ) = exp d dist i p i=1 7 6 X k = (x k,y k ), X l = (x l,y l ), dist 1 = x k x l, dist 2 = y k y l, d = 0.3, p = 2, q = distance

25 Simulated corroded surface Matrix 1 Matrix 2 Matrix 3 Matrix 4 Matrix Correlation coefficient Matrix 6 Matrix 11 Matrix 7 Matrix 12 Matrix 8 Matrix 13 Matrix 9 Matrix 14 Matrix 10 Matrix Matrix 16 Matrix 17 Matrix 18 Matrix 19 Matrix x

26 Simulated corroded surface

27 Simulated corroded surface - clustering algorithm 1.2 Davies Bouldin index vs number of clusters 1 Silhouette plot Davies Bouldin index Average silhouette coefficient Number of clusters Number of clusters

28 Simulated corroded surface - results ˆθ n c n calg Table: The estimate of extremal index and determined number of clusters Number of clusters ADup 2 p v. KS p v Table: Goodness-of-fit test results for different number of clusters

29 ˆξ ˆ σ AD 2 up p v. KS p v Table: GP fit to excess of dependent data ˆξ ˆ σ ADup 2 p v. KS p v Table: GP fit to excess of declustered data ˆξ ˆσ ˆµ ADup 2 p v. KS p v Table: GEV fit to block maxima data

30 Real data example

31 Real data example

32 Real data example 0.8 Davies Bouldin index vs number of clusters 1 Silhouette plot Davies Bouldin index Average silhouette coefficient Number of clusters Number of clusters

33 ˆξ ˆσ ˆµ ADup 2 p v. KS p v Table: GEV fit to block maxima data ˆξ ˆ σ ADup 2 p v. KS p v Table: GP fit to excess of dependent data ˆξ ˆ σ ADup 2 p v. KS p v Table: GP fit to excess of declustered data

4 3.5 Extrapolated probability density function GP based GEV 4 3.5 Extrapolated probability density function GP based GEV 3 3 2.

34 4 3.5 Extrapolated probability density function GP based GEV Extrapolated probability density function GP based GEV Density 2 Density Extreme pit depth [mm] Extreme pit depth [mm]

35 with the GEV distribution

36 with the GP distribution

37 the two applied distributions are closely related and lead to the consistent inference about extreme-values of corrosion data declustering improves the results given by the GP distribution the performance of other clustering algorithms could be checked in order to take into account corrosion nonstationarity due to space-varying environmental conditions, covariate-dependent extreme-value models with trends could be considered

38 THANK YOU FOR ATTENTION Questions???

Extreme Value Analysis of Corrosion Data. Master Thesis

Extreme Value Analysis of Corrosion Data Master Thesis Marcin Glegola Delft University of Technology, Faculty of Electrical Engineering, Mathematics and Computer Science, The Netherlands July 2007 Abstract