Climate change impact on hydrological extremes along rivers and urban drainage systems in Belgium

Transcription

1 CCI-HYDR project (contract SD/CP/3A) for: Programme SSD «Science for a Sustainable Development» TECHNICAL REPORT, MAY 28 Climate change impact on hydrological extremes along rivers and urban drainage systems in Belgium 3. Statistical analysis of historical rainfall, ETo and river flow series trends and cycles Faculty of Engineering Department of Civil Engineering Hydraulics Division CCI-HYDR project Royal Meteorological Institute of Belgium Meteorological Research and Development Department Risk Analysis and Sustainable Development Section

2 Faculty of Engineering Department of Civil Engineering Hydraulics Section Kasteelpark Arenberg 4 BE-3 Leuven, Belgium tel fax Patrick.Willems@bwk.kuleuven.be Meteorological Research and Development Department Risk Analysis and Sustainable Development Section Avenue Circulaire, 3 BE-8 Brussels, Belgium tel fax Emmanuel.Roulin@oma.be No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without indicating the reference : Ntegeka V., Willems P., 28. Climate change impact on hydrological extremes along rivers and urban drainage systems. III. Statistical analysis of historical rainfall, ETo and river flow series trends and cycles, Belgian Science Policy SSD Research Programme, Technical report CCI-HYDR project by K.U.Leuven Hydraulics Section & Royal Meteorological Institute of Belgium, May 28, 37 p.

3 CCI-HYDR III. Statistical analysis trends and cycles - Content i Table of contents Introduction... 2 Statistical analysis of trends and cycles in the historical Uccle rainfall series Introduction Methodology based on extreme value analysis Peaks-Over-Threshold extremes Aggregation levels and time scales Frequency analysis of extremes Rainfall distributions The two-component exponential distribution The Weibull distribution Calibration of the distributions Random sample generations Monte Carlo confidence intervals Quantile-perturbation approach Results Quantile perturbations for extreme rainfall conditions Slopes of quantile perturbations versus return period Mean POT perturbations Number of events Statistical hypothesis testing on the clustering of rainfall extremes Slope hypothesis testing Hypothesis testing average perturbations Conclusions Statistical trend analysis for evapotranspiration Trend analysis of historical river flow series Flow perturbation analysis Summer flow perturbations Autumn flow perturbations Winter flow perturbations Spring flow perturbations Conclusions from the flow perturbations s...35

4

5 CCI-HYDR III. Statistical analysis trends and cycles Introduction There is a general perception that the climate for the most recent decades has changed. In Europe evidence shows increasing temperature trends since the late 2 th and early 2 st centuries (Luterbacher et al., 24; Xoplaki et al., 25). Various statistical methods exist for studying the variability of extremes. The use of statistical techniques provides a basis for understanding and evaluating the changes within hydro-meteorological time series. While the temperature records are easier to interpret for the long-term records, patterns of precipitation are less clear. For instance, statistical techniques allow for easy manipulation of the temporal variability of precipitation, which is inherently intermittent. Frequency and perturbation methods can be combined to reveal trends and cycles which would have otherwise been missed. This section covers the statistical approach for studying rainfall extremes for the historical Uccle precipitation series. The approach combines aspects of frequency and perturbation and thus provides an insightful temporal assessment of the trends and cycles of the extremes. 2 Statistical analysis of trends and cycles in the historical Uccle rainfall series 2. Introduction Climate changes can be detected using physically based methods and/or empirical methods. Physically based methods detect changes using climate models while empirical methods detect changes using statistical techniques. This section covers the empirical statistical analysis for the long-term high-frequency homogeneous rainfall series at the climatological station of the Royal Meteorological Institute of Belgium at Uccle that starts in 898 and which is continued to date (Demarée et al., 998; Demarée, 23). Vaes et al. (22) carried out before a preliminary trend analysis on the rainfall extremes in this long-term rainfall series at Uccle for the period The most recent period (last 7 years) was, however, not included. Blanckaert and Willems (26) conducted a spectral analysis based on Fast and Windowed Fourier Analysis and a wavelet analysis based on the hourly series for the period This research is based on the extended long-term historical series of minutes rainfall intensities for the period and makes use of an alternative method based on quantile-perturbations. The method allows a consistency check to be made of the results from the climate model simulations with the recent empirical trends. It is noteworthy that the data is of high quality and was provided by the Royal Meteorological Institute of Belgium. The general aim of the statistical analysis is investigation on whether the recent historical changes in frequency and amplitude of the rainfall extremes can be considered statistically significant in comparison with the natural temporal variability of rainfall intensities (as observed in the full available series since 898). The objective for the study is not to predict future trends, but to detect trends and cycles in retrospect. The analysis is carried out for different aggregation levels (time spans over which the rainfall intensities are averaged) spread over the range of concentration times along Belgian rural and urban catchments. These are the relevant time scales for the rainfall that determine the peak flow downstream of the catchment, and need to cover for rural and urban hydrology in Belgium the range from minutes to the seasonal scale. In the trend analysis, also the effect of clustering in time on the temporal variability of the frequency and amplitude of the rainfall extremes is taken into account. The statistical analysis is based on the application of the frequency and perturbation techniques. While frequency techniques focus on how often an event may occur, perturbation techniques determine the relative magnitudes of events based on a certain baseline. The frequency- or quantileperturbation analysis compounds the two concepts thereby making it possible to study the changes in extremes.

6 2 Historical Uccle rainfall series 2.2 Methodology based on extreme value analysis 2.2. Peaks-Over-Threshold extremes Extreme value theory has for long been applied in the analysis of extremes. Extremes are selected from a series by applying a threshold which connotes that the analysis is valid only for those values above a certain return period. However the selection of the threshold is subjective. There is no universal technique used for the selection of the threshold. Lang et al. (999) proposed that the selection of the threshold should be based on the distribution of the Peaks- Over-Threshold values and the hypothesis of the independency. For this research the threshold is considered as the value above which a distribution could be reasonably fitted due to the intended use of the distributions for the Monte Carlo calculations. Also, the identification of extremes requires the use of an independency criterion. Extreme value theory assumes total statistical independency of the sampled extremes (Willems, 2) thereby providing a theoretical basis for distribution fitting. The independency criterion is based on the independency criterion for extracting the Peaks-Over-Threshold (POT) values for rainfall which is similar to that of extracting Peaks-Over-Threshold values for discharges. The independency criterion for discharge events states that two consecutive events are independent if the occurrence of one event does not affect the occurrence of the other event. The separation of two consecutive flood peaks is subjective due to the uncertainty associated with the physical independency because the occurrence of the last peak may be partially explained by the occurrence of the previous peak (Lang, 999). However rainfall independency is less uncertain because two consecutive rainfall peaks are independent if the rainfall event between them drops to zero. Thus the main criterion for peak extraction from rainfall series is the inter-event time. Willems (2) proposed a minimum of a 2hr inter-event time because two events happening within the same day or night are considered as one event. The peak-over-threshold extremes are extracted based on the simple moving average series. Moving average series are preferred to unchanged series for trend analysis because they capture the intrinsic trends within a series which would otherwise have been unnoticeable. They also allow the rainfall intensities to be investigated at the time scales which are relevant for the hydrological applications. The time span of the moving window is called aggregation level and covers the range of concentration times of the river and sewer catchments of interest. The sampling for the peak over threshold values is based on a 2hr independency criterion. For the different aggregation levels, the independency inter-event time is taken as equal to the aggregation level for those levels above the minimum (i.e., 2hrs) Aggregation levels and time scales Different aggregation levels and time scales are used in the statistical analysis. Aggregation levels of minutes, 6 minutes, 8 minutes and 44 minutes and monthly seasonal volumes are used in the analysis. In addition to the aggregations the data is also grouped in blocks of years ranging from 5 to 5 years. Therefore the analysis is based on a particular aggregation level for a particular block of years. For instance given an aggregation level of minutes and year blocks, the analysis involves studying the statistical properties based on minutes POT extremes grouped in year blocks for the period Note that the total number of non-overlapping blocks of years for a given period can easily be calculated if the initial year and final year is known. For the period the decades are , 98-97,..., , However these decades are not sufficient for a complete temporal analysis. Therefore a sliding window is used in the analysis. The sliding window requires the shift to the right of one year which leads to a new set of -year blocks: , 99-98,, , Note here that the last block does not shift because 24 is the last available year in the series. The sliding window is applied n times, where n is the number of years in a block. For the previous example, the shift of one year is applied times (for a year block). The last shift includes the blocks: 97-96, ,, , Table gives an overview of the complete set of years blocks considered. It is on the basis of the Peaks-Over-Threshold values grouped according to these blocks that the statistical trend and cycle analysis is applied.

7 CCI-HYDR III. Statistical analysis trends and cycles 3 Table : years blocks considered in the analysis. SUMMER WINTER WINDOWS The analysis is also carried out for the four seasons: Winter (December, January and February), Spring (March, April, May), Summer (June, July, August) and Autumn (September, October and November) Frequency analysis of extremes Frequency analysis deals with how often an even occurs. Thus, based on frequency analysis, extremes may be identified. The decision on what makes an event extreme depends on the intended use in design or future planning. The extremes for this study partially conform to the definition offered by Pickands (975). Pickands (975) stated that the extremes extracted from a series after applying a threshold can be fitted to a Generalized Pareto distribution (GPD). The threshold needs to be high enough to ensure that the extremes can be fitted to a distribution. Due to the nature of the study, selection of an optimal threshold (threshold that most accurately fits the distribution) is not restricted. Applying an optimal threshold would in some cases reduce the number of events and thus affect the number of events from the extracted blocks of years. Also, considering the number of series to be fitted, selecting an optimal threshold for each series would involve computational constraints. For instance, for a period of there are 7 series for each season. If each series is to be fitted accurately it would require that each series is fitted for a different threshold while having a constant threshold for all the series would simplify the distribution fitting computation for all the 7 series. An initial analysis reveals that although the fits are not accurate for some series they are reasonable approximations as long as the threshold is selected high enough. However, the definition of what constitutes an extreme event is still debated. An extreme event may be selected based on frequency, intensity or threshold exceedance and physical expected impacts. The thresholds used in this study are selected using the criterion of having at least 5 Peaks-Over-Threshold values per year in a particular season and also having a reasonable distribution fit Rainfall distributions Distribution fitting is an essential component of this study. It is on the basis of the distribution that the hypothesis testing on the significance of the historical trends and variations (based on Monte Carlo calculations) is done. For Monte Carlo calculations it is crucial that the variate can be generated through inversion of the distribution. The Normal and Gamma distributions are some of the distributions without analytic inverse transforms (Charles, 986). Fortunately, this

8 4 Historical Uccle rainfall series study is based on rainfall extremes, which according to previous studies, can be fitted to Weibull and Exponential distributions; for which analytic inverse transforms exist. The probability distribution of point precipitation intensities has been examined in many previous studies for the Uccle series by e.g. Demarée (985), Willems (998, 2), Mohymont (25). Willems (998) presented a systematic methodology which derived the type of the distribution and the optimal threshold. The exponential distribution has been suggested as presenting a good approximation to the underlying precipitation process: more specifically a two-component distribution to represent storms of two different types (air mass thunderstorms and cyclonic/frontal storms). This was done for durations in the range minutes till 5 days The two-component exponential distribution The two-component exponential distribution is defined as: G( x) = pa Ga ( x) + ( - pa ) Gb ( x) () in which G a (x) and G b (x) are two different exponential distributions and subscripts a and b represent the thunderstorms and frontal storms respectively: a x - x t G a ( x) = - exp(- ) (2) β x - x t G b ( x) = - exp(- ) (3) β Equations (2) and (3) represent the cumulative probability distributions where β is considered to be the scale parameter while x and x t are the rainfall variable and the threshold respectively. The probability distribution G(x) is considered as the combined distribution of two exponentially distributed populations a and b, in which pa represents the proportion of the population a. The two distributions arise from the fact there are two different types of storms; storms associated with thunderstorms in summer and storms associated with cyclonic and frontal storms. The first storm type is associated with population a (largest parameter β); because it is known that the extreme precipitation intensities are on average, larger for this storm type. The parameters p a, β a, β b were determined for the Uccle rainfall intensities in the range of aggregation levels between minutes and 5 days by Willems (2) The Weibull distribution Willems (2) discovered that the two-component exponential distribution was valid for aggregation levels less than 2 days while a one component distribution was valid up to 5 days. However, Willems (2) based his analysis on aggregated values up to 5 days. One of the temporal scales included in this study is the monthly scale. Due to the independency criterion of the minimum inter-event time of at least one month, the number of monthly extracted Peaks- Over-Threshold data would be limited, leading to more uncertainty. Therefore, for the monthly scale, the aggregation and use of the independency criteria is ignored. Only a threshold is applied to the series after calculating the cumulative monthly volumes. With this adjustment, the fitted exponential distribution suggested by Willems (2) is graphically found to be suspect. This could be explained by the fact that his analysis is based on aggregated values for time scales varying from minutes to 5 days. The Weibull distribution is found to have better fit to the monthly seasonal volumes than the exponential distribution. The Weibull 3-parameter distribution is characterised by the following equation: b x - x t α F( x) = - exp(-( ) ) (4) β where F(x) is the cumulative distribution function, x the rainfall volume, x t the threshold, β the scale parameter and α the shape parameter. Note that Equation (2) is indeed an alteration of Equation (6) with α equal to. Since the distribution can easily be transformed as will later be shown, the fitting is consequently less complicated allowing for easier manipulation of the Monte Carlo calculations.

9 CCI-HYDR III. Statistical analysis trends and cycles Calibration of the distributions One of the prerequisites for performing uncertainty analysis by Monte Carlo simulation is distribution fitting. As previously stated, the distribution should have an inverse transform which can easily be computed. Some probability distributions are too computationally expensive because they may require numerical integration. Fortunately, this study was based on the two easily transformable distributions, namely the Weibull 3-parameter distribution and the twocomponent exponential distribution. Various methods exist for fitting distributions. Some of these methods include the least squares estimation, the method of maximum likelihood and the method of moments among others. The fitting was primarily based on linear and non-linear least squares techniques. The two-component exponential distribution was fitted using the non-linear approach while the Weibull distribution was fitted using the linear approach. For the two-component exponential distribution, equations (), (2) and (3) can be combined to form Equation (5): x - x t x - x t - G( x) = pa exp(- ) + ( - pa ) exp(- ) (5) β β a The probability of exceedance G(x) can also be calculated using order statistics based on certain plotting position formulas. The Weibull plotting position formula is preferred in most cases because of its minimum variance (Ghosh, 999): b i G( x) = (6) n + where i is the rank of the series sorted in descending order and n is the number of data points in the series. Thus, by minimising the mean square error while adjusting β a, β b and p a, it is possible to calibrate the parameters for the two-component exponential distribution. Note here that since the Equation (5) has not been linearised, non linear least squares are used. Nonlinear least squares method usually requires initial estimates for the parameters. Willems (2) developed Equations (7) and (8) for estimating the parameters in the two-component exponential distribution: log( β a [ mm / h] ) = log( D[ days]) (7) log( β b [ mm / h] ) = log( D[ days]) (8) Based on these estimates initial guesses can be made for β a and β b. Note here that the threshold x t may be considered constant and p a is a value between and. From initial guesses the parameters with a minimum mean square error value of the residual of the empirical with the theoretical distribution are found. These represent the calibrated distribution for the series. The approach for calibrating the Weibull distribution is similar to that of the two-component exponential distribution although it is based on linear regression techniques. Equation (4) can be transformed linearly by taking logarithms twice and rearranging it to give: ln(-ln(- F( x)) αln( xx - t )-αln( β) By replacing -F(x) with Equation (6), Equation (9) changes to: = (9) i ln(-ln( )) = αln( xx - t )-αln( β) n + Both α and β can now be estimated by fitting a straight line using least squares regression with i the ordinate taken as ln(- ln( )) n + () and the abscissa taken as ln( x - x ). The slope of the line of best fit gives the shape parameter α from which the scale parameter β is estimated. The regression, however, requires prior estimation to be made of the threshold x t. Without such prior estimate, replacing ln( x - x ) by ln(x ) in previous equations, the shape parameter α can be t t

10 6 Historical Uccle rainfall series determined as the asymptotic slope towards the higher observations in a plot with the ordinate i taken as ln(-ln( )) n + and the abscissa taken as ln(x ) Random sample generations The approach adopted in this study for confidence interval calculation (see section 2.2.5) requires random samples (or a number of random values) to be generated from the rainfall distributions. A random value is usually thought of as a value selected such that each value in the population has an equal chance of being selected (Charles, 986). A random value can be selected from any probability distribution as long as the values are independent of each other. Random numbers can be generated for distributions by making use of the fact that the cumulative probability function for any continuous variate is uniformly distributed over the interval from to. Therefore with a continuous uniform variable U in [, ] and an invertible distribution F, the random variable X = F - (U) can be generated. This is called inverse transform sampling, which requires that a random number is selected from the interval [, ] followed by computation of the variate from the inverse cumulative distribution Monte Carlo confidence intervals Climate change is related to the statistically significant variations from the natural variability that persists for a long period, typically decades or longer. It involves shifts in the frequency and magnitude of sporadic weather events (IPCC, 2). Confidence intervals can be used to define a region of natural variability or randomness. Thus they are also used for testing hypotheses of significant deviations under the hypothesis of no trend or temporal clustering of rainfall extremes. Since a confidence interval defines a region of expected variability, any region outside the confidence bounds is considered to be statistically significant (hypothesis rejected of no significant trends and oscillations) while the region within the bounds is statistically insignificant (hypothesis accepted). Using this criterion, one is able to ascertain the statistical significance for a given hypothesis. Monte Carlo methods are statistical sampling methods used for generating several random outcomes from the derived distributions. Given a distribution, a program using a Monte Carlo algorithm, derives a large number of random outcomes from which a statistic is derived. The approach used for this study is based on the parametric bootstrapping method which requires that a sample is first fitted to a distribution after which random samples are generated from the distribution. One of the advantages of bootstrapping is that it allows confidence intervals to be estimated. This approach was originally introduced for independent data (Enfron, 979) but has evolved over time to allow for the analysis of dependent data by the use of the block bootstrapping (Vogel and Shallcross, 996). Block bootstrapping groups data in blocks from which the resampling is made and thus preserving the time within the series. However, with the use of Peaks-Over-Threshold values selected after using independency criteria, the data is assumed to be independent. Even though frequency analysis eliminates the time aspect within a particular block, the use of several blocks restores the time aspect since the statistics (e.g. 95% confidence intervals) can be calculated for each block. By connecting the statistics for each block the temporal evolution of the statistic is realized. The analysis also uses the Peaks-Over-Threshold (POT) series as the major focus of this study is on the extremes. The parametric bootstrapping Monte Carlo procedure is described below:. POT series for the entire period are extracted from the available series. 2. The series are then further separated into seasonal blocks for different block lengths. 3. Considering a particular block, a distribution is fitted to the POT data and the parameters for the distribution are stored. Also the total number of POT values n is stored. 4. Using the Monte Carlo methodology of random number generation, p samples are generated from the fitted distribution with each sample containing n POT values.

11 CCI-HYDR III. Statistical analysis trends and cycles 7 5. Each of the p samples are ranked in descending or ascending order and the confidence interval can calculated for each rank number. The rank numbers can easily be converted to return periods using empirical plotting positions. Note that for each rank number or return period, there are p possible values. Based on these values, the confidence interval is estimated from the rank range [p α/2, p (-α/2) ] where α is the level of confidence. For example, for p= and a 95% confidence interval (α=.5) the confidence interval is given by the 25 th and 975 th values after ranking Quantile-perturbation approach The proposed method investigates the historic changes in the ranked extremes. The method combines aspects of frequency, used in extreme value analysis, and perturbation, used in climate change impact studies. The technique is analogous to the frequency-perturbation approach applied by Harrold et al. (25) and Chiew (26) for deriving climate change scenarios from climate models. For climate change impact analysis on a daily rainfall series, instead of applying one factor (e.g., monthly change) for the entire daily time series (e.g., for same month) they decided to apply different factors based on the ranked daily values. The perturbation factors were calculated as ratios of two similarly ranked values obtained from the future (climate model scenarios) and the observed time series. The proposed method, however, is solely based on historical data. Since the perturbation is a relative change it requires two series. For the climate-model based approach, one of the series is taken as the reference or baseline series while the other is a future scenario series. In the present study, one of the series is derived from the long-term historical distribution while the other series is taken from a particular block (subseries) of interest. For example given a particular block of years, one of the series contains the actual POT values within the block while the other series is derived from the distribution of long-term historical values (from the entire period of 7 years). The POT values within the block were ranked (where i is the rank of each POT event), such that they can be related to empirical return periods /i (or L/i for block series of L years length). After ranking, the POT values correspond with quantiles x(l), x(l/2),, x(l/i),, where x(l/i) is the quantile with empirical return period L/i. The same procedure is applied to the full 7 years series, leading to quantiles x g (7), x g (7/2),, x g (7/i), The perturbation factors then correspond to the ratios x(l)/x g (L), x(l/2)/x g (L/2), It is clear that the return periods L, L/2, do not necessarily coincide with the empirical return periods of the POT events of the full 7 years series. In that case the x g (L/i) values are derived by linear interpolation from the closest (higher and lower) POT events. Figure illustrates the estimation of the first 3 values in the reference series for a -year summer block. The curve contains all the summer POT values in a 7 years period.

12 8 Historical Uccle rainfall series 6 4 Rainfall Intensity[mm/mins] Return period[years] Figure : Precipitation quantiles in long-term baseline calculation. 2.3 Results 2.3. Quantile perturbations for extreme rainfall conditions The calculation of the confidence intervals is based on the historical aggregated Uccle minutes rainfall series (898-24). The selection of the number of samples for the confidence intervals depends on the available computing resources, available time for analysis and level of accuracy. This study opts for samples due to the several blocks of years that are investigated. Seasonal blocks of 5 to 5 years (winter, spring, summer and autumn) from are required for the analysis which involves a large volume of data generation. For any given block of seasonal series there are 7 series each with different number of POT extremes. The POT extraction criterion is that each year has at least 5 POT values. Each block is assigned a confidence interval using the parametric bootstrapping Monte Carlo technique. After fitting a distribution (two-component exponential or Weibull for the monthly time scale) to the data for a particular block, random samples are generated; each containing the same number of events as the parent data. Different statistics can then be obtained for each sample. It is from this ensemble of statistics that a confidence interval is defined. For example for the perturbations, average quantile-perturbations are calculated (average for the higher return periods) for each sample separately. This gives perturbation factors from which the 3 rd and 98 th ranked values define the 95% confidence interval. The same procedure is repeated for all blocks each time defining the confidence interval. The upper and lower confidence interval points are then superimposed on the same plot with the historical factors. The resultant plot shows the factors and the confidence intervals which can then be used to check the hypothesis. A similar procedure is also used for other confidence intervals albeit with a few alterations depending on the statistic. Figure 2 and Figure 3 show the minutes perturbations for each separate rank number for the summer period and all years blocks (decades). The perturbations represent the changes with respect to the long-term historical data. The perturbation is taken as the ratio of similarly ranked data from the two series. The base or reference series is taken as the long-term expected series while the other series is taken as the actual series within a particular block. A single perturbation factor for a particular block of years is calculated as the average of all the perturbations above a particular threshold. The threshold selection is based on a criterion which

13 CCI-HYDR III. Statistical analysis trends and cycles 9 for this study is taken as having 5 events per year. This means that the perturbation can be seen as a quantile perturbation for extreme rainfall conditions. The mean perturbation is assigned to a year which is approximately in the middle of the block. Repeating the averaging over the different blocks assigns one factor to each block which eventually leads to a temporal variation of the perturbation factor. Figure 4 shows the temporal variation obtained from Figure 2 and Figure 3 with each point representing the centre of a -year block. After evaluating the perturbation temporal evolution, the confidence interval is also evaluated and superimposed on the same plot. It is then graphically possible to identify periods that depict significant variations under the hypothesis of no trend or temporal clustering of rainfall extremes (see Figure 8 and section 2.3.5).

14 Historical Uccle rainfall series exceedance probability[-] exceedance probability[-] exceedance probability[-] exceedance probability[-]... exceedance probability[-]... exceedance probability[-].. Figure 2: Quantile perturbations for minutes rainfall extremes and years blocks for summer periods.

15 CCI-HYDR III. Statistical analysis trends and cycles exceedance probability[-] exceedance probability[-] exceedance probability[-] exceedance probability[-].. Figure 3: Quantile perturbations for minutes rainfall extremes and years blocks for summer periods (cont d).

16 2 Historical Uccle rainfall series Figure 4: Estimates of average quantile perturbations for minutes rainfall extremes and years blocks for summer periods Slopes of quantile perturbations versus return period Calculation of the average quantile perturbations assumes these perturbations to be independent of the return period; or the slope of the quantile perturbations versus the return period or exceedance probability to be non-significantly different from zero. To test this, the slope is explicitly calculated and analyzed. Figure 5 shows some of the expected outcomes of the slopes for selected periods. There is a positive slope for the blocks and and negative slope for the blocks and The slope tests the hypothesis that the perturbation factor does not vary with exceedance probability i.e. changes in extremes. In other words, if the slope is significantly different from zero then there is a trend in the perturbations which implies that the higher extreme events have significantly different perturbations from the lower extreme events. Conversely, if the slope is not significantly different from zero (it is nearly horizontal) then the perturbations for both the low and high extremes are the same. Again, the confidence interval aids the analysis as it defines zones of significance. If the zero reference lies within the confidence interval for a particular period say (year 93 in Figure 5), then the slope is not significantly different from zero and thus the perturbation can be assumed to be constant for that period. However if the zero lies out of the confidence interval the slope is significantly different from zero ( ; year 943 in Figure 5) then the perturbation can not be assumed to be constant for the whole range of extremes.

17 CCI-HYDR III. Statistical analysis of trends and cycles Slope(995-24) Slope( ) Slope. Slope =.77 Slope =.968. Exceedance probability [-] Confidence Interval Slope = -.3 Slope = -.82 Slope( ) Slope(898-97). Slope[-] Year [-] Figure 5: Hypothesis testing for slopes of quantile perturbations versus return period Mean POT perturbations The mean POT represents an average Peaks-Over-Threshold value for a given dataset. For example, for this study the mean POT is calculated from the values within a block. The calculation is done for all the blocks of years which eventually lead to an evolution of mean POT values over the long-term period. The confidence interval for the mean POT is based on the mean POT values calculated from the randomly generated samples of the parent data for a particular block of years. From say samples for a particular block, the mean POT is calculated for each sample. From the resulting mean POT values it is possible to define the 95% confidence interval. The confidence interval is used to test the hypothesis that the longterm mean POT does not vary significantly from the reference level. The reference level is taken as the long-term mean POT value over the entire period, e.g., for all the summers in the longterm series Number of events There is a general perception that the frequency of extreme events has increased in recent years (Prudhomme et al., 23; Beniston et al., 24). Due to the availability of the long-term series it is now possible to examine also this perception using statistical hypothesis testing. The hypothesis that the recent number of events is significantly higher than what is expected can be tested for the most recent years. The significance of the number of events can be tested using the non-parametric bootstrapping Monte Carlo technique. The technique has some differences from the previous bootstrapping applications. One notable difference is that there is no distribution fitting. Instead the parent data is randomly distributed in time. For instance for the summer period, all the POT values for summer are randomly distributed in time with each POT value having say possible summer date values in the period Thus for each

18 4 Historical Uccle rainfall series sampling, the number of events above a particular threshold for a particular block of years is noted. The counting of events is repeated for all the random time samples. Therefore each block will have possible number of events from which the 95% confidence interval can be estimated. The analysis on mean POT perturbations (section 2.3.3) and number of events (this section) allows the quantile-perturbations to be further explained by their contributing trends and variations in both the number of events (time frequency of rainfall events) and the amplitude of each event Statistical hypothesis testing on the clustering of rainfall extremes The statistical hypothesis testing is based on the 95% confidence interval. The Interval defines a region of acceptance and is also an indication of the expected randomness boundary. Outside this boundary the hypothesis is rejected. Based on the hypothesis tests, periods of statistically significant behaviour can be identified. As discussed before, this statistical investigation is based on Peaks-Over-Threshold extremes for different aggregation levels and different block lengths using the sliding window technique for block lengths ranging from 5 to 5 years. The investigation is based on aggregation levels of minutes (no aggregation), 6 minutes (hourly), 44 minutes (daily) and 8 minutes (Week) together with monthly seasonal volumes. However, the discursion has been limited to a -year block length and aggregation levels of minutes and 44 minutes and monthly seasonal volumes. For this assessment, monthly volumes are preferred to aggregate monthly values due to the limitation of the POT extraction method which produces few monthly extremes for the analysis. With varying block lengths, periods of clustering of extremes may be identified. For instance if a particular period shows an indication of higher perturbations (showing clustering of rainfall extremes) for a particular block length then other block lengths can be used to check the persistence of the perturbation. If the period with high perturbation is consistent for all the block lengths then the clustering of events for that period is plausible. The block length may also be linked to the objective of the analysis. For example, the Intergovernmental Panel of Climate Change (IPCC, 2) uses decadal analysis for climate change. The 5 year block length may be used to test the perception that the most recent years since the 99 s have experienced more climate change effects compared to the previous periods. The variability of the perturbations of the historical precipitation shows some attributes of trends and oscillations. The temporal variability of the perturbation is made possible by using average perturbations for each block of years. In other words, one perturbation factor represents the mean of the perturbations within a particular block of years and above a rainfall threshold (thus for all rainfall extremes). Table 2 contains the thresholds. The thresholds are selected using a criterion of having at least 5 POT values per block. Table 2: Thresholds above which average quantile perturbations were derived. Statistic Season mins 44 mins Month Threshold Summer (mm) Winter Return period Summer (years) Winter The hypothesis of a constant perturbation above these thresholds can be tested using statistical hypothesis testing for the slope of the perturbation-exceedance probability plots. If the zero value lies within the confidence interval then the slope is not significantly different from zero or it is nearly horizontal. Therefore the assumption of a constant perturbation factor above a threshold is justified. Note that each slope is centred in the middle of each block (there are 7 blocks for the entire period) Slope hypothesis testing Figure 6 shows the slope of the perturbations above the selected thresholds for the different aggregation levels and monthly seasonal volumes.

19 CCI-HYDR III. Statistical analysis of trends and cycles 5 The hypothesis that the slope is not significantly different from zero can be generally accepted. The slope is significantly different from zero only for short periods, e.g., 94s, 96s and 99s for minutes rainfall in summer and 9s, 96s and 97s for minutes winter rainfall. Note that the alternative hypothesis of the slope being significantly different from zero (the is out of the confidence interval) holds for the significant periods. However there is still a 5% chance that the slope is almost horizontal. Since the periods are not of long lengths, this may be a logical compromise. On the other hand significant slopes are indications of high variability of extremes, which matches with clustering of the rainfall extremes (see section ) mins-years-summer 95% Confidence interval slope.6.5 mins-years-winter 95% Confidence interval slope Slope[-].5..5 Slope[-] mins-years-Summer 95% Confidence interval Slope slope mins-years-Winter 95% Confidence interval Slope slope Slope[-].2. Slope[-] Monthly-years-Summer 95% Confidence interval Slope slope.2 Monthly-years-Winter 95% Confidence interval Slope slope.6.8 Slope[-].4.2 Slope[-] Figure 6: -exceedance probability slope estimates for minutes, day and month rainfall extremes and -year blocks for summer period, together with 95% confidence intervals Hypothesis testing average perturbations The perturbation time series analysis is aimed at investigating whether the most recent changes can be considered to be statistically significant in comparison with the natural temporal variability. Identifying statistically significant trends and oscillations enables one to assess the likelihood of climate change effects during the most recent periods. Figure 7, Figure 8 and Figure 9 summarize the results obtained for the 5, and 5 years blocks. These figures show the results for minutes, daily (44 minutes aggregated from minutes) and monthly volumes. The perturbations, number of events and mean Peaks-Over-Threshold values have been included for both summer and winter.

20 6 Historical Uccle rainfall series.6 mins-5years-summer 9 min-5years-summer 4.5 mins-5years-summer % Confidence interval % Confidence interval Events Mean POT[mm] % Confidence interval Mean POT mins-5years-winter 8 min-5years-winter 4 mins-5years-winter % Confidence interval % Confidence interval Events Mean POT[mm] % Confidence interval Mean POT mins-5years-Summer 7 44min-5years-Summer.8 44mins-5years-Summer % Confidence interval % Confidence interval Events Mean POT[mm] % Confidence interval Mean POT mins-5years-Winter 7 44min-5years-Winter.4 44mins-5years-Winter % Confidence interval % Confidence interval Events Mean POT[mm] % Confidence interval Mean POT Monthly-5years-Summer 5 Monthly-5years-Summer 3 Monthly-5years-Summer % Confidence interval 5 95% Confidence interval Events Mean POT[mm] % Confidence interval Mean POT[mm] Monthly-5years-Winter 5 Monthly-5years-Winter 2 Monthly-5years-Winter % Confidence interval 5 95% Confidence interval Events Mean POT[mm] % Confidence interval Mean POT[mm] Figure 7: Estimates of average quantile perturbations, number of events and mean POT values for minutes, day and month rainfall extremes and 5 years blocks for summer and winter periods, together with 95% confidence intervals.

21 CCI-HYDR III. Statistical analysis of trends and cycles 7.6 mins-years-summer 8 min-years-summer 4 mins-years-summer ' 95% Confidence interval % Confidence interval Events Mean POT[mm] % Confidence interval Mean POT mins-years-winter 4 min-years-winter 3 mins-years-winter % Confidence interval % Confidence interval Events Mean POT[mm] % Confidence interval Mean POT mins-years-Summer 4 44min-years-Summer.6 44mins-years-Summer % Confidence interval % Confidence interval Events Mean POT[mm] % Confidence interval Mean POT mins-years-Winter 4 44min-years-Winter.2 44mins-years-Winter % Confidence interval % Confidence interval Events Mean POT[mm] % Confidence interval Mean POT Monthly-years-Summer 3 Monthly-years-Summer 6 Monthly-years-Summer % Confidence interval % Confidence interval Events Mean POT[mm] % Confidence interval Mean POT[mm] Monthly-years-Winter 3 Monthly-years-Winter 6 Monthly-years-Winter % Confidence interval % Confidence interval Events Mean POT[mm] % Confidence interval Mean POT[mm] Figure 8: Estimates of average quantile perturbations, number of events and mean POT values for minutes, day and month rainfall extremes and years blocks for summer and winter periods, together with 95% confidence intervals.