An Introduction to Extreme Value Analysis

Similar documents
An Introduction to Extreme Value Theory

An introduction to the analysis of extreme values using R and extremes

An application of extreme value theory to the management of a hydroelectric dam

Statistical methods to expect extreme values: Application of POT approach to CAC40 return index

Extreme Value Analysis of Corrosion Data. Master Thesis

Operational Risk Management: Added Value of Advanced Methodologies

CATASTROPHIC RISK MANAGEMENT IN NON-LIFE INSURANCE

Applying Generalized Pareto Distribution to the Risk Management of Commerce Fire Insurance

Statistical Software for Weather and Climate

Estimation and attribution of changes in extreme weather and climate events

Extreme Value Theory with Applications in Quantitative Risk Management

COMPARISON BETWEEN ANNUAL MAXIMUM AND PEAKS OVER THRESHOLD MODELS FOR FLOOD FREQUENCY PREDICTION

Extreme Value Theory for Heavy-Tails in Electricity Prices

Contributions to extreme-value analysis

Int. Statistical Inst.: Proc. 58th World Statistical Congress, 2011, Dublin (Session STS040) p.2985

EVIM: A Software Package for Extreme Value Analysis in MATLAB

An analysis of the dependence between crude oil price and ethanol price using bivariate extreme value copulas

Order Statistics. Lecture 15: Order Statistics. Notation Detour. Order Statistics, cont.

CHAPTER 2 Estimating Probabilities

Contributions to Modeling Extreme Events on Financial and Electricity Markets

Basics of Statistical Machine Learning

The R programming language. Statistical Software for Weather and Climate. Already familiar with R? R preliminaries

Fairfield Public Schools

A Basic Introduction to Missing Data

The Extremes Toolkit (extremes)

Working Paper: Extreme Value Theory and mixed Canonical vine Copulas on modelling energy price risks

Life Table Analysis using Weighted Survey Data

Uncertainty quantification for the family-wise error rate in multivariate copula models

A Simple Formula for Operational Risk Capital: A Proposal Based on the Similarity of Loss Severity Distributions Observed among 18 Japanese Banks

From value at risk to stress testing: The extreme value approach

A Comparative Software Review for Extreme Value Analysis

Exact Nonparametric Tests for Comparing Means - A Personal Summary

Probabilistic Real-Time Systems

Multivariate Extremes, Max-Stable Process Estimation and Dynamic Financial Modeling

Semi-Supervised Support Vector Machines and Application to Spam Filtering

A software review for extreme value analysis

Introduction to Markov Chain Monte Carlo

Inference on the parameters of the Weibull distribution using records

Maximum Likelihood Estimation

Climate Extremes Research: Recent Findings and New Direc8ons

Corrected Diffusion Approximations for the Maximum of Heavy-Tailed Random Walk

Generating Random Samples from the Generalized Pareto Mixture Model

BIG DATA: CONVENTIONAL METHODS MEET UNCONVENTIONAL DATA

15 Limit sets. Lyapunov functions

Statistical Machine Learning

Linear Classification. Volker Tresp Summer 2015

INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition)

Reliability estimators for the components of series and parallel systems: The Weibull model

Hypothesis Testing for Beginners

GENERALIZED PARETO FIT TO THE SOCIETY OF ACTUARIES LARGE CLAIMS DATABASE

What is Statistics? Lecture 1. Introduction and probability review. Idea of parametric inference

Quantitative Models for Operational Risk: Extremes, Dependence and Aggregation

The VAR models discussed so fare are appropriate for modeling I(0) data, like asset returns or growth rates of macroeconomic time series.

1 Maximum likelihood estimation

Asymptotics for ruin probabilities in a discrete-time risk model with dependent financial and insurance risks

CS 2750 Machine Learning. Lecture 1. Machine Learning. CS 2750 Machine Learning.

. (3.3) n Note that supremum (3.2) must occur at one of the observed values x i or to the left of x i.

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

ON LIMIT LAWS FOR CENTRAL ORDER STATISTICS UNDER POWER NORMALIZATION. E. I. Pancheva, A. Gacovska-Barandovska

Learning Gaussian process models from big data. Alan Qi Purdue University Joint work with Z. Xu, F. Yan, B. Dai, and Y. Zhu

Survival Analysis of Left Truncated Income Protection Insurance Data. [March 29, 2012]

Monte Carlo and Empirical Methods for Stochastic Inference (MASM11/FMS091)

L3: Statistical Modeling with Hadoop

Graphs. Exploratory data analysis. Graphs. Standard forms. A graph is a suitable way of representing data if:

Multivariate Normal Distribution

MODELLING, ESTIMATING AND VALIDATING MULTIDIMENSIONAL DISTRIBUTION FUNCTIONS -WITH APPLICATIONS TO RISK MANAGEMENT- By Markus Junker D 368

Message-passing sequential detection of multiple change points in networks

Chapter 6: Point Estimation. Fall Probability & Statistics

Lecture 8: Gamma regression

Linda Staub & Alexandros Gekenidis

Value-at-Risk for Fixed Income portfolios A comparison of alternative models

Non Parametric Inference

Un point de vue bayésien pour des algorithmes de bandit plus performants

Financial Time Series Analysis (FTSA) Lecture 1: Introduction

i=1 In practice, the natural logarithm of the likelihood function, called the log-likelihood function and denoted by

A Uniform Asymptotic Estimate for Discounted Aggregate Claims with Subexponential Tails

Transcription:

An Introduction to Extreme Value Analysis Graduate student seminar series Whitney Huang Department of Statistics Purdue University March 6, 2014 Whitney Huang (Purdue University) An Introduction to Extreme Value Analysis March 6, 2014 1 / 31

Whitney Huang (Purdue University) An Introduction to Extreme Value Analysis March 6, 2014 2 / 31

Outline 1 Motivation 2 Extreme Value Theorem 3 Example: Fort Collins Precipitation Whitney Huang (Purdue University) An Introduction to Extreme Value Analysis March 6, 2014 3 / 31

Usual vs Extremes Whitney Huang (Purdue University) An Introduction to Extreme Value Analysis March 6, 2014 4 / 31

Why study extremes? Although infrequent, extremes usually have large impact. Goal: to quantify the tail behavior often requires extrapolation. Applications: hydrology: flooding climate: temperature, precipitation, wind, finance insurance/reinsurance engineering: structural design, reliability Whitney Huang (Purdue University) An Introduction to Extreme Value Analysis March 6, 2014 5 / 31

Outline 1 Motivation 2 Extreme Value Theorem 3 Example: Fort Collins Precipitation Whitney Huang (Purdue University) An Introduction to Extreme Value Analysis March 6, 2014 6 / 31

Probability Framework Let X 1,, X n iid F and define Mn = max{x 1,, X n } Then the distribution function of M n is P(M n x) = P(X 1 x,, X n x) = P(X 1 x) P(X n x) = F n (x) Remark { F n (x) === n 0 if F (x) < 1 1 if F (x) = 1 the limiting distribution is degenerate. Whitney Huang (Purdue University) An Introduction to Extreme Value Analysis March 6, 2014 7 / 31

Asymptotic: Classical Limit Laws Recall the Central Limit Theorem: S n nµ nσ d N(0, 1) rescaling is the key to obtain a non-degenerate distribution Question: Can we get the limiting distribution of M n b n a n for suitable sequence {a n } > 0 and {b n }? Whitney Huang (Purdue University) An Introduction to Extreme Value Analysis March 6, 2014 8 / 31

Asymptotic: Classical Limit Laws Recall the Central Limit Theorem: S n nµ nσ d N(0, 1) rescaling is the key to obtain a non-degenerate distribution Question: Can we get the limiting distribution of M n b n a n for suitable sequence {a n } > 0 and {b n }? Whitney Huang (Purdue University) An Introduction to Extreme Value Analysis March 6, 2014 8 / 31

Theorem (Fisher Tippett Gnedenko theorem) If there exist sequences of constants a n > 0 and b n such that, as n ( ) Mn b n d P x G(x) a n for some non-degenerate distribution G, then G belongs to either the Gumbel, the Fréchet or the Weibull family Gumbel: G(x) = { exp(exp( x)) < x < ; 0 x 0, Fréchet: G(x) = exp( x α ) x > 0, α > 0; { exp( ( x) Weibull: G(x) = α ) x < 0, α > 0, 1 x 0; Whitney Huang (Purdue University) An Introduction to Extreme Value Analysis March 6, 2014 9 / 31

Generalized Extreme Value Distribution (GEV) This family encompasses all three extreme value limit families: where x + = max(x, 0) { [ G(x) = exp 1 + ξ( x µ σ ] 1 } ) ξ + µ and σ are location and scale parameters ξ is a shape parameter determining the rate of tail decay, with ξ > 0 giving the heavy-tailed (Fréchet) case ξ = 0 giving the light-tailed (Gumbel) case ξ < 0 giving the bounded-tailed (Weibull) case Whitney Huang (Purdue University) An Introduction to Extreme Value Analysis March 6, 2014 10 / 31

Max-Stability and GEV Definition A distribution G is said to be max-stable if G k (a k x + b k ) = G(x), k N for some constants a k > 0 and b k Taking powers of a distribution function results only in a change of location and scale A distribution is max-stable it is a GEV distribution Whitney Huang (Purdue University) An Introduction to Extreme Value Analysis March 6, 2014 11 / 31

Quantiles and Return Levels Quantiles of Extremes { [ G(x p ) = exp x p = µ σ ξ 1 + ξ( x p µ ) σ ] 1 ξ + } = 1 p [ 1 { log(1 p) ξ }] 0 < p < 1 In the extreme value terminology, x p is the return level associated with the return period 1 p Whitney Huang (Purdue University) An Introduction to Extreme Value Analysis March 6, 2014 12 / 31

Block Maximum Approach 1 Determine the block size and compute maximal for blocks 2 Fit the GEV to the maximal and assess fit 3 Perform inference for return levels, probabilities, etc Whitney Huang (Purdue University) An Introduction to Extreme Value Analysis March 6, 2014 13 / 31

Block Maximum Approach 1 Determine the block size and compute maximal for blocks 2 Fit the GEV to the maximal and assess fit 3 Perform inference for return levels, probabilities, etc Whitney Huang (Purdue University) An Introduction to Extreme Value Analysis March 6, 2014 13 / 31

Block Maximum Approach 1 Determine the block size and compute maximal for blocks 2 Fit the GEV to the maximal and assess fit 3 Perform inference for return levels, probabilities, etc Whitney Huang (Purdue University) An Introduction to Extreme Value Analysis March 6, 2014 13 / 31

Threshold Exceedance Approach Motivation: The block maximum method ignores much of the data which may also relevant to extreme we would like to use the data more efficient. Alternatives: Threshold Exceedance Approach 1 Select an appropriate threshold u, extract the exceedances from the dataset 2 Fit an appropriate model to exceedances 3 Perform inference for return levels, probabilities, etc Whitney Huang (Purdue University) An Introduction to Extreme Value Analysis March 6, 2014 14 / 31

Threshold Exceedance Approach Motivation: The block maximum method ignores much of the data which may also relevant to extreme we would like to use the data more efficient. Alternatives: Threshold Exceedance Approach 1 Select an appropriate threshold u, extract the exceedances from the dataset 2 Fit an appropriate model to exceedances 3 Perform inference for return levels, probabilities, etc Whitney Huang (Purdue University) An Introduction to Extreme Value Analysis March 6, 2014 14 / 31

Threshold Exceedance Approach Motivation: The block maximum method ignores much of the data which may also relevant to extreme we would like to use the data more efficient. Alternatives: Threshold Exceedance Approach 1 Select an appropriate threshold u, extract the exceedances from the dataset 2 Fit an appropriate model to exceedances 3 Perform inference for return levels, probabilities, etc Whitney Huang (Purdue University) An Introduction to Extreme Value Analysis March 6, 2014 14 / 31

Whitney Huang (Purdue University) An Introduction to Extreme Value Analysis March 6, 2014 15 / 31

GPD for Exceedances P(X i > x + u X i > u) = np(x i > x + u) np(x i > u) ( 1 + ξ x+u b n a n = ( 1 + 1 + ξ u bn a n ) 1 ξ ξx a n + ξ(u b n ) Survival function of generalized Pareto distribution ) 1 ξ Whitney Huang (Purdue University) An Introduction to Extreme Value Analysis March 6, 2014 16 / 31

Theorem (Pickands Balkema de Haan theorem) Let X 1, iid F, and let F u be their conditional excess distribution function. Pickands (1975), Balkema and de Haan (1974) posed that for a large class of underlying distribution functions F, and large u, F u is well approximated by the generalized Pareto distribution GPD. That is: F u (y) GPD ξ,σ (y) u where GPD ξ,σ (y) = { 1 (1 + ξy σ ) 1 ξ ξ 0, 1 exp( y σ ) ξ = 0; Whitney Huang (Purdue University) An Introduction to Extreme Value Analysis March 6, 2014 17 / 31

Threshold Selection Bias variance trade off: threshold too low bias because of the model asymptotics being invalid; threshold too high variance is large due to few data points Whitney Huang (Purdue University) An Introduction to Extreme Value Analysis March 6, 2014 18 / 31

Outline 1 Motivation 2 Extreme Value Theorem 3 Example: Fort Collins Precipitation Whitney Huang (Purdue University) An Introduction to Extreme Value Analysis March 6, 2014 19 / 31

Example: Fort Collins Precipitation Spike corresponds to 1997 event, recording station not at center of storm. Question: How unusual was event? Whitney Huang (Purdue University) An Introduction to Extreme Value Analysis March 6, 2014 20 / 31

How unusual was the Fort Collins event? Measured value for 1997 event is 6.18 inches. Take a look at data preceding the event (1948-1990) and try to estimate the return period associated with this event It is equivalent to ask What is the probability the annual maximum event is larger than 6.18 inches? Requires extrapolation into the tail. Largest observation (1948-1990) is 4.09 inches inches. We will approach this problem in two ways: 1 Model all (non-zero) data 2 Model only extreme data Whitney Huang (Purdue University) An Introduction to Extreme Value Analysis March 6, 2014 21 / 31

Modeling all precipitation data Let Y t be the daily summer (April October) precipitation for Fort Collins, CO. { Yt > 0 with probability p Assume ˆp = 0.218 Y t = 0 with probability 1 p Model: Y t Y t > 0 Gamma(α, β) Maximum likelihood estimates: ˆα = 0.784, ˆβ = 3.52 Whitney Huang (Purdue University) An Introduction to Extreme Value Analysis March 6, 2014 22 / 31

Modeling all precipitation data cont d P(Y t > 6.18) = P(Y t > 6.18 Y t > 0)P(Y t > 0) = (1 F gamma(ˆα, ˆβ) (6.18))(0.218) = (1.47 10 10 )(0.218) = 3.20 10 11 Let M be the annual maximum precipitation P(M > 6.18) = 1 P(M < 6.18) = 1 P(Y t < 6.18) 214 = 1 (1 P(Y t > 6.18)) 214 = 1 ( 1 3.20 10 11) 214 = 6.86 10 9 Return period estimate = (6.86 10 9 ) 1 = 145, 815, 245 years Whitney Huang (Purdue University) An Introduction to Extreme Value Analysis March 6, 2014 23 / 31

Modeling all precipitation data: Diagnostics Whitney Huang (Purdue University) An Introduction to Extreme Value Analysis March 6, 2014 24 / 31

Modeling Annual Maximal Let M n = max X t. Assume M n GEV (µ, σ, ξ) t=1,,n P(M n x) = exp { [ 1 + ξ( x µ σ ) ] 1 ξ Maximum Likelihood estimates: ˆµ = 1.11, ˆσ = 0.46, ˆξ = 0.31. P(ann max > 6.18) = 1 P(M n 6.18) = 0.008 Return period estimate = 1 = 121 years 0.008 + } Whitney Huang (Purdue University) An Introduction to Extreme Value Analysis March 6, 2014 25 / 31

Modeling Annual Maximal: Diagnostics Whitney Huang (Purdue University) An Introduction to Extreme Value Analysis March 6, 2014 26 / 31

Modeling Annual Maximal: Diagnostics Whitney Huang (Purdue University) An Introduction to Extreme Value Analysis March 6, 2014 27 / 31

Temporal Dependence Question: Is the GEV still the limiting distribution for block maxima of a stationary (but not independent) sequence {X i }? Answer: Yes, so long as mixing conditions hold. (Leadbetter et al., 1983) What does this mean for inference? Block maximum approach: GEV still correct for marginal. Since block maximum data likely have negligible dependence, proceed as usual Threshold exceedance approach: GPD is correct for the marginal. If extremes occur in clusters, estimation affected as likelihood assumes independence of threshold exceedances Whitney Huang (Purdue University) An Introduction to Extreme Value Analysis March 6, 2014 28 / 31

Temporal Dependence Question: Is the GEV still the limiting distribution for block maxima of a stationary (but not independent) sequence {X i }? Answer: Yes, so long as mixing conditions hold. (Leadbetter et al., 1983) What does this mean for inference? Block maximum approach: GEV still correct for marginal. Since block maximum data likely have negligible dependence, proceed as usual Threshold exceedance approach: GPD is correct for the marginal. If extremes occur in clusters, estimation affected as likelihood assumes independence of threshold exceedances Whitney Huang (Purdue University) An Introduction to Extreme Value Analysis March 6, 2014 28 / 31

Temporal Dependence Question: Is the GEV still the limiting distribution for block maxima of a stationary (but not independent) sequence {X i }? Answer: Yes, so long as mixing conditions hold. (Leadbetter et al., 1983) What does this mean for inference? Block maximum approach: GEV still correct for marginal. Since block maximum data likely have negligible dependence, proceed as usual Threshold exceedance approach: GPD is correct for the marginal. If extremes occur in clusters, estimation affected as likelihood assumes independence of threshold exceedances Whitney Huang (Purdue University) An Introduction to Extreme Value Analysis March 6, 2014 28 / 31

Temporal Dependence Question: Is the GEV still the limiting distribution for block maxima of a stationary (but not independent) sequence {X i }? Answer: Yes, so long as mixing conditions hold. (Leadbetter et al., 1983) What does this mean for inference? Block maximum approach: GEV still correct for marginal. Since block maximum data likely have negligible dependence, proceed as usual Threshold exceedance approach: GPD is correct for the marginal. If extremes occur in clusters, estimation affected as likelihood assumes independence of threshold exceedances Whitney Huang (Purdue University) An Introduction to Extreme Value Analysis March 6, 2014 28 / 31

Temporal Dependence Question: Is the GEV still the limiting distribution for block maxima of a stationary (but not independent) sequence {X i }? Answer: Yes, so long as mixing conditions hold. (Leadbetter et al., 1983) What does this mean for inference? Block maximum approach: GEV still correct for marginal. Since block maximum data likely have negligible dependence, proceed as usual Threshold exceedance approach: GPD is correct for the marginal. If extremes occur in clusters, estimation affected as likelihood assumes independence of threshold exceedances Whitney Huang (Purdue University) An Introduction to Extreme Value Analysis March 6, 2014 28 / 31

Remarks on Univariate Extremes To estimate the tail, EVT uses only extreme observations Tail parameter ξ is extremely important but hard to estimate Threshold exceedance approaches allow the user to retain more data than block-maximum approaches, thereby reducing the uncertainty with parameter estimates Temporal dependence in the data is more of an issue in threshold exceedance models. One can either decluster, or alternatively, adjust inference Whitney Huang (Purdue University) An Introduction to Extreme Value Analysis March 6, 2014 29 / 31

Whitney Huang (Purdue University) An Introduction to Extreme Value Analysis March 6, 2014 30 / 31

For Further Reading S. Coles An Introduction to Statistical Modeling of Extreme Values. Springer, 2001. J. Beirlant, Y Goegebeur, J. Segers, and J Teugels Statistics of Extremes: Theory and Applications. Wiley, 2004. L. de Haan, and A. Ferreira Extreme Value Theory: An Introduction. Springer, 2006. S. I. Resnick Heavy-Tail Phenomena: Probabilistic and Statistical Modeling. Springer, 2007. Whitney Huang (Purdue University) An Introduction to Extreme Value Analysis March 6, 2014 31 / 31