Extreme Value Modeling for Detection and Attribution of Climate Extremes

Similar documents
Web-based Supplementary Materials for Bayesian Effect Estimation. Accounting for Adjustment Uncertainty by Chi Wang, Giovanni

Lecture 3: Linear methods for classification

Basics of Statistical Machine Learning

Bayesian Statistics in One Hour. Patrick Lam

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression

A Basic Introduction to Missing Data

Modeling the Distribution of Environmental Radon Levels in Iowa: Combining Multiple Sources of Spatially Misaligned Data

Lecture 8: Signal Detection and Noise Assumption

Econometrics Simple Linear Regression

Package EstCRM. July 13, 2015

Statistical Machine Learning

Exact Inference for Gaussian Process Regression in case of Big Data with the Cartesian Product Structure

Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 )

Review of the Methods for Handling Missing Data in. Longitudinal Data Analysis

Sample Size Calculation for Longitudinal Studies

Logistic Regression (1/24/13)

Factorial experimental designs and generalized linear models

INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition)

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus

CS 688 Pattern Recognition Lecture 4. Linear Models for Classification

Statistics Graduate Courses

Probabilistic Models for Big Data. Alex Davies and Roger Frigola University of Cambridge 13th February 2014

Lecture 14: GLM Estimation and Logistic Regression

A General Approach to Variance Estimation under Imputation for Missing Survey Data

Orthogonal Distance Regression

Nonlinear Regression:

An Introduction to Machine Learning

Statistical Machine Learning from Data

Monte Carlo Simulation

Two Topics in Parametric Integration Applied to Stochastic Simulation in Industrial Engineering

INTRODUCTION TO GEOSTATISTICS And VARIOGRAM ANALYSIS

Overview Classes Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7)

Introduction to Path Analysis

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

Efficiency and the Cramér-Rao Inequality

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Pattern Analysis. Logistic Regression. 12. Mai Joachim Hornegger. Chair of Pattern Recognition Erlangen University

Linear Discrimination. Linear Discrimination. Linear Discrimination. Linearly Separable Systems Pairwise Separation. Steven J Zeil.

From Sparse Approximation to Forecast of Intraday Load Curves

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

Introduction to General and Generalized Linear Models

Simple Linear Regression Inference

Fitting Subject-specific Curves to Grouped Longitudinal Data

Comparison of Estimation Methods for Complex Survey Data Analysis

Extreme-Value Analysis of Corrosion Data

Tutorial on Markov Chain Monte Carlo

Web-based Supplementary Materials for. Modeling of Hormone Secretion-Generating. Mechanisms With Splines: A Pseudo-Likelihood.

Analyzing Structural Equation Models With Missing Data

Estimation and attribution of changes in extreme weather and climate events

Stephen du Toit Mathilda du Toit Gerhard Mels Yan Cheng. LISREL for Windows: SIMPLIS Syntax Files

Note on the EM Algorithm in Linear Regression Model

A SURVEY ON CONTINUOUS ELLIPTICAL VECTOR DISTRIBUTIONS

ZHIYONG ZHANG AND LIJUAN WANG

Maximum Likelihood Estimation

Ordinal Regression. Chapter

Geostatistics Exploratory Analysis

Gaussian Processes to Speed up Hamiltonian Monte Carlo

Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches

HURDLE AND SELECTION MODELS Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics July 2009

STATISTICA Formula Guide: Logistic Regression. Table of Contents

OPTIMAL PORTFOLIO ALLOCATION WITH CVAR: A ROBUST

Variations of Statistical Models

Multivariate Normal Distribution

Comparison of resampling method applied to censored data

Factor Analysis. Principal components factor analysis. Use of extracted factors in multivariate dependency models

Classification Problems

PROPERTIES OF THE SAMPLE CORRELATION OF THE BIVARIATE LOGNORMAL DISTRIBUTION

PARTIAL LEAST SQUARES IS TO LISREL AS PRINCIPAL COMPONENTS ANALYSIS IS TO COMMON FACTOR ANALYSIS. Wynne W. Chin University of Calgary, CANADA

A Study on the Comparison of Electricity Forecasting Models: Korea and China

Recent Developments of Statistical Application in. Finance. Ruey S. Tsay. Graduate School of Business. The University of Chicago

Centre for Central Banking Studies

MAN-BITES-DOG BUSINESS CYCLES ONLINE APPENDIX

Handling missing data in large data sets. Agostino Di Ciaccio Dept. of Statistics University of Rome La Sapienza

Standard errors of marginal effects in the heteroskedastic probit model

Credit Risk Models: An Overview

Illustration (and the use of HLM)

Java Modules for Time Series Analysis

Learning Gaussian process models from big data. Alan Qi Purdue University Joint work with Z. Xu, F. Yan, B. Dai, and Y. Zhu

Part 2: Analysis of Relationship Between Two Variables

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

Fairfield Public Schools

Topic 3b: Kinetic Theory

Revenue Management with Correlated Demand Forecasting

Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares

Principle of Data Reduction

Time Series Analysis

Estimating an ARMA Process

CS229 Lecture notes. Andrew Ng

A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution

Spatial Statistics Chapter 3 Basics of areal data and areal data modeling

Sales forecasting # 1

Data Mining: Algorithms and Applications Matrix Math Review

CCNY. BME I5100: Biomedical Signal Processing. Linear Discrimination. Lucas C. Parra Biomedical Engineering Department City College of New York

Christfried Webers. Canberra February June 2015

Analysis of Bayesian Dynamic Linear Models

1 Teaching notes on GMM 1.

The Proportional Odds Model for Assessing Rater Agreement with Multiple Modalities

Constrained Bayes and Empirical Bayes Estimator Applications in Insurance Pricing

Item Response Theory in R using Package ltm

Transcription:

Extreme Value Modeling for Detection and Attribution of Climate Extremes Jun Yan, Yujing Jiang Joint work with Zhuo Wang, Xuebin Zhang Department of Statistics, University of Connecticut February 2, 2016 @ IDAG, Boulder, CO Jun Yan February 2, 2016 @ IDAG, Boulder, CO 1 / 21

Outline 1 Introduction 2 Combined Score Equations (CSE) 3 Illustrations 4 Outlook Jun Yan February 2, 2016 @ IDAG, Boulder, CO 2 / 21

Introduction Method of Zwiers, Zhang, and Feng (2011) Data: Multiple years over a collection of sites extremes from climate model simulation (multiple model, multiple ensemble) + observed extremes Signal estimation from simulation data: piecewise constant location parameter ˆµ ts in GEV fit at each site. Detection analysis for the observed data GEV fit with location µ ts = α s + ˆµ ts β, site specific σ s, ξ s. Profile independence likelihood estimation of β Uncertainty assessment via nested block bootstrap (32x32) to account for uncertainty in ˆµ ts. Goodness-of-fit test: KS test at each site with field significance check Jun Yan February 2, 2016 @ IDAG, Boulder, CO 3 / 21

Introduction Departure from Zwiers et al (2011) Spatial dependence is discarded: Can efficiency in estimating β be improved by incorporating spatial dependence? Max-stable process for spatial extremes with composite likelihood estimation (e.g., Davison et al., 2012). Misspecification of spatial dependence may ruin inferences on marginal parameters: bias can be serious with strong dependence (Wang et al., 2014); goodness-of-fit test is difficult (Kojadinovic et al., 2015). In some applications like D&A, the primary interest is the inference about marginal parameters; the spatial dependence is a nuisance. Combining marginal GEV score equations: no dependence assumptions beyond marginal GEV. Profiling is computing intensive and accurary depends on grid resolution: Can we compute more efficiently? (needed by multiple forcing) Goal: toward a closer analog to standard optimal fingerprinting (e.g., Allen and Stott, 2003). Jun Yan February 2, 2016 @ IDAG, Boulder, CO 4 / 21

Combined Score Equations (CSE) Setup Idea: Combine the score equation of the marginal GEV distribution at each monitoring sites in some optimal way to improve efficiency by accounting the spatial correlation among them. Y ts : extreme observation of interest at site s in year t with density f ( ; θ ts ), s = 1,..., m, t = 1,..., n, and scalor parameter θ ts (other paramers assumed known for the moment). X ts : p 1 covariate vector (signal) for θ ts. g(θ ts ) = η ts = X ts β, where g is a known link function. Assume data from year to year are independent while spatial dependence exists within the same year. Only assume marginal distribution f is the correctly specified GEV distribution. Jun Yan February 2, 2016 @ IDAG, Boulder, CO 5 / 21

Combined Score Equations (CSE) Combining the Score Equations Score function: S ts = d log f (Y ts ; θ ts )/dθ ts. Score equation for β at site s: Combined score equation: n t=1 n t=1 X ts dθ ts dη ts S ts = 0. X t A t W 1 t S t = 0, where X t = (X t1,..., X tm ), A t = diag(dθ t1 /dη t1,..., dθ tm /dη tm ), W 1 t is the weight matrix, and S t = (S t1,..., S tm ). When W t is the identity matrix, it reduces to the derivative of the independence likelihood (Zwiers et al., 2011). Jun Yan February 2, 2016 @ IDAG, Boulder, CO 6 / 21

Combined Score Equations (CSE) Optimal Weight Optimal W t (Nikoloulopoulos et al., 2011): dθ 2 t1 W t = Ω t 1 t, where Ω t = cov(s t ) and { ( d 2 ) log f t1 (y t1, θ t1 ) t = diag E,..., E ( d 2 log f tm (y tm, θ tm ) dθ 2 tm )}. Ω t plays the role of variance matrix representing internal variability in standard optimal fingerprinting Approximate the covariance matrix Ω t of the score functions S t : Apply the idea of generalized estimating equations (GEE) use simple form of working spatial correlation structure. Assume all the clusters (years) share a same correlation matrix, R, of the score function: Ω t = 1/2 t R 1/2 t. Jun Yan February 2, 2016 @ IDAG, Boulder, CO 7 / 21

Combined Score Equations (CSE) Approximation of Optimal Weight 0 5 10 15 20 0.0 0.2 0.4 0.6 0.8 1.0 Euclidean Distance correlation µ Exp Sph Gau 0.2 0.4 0.6 0.8 1.0 correlation σ Exp Sph Gau Figure: The empirical correlation of the standardized score function of µ (points), and the corresponding non-linear least square fitted correlation curves from exponential (red), spherical (blue) and gaussian (green) correlation function. Data generated from an isotropic Smith model with m = 20, n = 1000, and moderate dependence level in region [ 10, 10]. Jun Yan February 2, 2016 @ IDAG, Boulder, CO 8 / 21

Combined Score Equations (CSE) Approximation of Optimal Weight It would be nice to know the pairwise correlation between site j and site k, ρ jk, but approximation is good too. Exponential correlation ρ jk = exp( d jk /r), where d jk is the pairwise distance and r is the parameter to be estimated through the empirical correlation of the standardized score function. Spherical correlation ρ jk = [ 1 1.5(r/d jk ) + 0.5(r/d jk ) 3] I dij<r, which leads to sparse correlation matrix and can be exploited computationally when the number of sites is big. Jun Yan February 2, 2016 @ IDAG, Boulder, CO 9 / 21

Combined Score Equations (CSE) Coordinate Descent Approach GEV for observed extremes in detection analysis location µ ts = α s + X T ts β, where the signals X can incorporate p forcings. site specific scale σ s and shape ξ s. a total of 3m + p unknown parameters. Coordinate descent approach: a two-step iterative process. 1 Given current estimate ˆβ of β, obtain the likelihood estimate ˆζ s of ζ s = (α s, σ s, ξ s ) separately at each grid box s {1,..., m}. 2 Given current estimate ˆζ s, obtain the CSE estimate ˆβ of β from solving the estimating equation with an appropriately chosen working correlation structure. The two steps iterate until ˆβ converges. Jun Yan February 2, 2016 @ IDAG, Boulder, CO 10 / 21

Illustrations Simulation Study in Fingerprinting Setting Mimic the daily maximum temperature setting in Australia (n = 140, m = 29). Recall detection model: µ ts = α s + X ts β, σ ts = σ s, ξ ts = ξ s. Estimated signals µ d(t),s were used as input X ts to generate data. Parameters α, σ and ξ are the estimates based on Australia data. β {0, 0.5, 1}. Dependence model: a mixture of a GG model (proportion p) and a GA model (proportion 1 p). CSE method with an exponential correlation structure. Jun Yan February 2, 2016 @ IDAG, Boulder, CO 11 / 21

Illustrations Estimate RMSE RE p Dep True IL PL CSE IL PL CSE PL CSE 0 M 0 0.001 0.001 0.001 0.120 0.114 0.103 1.10 1.37 0.5 0.503 0.503 0.502 0.118 0.111 0.097 1.12 1.49 1 1.005 1.005 1.005 0.119 0.112 0.098 1.13 1.48 S 0 0.001 0.002 0.004 0.153 0.140 0.104 1.19 2.14 0.5 0.503 0.502 0.501 0.147 0.133 0.102 1.21 2.05 1 1.007 1.007 1.002 0.146 0.134 0.103 1.19 1.99 0.5 M 0 0.004 0.003 0.001 0.116 0.112 0.094 1.08 1.51 0.5 0.507 0.507 0.502 0.115 0.111 0.096 1.07 1.42 1 0.997 0.997 1.000 0.115 0.112 0.097 1.06 1.40 S 0 0.004 0.004 0.000 0.138 0.131 0.098 1.12 2.00 0.5 0.500 0.500 0.499 0.144 0.136 0.100 1.13 2.08 1 1.005 1.005 1.006 0.138 0.131 0.097 1.12 2.02 1 M 0 0.001 0.001 0.002 0.110 0.108 0.091 1.04 1.46 0.5 0.504 0.504 0.502 0.110 0.108 0.092 1.04 1.44 1 0.997 0.997 1.000 0.112 0.110 0.093 1.04 1.44 S 0 0.001 0.000 0.002 0.132 0.128 0.098 1.07 1.83 0.5 0.505 0.505 0.502 0.133 0.129 0.099 1.07 1.80 1 0.996 0.996 1.000 0.135 0.131 0.100 1.07 1.82 (The relative efficiency (RE) was based on the MSE, with the IL estimate as reference.) Jun Yan February 2, 2016 @ IDAG, Boulder, CO 12 / 21

Illustrations Applications on Extreme Temperatures Extreme temperatures in Northern Europe (NEU) Annual maximum of daily maximum (TXx) warmest day Annual maximum of daily minimum (TNx) warmest night Annual minimum of daily maximum (TXn) coldest day Annual minimum of daily minimum (TNn) coldest night Data period 1951 2010 (n = 60, m = 67). CSE method with an exponential correlation structure. Jun Yan February 2, 2016 @ IDAG, Boulder, CO 13 / 21

Illustrations Results for the annual maximum of daily minimum temperature (TNx) for illustration. Forcing Me Par est 90% CI len ALL IL β 1.10 (0.73, 1.48) 0.75 CSE β 0.69 (0.46, 0.95) 0.49 ANT IL β 1.19 (0.77, 1.62) 0.85 CSE β 0.52 (0.31, 0.74) 0.43 ANT&NAT IL β A 1.12 (0.75, 1.50) 0.76 β N 0.91 ( 0.28, 2.07) 2.35 CSE β A 0.70 (0.47, 0.95) 0.48 β N 0.59 (0.17, 1.01) 0.84 Jun Yan February 2, 2016 @ IDAG, Boulder, CO 14 / 21

Outlook Summary CSE improves estimation efficiency without specifying spatial dependence. Coordinate descent algorithm is reasonably fast and reliable. Application to climate extremes increases power of detection and attribution of changes, with possibly multiple forcing. Outlook (thesis of Yujing Jiang) Measurement error may cause bias, especially when it is high relative to the signal. A joint modeling approach similar to Hannart et al. (2014) for extremes: both simulated and observed data depend on a latent signal. Different climate models may have different sensitivity to the latent signal, but the average of the scaling factors is restricted to be 1. Jun Yan February 2, 2016 @ IDAG, Boulder, CO 15 / 21

Outlook Departure from Z. Wang s Thesis PhD thesis in Statistics: Yujing Jiang (joint with Zhuo Wang, Jun Yan, and Xuebin Zhang) The work reported earlier is a 2-step approach 1 Estimate the signal from the climate simulation data. 2 Estimate the scaling factor of the signal with observed data. Possible drawback: uncertainty in estimated signals has an effect like error-in-covariates, which is known to attenuate covariate effects in regression models with measurement error. Goal: remove bias from measurement error but retain efficiency from CSE. Jun Yan February 2, 2016 @ IDAG, Boulder, CO 16 / 21

Outlook Joint D&A Model for Observed and Simulated Extremes The signal (characterized by a few parameters) is shared by the location parameters of the GEV models for both. Illustration with one forcing GEV model for observed extremes: µ ts = α s + β obs µ ts, σ ts = σ s, ξ ts = ξ s, GEV model for simulated extremes from climate model c, c = 1,..., K, µ cts = α cs + β c µ ts, σ cts = σ cs, ξ cts = ξ cs Signal appears in the model as µ ts β c allows model specific sensitivity the average of β c over c is restricted to be 1 σ cs and ξ cs could be restricted to be the same as σ s and ξ s, respectively, if desired. Jun Yan February 2, 2016 @ IDAG, Boulder, CO 17 / 21

Outlook Parameter Estimation Assume independence between observed data and simulated data, Block coordinate descent (the observed data treated as if from the K + 1th climate model) 1 { µ ts, t = 1,..., 10D}, s = 1,..., m 2 {σ s, ξ s }, s = 1,..., m 3 {α cs }, c = 1,..., K + 1} 4 {β c }, c = 1,..., K + 1} Average-to-1 restriction is enforced at each iteration for identifiability. When updating each β, CSE can be used for efficiency. Jun Yan February 2, 2016 @ IDAG, Boulder, CO 18 / 21

Outlook A Simulation Study Regional D&A study for extreme temperature: 29 grid boxes in Australia. A single climate model under one forcing. Same generating model for observed and simulated data, Dependence structure was a geometric Gaussian process with a Gaussian correlation function with φ = 12 and 18. True marginal parameter values were set to be the estimates from 10 runs under ALL forcing from HadCM3. Signal: 0.1 degree/10 years. β obs = 1. Number of years: 100. Number of runs from the climate model: 2, 5, 10. Four methods: 2-step (2S) versus joint modeling (JM); independence likelihood (IL) versus CSE. Jun Yan February 2, 2016 @ IDAG, Boulder, CO 19 / 21

Outlook Table: Mean, standard deviation (SD) and root mean squared error (RMSE) of estimates of β obs from 1000 replicates. Run 2 5 10 Dep Mean SD RMSE Mean SD RMSE Mean SD RMSE M 2S.IL 0.94 0.12 0.13 0.98 0.11 0.11 0.99 0.11 0.11 2S.CSE 0.72 0.12 0.30 0.88 0.11 0.16 0.94 0.11 0.13 JM.IL 1.00 0.11 0.11 1.01 0.10 0.10 1.00 0.11 0.11 JM.CSE 1.00 0.09 0.09 1.00 0.09 0.09 1.01 0.10 0.10 S 2S.IL 0.95 0.15 0.16 0.98 0.14 0.15 0.99 0.14 0.14 2S.CSE 0.65 0.13 0.38 0.83 0.13 0.22 0.91 0.13 0.16 JM.IL 1.01 0.14 0.14 1.00 0.14 0.14 1.00 0.13 0.13 JM.CSE 1.00 0.10 0.10 1.00 0.11 0.11 1.00 0.11 0.11 Jun Yan February 2, 2016 @ IDAG, Boulder, CO 20 / 21

Outlook References Allen, M. R. and P. A. Stott (2003). Estimating signal amplitudes in optimal fingerprinting, part i: theory. Climate Dynamics 21, 477 491. Davison, A. C., S. A. Padoan, and M. Ribatet (2012). Statistical modeling of spatial extremes. Statistical Science 27(2), 161 186. Hannart, A., A. Ribes, and P. Naveau (2014). Optimal fingerprinting under multiple sources of uncertainty. Geophysical Research Letters 41(4), 1261 1268. Kojadinovic, I., H. Shang, and J. Yan (2015). A class of goodness-of-fit tests for spatial extremes models based on max-stable processes. Statistics and Its Interfaces 8(1), 45 62. Nikoloulopoulos, A. K., H. Joe, and N. R. Chaganty (2011). Weighted scores method for regression models with dependent data. Biostatistics 12, 653 665. Wang, Z., J. Yan, and X. Zhang (2014). Incorporating spatial dependence in regional frequency analysis. Water Resources Research 50(12), 9570 9585. Zwiers, F. W., X. Zhang, and Y. Feng (2011). Anthropogenic influence on long return period daily temperature extremes at regional scales. Journal of Climate 24(3), 881 892. Jun Yan February 2, 2016 @ IDAG, Boulder, CO 21 / 21