Extreme Value Modeling for Detection and Attribution of Climate Extremes

Size: px
Start display at page:

Download "Extreme Value Modeling for Detection and Attribution of Climate Extremes"

Transcription

1 Extreme Value Modeling for Detection and Attribution of Climate Extremes Jun Yan, Yujing Jiang Joint work with Zhuo Wang, Xuebin Zhang Department of Statistics, University of Connecticut February 2, IDAG, Boulder, CO Jun Yan February 2, IDAG, Boulder, CO 1 / 21

2 Outline 1 Introduction 2 Combined Score Equations (CSE) 3 Illustrations 4 Outlook Jun Yan February 2, IDAG, Boulder, CO 2 / 21

3 Introduction Method of Zwiers, Zhang, and Feng (2011) Data: Multiple years over a collection of sites extremes from climate model simulation (multiple model, multiple ensemble) + observed extremes Signal estimation from simulation data: piecewise constant location parameter ˆµ ts in GEV fit at each site. Detection analysis for the observed data GEV fit with location µ ts = α s + ˆµ ts β, site specific σ s, ξ s. Profile independence likelihood estimation of β Uncertainty assessment via nested block bootstrap (32x32) to account for uncertainty in ˆµ ts. Goodness-of-fit test: KS test at each site with field significance check Jun Yan February 2, IDAG, Boulder, CO 3 / 21

4 Introduction Departure from Zwiers et al (2011) Spatial dependence is discarded: Can efficiency in estimating β be improved by incorporating spatial dependence? Max-stable process for spatial extremes with composite likelihood estimation (e.g., Davison et al., 2012). Misspecification of spatial dependence may ruin inferences on marginal parameters: bias can be serious with strong dependence (Wang et al., 2014); goodness-of-fit test is difficult (Kojadinovic et al., 2015). In some applications like D&A, the primary interest is the inference about marginal parameters; the spatial dependence is a nuisance. Combining marginal GEV score equations: no dependence assumptions beyond marginal GEV. Profiling is computing intensive and accurary depends on grid resolution: Can we compute more efficiently? (needed by multiple forcing) Goal: toward a closer analog to standard optimal fingerprinting (e.g., Allen and Stott, 2003). Jun Yan February 2, IDAG, Boulder, CO 4 / 21

5 Combined Score Equations (CSE) Setup Idea: Combine the score equation of the marginal GEV distribution at each monitoring sites in some optimal way to improve efficiency by accounting the spatial correlation among them. Y ts : extreme observation of interest at site s in year t with density f ( ; θ ts ), s = 1,..., m, t = 1,..., n, and scalor parameter θ ts (other paramers assumed known for the moment). X ts : p 1 covariate vector (signal) for θ ts. g(θ ts ) = η ts = X ts β, where g is a known link function. Assume data from year to year are independent while spatial dependence exists within the same year. Only assume marginal distribution f is the correctly specified GEV distribution. Jun Yan February 2, IDAG, Boulder, CO 5 / 21

6 Combined Score Equations (CSE) Combining the Score Equations Score function: S ts = d log f (Y ts ; θ ts )/dθ ts. Score equation for β at site s: Combined score equation: n t=1 n t=1 X ts dθ ts dη ts S ts = 0. X t A t W 1 t S t = 0, where X t = (X t1,..., X tm ), A t = diag(dθ t1 /dη t1,..., dθ tm /dη tm ), W 1 t is the weight matrix, and S t = (S t1,..., S tm ). When W t is the identity matrix, it reduces to the derivative of the independence likelihood (Zwiers et al., 2011). Jun Yan February 2, IDAG, Boulder, CO 6 / 21

7 Combined Score Equations (CSE) Optimal Weight Optimal W t (Nikoloulopoulos et al., 2011): dθ 2 t1 W t = Ω t 1 t, where Ω t = cov(s t ) and { ( d 2 ) log f t1 (y t1, θ t1 ) t = diag E,..., E ( d 2 log f tm (y tm, θ tm ) dθ 2 tm )}. Ω t plays the role of variance matrix representing internal variability in standard optimal fingerprinting Approximate the covariance matrix Ω t of the score functions S t : Apply the idea of generalized estimating equations (GEE) use simple form of working spatial correlation structure. Assume all the clusters (years) share a same correlation matrix, R, of the score function: Ω t = 1/2 t R 1/2 t. Jun Yan February 2, IDAG, Boulder, CO 7 / 21

8 Combined Score Equations (CSE) Approximation of Optimal Weight Euclidean Distance correlation µ Exp Sph Gau correlation σ Exp Sph Gau Figure: The empirical correlation of the standardized score function of µ (points), and the corresponding non-linear least square fitted correlation curves from exponential (red), spherical (blue) and gaussian (green) correlation function. Data generated from an isotropic Smith model with m = 20, n = 1000, and moderate dependence level in region [ 10, 10]. Jun Yan February 2, IDAG, Boulder, CO 8 / 21

9 Combined Score Equations (CSE) Approximation of Optimal Weight It would be nice to know the pairwise correlation between site j and site k, ρ jk, but approximation is good too. Exponential correlation ρ jk = exp( d jk /r), where d jk is the pairwise distance and r is the parameter to be estimated through the empirical correlation of the standardized score function. Spherical correlation ρ jk = [ 1 1.5(r/d jk ) + 0.5(r/d jk ) 3] I dij<r, which leads to sparse correlation matrix and can be exploited computationally when the number of sites is big. Jun Yan February 2, IDAG, Boulder, CO 9 / 21

10 Combined Score Equations (CSE) Coordinate Descent Approach GEV for observed extremes in detection analysis location µ ts = α s + X T ts β, where the signals X can incorporate p forcings. site specific scale σ s and shape ξ s. a total of 3m + p unknown parameters. Coordinate descent approach: a two-step iterative process. 1 Given current estimate ˆβ of β, obtain the likelihood estimate ˆζ s of ζ s = (α s, σ s, ξ s ) separately at each grid box s {1,..., m}. 2 Given current estimate ˆζ s, obtain the CSE estimate ˆβ of β from solving the estimating equation with an appropriately chosen working correlation structure. The two steps iterate until ˆβ converges. Jun Yan February 2, IDAG, Boulder, CO 10 / 21

11 Illustrations Simulation Study in Fingerprinting Setting Mimic the daily maximum temperature setting in Australia (n = 140, m = 29). Recall detection model: µ ts = α s + X ts β, σ ts = σ s, ξ ts = ξ s. Estimated signals µ d(t),s were used as input X ts to generate data. Parameters α, σ and ξ are the estimates based on Australia data. β {0, 0.5, 1}. Dependence model: a mixture of a GG model (proportion p) and a GA model (proportion 1 p). CSE method with an exponential correlation structure. Jun Yan February 2, IDAG, Boulder, CO 11 / 21

12 Illustrations Estimate RMSE RE p Dep True IL PL CSE IL PL CSE PL CSE 0 M S M S M S (The relative efficiency (RE) was based on the MSE, with the IL estimate as reference.) Jun Yan February 2, IDAG, Boulder, CO 12 / 21

13 Illustrations Applications on Extreme Temperatures Extreme temperatures in Northern Europe (NEU) Annual maximum of daily maximum (TXx) warmest day Annual maximum of daily minimum (TNx) warmest night Annual minimum of daily maximum (TXn) coldest day Annual minimum of daily minimum (TNn) coldest night Data period (n = 60, m = 67). CSE method with an exponential correlation structure. Jun Yan February 2, IDAG, Boulder, CO 13 / 21

14 Illustrations Results for the annual maximum of daily minimum temperature (TNx) for illustration. Forcing Me Par est 90% CI len ALL IL β 1.10 (0.73, 1.48) 0.75 CSE β 0.69 (0.46, 0.95) 0.49 ANT IL β 1.19 (0.77, 1.62) 0.85 CSE β 0.52 (0.31, 0.74) 0.43 ANT&NAT IL β A 1.12 (0.75, 1.50) 0.76 β N 0.91 ( 0.28, 2.07) 2.35 CSE β A 0.70 (0.47, 0.95) 0.48 β N 0.59 (0.17, 1.01) 0.84 Jun Yan February 2, IDAG, Boulder, CO 14 / 21

15 Outlook Summary CSE improves estimation efficiency without specifying spatial dependence. Coordinate descent algorithm is reasonably fast and reliable. Application to climate extremes increases power of detection and attribution of changes, with possibly multiple forcing. Outlook (thesis of Yujing Jiang) Measurement error may cause bias, especially when it is high relative to the signal. A joint modeling approach similar to Hannart et al. (2014) for extremes: both simulated and observed data depend on a latent signal. Different climate models may have different sensitivity to the latent signal, but the average of the scaling factors is restricted to be 1. Jun Yan February 2, IDAG, Boulder, CO 15 / 21

16 Outlook Departure from Z. Wang s Thesis PhD thesis in Statistics: Yujing Jiang (joint with Zhuo Wang, Jun Yan, and Xuebin Zhang) The work reported earlier is a 2-step approach 1 Estimate the signal from the climate simulation data. 2 Estimate the scaling factor of the signal with observed data. Possible drawback: uncertainty in estimated signals has an effect like error-in-covariates, which is known to attenuate covariate effects in regression models with measurement error. Goal: remove bias from measurement error but retain efficiency from CSE. Jun Yan February 2, IDAG, Boulder, CO 16 / 21

17 Outlook Joint D&A Model for Observed and Simulated Extremes The signal (characterized by a few parameters) is shared by the location parameters of the GEV models for both. Illustration with one forcing GEV model for observed extremes: µ ts = α s + β obs µ ts, σ ts = σ s, ξ ts = ξ s, GEV model for simulated extremes from climate model c, c = 1,..., K, µ cts = α cs + β c µ ts, σ cts = σ cs, ξ cts = ξ cs Signal appears in the model as µ ts β c allows model specific sensitivity the average of β c over c is restricted to be 1 σ cs and ξ cs could be restricted to be the same as σ s and ξ s, respectively, if desired. Jun Yan February 2, IDAG, Boulder, CO 17 / 21

18 Outlook Parameter Estimation Assume independence between observed data and simulated data, Block coordinate descent (the observed data treated as if from the K + 1th climate model) 1 { µ ts, t = 1,..., 10D}, s = 1,..., m 2 {σ s, ξ s }, s = 1,..., m 3 {α cs }, c = 1,..., K + 1} 4 {β c }, c = 1,..., K + 1} Average-to-1 restriction is enforced at each iteration for identifiability. When updating each β, CSE can be used for efficiency. Jun Yan February 2, IDAG, Boulder, CO 18 / 21

19 Outlook A Simulation Study Regional D&A study for extreme temperature: 29 grid boxes in Australia. A single climate model under one forcing. Same generating model for observed and simulated data, Dependence structure was a geometric Gaussian process with a Gaussian correlation function with φ = 12 and 18. True marginal parameter values were set to be the estimates from 10 runs under ALL forcing from HadCM3. Signal: 0.1 degree/10 years. β obs = 1. Number of years: 100. Number of runs from the climate model: 2, 5, 10. Four methods: 2-step (2S) versus joint modeling (JM); independence likelihood (IL) versus CSE. Jun Yan February 2, IDAG, Boulder, CO 19 / 21

20 Outlook Table: Mean, standard deviation (SD) and root mean squared error (RMSE) of estimates of β obs from 1000 replicates. Run Dep Mean SD RMSE Mean SD RMSE Mean SD RMSE M 2S.IL S.CSE JM.IL JM.CSE S 2S.IL S.CSE JM.IL JM.CSE Jun Yan February 2, IDAG, Boulder, CO 20 / 21

21 Outlook References Allen, M. R. and P. A. Stott (2003). Estimating signal amplitudes in optimal fingerprinting, part i: theory. Climate Dynamics 21, Davison, A. C., S. A. Padoan, and M. Ribatet (2012). Statistical modeling of spatial extremes. Statistical Science 27(2), Hannart, A., A. Ribes, and P. Naveau (2014). Optimal fingerprinting under multiple sources of uncertainty. Geophysical Research Letters 41(4), Kojadinovic, I., H. Shang, and J. Yan (2015). A class of goodness-of-fit tests for spatial extremes models based on max-stable processes. Statistics and Its Interfaces 8(1), Nikoloulopoulos, A. K., H. Joe, and N. R. Chaganty (2011). Weighted scores method for regression models with dependent data. Biostatistics 12, Wang, Z., J. Yan, and X. Zhang (2014). Incorporating spatial dependence in regional frequency analysis. Water Resources Research 50(12), Zwiers, F. W., X. Zhang, and Y. Feng (2011). Anthropogenic influence on long return period daily temperature extremes at regional scales. Journal of Climate 24(3), Jun Yan February 2, IDAG, Boulder, CO 21 / 21

Web-based Supplementary Materials for Bayesian Effect Estimation. Accounting for Adjustment Uncertainty by Chi Wang, Giovanni

Web-based Supplementary Materials for Bayesian Effect Estimation. Accounting for Adjustment Uncertainty by Chi Wang, Giovanni 1 Web-based Supplementary Materials for Bayesian Effect Estimation Accounting for Adjustment Uncertainty by Chi Wang, Giovanni Parmigiani, and Francesca Dominici In Web Appendix A, we provide detailed

More information

Lecture 3: Linear methods for classification

Lecture 3: Linear methods for classification Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,

More information

Basics of Statistical Machine Learning

Basics of Statistical Machine Learning CS761 Spring 2013 Advanced Machine Learning Basics of Statistical Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu Modern machine learning is rooted in statistics. You will find many familiar

More information

Bayesian Statistics in One Hour. Patrick Lam

Bayesian Statistics in One Hour. Patrick Lam Bayesian Statistics in One Hour Patrick Lam Outline Introduction Bayesian Models Applications Missing Data Hierarchical Models Outline Introduction Bayesian Models Applications Missing Data Hierarchical

More information

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression Logistic Regression Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max

More information

A Basic Introduction to Missing Data

A Basic Introduction to Missing Data John Fox Sociology 740 Winter 2014 Outline Why Missing Data Arise Why Missing Data Arise Global or unit non-response. In a survey, certain respondents may be unreachable or may refuse to participate. Item

More information

Modeling the Distribution of Environmental Radon Levels in Iowa: Combining Multiple Sources of Spatially Misaligned Data

Modeling the Distribution of Environmental Radon Levels in Iowa: Combining Multiple Sources of Spatially Misaligned Data Modeling the Distribution of Environmental Radon Levels in Iowa: Combining Multiple Sources of Spatially Misaligned Data Brian J. Smith, Ph.D. The University of Iowa Joint Statistical Meetings August 10,

More information

Lecture 8: Signal Detection and Noise Assumption

Lecture 8: Signal Detection and Noise Assumption ECE 83 Fall Statistical Signal Processing instructor: R. Nowak, scribe: Feng Ju Lecture 8: Signal Detection and Noise Assumption Signal Detection : X = W H : X = S + W where W N(, σ I n n and S = [s, s,...,

More information

Econometrics Simple Linear Regression

Econometrics Simple Linear Regression Econometrics Simple Linear Regression Burcu Eke UC3M Linear equations with one variable Recall what a linear equation is: y = b 0 + b 1 x is a linear equation with one variable, or equivalently, a straight

More information

Package EstCRM. July 13, 2015

Package EstCRM. July 13, 2015 Version 1.4 Date 2015-7-11 Package EstCRM July 13, 2015 Title Calibrating Parameters for the Samejima's Continuous IRT Model Author Cengiz Zopluoglu Maintainer Cengiz Zopluoglu

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes

More information

Exact Inference for Gaussian Process Regression in case of Big Data with the Cartesian Product Structure

Exact Inference for Gaussian Process Regression in case of Big Data with the Cartesian Product Structure Exact Inference for Gaussian Process Regression in case of Big Data with the Cartesian Product Structure Belyaev Mikhail 1,2,3, Burnaev Evgeny 1,2,3, Kapushev Yermek 1,2 1 Institute for Information Transmission

More information

Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 )

Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 ) Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 ) and Neural Networks( 類 神 經 網 路 ) 許 湘 伶 Applied Linear Regression Models (Kutner, Nachtsheim, Neter, Li) hsuhl (NUK) LR Chap 10 1 / 35 13 Examples

More information

Review of the Methods for Handling Missing Data in. Longitudinal Data Analysis

Review of the Methods for Handling Missing Data in. Longitudinal Data Analysis Int. Journal of Math. Analysis, Vol. 5, 2011, no. 1, 1-13 Review of the Methods for Handling Missing Data in Longitudinal Data Analysis Michikazu Nakai and Weiming Ke Department of Mathematics and Statistics

More information

Sample Size Calculation for Longitudinal Studies

Sample Size Calculation for Longitudinal Studies Sample Size Calculation for Longitudinal Studies Phil Schumm Department of Health Studies University of Chicago August 23, 2004 (Supported by National Institute on Aging grant P01 AG18911-01A1) Introduction

More information

Logistic Regression (1/24/13)

Logistic Regression (1/24/13) STA63/CBB540: Statistical methods in computational biology Logistic Regression (/24/3) Lecturer: Barbara Engelhardt Scribe: Dinesh Manandhar Introduction Logistic regression is model for regression used

More information

Factorial experimental designs and generalized linear models

Factorial experimental designs and generalized linear models Statistics & Operations Research Transactions SORT 29 (2) July-December 2005, 249-268 ISSN: 1696-2281 www.idescat.net/sort Statistics & Operations Research c Institut d Estadística de Transactions Catalunya

More information

INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition)

INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition) INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition) Abstract Indirect inference is a simulation-based method for estimating the parameters of economic models. Its

More information

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus Tihomir Asparouhov and Bengt Muthén Mplus Web Notes: No. 15 Version 8, August 5, 2014 1 Abstract This paper discusses alternatives

More information

CS 688 Pattern Recognition Lecture 4. Linear Models for Classification

CS 688 Pattern Recognition Lecture 4. Linear Models for Classification CS 688 Pattern Recognition Lecture 4 Linear Models for Classification Probabilistic generative models Probabilistic discriminative models 1 Generative Approach ( x ) p C k p( C k ) Ck p ( ) ( x Ck ) p(

More information

Statistics Graduate Courses

Statistics Graduate Courses Statistics Graduate Courses STAT 7002--Topics in Statistics-Biological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.

More information

Probabilistic Models for Big Data. Alex Davies and Roger Frigola University of Cambridge 13th February 2014

Probabilistic Models for Big Data. Alex Davies and Roger Frigola University of Cambridge 13th February 2014 Probabilistic Models for Big Data Alex Davies and Roger Frigola University of Cambridge 13th February 2014 The State of Big Data Why probabilistic models for Big Data? 1. If you don t have to worry about

More information

Lecture 14: GLM Estimation and Logistic Regression

Lecture 14: GLM Estimation and Logistic Regression Lecture 14: GLM Estimation and Logistic Regression Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South

More information

A General Approach to Variance Estimation under Imputation for Missing Survey Data

A General Approach to Variance Estimation under Imputation for Missing Survey Data A General Approach to Variance Estimation under Imputation for Missing Survey Data J.N.K. Rao Carleton University Ottawa, Canada 1 2 1 Joint work with J.K. Kim at Iowa State University. 2 Workshop on Survey

More information

Orthogonal Distance Regression

Orthogonal Distance Regression Applied and Computational Mathematics Division NISTIR 89 4197 Center for Computing and Applied Mathematics Orthogonal Distance Regression Paul T. Boggs and Janet E. Rogers November, 1989 (Revised July,

More information

Nonlinear Regression:

Nonlinear Regression: Zurich University of Applied Sciences School of Engineering IDP Institute of Data Analysis and Process Design Nonlinear Regression: A Powerful Tool With Considerable Complexity Half-Day : Improved Inference

More information

An Introduction to Machine Learning

An Introduction to Machine Learning An Introduction to Machine Learning L5: Novelty Detection and Regression Alexander J. Smola Statistical Machine Learning Program Canberra, ACT 0200 Australia Alex.Smola@nicta.com.au Tata Institute, Pune,

More information

Statistical Machine Learning from Data

Statistical Machine Learning from Data Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Gaussian Mixture Models Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique

More information

Monte Carlo Simulation

Monte Carlo Simulation 1 Monte Carlo Simulation Stefan Weber Leibniz Universität Hannover email: sweber@stochastik.uni-hannover.de web: www.stochastik.uni-hannover.de/ sweber Monte Carlo Simulation 2 Quantifying and Hedging

More information

Two Topics in Parametric Integration Applied to Stochastic Simulation in Industrial Engineering

Two Topics in Parametric Integration Applied to Stochastic Simulation in Industrial Engineering Two Topics in Parametric Integration Applied to Stochastic Simulation in Industrial Engineering Department of Industrial Engineering and Management Sciences Northwestern University September 15th, 2014

More information

INTRODUCTION TO GEOSTATISTICS And VARIOGRAM ANALYSIS

INTRODUCTION TO GEOSTATISTICS And VARIOGRAM ANALYSIS INTRODUCTION TO GEOSTATISTICS And VARIOGRAM ANALYSIS C&PE 940, 17 October 2005 Geoff Bohling Assistant Scientist Kansas Geological Survey geoff@kgs.ku.edu 864-2093 Overheads and other resources available

More information

Overview Classes. 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7)

Overview Classes. 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7) Overview Classes 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7) 2-4 Loglinear models (8) 5-4 15-17 hrs; 5B02 Building and

More information

Introduction to Path Analysis

Introduction to Path Analysis This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this

More information

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris Class #6: Non-linear classification ML4Bio 2012 February 17 th, 2012 Quaid Morris 1 Module #: Title of Module 2 Review Overview Linear separability Non-linear classification Linear Support Vector Machines

More information

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop Music and Machine Learning (IFT6080 Winter 08) Prof. Douglas Eck, Université de Montréal These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher

More information

Efficiency and the Cramér-Rao Inequality

Efficiency and the Cramér-Rao Inequality Chapter Efficiency and the Cramér-Rao Inequality Clearly we would like an unbiased estimator ˆφ (X of φ (θ to produce, in the long run, estimates which are fairly concentrated i.e. have high precision.

More information

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not. Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C

More information

Pattern Analysis. Logistic Regression. 12. Mai 2009. Joachim Hornegger. Chair of Pattern Recognition Erlangen University

Pattern Analysis. Logistic Regression. 12. Mai 2009. Joachim Hornegger. Chair of Pattern Recognition Erlangen University Pattern Analysis Logistic Regression 12. Mai 2009 Joachim Hornegger Chair of Pattern Recognition Erlangen University Pattern Analysis 2 / 43 1 Logistic Regression Posteriors and the Logistic Function Decision

More information

Linear Discrimination. Linear Discrimination. Linear Discrimination. Linearly Separable Systems Pairwise Separation. Steven J Zeil.

Linear Discrimination. Linear Discrimination. Linear Discrimination. Linearly Separable Systems Pairwise Separation. Steven J Zeil. Steven J Zeil Old Dominion Univ. Fall 200 Discriminant-Based Classification Linearly Separable Systems Pairwise Separation 2 Posteriors 3 Logistic Discrimination 2 Discriminant-Based Classification Likelihood-based:

More information

From Sparse Approximation to Forecast of Intraday Load Curves

From Sparse Approximation to Forecast of Intraday Load Curves From Sparse Approximation to Forecast of Intraday Load Curves Mathilde Mougeot Joint work with D. Picard, K. Tribouley (P7)& V. Lefieux, L. Teyssier-Maillard (RTE) 1/43 Electrical Consumption Time series

More information

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN Beckman HLM Reading Group: Questions, Answers and Examples Carolyn J. Anderson Department of Educational Psychology I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN Linear Algebra Slide 1 of

More information

Introduction to General and Generalized Linear Models

Introduction to General and Generalized Linear Models Introduction to General and Generalized Linear Models General Linear Models - part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby

More information

Simple Linear Regression Inference

Simple Linear Regression Inference Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation

More information

Fitting Subject-specific Curves to Grouped Longitudinal Data

Fitting Subject-specific Curves to Grouped Longitudinal Data Fitting Subject-specific Curves to Grouped Longitudinal Data Djeundje, Viani Heriot-Watt University, Department of Actuarial Mathematics & Statistics Edinburgh, EH14 4AS, UK E-mail: vad5@hw.ac.uk Currie,

More information

Comparison of Estimation Methods for Complex Survey Data Analysis

Comparison of Estimation Methods for Complex Survey Data Analysis Comparison of Estimation Methods for Complex Survey Data Analysis Tihomir Asparouhov 1 Muthen & Muthen Bengt Muthen 2 UCLA 1 Tihomir Asparouhov, Muthen & Muthen, 3463 Stoner Ave. Los Angeles, CA 90066.

More information

Extreme-Value Analysis of Corrosion Data

Extreme-Value Analysis of Corrosion Data July 16, 2007 Supervisors: Prof. Dr. Ir. Jan M. van Noortwijk MSc Ir. Sebastian Kuniewski Dr. Marco Giannitrapani Outline Motivation Motivation in the oil industry, hundreds of kilometres of pipes and

More information

Tutorial on Markov Chain Monte Carlo

Tutorial on Markov Chain Monte Carlo Tutorial on Markov Chain Monte Carlo Kenneth M. Hanson Los Alamos National Laboratory Presented at the 29 th International Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Technology,

More information

Web-based Supplementary Materials for. Modeling of Hormone Secretion-Generating. Mechanisms With Splines: A Pseudo-Likelihood.

Web-based Supplementary Materials for. Modeling of Hormone Secretion-Generating. Mechanisms With Splines: A Pseudo-Likelihood. Web-based Supplementary Materials for Modeling of Hormone Secretion-Generating Mechanisms With Splines: A Pseudo-Likelihood Approach by Anna Liu and Yuedong Wang Web Appendix A This appendix computes mean

More information

Analyzing Structural Equation Models With Missing Data

Analyzing Structural Equation Models With Missing Data Analyzing Structural Equation Models With Missing Data Craig Enders* Arizona State University cenders@asu.edu based on Enders, C. K. (006). Analyzing structural equation models with missing data. In G.

More information

Estimation and attribution of changes in extreme weather and climate events

Estimation and attribution of changes in extreme weather and climate events IPCC workshop on extreme weather and climate events, 11-13 June 2002, Beijing. Estimation and attribution of changes in extreme weather and climate events Dr. David B. Stephenson Department of Meteorology

More information

Stephen du Toit Mathilda du Toit Gerhard Mels Yan Cheng. LISREL for Windows: SIMPLIS Syntax Files

Stephen du Toit Mathilda du Toit Gerhard Mels Yan Cheng. LISREL for Windows: SIMPLIS Syntax Files Stephen du Toit Mathilda du Toit Gerhard Mels Yan Cheng LISREL for Windows: SIMPLIS Files Table of contents SIMPLIS SYNTAX FILES... 1 The structure of the SIMPLIS syntax file... 1 $CLUSTER command... 4

More information

Note on the EM Algorithm in Linear Regression Model

Note on the EM Algorithm in Linear Regression Model International Mathematical Forum 4 2009 no. 38 1883-1889 Note on the M Algorithm in Linear Regression Model Ji-Xia Wang and Yu Miao College of Mathematics and Information Science Henan Normal University

More information

A SURVEY ON CONTINUOUS ELLIPTICAL VECTOR DISTRIBUTIONS

A SURVEY ON CONTINUOUS ELLIPTICAL VECTOR DISTRIBUTIONS A SURVEY ON CONTINUOUS ELLIPTICAL VECTOR DISTRIBUTIONS Eusebio GÓMEZ, Miguel A. GÓMEZ-VILLEGAS and J. Miguel MARÍN Abstract In this paper it is taken up a revision and characterization of the class of

More information

ZHIYONG ZHANG AND LIJUAN WANG

ZHIYONG ZHANG AND LIJUAN WANG PSYCHOMETRIKA VOL. 78, NO. 1, 154 184 JANUARY 2013 DOI: 10.1007/S11336-012-9301-5 METHODS FOR MEDIATION ANALYSIS WITH MISSING DATA ZHIYONG ZHANG AND LIJUAN WANG UNIVERSITY OF NOTRE DAME Despite wide applications

More information

Maximum Likelihood Estimation

Maximum Likelihood Estimation Math 541: Statistical Theory II Lecturer: Songfeng Zheng Maximum Likelihood Estimation 1 Maximum Likelihood Estimation Maximum likelihood is a relatively simple method of constructing an estimator for

More information

Ordinal Regression. Chapter

Ordinal Regression. Chapter Ordinal Regression Chapter 4 Many variables of interest are ordinal. That is, you can rank the values, but the real distance between categories is unknown. Diseases are graded on scales from least severe

More information

Geostatistics Exploratory Analysis

Geostatistics Exploratory Analysis Instituto Superior de Estatística e Gestão de Informação Universidade Nova de Lisboa Master of Science in Geospatial Technologies Geostatistics Exploratory Analysis Carlos Alberto Felgueiras cfelgueiras@isegi.unl.pt

More information

Gaussian Processes to Speed up Hamiltonian Monte Carlo

Gaussian Processes to Speed up Hamiltonian Monte Carlo Gaussian Processes to Speed up Hamiltonian Monte Carlo Matthieu Lê Murray, Iain http://videolectures.net/mlss09uk_murray_mcmc/ Rasmussen, Carl Edward. "Gaussian processes to speed up hybrid Monte Carlo

More information

Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches

Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches PhD Thesis by Payam Birjandi Director: Prof. Mihai Datcu Problematic

More information

HURDLE AND SELECTION MODELS Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics July 2009

HURDLE AND SELECTION MODELS Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics July 2009 HURDLE AND SELECTION MODELS Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics July 2009 1. Introduction 2. A General Formulation 3. Truncated Normal Hurdle Model 4. Lognormal

More information

STATISTICA Formula Guide: Logistic Regression. Table of Contents

STATISTICA Formula Guide: Logistic Regression. Table of Contents : Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

More information

OPTIMAL PORTFOLIO ALLOCATION WITH CVAR: A ROBUST

OPTIMAL PORTFOLIO ALLOCATION WITH CVAR: A ROBUST OPTIMAL PORTFOLIO ALLOCATION WITH CVAR: A ROBUST APPROACH Luigi Grossi 1, Fabrizio Laurini 2 and Giacomo Scandolo 1 1 Dipartimento di Scienze Economiche Università di Verona (e-mail: luigi.grossi@univr.it,

More information

Variations of Statistical Models

Variations of Statistical Models 38. Statistics 1 38. STATISTICS Revised September 2013 by G. Cowan (RHUL). This chapter gives an overview of statistical methods used in high-energy physics. In statistics, we are interested in using a

More information

Multivariate Normal Distribution

Multivariate Normal Distribution Multivariate Normal Distribution Lecture 4 July 21, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Lecture #4-7/21/2011 Slide 1 of 41 Last Time Matrices and vectors Eigenvalues

More information

Comparison of resampling method applied to censored data

Comparison of resampling method applied to censored data International Journal of Advanced Statistics and Probability, 2 (2) (2014) 48-55 c Science Publishing Corporation www.sciencepubco.com/index.php/ijasp doi: 10.14419/ijasp.v2i2.2291 Research Paper Comparison

More information

Factor Analysis. Principal components factor analysis. Use of extracted factors in multivariate dependency models

Factor Analysis. Principal components factor analysis. Use of extracted factors in multivariate dependency models Factor Analysis Principal components factor analysis Use of extracted factors in multivariate dependency models 2 KEY CONCEPTS ***** Factor Analysis Interdependency technique Assumptions of factor analysis

More information

Classification Problems

Classification Problems Classification Read Chapter 4 in the text by Bishop, except omit Sections 4.1.6, 4.1.7, 4.2.4, 4.3.3, 4.3.5, 4.3.6, 4.4, and 4.5. Also, review sections 1.5.1, 1.5.2, 1.5.3, and 1.5.4. Classification Problems

More information

PROPERTIES OF THE SAMPLE CORRELATION OF THE BIVARIATE LOGNORMAL DISTRIBUTION

PROPERTIES OF THE SAMPLE CORRELATION OF THE BIVARIATE LOGNORMAL DISTRIBUTION PROPERTIES OF THE SAMPLE CORRELATION OF THE BIVARIATE LOGNORMAL DISTRIBUTION Chin-Diew Lai, Department of Statistics, Massey University, New Zealand John C W Rayner, School of Mathematics and Applied Statistics,

More information

PARTIAL LEAST SQUARES IS TO LISREL AS PRINCIPAL COMPONENTS ANALYSIS IS TO COMMON FACTOR ANALYSIS. Wynne W. Chin University of Calgary, CANADA

PARTIAL LEAST SQUARES IS TO LISREL AS PRINCIPAL COMPONENTS ANALYSIS IS TO COMMON FACTOR ANALYSIS. Wynne W. Chin University of Calgary, CANADA PARTIAL LEAST SQUARES IS TO LISREL AS PRINCIPAL COMPONENTS ANALYSIS IS TO COMMON FACTOR ANALYSIS. Wynne W. Chin University of Calgary, CANADA ABSTRACT The decision of whether to use PLS instead of a covariance

More information

A Study on the Comparison of Electricity Forecasting Models: Korea and China

A Study on the Comparison of Electricity Forecasting Models: Korea and China Communications for Statistical Applications and Methods 2015, Vol. 22, No. 6, 675 683 DOI: http://dx.doi.org/10.5351/csam.2015.22.6.675 Print ISSN 2287-7843 / Online ISSN 2383-4757 A Study on the Comparison

More information

Recent Developments of Statistical Application in. Finance. Ruey S. Tsay. Graduate School of Business. The University of Chicago

Recent Developments of Statistical Application in. Finance. Ruey S. Tsay. Graduate School of Business. The University of Chicago Recent Developments of Statistical Application in Finance Ruey S. Tsay Graduate School of Business The University of Chicago Guanghua Conference, June 2004 Summary Focus on two parts: Applications in Finance:

More information

Centre for Central Banking Studies

Centre for Central Banking Studies Centre for Central Banking Studies Technical Handbook No. 4 Applied Bayesian econometrics for central bankers Andrew Blake and Haroon Mumtaz CCBS Technical Handbook No. 4 Applied Bayesian econometrics

More information

MAN-BITES-DOG BUSINESS CYCLES ONLINE APPENDIX

MAN-BITES-DOG BUSINESS CYCLES ONLINE APPENDIX MAN-BITES-DOG BUSINESS CYCLES ONLINE APPENDIX KRISTOFFER P. NIMARK The next section derives the equilibrium expressions for the beauty contest model from Section 3 of the main paper. This is followed by

More information

Handling missing data in large data sets. Agostino Di Ciaccio Dept. of Statistics University of Rome La Sapienza

Handling missing data in large data sets. Agostino Di Ciaccio Dept. of Statistics University of Rome La Sapienza Handling missing data in large data sets Agostino Di Ciaccio Dept. of Statistics University of Rome La Sapienza The problem Often in official statistics we have large data sets with many variables and

More information

Standard errors of marginal effects in the heteroskedastic probit model

Standard errors of marginal effects in the heteroskedastic probit model Standard errors of marginal effects in the heteroskedastic probit model Thomas Cornelißen Discussion Paper No. 320 August 2005 ISSN: 0949 9962 Abstract In non-linear regression models, such as the heteroskedastic

More information

Credit Risk Models: An Overview

Credit Risk Models: An Overview Credit Risk Models: An Overview Paul Embrechts, Rüdiger Frey, Alexander McNeil ETH Zürich c 2003 (Embrechts, Frey, McNeil) A. Multivariate Models for Portfolio Credit Risk 1. Modelling Dependent Defaults:

More information

Illustration (and the use of HLM)

Illustration (and the use of HLM) Illustration (and the use of HLM) Chapter 4 1 Measurement Incorporated HLM Workshop The Illustration Data Now we cover the example. In doing so we does the use of the software HLM. In addition, we will

More information

Java Modules for Time Series Analysis

Java Modules for Time Series Analysis Java Modules for Time Series Analysis Agenda Clustering Non-normal distributions Multifactor modeling Implied ratings Time series prediction 1. Clustering + Cluster 1 Synthetic Clustering + Time series

More information

Learning Gaussian process models from big data. Alan Qi Purdue University Joint work with Z. Xu, F. Yan, B. Dai, and Y. Zhu

Learning Gaussian process models from big data. Alan Qi Purdue University Joint work with Z. Xu, F. Yan, B. Dai, and Y. Zhu Learning Gaussian process models from big data Alan Qi Purdue University Joint work with Z. Xu, F. Yan, B. Dai, and Y. Zhu Machine learning seminar at University of Cambridge, July 4 2012 Data A lot of

More information

Part 2: Analysis of Relationship Between Two Variables

Part 2: Analysis of Relationship Between Two Variables Part 2: Analysis of Relationship Between Two Variables Linear Regression Linear correlation Significance Tests Multiple regression Linear Regression Y = a X + b Dependent Variable Independent Variable

More information

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics For 2015 Examinations Aim The aim of the Probability and Mathematical Statistics subject is to provide a grounding in

More information

Fairfield Public Schools

Fairfield Public Schools Mathematics Fairfield Public Schools AP Statistics AP Statistics BOE Approved 04/08/2014 1 AP STATISTICS Critical Areas of Focus AP Statistics is a rigorous course that offers advanced students an opportunity

More information

Topic 3b: Kinetic Theory

Topic 3b: Kinetic Theory Topic 3b: Kinetic Theory What is temperature? We have developed some statistical language to simplify describing measurements on physical systems. When we measure the temperature of a system, what underlying

More information

Revenue Management with Correlated Demand Forecasting

Revenue Management with Correlated Demand Forecasting Revenue Management with Correlated Demand Forecasting Catalina Stefanescu Victor DeMiguel Kristin Fridgeirsdottir Stefanos Zenios 1 Introduction Many airlines are struggling to survive in today's economy.

More information

Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares

Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares Topic 4 - Analysis of Variance Approach to Regression Outline Partitioning sums of squares Degrees of freedom Expected mean squares General linear test - Fall 2013 R 2 and the coefficient of correlation

More information

Principle of Data Reduction

Principle of Data Reduction Chapter 6 Principle of Data Reduction 6.1 Introduction An experimenter uses the information in a sample X 1,..., X n to make inferences about an unknown parameter θ. If the sample size n is large, then

More information

Time Series Analysis

Time Series Analysis Time Series Analysis hm@imm.dtu.dk Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby 1 Outline of the lecture Identification of univariate time series models, cont.:

More information

Estimating an ARMA Process

Estimating an ARMA Process Statistics 910, #12 1 Overview Estimating an ARMA Process 1. Main ideas 2. Fitting autoregressions 3. Fitting with moving average components 4. Standard errors 5. Examples 6. Appendix: Simple estimators

More information

CS229 Lecture notes. Andrew Ng

CS229 Lecture notes. Andrew Ng CS229 Lecture notes Andrew Ng Part X Factor analysis Whenwehavedatax (i) R n thatcomesfromamixtureofseveral Gaussians, the EM algorithm can be applied to fit a mixture model. In this setting, we usually

More information

A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution

A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 4: September

More information

Spatial Statistics Chapter 3 Basics of areal data and areal data modeling

Spatial Statistics Chapter 3 Basics of areal data and areal data modeling Spatial Statistics Chapter 3 Basics of areal data and areal data modeling Recall areal data also known as lattice data are data Y (s), s D where D is a discrete index set. This usually corresponds to data

More information

Sales forecasting # 1

Sales forecasting # 1 Sales forecasting # 1 Arthur Charpentier arthur.charpentier@univ-rennes1.fr 1 Agenda Qualitative and quantitative methods, a very general introduction Series decomposition Short versus long term forecasting

More information

15.062 Data Mining: Algorithms and Applications Matrix Math Review

15.062 Data Mining: Algorithms and Applications Matrix Math Review .6 Data Mining: Algorithms and Applications Matrix Math Review The purpose of this document is to give a brief review of selected linear algebra concepts that will be useful for the course and to develop

More information

CCNY. BME I5100: Biomedical Signal Processing. Linear Discrimination. Lucas C. Parra Biomedical Engineering Department City College of New York

CCNY. BME I5100: Biomedical Signal Processing. Linear Discrimination. Lucas C. Parra Biomedical Engineering Department City College of New York BME I5100: Biomedical Signal Processing Linear Discrimination Lucas C. Parra Biomedical Engineering Department CCNY 1 Schedule Week 1: Introduction Linear, stationary, normal - the stuff biology is not

More information

Christfried Webers. Canberra February June 2015

Christfried Webers. Canberra February June 2015 c Statistical Group and College of Engineering and Computer Science Canberra February June (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 829 c Part VIII Linear Classification 2 Logistic

More information

Analysis of Bayesian Dynamic Linear Models

Analysis of Bayesian Dynamic Linear Models Analysis of Bayesian Dynamic Linear Models Emily M. Casleton December 17, 2010 1 Introduction The main purpose of this project is to explore the Bayesian analysis of Dynamic Linear Models (DLMs). The main

More information

1 Teaching notes on GMM 1.

1 Teaching notes on GMM 1. Bent E. Sørensen January 23, 2007 1 Teaching notes on GMM 1. Generalized Method of Moment (GMM) estimation is one of two developments in econometrics in the 80ies that revolutionized empirical work in

More information

The Proportional Odds Model for Assessing Rater Agreement with Multiple Modalities

The Proportional Odds Model for Assessing Rater Agreement with Multiple Modalities The Proportional Odds Model for Assessing Rater Agreement with Multiple Modalities Elizabeth Garrett-Mayer, PhD Assistant Professor Sidney Kimmel Comprehensive Cancer Center Johns Hopkins University 1

More information

Constrained Bayes and Empirical Bayes Estimator Applications in Insurance Pricing

Constrained Bayes and Empirical Bayes Estimator Applications in Insurance Pricing Communications for Statistical Applications and Methods 2013, Vol 20, No 4, 321 327 DOI: http://dxdoiorg/105351/csam2013204321 Constrained Bayes and Empirical Bayes Estimator Applications in Insurance

More information

Item Response Theory in R using Package ltm

Item Response Theory in R using Package ltm Item Response Theory in R using Package ltm Dimitris Rizopoulos Department of Biostatistics, Erasmus University Medical Center, the Netherlands d.rizopoulos@erasmusmc.nl Department of Statistics and Mathematics

More information