Uncertainty quantification for the familywise error rate in multivariate copula models


 Nicholas Wade
 1 years ago
 Views:
Transcription
1 Uncertainty quantification for the familywise error rate in multivariate copula models Thorsten Dickhaus (joint work with Taras Bodnar, Jakob Gierl and Jens Stange) University of Bremen Institute for Statistics Adaptive Designs and Multiple Testing Procedures Workshop 2015 University of Cologne,
2 Outline Simultaneous test procedures in terms of pvalue copulae Asymptotic behavior of empirically calibrated multiple tests Estimation of an unknown copula Application: Exchange rate risks References: Dickhaus, T., Gierl, J. (2013): Stange, J., Bodnar, T., Dickhaus, T. (2014): Simultaneous test procedures in Uncertainty quantification for the familywise terms of pvalue copulae. error rate in multivariate copula models. CMCGS 2013 Proceedings, AStA Adv. Stat. Anal., online first.
3 Notational setup Given: Statistical model (Ω, F, (P ϑ ) ϑ Θ ) H m = (H i ) i=1,...,m (Ω, F, (P ϑ ) ϑ Θ, H m ) ϕ = (ϕ i : i = 1,..., m) Family of null hypotheses with H i Θ and alternatives K i = Θ \ H i multiple test problem multiple test for H m Test decision Hypotheses 0 1 true U m V m m 0 false T m S m m 1 W m R m m
4 Local significance level (Strong) control of the FamilyWise Error Rate (FWER): ϑ Θ : FWER ϑ (ϕ) = P ϑ (V m > 0)! α Bonferroni correction: Carry out each individual test ϕ i at local level α loc. := α/m. Let I 0 (ϑ) denote the index set of true hypotheses in H m under ϑ. FWER ϑ (ϕ) = P ϑ {ϕ i = 1} i I 0 (ϑ) i I 0 (ϑ) P ϑ ({ϕ i = 1}) m 0 α loc. mα loc. = α.
5 Simultaneous test procedures K. R. Gabriel (1969), Hothorn et al. (2008) Definition: Define the (global) intersection hypothesis by H 0 = m i=1 H i. Consider the extended problem (Ω, F, (P ϑ ) ϑ Θ, H m+1 ) with H m+1 = {H i, i I := {0, 1,..., m}}. Assume realvalued test statistics T i, i I, which tend to larger values under alternatives. Then we call (a) (H m+1, T ) with T = {T i, i I } a testing family. (b) ϕ = (ϕ i, i I ) a simultaneous test procedure (STP), if { 1, if T i > c α, 0 i m : ϕ i = such that 0, if T i c α, ϑ H 0 : P ϑ ({ϕ 0 = 1}) = P ϑ ({T 0 > c α }) α.
6 FWER control with STPs Assumptions (for the moment): 1. There exists a ϑ H 0 which is a least favorable parameter configuration (LFC) for the FWER of the STP ϕ based on T 1,..., T m i m : H i : {θ i (ϑ) = θi }, where θ : Θ Θ 3. L(T i ) is continuous under H i with known cdf. F i. Exemplary model classes: ANOVA1: all pairs comparisons (Tukey contrasts), multiple comparisons with a control group (Dunnett contrasts) Assumptions are fulfilled (θ: difference operator) Multiple association tests in contingency tables, genetic association studies Assumptions are fulfilled, at least asymptotically (for large sample sizes)
7 FWER control with STPs Assumptions (for the moment): 1. There exists a ϑ H 0 which is a least favorable parameter configuration (LFC) for the FWER of the STP ϕ based on T 1,..., T m i m : H i : {θ i (ϑ) = θi }, where θ : Θ Θ 3. L(T i ) is continuous under H i with known cdf. F i. Exemplary model classes: ANOVA1: all pairs comparisons (Tukey contrasts), multiple comparisons with a control group (Dunnett contrasts) Assumptions are fulfilled (θ: difference operator) Multiple association tests in contingency tables, genetic association studies Assumptions are fulfilled, at least asymptotically (for large sample sizes)
8 Copulae Theorem: (Sklar (1959, 1996)) Let X = (X 1,..., X m ) a random vector with values in R m and with joint cdf F X and marginal cdfs F X1,..., F Xm. Then there exists a function C : [0, 1] m [0, 1] such that x = (x 1,..., x m ) R m : F X (x) = C(F X1 (x 1 ),..., F Xm (x m )). If all m marginal cdfs are continuous, the copula C is unique. Obviously, it holds: If all X i, 1 i m, are marginally distributed as UNI[0, 1], then F X = C!
9 pvalues, distributional transforms Under our general assumptions , appropriate pvalues corresponding to the T i are given by 1 i m : p i = 1 F i (T i ). Properties of p i under assumptions : T i > c α p i < 1 F i (c α ), if F i is strictly isotone. We may think of α (i) loc. := 1 F i (c α ) as a multiplicityadjusted local significance level. 1 p i is equal to Rüschendorf s distributional transform. Under H i, we have p i UNI[0, 1] and 1 p i UNI[0, 1].
10 A simple calculation Let us construct an STP ϕ in terms of pvalues. Due to the above, we only have to consider multiple tests of the form ϕ = (ϕ i : 1 i m) with ϕ i = 1 (i) [0,α )(p i). loc. For arbitrary ϑ Θ and ϑ H 0, we get: FWER ϑ (ϕ) = P ϑ ( m ) {p i < α loc.} (i) P ϑ {p i < α loc.} (i) = 1 P ϑ i I 0 (ϑ) ( m i=1 {1 p i 1 α (i) loc.} = 1 C ϑ (1 α (1) loc.,..., 1 α (m) loc. ), with C ϑ denoting the copula of (1 p i : 1 i m) under ϑ. i=1 )
11 Projection method, Hothorn et al. (2008) Assume that an (asymptotically) jointly normal vector of test statistics T = (T 1,..., T m ) is at hand. For control of the FWER by an STP based on T, determine the equicoordinate (twosided) (1 α)quantile of the joint normal distribution of T and project onto the axes. R: vcov() + mvtnorm
12 FWER control at level α = 0.3 via contour lines of the copula C ϑ
13 FWER control at level α = 0.3 via contour lines of the copula C ϑ
14 FWER control at level α = 0.3 via contour lines of the copula C ϑ
15 FWER control at level α = 0.3 via contour lines of the copula C ϑ
16 FWER control at level α = 0.3 via contour lines of the copula C ϑ We obtain α loc Crosscheck: Φ 1 (1 α loc./2) is equal to the tabulated normal quantile for the chosen parameters. The structural information provided by C ϑ increases power! If one hypothesis is more important than the other, just change the slope of the blue straight line.
17 Unknown copula C ϑ In the case that we are willing to assume , but do not know the copula C ϑ, we propose: Parametric copula estimation (e. g., via Spearman s ρ and/or Kendall s τ and/or Hoeffding s lemma) Nonparametric copula estimation (e. g., with Bernstein copulae) Modeling with structured (hierarchical) copulae (e. g., for block dependencies) Approximating contour lines by resampling or statistical learning techniques These are research topics within our Research Unit FOR 1735 Structural Inference in Statistics: Adaptation and Efficiency.
18 Extended model setup with copula parameter Extended model for the family of probability measures: P = (P ϑ,η : ϑ Θ, η Ξ) ϑ Θ η Ξ Parameter of interest (H j Θ, 1 j m), Nuisance (copula) parameter representing the dependency structure Fundamental assumption: η does not depend on ϑ. FWER control in the extended model: sup FWER ϑ,η (ϕ)! α. ϑ Θ,η Ξ LFC ϑ H 0 : Put P η = P ϑ,η and FWER η(ϕ) = FWER ϑ,η(ϕ).
19 Empirical calibration of critical values We recall for a multiple test ϕ with test statistics T 1,..., T m and critical values c 1,..., c m under our general assumptions : m FWER ϑ,η (ϕ) FWER η(ϕ) = P η {T j > c j } Empirical calibration of ϕ: j=1 = 1 C η (F 1 (c 1 ),..., F m (c m )). Assume that the dependence structure of T is determined by the copula function C η0, η 0 Ξ. Utilization of an estimate ˆη for η 0 leads to the empirically calibrated critical values ĉ = c(ˆη) and the calibrated test ˆϕ. Calibrated local significance levels: Take u(ˆη) from the set (1 α) and put α(j) loc. = 1 u j (ˆη), 1 j m. C 1 ˆη
20 Empirical calibration of critical values We recall for a multiple test ϕ with test statistics T 1,..., T m and critical values c 1,..., c m under our general assumptions : m FWER ϑ,η (ϕ) FWER η(ϕ) = P η {T j > c j } Empirical calibration of ϕ: j=1 = 1 C η (F 1 (c 1 ),..., F m (c m )). Assume that the dependence structure of T is determined by the copula function C η0, η 0 Ξ. Utilization of an estimate ˆη for η 0 leads to the empirically calibrated critical values ĉ = c(ˆη) and the calibrated test ˆϕ. Calibrated local significance levels: Take u(ˆη) from the set (1 α) and put α(j) loc. = 1 u j (ˆη), 1 j m. C 1 ˆη
21 Regard FWER η 0 (ϕ) as a derived parameter of the copula model for T. Theorem: Assume that C η0 {C η η Ξ R p }, p N. Suppose an estimator ˆη n : Ω Ξ of η 0 fulfilling n(ˆηn η 0 ) d N p (0, Σ 0 ) as n. Then, under standard regularity assumptions, it holds: a) Asymptotic Normality (Delta method) n ( FWER η0 ( ˆϕ) α ) d N (0, σ 2 η0 ). b) Asymptotic Confidence Region (ˆσ 2 n consistent for σ 2 η 0 ) ( n FWER ) lim n P η 0 ( ˆϕ) α η 0 z 1 δ = 1 δ. ˆσ n
22 Three inversion formulas Lemma: X and Y realvalued random variables with marginal cdfs F X and F Y and bivariate copula C η, depending on a copula parameter η. σ X,Y : Covariance of X and Y ρ X,Y : Spearman s rank correlation coefficient (population version) τ X,Y : Kendall s tau (population version) Then it holds: σ X,Y = f 1 (η) = [C η {F X (x), F Y (y)} R 2 F X (x)f Y (y)] dx dy, ρ X,Y = f 2 (η) = 12 C η (u, v) du dv 3, [0,1] 2 τ X,Y = f 3 (η) = 4 C η (u, v) dc η (u, v) 1. [0,1] 2
23 Example: GumbelHougaard copulae (Oneparametric Archimedean copula) 1/η m C η (u 1,..., u m ) = exp ( ln(u j )) η, η 1. Taking m = 2, we obtain and, consequently, j=1 τ η = η 1 η η = (1 τ) 1. (1) Thus, η can easily be calibrated by a method of moments (plugin of an augmented sample version of τ into (1)).
24 GumbelHougaard copulae and maxstability Proposition: (maxstability of GumbelHougaard copulae) For all η 1 and (u 1,..., u m ) [0, 1] m, it holds: 1. C η is a maxstable copula, i. e., n N : C η (u 1,..., u m ) n = C η (u n 1,..., un m). 2. It exists a family of copulas such that for any member C, it holds ( C(u 1/n n 1,..., u 1/n m )) = Cη (u 1,..., u m ). lim n = Applications of GumbelHougaard copulae in multivariate extreme value statistics
25 Example: Multiple support tests X 1,..., X n : sample of iid. random vectors with values in [0, ) m, each of which distributed as X = (X 1,..., X m ) with 1 j m : X j d = ϑj Z j, ϑ j > 0, where Z j has cdf. F j : [0, 1] [0, 1]. Parameter of interest: ϑ = (ϑ 1,..., ϑ m ) Θ = (0, ) m. Multiple test problem (ϑ j : 1 j m given constants): H j : {ϑ j ϑ j } versus K j : {ϑ j > ϑ j }, j = 1,..., m Test statistics: T j = max 1 i n X i,j/ϑ j, 1 j m If the copula of X is in the domain of attraction of some C η, our theory applies, at least asymptotically.
26 An application to exchange rate risks Consider daily exchange rates: EUR/CNY, EUR/HKD, EUR/MXN, and EUR/USD. Data from 01/07/2010 to 30/06/2014 (http://sdw.ecb.europa.eu) were transformed into logreturns. Entire sample was split into two subsamples, where the first subsample consists of the data for the first three years. Research question: For which of the four time series does the tail behavior of the returns remain stable during the fourth year of analysis?
27 Stochastic model for extreme returns It is common practice to model excesses over large thresholds u by generalized Pareto distributions (GPDs) with cdf { 1 (1 + ξx/ϑ) G ξ,ϑ (x) = 1/ξ, ξ 0, 1 exp( x/ϑ), ξ = 0, where x 0 for ξ 0 and 0 x ϑ/ξ if ξ < 0. Table: Maximum likelihood estimates of the GPD parameters based on data from 01/07/2010 until 30/06/2013 Parameter EUR/CNY EUR/HKD EUR/MXN EUR/USD ξ ( ) ( ) ( ) ( ) ϑ ( ) ( ) ( ) ( ) x 0 = u ϑ/ξ
28 Results of the data analysis on second subsample Table: Lower confidence limits for ϑ j and x 0,j, 1 j 4, for the second time period from 01/07/2013 until 30/06/2014 ϑ j EUR/CNY EUR/HKD EUR/MXN EUR/USD Bonferroni Šidák Gumbel Gˆη x 0,j EUR/CNY EUR/HKD EUR/MXN EUR/USD Bonferroni Šidák Gumbel Gˆη
29 References Gabriel, K. R. (1969). Simultaneous test procedures  some theory of multiple comparisons. Ann. Math. Stat., Vol. 40, Hothorn, T., Bretz, F., Westfall, P. (2008). Simultaneous Inference in General Parametric Models. Biometrical Journal, Vol. 50, No. 3, Rüschendorf, L. (2009). On the distributional transform, Sklar s theorem, and the empirical copula process. J. Stat. Plann. Inference, Vol. 139, No. 11, Sklar, A. (1996). Random variables, distribution functions, and copulas  a personal look backward and forward. In: Distributions with Fixed Marginals and Related Topics. Institute of Mathematical Statistics, Hayward, CA, 114.
MINITAB ASSISTANT WHITE PAPER
MINITAB ASSISTANT WHITE PAPER This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. OneWay
More informationBeyond Correlation: Extreme Comovements Between Financial Assets
Beyond Correlation: Extreme Comovements Between Financial Assets Roy Mashal Columbia University Assaf Zeevi Columbia University This version: October 14, 2002 Abstract This paper investigates the potential
More informationRecall this chart that showed how most of our course would be organized:
Chapter 4 OneWay ANOVA Recall this chart that showed how most of our course would be organized: Explanatory Variable(s) Response Variable Methods Categorical Categorical Contingency Tables Categorical
More informationINFERENCE FOR THE IDENTIFIED SET IN PARTIALLY IDENTIFIED ECONOMETRIC MODELS. JOSEPH P. ROMANO Stanford University, Stanford, CA 943054065, U.S.A.
http://www.econometricsociety.org/ Econometrica, Vol. 78, No. 1 (January, 2010), 169 211 INFERENCE FOR THE IDENTIFIED SET IN PARTIALLY IDENTIFIED ECONOMETRIC MODELS JOSEPH P. ROMANO Stanford University,
More informationExtreme Value Analysis of Corrosion Data. Master Thesis
Extreme Value Analysis of Corrosion Data Master Thesis Marcin Glegola Delft University of Technology, Faculty of Electrical Engineering, Mathematics and Computer Science, The Netherlands July 2007 Abstract
More informationThe Best of Both Worlds:
The Best of Both Worlds: A Hybrid Approach to Calculating Value at Risk Jacob Boudoukh 1, Matthew Richardson and Robert F. Whitelaw Stern School of Business, NYU The hybrid approach combines the two most
More informationModelling dependent yearly claim totals including zeroclaims in private health insurance
Modelling dependent yearly claim totals including zeroclaims in private health insurance Vinzenz Erhardt, Claudia Czado Technische Universität München, Zentrum Mathematik, Boltzmannstr. 3, 85747 Garching,
More informationModeling Dependence Structures in the Electricity Price Business by Means of Copula Techniques. Hui Zheng
THESIS FOR THE DEGREE OF LICENTIATE OF ENGINEERING IN MATHEMATICAL STATISTICS Modeling Dependence Structures in the Electricity Price Business by Means of Copula Techniques Hui Zheng DEPARTMENT OF MATHEMATICAL
More informationFalse Discovery Rate Control with Groups
False Discovery Rate Control with Groups James X. Hu, Hongyu Zhao and Harrison H. Zhou Abstract In the context of largescale multiple hypothesis testing, the hypotheses often possess certain group structures
More informationA direct approach to false discovery rates
J. R. Statist. Soc. B (2002) 64, Part 3, pp. 479 498 A direct approach to false discovery rates John D. Storey Stanford University, USA [Received June 2001. Revised December 2001] Summary. Multiplehypothesis
More informationStrong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: a unified approach
J. R. Statist. Soc. B (2004) 66, Part 1, pp. 187 205 Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: a unified approach John D. Storey,
More informationA Guide to Sample Size Calculations for Random Effect Models via Simulation and the MLPowSim Software Package
A Guide to Sample Size Calculations for Random Effect Models via Simulation and the MLPowSim Software Package William J Browne, Mousa Golalizadeh Lahi* & Richard MA Parker School of Clinical Veterinary
More informationwww.rmsolutions.net R&M Solutons
Ahmed Hassouna, MD Professor of cardiovascular surgery, AinShams University, EGYPT. Diploma of medical statistics and clinical trial, Paris 6 university, Paris. 1A Choose the best answer The duration
More informationPermutation Pvalues Should Never Be Zero: Calculating Exact Pvalues When Permutations Are Randomly Drawn
Permutation Pvalues Should Never Be Zero: Calculating Exact Pvalues When Permutations Are Randomly Drawn Gordon K. Smyth & Belinda Phipson Walter and Eliza Hall Institute of Medical Research Melbourne,
More informationSet Identified Linear Models
Set Identified Linear Models Christian Bontemps, Thierry Magnac, Eric Maurin First version, August 2006 This version, December 2007 Abstract We analyze the identification and estimation of parameters β
More informationMean Reversion and Unit Root Properties of Diffusion Models
Mean Reversion and Unit Root Properties of Diffusion Models Jihyun Kim Joon Y. Park Abstract This paper analyzes the mean reversion and unit root properties of general diffusion models and their discrete
More informationSimultaneous Inference in General Parametric Models
Simultaneous Inference in General Parametric Models Torsten Hothorn Institut für Statistik LudwigMaximiliansUniversität München Ludwigstraße 33, D 80539 München, Germany Frank Bretz Statistical Methodology,
More informationMultivariate Analysis of Variance (MANOVA): I. Theory
Gregory Carey, 1998 MANOVA: I  1 Multivariate Analysis of Variance (MANOVA): I. Theory Introduction The purpose of a t test is to assess the likelihood that the means for two groups are sampled from the
More informationThe Variability of PValues. Summary
The Variability of PValues Dennis D. Boos Department of Statistics North Carolina State University Raleigh, NC 276958203 boos@stat.ncsu.edu August 15, 2009 NC State Statistics Departement Tech Report
More informationStatistical Inference in TwoStage Online Controlled Experiments with Treatment Selection and Validation
Statistical Inference in TwoStage Online Controlled Experiments with Treatment Selection and Validation Alex Deng Microsoft One Microsoft Way Redmond, WA 98052 alexdeng@microsoft.com Tianxi Li Department
More informationarxiv:1309.4656v1 [math.st] 18 Sep 2013
The Annals of Statistics 2013, Vol. 41, No. 1, 1716 1741 DOI: 10.1214/13AOS1123 c Institute of Mathematical Statistics, 2013 UNIFORMLY MOST POWERFUL BAYESIAN TESTS arxiv:1309.4656v1 [math.st] 18 Sep 2013
More information1 Covariate Shift by Kernel Mean Matching
1 Covariate Shift by Kernel Mean Matching Arthur Gretton 1 Alex Smola 1 Jiayuan Huang Marcel Schmittfull Karsten Borgwardt Bernhard Schölkopf Given sets of observations of training and test data, we consider
More information11. Analysis of Casecontrol Studies Logistic Regression
Research methods II 113 11. Analysis of Casecontrol Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:
More information1 Theory: The General Linear Model
QMIN GLM Theory  1.1 1 Theory: The General Linear Model 1.1 Introduction Before digital computers, statistics textbooks spoke of three procedures regression, the analysis of variance (ANOVA), and the
More informationTests in a case control design including relatives
Tests in a case control design including relatives Stefanie Biedermann i, Eva Nagel i, Axel Munk ii, Hajo Holzmann ii, Ansgar Steland i Abstract We present a new approach to handle dependent data arising
More informationThe Bonferonni and Šidák Corrections for Multiple Comparisons
The Bonferonni and Šidák Corrections for Multiple Comparisons Hervé Abdi 1 1 Overview The more tests we perform on a set of data, the more likely we are to reject the null hypothesis when it is true (i.e.,
More informationFrom value at risk to stress testing: The extreme value approach
Journal of Banking & Finance 24 (2000) 1097±1130 www.elsevier.com/locate/econbase From value at risk to stress testing: The extreme value approach Francßois M. Longin * Department of Finance, Groupe ESSEC,
More informationA MeanVariance Framework for Tests of Asset Pricing Models
A MeanVariance Framework for Tests of Asset Pricing Models Shmuel Kandel University of Chicago TelAviv, University Robert F. Stambaugh University of Pennsylvania This article presents a meanvariance
More informationOneyear reserve risk including a tail factor : closed formula and bootstrap approaches
Oneyear reserve risk including a tail factor : closed formula and bootstrap approaches Alexandre Boumezoued R&D Consultant Milliman Paris alexandre.boumezoued@milliman.com Yoboua Angoua NonLife Consultant
More informationSIEVE INFERENCE ON POSSIBLY MISSPECIFIED SEMINONPARAMETRIC TIME SERIES MODELS
SIEVE INFERENCE ON POSSIBLY MISSPECIFIED SEMINONPARAMERIC IME SERIES MODELS By Xiaohong Chen, Zhipeng Liao and Yixiao Sun Yale University, UC Los Angeles and UC San Diego his paper provides a general
More information