Generalization of Chisquare Goodness-of-Fit Test for Binned Data Using Saturated Models, with Application to Histograms

Similar documents
12.5: CHI-SQUARE GOODNESS OF FIT TESTS

1.5 Oneway Analysis of Variance

33. STATISTICS. 33. Statistics 1

Maximum Likelihood Estimation

Statistics Graduate Courses

Chapter 3 RANDOM VARIATE GENERATION

MATH4427 Notebook 2 Spring MATH4427 Notebook Definitions and Examples Performance Measures for Estimators...

Theory versus Experiment. Prof. Jorgen D Hondt Vrije Universiteit Brussel jodhondt@vub.ac.be

Calculating P-Values. Parkland College. Isela Guerra Parkland College. Recommended Citation

CHI-SQUARE: TESTING FOR GOODNESS OF FIT

Introduction to General and Generalized Linear Models

Probability Calculator

Association Between Variables

Likelihood: Frequentist vs Bayesian Reasoning

An Introduction to Modeling Stock Price Returns With a View Towards Option Pricing

MATH 10: Elementary Statistics and Probability Chapter 5: Continuous Random Variables

Estimating Industry Multiples

A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution

171:290 Model Selection Lecture II: The Akaike Information Criterion

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

SENSITIVITY ANALYSIS AND INFERENCE. Lecture 12

A LOGNORMAL MODEL FOR INSURANCE CLAIMS DATA

Linear Models for Continuous Data

Interpreting Kullback-Leibler Divergence with the Neyman-Pearson Lemma

Basics of Statistical Machine Learning

statistics Chi-square tests and nonparametric Summary sheet from last time: Hypothesis testing Summary sheet from last time: Confidence intervals

Automated Biosurveillance Data from England and Wales,

The Variability of P-Values. Summary

Exact Nonparametric Tests for Comparing Means - A Personal Summary

Gaussian Conjugate Prior Cheat Sheet

Modelling the Scores of Premier League Football Matches

SAS Software to Fit the Generalized Linear Model

Permutation Tests for Comparing Two Populations

Dealing with systematics for chi-square and for log likelihood goodness of fit statistics

Two Correlated Proportions (McNemar Test)

Multivariate normal distribution and testing for means (see MKB Ch 3)

99.37, 99.38, 99.38, 99.39, 99.39, 99.39, 99.39, 99.40, 99.41, cm

Maximum likelihood estimation of mean reverting processes

5.1 Identifying the Target Parameter

Exact Confidence Intervals

INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition)

2WB05 Simulation Lecture 8: Generating random variables

Nonparametric Tests for Randomness

Department of Economics

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

To give it a definition, an implicit function of x and y is simply any relationship that takes the form:

Aggregate Loss Models

Signal Detection C H A P T E R SIGNAL DETECTION AS HYPOTHESIS TESTING

2DI36 Statistics. 2DI36 Part II (Chapter 7 of MR)

Spatial Statistics Chapter 3 Basics of areal data and areal data modeling

Experimental Design. Power and Sample Size Determination. Proportions. Proportions. Confidence Interval for p. The Binomial Test

Time Series Analysis

Stat 5102 Notes: Nonparametric Tests and. confidence interval

Recall this chart that showed how most of our course would be organized:

Two-sample inference: Continuous data

Statistics 2014 Scoring Guidelines

Math 425 (Fall 08) Solutions Midterm 2 November 6, 2008

Factorial experimental designs and generalized linear models

Normality Testing in Excel

This can dilute the significance of a departure from the null hypothesis. We can focus the test on departures of a particular form.

Nonparametric Statistics

Projects Involving Statistics (& SPSS)

Poisson Models for Count Data

START Selected Topics in Assurance

Chapter 4. Probability and Probability Distributions

Chapter 19 The Chi-Square Test

3.4 Statistical inference for 2 populations based on two samples

Practice problems for Homework 11 - Point Estimation

BA 275 Review Problems - Week 6 (10/30/06-11/3/06) CD Lessons: 53, 54, 55, 56 Textbook: pp , ,

Chi Square Tests. Chapter Introduction

Principle of Data Reduction

4. Continuous Random Variables, the Pareto and Normal Distributions

Overview Classes Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7)

Multiple Optimization Using the JMP Statistical Software Kodak Research Conference May 9, 2005

1.1 Introduction, and Review of Probability Theory Random Variable, Range, Types of Random Variables CDF, PDF, Quantiles...

AP: LAB 8: THE CHI-SQUARE TEST. Probability, Random Chance, and Genetics

Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed page of such transmission.

STAT 2080/MATH 2080/ECON 2280 Statistical Methods for Data Analysis and Inference Fall 2015

Chapter 4: Statistical Hypothesis Testing

EVALUATION OF PROBABILITY MODELS ON INSURANCE CLAIMS IN GHANA

Chapter 2. Hypothesis testing in one population

First-year Statistics for Psychology Students Through Worked Examples

A Power Primer. Jacob Cohen New York University ABSTRACT

The Binomial Probability Distribution

Week 4: Standard Error and Confidence Intervals

Quantitative Methods for Finance

Math 251, Review Questions for Test 3 Rough Answers

Math 461 Fall 2006 Test 2 Solutions

Rank-Based Non-Parametric Tests

= δx x + δy y. df ds = dx. ds y + xdy ds. Now multiply by ds to get the form of the equation in terms of differentials: df = y dx + x dy.

Some probability and statistics

CONTENTS OF DAY 2. II. Why Random Sampling is Important 9 A myth, an urban legend, and the real reason NOTES FOR SUMMER STATISTICS INSTITUTE COURSE

Program for statistical comparison of histograms

Statistical Impact of Slip Simulator Training at Los Alamos National Laboratory

Transcription:

Generalization of Chisquare Goodness-of-Fit Test for Binned Data Using Saturated Models, with Application to Histograms Robert D. Cousins Dept. of Physics and Astronomy University of California, Los Angeles, California USA April 9, 00; revised March 3, 03 Abstract This note is a quick review of a generalization of the chisquare goodnessof-fit test for the situation when the data are not Gaussian (as for example histogram bin contents). The generalization, already in use for many years, is based on the likelihood ratio test in which one uses in the denominator a saturated model, i.e., a model that fits the data exactly. As with the Gaussian test (and in fact any goodness-of-fit test), the power of the more general test can depend strongly on the alternative hypothesis. Introduction A goodness-of-fit (GOF) test is a test of the null hypothesis when the alternative hypothesis is not specified. Since Neyman and Pearson taught us that (even for simple hypotheses) the best test of the null hypothesis depends on the alternative, there is no universally best GOF test. Nonetheless, the ubiquity of the chisquare GOF test attests to its utility, at least for picking up certain departures from the null. In its usual form for uncorrelated Gaussian (normal) distributed data, one has χ = i (d i f i ), () σ i where d i ±σ i is the ith measured data point with rms deviation σ i (each assumed to be a known constant), and f i is the model prediction, perhaps with parameters. (If σ i is not a known constant at each i, but depends on the unknown true value of the model at i, then there are subtleties beyond the scope of this short cousins@physics.ucla.edu

note.) Considered as a random variable (since a function of the random data), in many applications this test statistic, χ, has a probability density [] which is also frequently called the chisquare function, with the potentially confusing ambiguity in multiple meanings usually resolvable by context. This can/should also be avoided by using another name, such as S or Q for the left hand side of Eqn., but I yield to common practice in this note. In HEP, we can also encounter situations in which the so-called regularity conditions are not met, so that the distribution of the test statistic in Eqn. is not a chisquare function; two common cases are when the true or best-fit values of parameters are on the boundary (physical constraint such as non-negativity), and when there are issues with degrees of freedom not being well-defined. Again, these issues are beyond the scope of this note, which focuses on a single issue, that concerning the generalization of this usual chisquare to the case where the data are not Gaussian. Likelihood ratios and the saturated model For the same data and model as above, the likelihood is: L = i πσ i exp ( (d i f i ) /σ i ), () leading to the common statement that ln L is equal to χ ; this is of course not correct since the former quantity has extra constants. Understanding why the extra terms disappear in the GOF test is an important point of this note. Wilks [] taught us that certain likelihood ratios obeying important regularity conditions have asymptotic probability densities which also follow the chisquare probability density. Likelihoods have the appealing property of being independent of the metric in which parameters are described. Likelihood ratios inherit this property and furthermore are invariant under change of metrics in which the data are described. For deep reasons such as the Neyman-Pearson Lemma, likelihood ratios are generally useful for comparing two hypotheses. In this note, likelihood ratios are denoted by λ, and are the basis for a generalization of Eqn.. Given only a null hypothesis and the data, one can invent an alternative hypothesis for which f i is equal to the data d i at every measured value. Such a model, which typically needs as many parameters are there are data points, is called a saturated model [3]. In some circumstances, saturated models can be useful for comparisons with the null hypothesis, and in particular for providing a denominator in the likelihood ratio. For the Gaussian data above, the saturated model sets f i = d i, so that the likelihood of the data in the saturated model is (since exp(0) = ) L saturated = i πσ i. (3)

The ratio of the two likelihoods above is then λ = i exp ( (d i f i ) /σ i ), (4) and thus, importantly, χ = ln λ. (5) From this point of view (there are other paths to Eqn. not as relevant to this note), the constants in ln L were not just ignored; they were canceled when a ratio was formed. Since the saturated model does not depend on the parameters of the original model, the maximum of λ is of course at those parameters that maximize the original L. In 983, Baker and Cousins [4] reviewed the use of the saturated model (although we did not call it that, instead citing a 98 paper by Neyman and Pearson) when applied to multinomial and Poisson data in histograms. The result is a GOF statistic for each case, which we called (following some literature of the time) χ λ,m and χ λ,p, respectively. The Poisson form is mentioned in the PDG s Review of Particle Properties []; some time ago we decided that it is best just to denote it by ln λ as some felt that calling it χ might encourage people to forget that it only asymptotically follows the χ distribution (and only if conditions are satisfied). As we said in our paper, probably the safest thing to do is to study the distribution by Monte Carlo. Heinrich [5] has studied the distribution and moments of λ for small statistics, and makes the point that for the asymptotic formulas to be valid, the contents of all bins must each be large. I think that in some of the cases that we have had before the statistics committee, the expanded use of the saturated model might be useful. For example, it might be applied when comparing simultaneous predictions of a model to a set of several binomial problems (asymmetries or efficiencies). One needs to keep in mind that even the ungeneralized GOF statistic is insensitive to certain departures from the null (since it throws away sign and order information). The generalized version needs to be understood in the same spirit, in addition to checking the distribution under the null by M.C. But with these cautions, I would encourage the trial and study of GOF test statistics based on saturated models. 3 Caution against absolute likelihood as goodness of fit statistic Occasionally one finds the recommendation to use as a goodness of fit test statistic the absolute likelihood at its maximum, as opposed to a ratio of likelihood maxima; typically Monte Carlo simulation is recommended as the way to get the null distribution of the test statistic. Although this might at first seem plausible, it ia a flawed concept: unlike the likelihood ratio, such a GOF statistic is without foundation, and power can vary arbitrarily with the metric. Simple examples can make clear that the value of the likelihood at its maximum must 3

be compared to something. For example, for Poisson counts, the probability of observing 00 events when µ = 00.0 is much less than the probability of observing event when µ =.0, even though in both cases the data perfectly fit the theory. Thus the saturated model provides the reference of the largest value that the likelihood can be for that data (for any model), and hence provides a reasonable normalization for the maximum observed for a more constraining model. Heinrich [6] discusses the pitfalls of using the absolute likelihood as a GOF statistic. For unbinned likelihoods, which are common in HEP, the problem is exacerbated. One might hope that one could just take binned GOF in the limit of small bins, but the answer depends on the way the limit is taken, i.e., in which metric the bin boundaries are equally spaced. Goodness of fit for unbinned data in one dimension has a variety of tests to choose among [7], depending on roughly what sort of alternatives one wants power against. In higher dimensions, the problem is yet more difficult [8]. Acknowledgments I thank Louis Lyons and other members of the CMS Statistics Committee for corrections and comments on earlier drafts. This work was partially supported by the U.S. Department of Energy and the National Science Foundation. References [] C. Amsler et al. (Particle Data Group), Review of Particle Physics, Physics Letters B667 (008). http://pdg.lbl.gov/009/reviews/ rpp009-rev-statistics.pdf, Eqn. 3.. [] S.S. Wilks, The large-sample distribution of the likelihood ratio for testing composite hypotheses, Annals of Math. Stat. 9 (938) 60. [3] J.K. Lindsey, Parametric Statistical Inference (New York: Oxford University Press), 996. [4] S. Baker and R. D. Cousins, Clarification of the use of chi-square and likelihood functions in fits to histograms, Nucl. Instrum. Meth. (984) 437. [5] Joel G. Heinrich, The Log Likelihood Ratio of the Poisson Distribution for Small µ, CDF/MEMO/CDF/CDFR/578, Version, http://www-cdf. fnal.gov/physics/statistics/notes/cdf578_loglikeratv.ps.gz [6] Joel Heinrich, Pitfalls of Goodness-of-Fit from Likelihood, talk at PhyStat 003 (Stanford, CA), arxiv:physics/03067 4

[7] Goodness-of-Fit Techniques, edited by Ralph B. D Agostino and Michael A. Stephens, Vol. 68 of Statistics: Textbooks and Monograph. (New York, Marcel Dekker, 986). [8] Mike Williams, How good are your fits? Unbinned multivariate goodness-of-fit tests in high energy physics, JINST 5 P09004 (00), arxiv:006.309 [hep-ex]. 5