Statistics. Lent Term 2015 Prof. Mark Thomson. 3: Fitting and Hypothesis Testing

Similar documents
5.1 Identifying the Target Parameter

CHI-SQUARE: TESTING FOR GOODNESS OF FIT

Partial Fractions. Combining fractions over a common denominator is a familiar operation from algebra:

Simple linear regression

99.37, 99.38, 99.38, 99.39, 99.39, 99.39, 99.39, 99.40, 99.41, cm

Algebra 1 Course Title

SENSITIVITY ANALYSIS AND INFERENCE. Lecture 12

People have thought about, and defined, probability in different ways. important to note the consequences of the definition:

Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION

Simple Linear Regression Inference

1 Prior Probability and Posterior Probability

Theory versus Experiment. Prof. Jorgen D Hondt Vrije Universiteit Brussel jodhondt@vub.ac.be

4. Continuous Random Variables, the Pareto and Normal Distributions

Algebra 2 Chapter 1 Vocabulary. identity - A statement that equates two equivalent expressions.

MATHEMATICAL METHODS OF STATISTICS

MATH10212 Linear Algebra. Systems of Linear Equations. Definition. An n-dimensional vector is a row or a column of n numbers (or letters): a 1.

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Simple Regression Theory II 2010 Samuel L. Baker

Statistical estimation using confidence intervals

Part 2: Analysis of Relationship Between Two Variables

Trend and Seasonal Components

CURVE FITTING LEAST SQUARES APPROXIMATION

Lecture Notes Module 1

Regression III: Advanced Methods

Increasing for all. Convex for all. ( ) Increasing for all (remember that the log function is only defined for ). ( ) Concave for all.

AP Physics 1 and 2 Lab Investigations

" Y. Notation and Equations for Regression Lecture 11/4. Notation:

Elasticity Theory Basics

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Matrices 2. Solving Square Systems of Linear Equations; Inverse Matrices

Chapter 4. Probability and Probability Distributions

Notes on Orthogonal and Symmetric Matrices MENU, Winter 2013

Review of Fundamental Mathematics

Vector and Matrix Norms

Efficient Curve Fitting Techniques

MATH BOOK OF PROBLEMS SERIES. New from Pearson Custom Publishing!

Introduction to General and Generalized Linear Models

88 CHAPTER 2. VECTOR FUNCTIONS. . First, we need to compute T (s). a By definition, r (s) T (s) = 1 a sin s a. sin s a, cos s a

LS.6 Solution Matrices

Current Standard: Mathematical Concepts and Applications Shape, Space, and Measurement- Primary

Lecture 8. Confidence intervals and the central limit theorem

APPLIED MATHEMATICS ADVANCED LEVEL

Least-Squares Intersection of Lines

Dealing with systematics for chi-square and for log likelihood goodness of fit statistics

Multivariate Normal Distribution

33. STATISTICS. 33. Statistics 1

3.2. Solving quadratic equations. Introduction. Prerequisites. Learning Outcomes. Learning Style

Using R for Linear Regression

Gaussian Conjugate Prior Cheat Sheet

Likelihood: Frequentist vs Bayesian Reasoning

6.4 Normal Distribution

Means, standard deviations and. and standard errors

QUANTITATIVE METHODS BIOLOGY FINAL HONOUR SCHOOL NON-PARAMETRIC TESTS

EMPIRICAL FREQUENCY DISTRIBUTION

Chapter 27: Taxation. 27.1: Introduction. 27.2: The Two Prices with a Tax. 27.2: The Pre-Tax Position

In mathematics, there are four attainment targets: using and applying mathematics; number and algebra; shape, space and measures, and handling data.

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

Algebra Academic Content Standards Grade Eight and Grade Nine Ohio. Grade Eight. Number, Number Sense and Operations Standard

LOGISTIC REGRESSION. Nitin R Patel. where the dependent variable, y, is binary (for convenience we often code these values as

Point and Interval Estimates

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS

Normal distribution. ) 2 /2σ. 2π σ

DERIVATIVES AS MATRICES; CHAIN RULE

The Standard Normal distribution

The Effects of Start Prices on the Performance of the Certainty Equivalent Pricing Policy

Lecture 8: More Continuous Random Variables

Chapter 3 RANDOM VARIATE GENERATION

Nuclear Physics Lab I: Geiger-Müller Counter and Nuclear Counting Statistics

Precalculus REVERSE CORRELATION. Content Expectations for. Precalculus. Michigan CONTENT EXPECTATIONS FOR PRECALCULUS CHAPTER/LESSON TITLES

Year 9 set 1 Mathematics notes, to accompany the 9H book.

MA107 Precalculus Algebra Exam 2 Review Solutions

Financial Mathematics and Simulation MATH Spring 2011 Homework 2

CHAPTER 5 Round-off errors

The Basics of FEA Procedure

NEW YORK STATE TEACHER CERTIFICATION EXAMINATIONS

Basics of Statistical Machine Learning

Measuring Line Edge Roughness: Fluctuations in Uncertainty

c 2008 Je rey A. Miron We have described the constraints that a consumer faces, i.e., discussed the budget constraint.

Mean value theorem, Taylors Theorem, Maxima and Minima.

SAS Software to Fit the Generalized Linear Model

Introduction to Regression and Data Analysis

STRUTS: Statistical Rules of Thumb. Seattle, WA

Week 4: Standard Error and Confidence Intervals

POLYNOMIAL AND MULTIPLE REGRESSION. Polynomial regression used to fit nonlinear (e.g. curvilinear) data into a least squares linear regression model.

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS

CALCULATIONS & STATISTICS

Microeconomic Theory: Basic Math Concepts

STA 4273H: Statistical Machine Learning

Hypothesis Testing for Beginners

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

MBA 611 STATISTICS AND QUANTITATIVE METHODS

Dimensional analysis is a method for reducing the number and complexity of experimental variables that affect a given physical phenomena.

Efficiency and the Cramér-Rao Inequality

Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools

1 Error in Euler s Method

II. DISTRIBUTIONS distribution normal distribution. standard scores

Recall that two vectors in are perpendicular or orthogonal provided that their dot

Transcription:

Statistics Lent Term 2015 Prof. Mark Thomson n = 1 n = 2 n = 3 n = 4 n = 5 n = 10 n = 20 n = 40 n = 60 n = 80 Lecture 3 : Fitting and Hypothesis testing Prof. M.A. Thomson Lent 2015 67 Lecture Lecture Lecture Lecture Lecture 1: Back to basics Introduction, Probability distribution functions, Binomial distributions, Poisson distribution 2: The Gaussian Limit The central limit theorem, Gaussian errors, Error propagation, Combination of measurements, Multidimensional Gaussian errors, Error Matrix 3: Fitting and Hypothesis Testing The χ 2 test, Likelihood functions, Fitting, Binned maximum likelihood, Unbinned maximum likelihood 4: The Dark Arts Part I Bayesian Inference, Credible Intervals 5: The Dark Arts II The Frequentist approach, Confidence Intervals, Limits near physical boundaries, Systematic Uncertainties. Prof. M.A. Thomson Lent 2015 68

Introduction! Given some data (event counts, distributions) and a particular theoretical model " are the data consistent with the model: hypothesis testing goodness of fit " in the context of the model, what are our best estimates of its parameters: " fitting! In both cases, need a measure of consistency of data with our model! Start with a discussion of Prof. M.A. Thomson Lent 2015 69 The Chi-Squared Statistic! Suppose we measure a parameter,, which a theorist says should have the value! Within this simple model, we can write down the prior probability of obtaining the value given the prediction! To express the consistency of the data, ask the question if the model is correct what is the probability of obtaining result at least far as far from the prediction as the observed value! This is simply the fraction of the area under the Gaussian with! e.g. if 1.5σ from the prediction:! Only care about degree of consistency, not whether we are on the +ve or ve side, so equivalently want the probability where! For Gaussian distributed variables,, forms the basis of our consistency test Prof. M.A. Thomson Lent 2015 70

! The probability distribution for, χ 2, can be obtained easily from the ID distribution and Factor of two from +ve and ve parts of Gaussian Prof. M.A. Thomson Lent 2015 71 The Chi-Squared Statistic in higher dimensions! So far, this isn t particularly useful! But now extend this to two dimensions (ignoring correlations for the moment, although we now know how to include them)! Lines of equal probability are equivalent to lines of equal! What if I measure! How consistent is this with expected values?! ANSWER: the probability of obtaining smaller probability then observed " i.e. integrate 2D PDF over region where Prof. M.A. Thomson Lent 2015 72

! It is worth seeing how this works mathematically " First transform error ellipse into circular contours " Therefore Prof. M.A. Thomson Lent 2015 73! Only interested in radius i.e. " so transform with! Therefore, probability distribution in chi-squared:! For two Gaussian distributed variables, we now have an expression for the chi-squared probability distribution! Problem: Show that: and Prof. M.A. Thomson Lent 2015 74

n = 1 n = 2 n = 3 n = 4 n = 5 n = 10 n = 20 n = 40 n = 60 n = 80! For any number of variables (degrees of freedom) can now obtain! Already done for you, e.g. tables or more convenient TMath::Prob(χ 2,n) Prof. M.A. Thomson Lent 2015 75 Properties of chi-squared! For n degrees of freedom Proof: or! For n degrees of freedom Proof: Prof. M.A. Thomson Lent 2015 76

n = 1 n = 2 n = 3 n = 4 n = 5 n = 10 n = 20 n = 40 n = 60 n = 80! For large n; the distribution tends to a Gaussian (large ~ 40) Useful for quick estimates Prof. M.A. Thomson Lent 2015 77 Example! Suppose we have an absolute prediction for a distribution, e.g. a differential cross section and we can account perfectly for experimental effects (efficiency, background, bin-bin migration)! Measure number of events in bins of cosθ,, and compare to prediction! If prediction is correct, expect the observed number of events in a given bin to to follow a Poisson distribution of mean! If the expectations in all bins are sufficiently large, the Poisson distribution can be approximated as a Gaussian with mean and variance! In this limit can use chi-squared for consistency with hypothesis Expected fluctuations around mean! For N bins, have N independent (approximately) Gaussian distributed variables! Overall consistency of data with prediction assessed using! If hypothesis (prediction) is correct expect Prof. M.A. Thomson Lent 2015 78

" Quick estimates: " But Ν =20, not very large, these estimates only give an indication of the agreement. " Correct numbers (integral of expected χ 2 distribution) TMath::Prob(χ 2,20) Perfectly consistent: 10 % probability of getting a χ 2 worse than observed value by chance A bit dodgy: only 1 % probability of getting a χ 2 worse than observed value by chance Prof. M.A. Thomson Lent 2015 79 " What about a very small value of chi-squared, e.g. " Expected " Observed value is much smaller " " " Conclude 1/1000000 chance of getting such a small value: highly suspicious " What could be wrong: Errors not estimated correctly Prof. M.A. Thomson Lent 2015 80

Log-Likelihood! The chi-squared-n distribution is just a re-expression of a Gaussian for N-variables! If the expected numbers of events are low, have to base consistency of data and prediction on Poisson statistics " For the ith bin: " Therefore the joint probability of obtaining exactly the observed, i.e. the likelihood " Convenient to take the natural logarithm (hence log-likelihood) Poisson distributed variables! The likelihood is often very small. It is the probability of obtaining exactly the observed numbers of events in each bin " For above distribution Prof. M.A. Thomson Lent 2015 81! What constitutes a good value of log-likelihood? " i.e. for the distribution shown, does imply good agreement?! There is no simple answer to this question! Unlike for the chi-squared distribution, there is no general analytic form! One practical way of assessing the consistency, is to generate many toy MC distributions according to expected distributions Float_t LogLikelihood = 1.0; TRandom2* r = new TRandom2(); for( Int_t i=1; i<= nbins; i++){ Float_t expected = hist->getbincontent(i); Int_t nobs = r->poisson(expected); Double_t prob = TMath::PoissonI(nObs,expected); LogLikelihood += log(prob); } Bad Agreement Good Agreement! Hence have obtain expected lnl distribution for particular problem Prof. M.A. Thomson Lent 2015 82

Relationship between chi-squared and likelihood! For Gaussian distributed variables Chi-squared ½ is lnl for Gaussian distributed variables Prof. M.A. Thomson Lent 2015 83 Chi-Squared Fitting: Gaussian Errors! Given some data and a prediction which depends on a set of parameters, we want to determine our best estimate for the parameters.! Construct the probability:! Best estimate is the set of parameters that maximises Simple Example: " Two measurements of a quantity, x, using different methods (assume independent Gaussian errors) " What is our best estimate of the true value of x? " Maximum probability corresponds to minimum chi-squared " For Gaussian distributed variables: fitting # minimising chi-squared " Here require Prof. M.A. Thomson Lent 2015 84

" Which is the formula we found previously for averaging two measurements " Note: chi-squared is a quadratic function of the parameter x " Taylor expansion about minimum with and gives therefore Prof. M.A. Thomson Lent 2015 85 Goodness of Fit " Started with two measurements of a quantity, x " Best estimate of the true value of x, is that which maximises the likelihood, i.e. minimises chi-squared, giving a chi-squared at the minimum of: " What is the probability that our data are consistent with a common mean? i.e. how do we interpret this chi-squared minimum " Expanding and substituting for with " Here the minimum chi-squared corresponds is distributed as chi-squared for a single Gaussian measurement, i.e. 1 degree of freedom In general, the fitted minimum chi-squared is distributed as chi-squared for (number of measurements number of fitted parameters) degrees of freedom Prof. M.A. Thomson Lent 2015 86

Example: Straight line fitting! Given a set of points find the best fit straight line and the uncertainties! First define the chi-squared:! Minimise chi-squared with respect to the 2 parameters describing the model Where the sums are represented by etc. Prof. M.A. Thomson Lent 2015 87! For the errors first make Taylor expansion around the minimum chi-squared: (since the function is quadratic there are no other terms) which gives an elliptical contours! In terms of the inverse error matrix! Hence with! Here! Giving Prof. M.A. Thomson Lent 2015 88

! Suppose we want to calculate the error on y as a function of x based on our fit:! It is worth noting that the correlation coefficient is proportional to, hence one could fit after making the transformation such that and the uncertainties on the new intercept and gradient become uncorrelated Prof. M.A. Thomson Lent 2015 89 Binned Maximum Likelihood Fits! So far only considered chi-squared fitting (i.e. assumes Gaussian errors)! In particle physics often dealing with low numbers of events and need to account for the Poisson nature of the data! In general can write down the joint probability " e.g. if we predict a distribution should follow a first order polynomial " and measure events in bins of " define likelihood based on Poisson statistics " best estimates of parameters, defined by maximum likelihood or, equivalently, the minimum of -log-likelihood Prof. M.A. Thomson Lent 2015 90

! Maximising the likelihood corresponds to solving the set of linear equations! To estimate the errors on the parameters, expand lnl, about its minimum! Unlike the case of Gaussian errors, the " hence the resulting likelihood surface will not have a quadratic form! For the moment, restrict the discussion to a single variable Gaussian Higher Order Prof. M.A. Thomson Lent 2015 91! If resulting likelihood distribution is sufficiently Gaussian could assign an estimate of the error on our measurement as:! This is OK in the Gaussian limit, but in general it is not very useful! Usually adopt a Gaussian inspired procedure! For Gaussian distributed variables, have a parabolic chi-squared curve which gives a Gaussian likelihood with! 1 sigma errors defined the points where! Or, EQUIVALENTLY, where the Relative Likelihood compared to the maximum likelihood decreases by Prof. M.A. Thomson Lent 2015 92

e.g. assume a form Example! Not unreasonable to use estimate of 1 sigma uncertainty as the values: Similarly for the 2 sigma uncertainty:! At these points, have probabilities of and relative to maximum! BUT: no guarantee that 68 % of PDF lies within Interpretation requires some care a dark art (see later) Prof. M.A. Thomson Lent 2015 93 Binned Maximum Likelihood: Goodness of Fit! Safest way is to generate toy MC experiments, perform the fit, and thus obtain the expected lnl distribution! However, there is an invaluable trick " For Poisson errors, we minimised the function " Free to add a constant to this doesn t affect the result " Here the data are fixed and we vary the expectation " Add the lnl of observing n i given an expectation of n i Likelihood ratio Prof. M.A. Thomson Lent 2015 94

! In the limit where the are not too small, in region of best fit is small with For Poisson distributed variables with! Hence is distributed as in the limit of large n Prof. M.A. Thomson Lent 2015 95! This is a very useful trick.! When fitting a histogram with Poisson errors ALWAYS " Perform a maximum likelihood, not a chi-squared fit " Use the likelihood ratio NOTE " Best fit parameters determined by " At the best fit point -2lnλ tends to a chi-squared distribution for n-m d.o.f. Prof. M.A. Thomson Lent 2015 96

Unbinned Maximum Likelihood Fits! For some applications, binning data results in a loss of precision " e.g. sparse data and a rapidly varying prediction! In other cases there is simply no need to bin data: unbinned maximum likelihood NOTE: this is a shape-only fit: normalisation doesn t enter! Suppose we can construct the predicted PDF for the data as a function of the parameter of interest " e.g. make N measurements of decay time,, and want to estimate lifetime Write down NORMALISED PDF We can now write down the likelihood of obtaining our set of data Obtain lifetime by maximising likelihood (or equivalently lnl) Prof. M.A. Thomson Lent 2015 97 Since we now know what we are doing, it is immediately obvious that the error is not symmetric i.e. the expected, but not entirely obvious result For the error estimate take the second derivative The likelihood function is Usual to quote with asymmetric errors, e.g. Prof. M.A. Thomson Lent 2015 98

Another Example! Forward-Backward asymmetry at LEP Expected angular distribution of form But measured angular distribution depends on efficiency e µ + µ e + -1 +1-1 +1 However is not known (at least not precisely) But it is known to be symmetric Sufficient to construct an unbinned maximum likelihood fit First write down the PDF in terms of the unknown efficiency For unbinned maximum likelihood fit PDF must be normalised Prof. M.A. Thomson Lent 2015 99 Since is symmetric normalisation gives i.e. independent of A Hence the normalised PDF is of the form For the N observed values the log-likelihood function becomes: For a maximum Which can be solved (preferably minimised by MINUIT) despite the fact we don t know the precise form of the PDF Prof. M.A. Thomson Lent 2015 100

Extended Maximum Likelihood Fits! Unbinned maximum likelihood uses only shape information! In the extended maximum likelihood fit include normalisation! Suppose you observe events with and expect a total of with PDF which is a function of some parameter you wish to measure Poisson Unbinned ML! Just for fun suppose our PDF is binned with an expectation of in each bin if there are events observed in each bin Prof. M.A. Thomson Lent 2015 101 For a given data set this is a constant Binned Poisson Likelihood Can you see where this comes from? Hence our previous expression for Binned Maximum likelihood is just an Extended Maximum Likelihood fit with a binned PDF Prof. M.A. Thomson Lent 2015 102

Fitting Summary! Have covered main fitting/goodness of fit issues: " definition of chi-squared " chi-squared fitting " definition of likelihood functions and relation to chi-squared " likelihood fitting techniques! Next time we will consider more carefully the interpretation Prof. M.A. Thomson Lent 2015 103 Appendix The Error Function! For a single variable the Chi-Squared Probability! Change variable! Change variable again and integrate over positive values only Error function Prof. M.A. Thomson Lent 2015 104

! The probability of obtaining a value of by chance is: Complement of the Error Function! This is nothing more than a different way of expressing a 1D Gaussian distribution (or more correctly its two-sided integral) Prof. M.A. Thomson Lent 2015 105