Covariance & Correlation

Similar documents
THE METHOD OF LEAST SQUARES THE METHOD OF LEAST SQUARES

Causal, Explanatory Forecasting. Analysis. Regression Analysis. Simple Linear Regression. Which is Independent? Forecasting

8.5 UNITARY AND HERMITIAN MATRICES. The conjugate transpose of a complex matrix A, denoted by A*, is given by

SIMPLE LINEAR CORRELATION

benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ).

What is Candidate Sampling

Quantization Effects in Digital Filters

1. Measuring association using correlation and regression

SPEE Recommended Evaluation Practice #6 Definition of Decline Curve Parameters Background:

Latent Class Regression. Statistics for Psychosocial Research II: Structural Models December 4 and 6, 2006

How To Calculate The Accountng Perod Of Nequalty

PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 12

CHAPTER 14 MORE ABOUT REGRESSION

CHAPTER 5 RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES

This circuit than can be reduced to a planar circuit

NMT EE 589 & UNM ME 482/582 ROBOT ENGINEERING. Dr. Stephen Bruder NMT EE 589 & UNM ME 482/582

CS 2750 Machine Learning. Lecture 3. Density estimation. CS 2750 Machine Learning. Announcements

CHOLESTEROL REFERENCE METHOD LABORATORY NETWORK. Sample Stability Protocol

The OC Curve of Attribute Acceptance Plans

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

NPAR TESTS. One-Sample Chi-Square Test. Cell Specification. Observed Frequencies 1O i 6. Expected Frequencies 1EXP i 6

Support Vector Machines

Calculation of Sampling Weights

Portfolio Loss Distribution

Logistic Regression. Lecture 4: More classifiers and classes. Logistic regression. Adaboost. Optimization. Multiple class classification

Recurrence. 1 Definitions and main statements

L10: Linear discriminants analysis

STATISTICAL DATA ANALYSIS IN EXCEL

Calibration and Linear Regression Analysis: A Self-Guided Tutorial

THE DISTRIBUTION OF LOAN PORTFOLIO VALUE * Oldrich Alfons Vasicek

The Mathematical Derivation of Least Squares

Economic Interpretation of Regression. Theory and Applications

BERNSTEIN POLYNOMIALS

1 Example 1: Axis-aligned rectangles

HÜCKEL MOLECULAR ORBITAL THEORY

An Alternative Way to Measure Private Equity Performance

n + d + q = 24 and.05n +.1d +.25q = 2 { n + d + q = 24 (3) n + 2d + 5q = 40 (2)

Logistic Regression. Steve Kroon

Mean Molecular Weight

GRAVITY DATA VALIDATION AND OUTLIER DETECTION USING L 1 -NORM

) of the Cell class is created containing information about events associated with the cell. Events are added to the Cell instance

Linear Circuits Analysis. Superposition, Thevenin /Norton Equivalent circuits

The Greedy Method. Introduction. 0/1 Knapsack Problem

v a 1 b 1 i, a 2 b 2 i,..., a n b n i.

Least Squares Fitting of Data

where the coordinates are related to those in the old frame as follows.

Section 5.4 Annuities, Present Value, and Amortization

A Probabilistic Theory of Coherence

Texas Instruments 30X IIS Calculator

Lecture 2: Single Layer Perceptrons Kevin Swingler

Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College

Luby s Alg. for Maximal Independent Sets using Pairwise Independence

Exhaustive Regression. An Exploration of Regression-Based Data Mining Techniques Using Super Computation

Can Auto Liability Insurance Purchases Signal Risk Attitude?

How Sets of Coherent Probabilities May Serve as Models for Degrees of Incoherence

Regression Models for a Binary Response Using EXCEL and JMP

Section 2 Introduction to Statistical Mechanics

Hedging Interest-Rate Risk with Duration

Answer: A). There is a flatter IS curve in the high MPC economy. Original LM LM after increase in M. IS curve for low MPC economy

Inter-Ing INTERDISCIPLINARITY IN ENGINEERING SCIENTIFIC INTERNATIONAL CONFERENCE, TG. MUREŞ ROMÂNIA, November 2007.

Characterization of Assembly. Variation Analysis Methods. A Thesis. Presented to the. Department of Mechanical Engineering. Brigham Young University

Analysis of Premium Liabilities for Australian Lines of Business

A machine vision approach for detecting and inspecting circular parts

Efficient Project Portfolio as a tool for Enterprise Risk Management

Introduction to Statistical Physics (2SP)

Traffic-light a stress test for life insurance provisions

IS-LM Model 1 C' dy = di

Energies of Network Nastsemble

Fisher Markets and Convex Programs

Institute of Informatics, Faculty of Business and Management, Brno University of Technology,Czech Republic

Extending Probabilistic Dynamic Epistemic Logic

Chapter 7: Answers to Questions and Problems

Series Solutions of ODEs 2 the Frobenius method. The basic idea of the Frobenius method is to look for solutions of the form 3

ANALYZING THE RELATIONSHIPS BETWEEN QUALITY, TIME, AND COST IN PROJECT MANAGEMENT DECISION MAKING

Lecture 3: Force of Interest, Real Interest Rate, Annuity

Vasicek s Model of Distribution of Losses in a Large, Homogeneous Portfolio

Joe Pimbley, unpublished, Yield Curve Calculations

Properties of real networks: degree distribution

1 De nitions and Censoring

Statistical Methods to Develop Rating Models

We are now ready to answer the question: What are the possible cardinalities for finite fields?

1.2 DISTRIBUTIONS FOR CATEGORICAL DATA

Evaluating credit risk models: A critique and a new proposal

Credit Limit Optimization (CLO) for Credit Cards

total A A reag total A A r eag

Rate-Based Daily Arrival Process Models with Application to Call Centers

Multiple stage amplifiers

Brigid Mullany, Ph.D University of North Carolina, Charlotte

Prediction of Disability Frequencies in Life Insurance

ECONOMICS OF PLANT ENERGY SAVINGS PROJECTS IN A CHANGING MARKET Douglas C White Emerson Process Management

How To Find The Dsablty Frequency Of A Clam

21 Vectors: The Cross Product & Torque

NONLINEAR OPTIMIZATION FOR PROJECT SCHEDULING AND RESOURCE ALLOCATION UNDER UNCERTAINTY

Project Networks With Mixed-Time Constraints

QUESTIONS, How can quantum computers do the amazing things that they are able to do, such. cryptography quantum computers

How To Understand The Results Of The German Meris Cloud And Water Vapour Product

Richard W. Andrews and William C. Birdsall, University of Michigan Richard W. Andrews, Michigan Business School, Ann Arbor, MI

Problem Set 3. a) We are asked how people will react, if the interest rate i on bonds is negative.

1. Fundamentals of probability theory 2. Emergence of communication traffic 3. Stochastic & Markovian Processes (SP & MP)

Software Alignment for Tracking Detectors

Transcription:

Covarance & Correlaton The covarance between two varables s defned by: cov x,y= x x y y =xyxy Ths s the most useful thng they never tell you n most lab courses! Note that cov(x,x)=v(x). The correlaton coeffcent s a untless verson of the same thng: cov x,y = x y If x and y are ndependent varables (P(x,y) = P(x)P(y)), then cov x,y=dxdyp x,yxy dxdyp x,yx dxdyp x,yy =dxp xxdyp yy dxp xx dyp yy= 0 Physcs 509 9

More on Covarance Correlaton coeffcents for some smulated data sets. Note the bottom rght---whle ndependent varables must have zero correlaton, the reverse s not true! Correlaton s mportant because t s part of the error propagaton equaton, as we'll see. Physcs 509 10

Varance and Covarance of Lnear Combnatons of Varables Suppose we have two random varable X and Y (not necessarly ndependent), and that we know cov(x,y). Consder the lnear combnatons W=aX+bY and Z=cX+dY. It can be shown that cov(w,z)=cov(ax+by,cx+dy) = cov(ax,cx) + cov(ax,dy) + cov(by,cx) + cov(by,dy) = ac cov(x,x) + (ad + bc) cov(x,y) + bd cov(y,y) = ac V(X) + bd V(Y) + (ad+bc) cov(x,y) Specal case s V(X+Y): V(X+Y) = cov(x+y,x+y) = V(X) + V(Y) + cov(x,y) Very specal case: varance of the sum of ndependent random varables s the sum of ther ndvdual varances! Physcs 509 11

Gaussan Dstrbutons By far the most useful dstrbuton s the Gaussan (normal) dstrbuton: P x,= 1 1 x e Mean = µ, Varance=σ Note that wdth scales wth σ. Area out on tals s mportant---use lookup tables or cumulatve dstrbuton functon. In plot to left, red area (>σ) s.3%. 68.7% of area wthn ±1σ 95.45% of area wthn ±σ 99.73% of area wthn ±3σ 90% of area wthn ±1.645σ 95% of area wthn ±1.960σ 99% of area wthn ±.576σ Physcs 509 1

Why are Gaussan dstrbutons so crtcal? They occur very commonly---the reason s that the average of several ndependent random varables often approaches a Gaussan dstrbuton n the lmt of large N. Nce mathematcal propertes---nfntely dfferentable, symmetrc. Sum or dfference of two Gaussan varables s always tself Gaussan n ts dstrbuton. Many complcated formulas smplfy to lnear algebra, or even smpler, f all varables have Gaussan dstrbutons. Gaussan dstrbuton s often used as a shorthand for dscussng probabltes. A 5 sgma result means a result wth a chance probablty that s the same as the tal area of a unt Gaussan: 5 dtp t=0,=1 Ths way of speakng s used even for non-gaussan dstrbutons! Physcs 509 13

Why you should be very careful wth Gaussans.. The major danger of Gaussans s that they are overused. Although many dstrbutons are approxmately Gaussan, they often have long non-gaussan tals. Whle 99% of the tme a Gaussan dstrbuton wll correctly model your data, many foul-ups result from that other 1%. It's usually good practce to smulate your data to see f the dstrbutons of quanttes you thnk are Gaussan really follow a Gaussan dstrbuton. Common example: the rato of two numbers wth Gaussan dstrbutons s tself often not very Gaussan (although n certan lmts t may be). Physcs 509 14

Revew of covarances of jont PDFs Consder some multdmensonal PDF p(x 1... x n ). We defne the covarance between any two varables by: covx,x j =dxpx x x x j x j The set of all possble covarances defnes a covarance matrx, often denoted by V j. The dagonal elements of V j are the varances of the ndvdual varables, whle the off-dagonal elements are related to the correlaton coeffcents: 1 1 1... 1n 1 n ]... n n n1 1 n n n... n =[ V 1 1 n j Physcs 509 10

Propertes of covarance matrces Covarance matrces always: are symmetrc and square are nvertble (very mportant requrement!) The most common use of a covarance matrx s to nvert t then use t to calculate a χ : = j y f x V j 1 y j f x j If the covarances are zero, then V j =δ j σ, and ths reduces to: = y f x Warnng: do NOT use the smplfed formula f data ponts are correlated! Physcs 509 11

Approxmatng the peak of a PDF wth a multdmensonal Gaussan Suppose we have some complcatedlookng PDF n D that has a well-defned peak. How mght we approxmate the shape of ths PDF around ts maxmum? Physcs 509 1

Taylor Seres expanson Consder a Taylor seres expanson of the logarthm of the PDF around ts maxmum at (x 0,y 0 ): logpx,y=p 0 Axx 0 Byy 0 Cxx 0 Dyy 0 Exx 0 yy 0... Snce we are expandng around the peak, then the frst dervatves must equal zero, so A=B=0. The remanng terms can be wrtten n matrx form: logpx,yp 0 x,y C D E x E y In order for (x 0,y 0 ) to be a maxmum of the PDF (and not a mnmum or saddle pont), the above matrx must be postve defnte, and therefore nvertble. Physcs 509 13

Taylor Seres expanson Let me now suggestvely denote the nverse of the above matrx by V j. It's a postve defnte matrx wth three parameters. In fact, I mght as well call these parameters σ x, σ y, and ρ. Exponentatng, we see that around ts peak the PDF can be approxmated by a multdmensonal Gaussan. The full formula, ncludng normalzaton, s Px,y= logpx,yp 0 x,y C D E x E y { 1 x y 1 exp 1 [ xx 0 1 x yy 0 y xx yy ]} 0 0 x y Ths s a good approxmaton as long as hgher order terms n Taylor seres are small. Physcs 509 14

Interpretaton of multdmensonal Gaussan Px,y= { 1 x y 1 exp 1 [ xx 0 1 x yy 0 y xx 0 x yy 0 y ]} Can I drectly relate the free parameters to the covarance matrx? Frst calculate P(x) by margnalzng over y: Pxexp{ 1 1 xx 0 x Pxexp{ 1 1 xx 0 x } } { dy exp 1 [ yy 0 1 y dy exp{ 1 [ yy 0 1 y xx 0 x yy 0 y ]} xx yy xx 0 0 0 xx 0 x y x x Pxexp{ 1 xx } 0 dy 1 x exp{ 1 [ yy 0 1 y xx 0 xx 0 } x } x Pxexp{ 1 xx } 0 1 exp { xx 0 x 1 =exp { 1 xx 0 x So we get a Gaussan wth wdth σ x. Calculatons of σ y smlar, and can also show that ρ s correlaton coeffcent. x Physcs 509 15 ]} ]}

P(x y) Px,y= { 1 x y 1 exp 1 [ xx 0 1 x yy 0 y xx yy ]} 0 0 x y Note: f you vew y as a fxed parameter, then the PDF P(x y) s a Gaussan wth wdth of: x 1 and a mean value of x 0 x y yy 0 (It makes sense that the wdth of P(x y) s always narrower than the wdth of the margnalzed PDF P(x) (ntegrated over y). If you know the actual value of y, you have addtonal nformaton and so a tghter constrant on x. Physcs 509 16

σ x = σ y =1 ρ=0.8 Red ellpse: contour wth argument of exponental set to equal -1/ Blue ellpse: contour contanng 68% of D probablty content. Physcs 509 17

Contour ellpses Px,y= { 1 x y 1 exp 1 [ xx 0 1 x The contour ellpses are defned by settng the argument of the exponent equal to a constant. The exponent equals -1/ on the red ellpse from the prevous graph. Parameters of ths ellpse are: tan = x y x y yy 0 y xx yy ]} 0 0 x y u = cos x sn y cos sn v = cos y sn x cos sn Physcs 509 18

Probablty content nsde a contour ellpse For a 1D Gaussan exp(-x /σ ), the ±1σ lmts occur when the argument of the exponent equals -1/. For a Gaussan there's a 68% chance of the measurement fallng wthn around the mean. But for a D Gaussan ths s not the case. Easest to see ths for the smple case of σ x =σ y =1: 1 dxdy exp[ 1 x y ] r = 0 dr exp[ 1 0 r ] =0.68 Evaluatng ths ntegral and solvng gves r 0 =.3. So 68% of probablty content s contaned wthn a radus of σ(.3). We call ths the D contour. Note that t's bgger than the 1D verson---f you pck ponts nsde the 68% contour and plot ther x coordnates, they'll span a wder range than those pcked from the 68% contour of the 1D margnalzed PDF! Physcs 509 19

σ x = σ y =1 ρ =0.8 Red ellpse: contour wth argument of exponental set to equal -1/ Blue ellpse: contour contanng 68% of probablty content. Physcs 509 0

Margnalzaton by mnmzaton Normal margnalzaton procedure: ntegrate over y. For a multdmensonal Gaussan, ths gves the same answer as fndng the extrema of the ellpse---for every x, fnd the the value of y that maxmzes the lkelhood. For example, at x=± the value of y whch maxmzes the lkelhood s just where the dashed lne touches the ellpse. The value of the lkelhood at that pont then s the value P(x) Physcs 509 1

Two margnalzaton procedures Normal margnalzaton procedure: ntegrate over nusance varables: Px=dyPx,y Alternate margnalzaton procedure: maxmze the lkelhood as a functon of the nusance varables, and return the result: Pxmax y Px,y (It s not necessarly the case that the resultng PDF s normalzed.) I can prove for Gaussan dstrbutons that these two margnalzaton procedures are equvalent, but cannot prove t for the general case (In fact they gve dfferent results). Bayesans always follow the frst prescrpton. Frequentsts most often use the second. Sometmes t wll be computatonally easer to apply one, sometmes the other, even for PDFs that are approxmately Gaussan. Physcs 509

Maxmum lkelhood estmators By far the most useful estmator s the maxmum lkelhood method. Gven your data set x 1... x N and a set of unknown parameters α, calculate the lkelhood functon N Lx 1...x N = Px =1 It's more common (and easer) to calculate -ln L nstead: N lnlx 1...x N = lnpx =1 The maxmum lkelhood estmator s that value of α whch maxmzes L as a functon of α. It can be found by mnmzng -ln L over the unknown parameters. Physcs 509 1

Smple example of an ML estmator Suppose that our data sample s drawn from two dfferent dstrbutons. We know the shapes of the two dstrbutons, but not what fracton of our populaton comes from dstrbuton A vs. B. We have 0 random measurements of X from the populaton. P A x= 1e ex P B x=3x P tot x=f P A x1f P B x Physcs 509 13

Form for the log lkelhood and the ML estmator Suppose that our data sample s drawn from two dfferent dstrbutons. We know the shapes of the two dstrbutons, but not what fracton of our populaton comes from dstrbuton A vs. B. We have 0 random measurements of X from the populaton. P tot x=f P A x1f P B x Form the negatve log lkelhood: N lnlf= lnp tot x f =1 Mnmze -ln(l) wth respect to f. Sometmes you can solve ths analytcally by settng the dervatve equal to zero. More often you have to do t numercally. Physcs 509 14

Graph of the log lkelhood The graph to the left shows the shape of the negatve log lkelhood functon vs. the unknown parameter f. The mnmum s f=0.415. Ths s the ML estmate. As we'll see, the 1σ error range s defned by ln(l)=0.5 above the mnmum. The data set was actually drawn from a dstrbuton wth a true value of f=0.3 Physcs 509 15

Errors on ML estmators In the lmt of large N, the log lkelhood becomes parabolc (by CLT). Comparng to ln(l) for a smple Gaussan: lnl=l 0 1 f f f t s natural to dentfy the 1σ range on the parameter by the ponts as whch ln(l)=½. σ range: ln(l)=½() = 3σ range: ln(l)=½(3) =4.5 Ths s done even when the lkelhood sn't parabolc (although at some perl). Physcs 509 18

Parabolcty of the log lkelhood In general the log lkelhood becomes more parabolc as N gets larger. The graphs at the rght show the negatve log lkelhoods for our example problem for N=0 and N=500. The red curves are parabolc fts around the mnmum. How large does N have to be before the parabolc approxmaton s good? That depends on the problem---try graphng -ln(l) vs your parameter to see how parabolc t s. Physcs 509 19

Asymmetrc errors from ML estmators Even when the log lkelhood s not Gaussan, t's nearly unversal to defne the 1σ range by ln(l)=½. Ths can result n asymmetrc error bars, such as: 0.17 0.41 0.15 The justfcaton often gven for ths s that one could always reparameterze the estmated quantty nto one whch does have a parabolc lkelhood. Snce ML estmators are supposed to be nvarant under reparameterzatons, you could then transform back to get asymmetrc errors. Does ths procedure actually work? Physcs 509 0

Coverage of ML estmator errors What do we really want the ML error bars to mean? Ideally, the 1σ range would mean that the true value has 68% chance of beng wthn that range. Fracton of tme 1σ range ncludes N true value 5 56.7% 10 64.8% 0 68.0% 500 67.0% Dstrbuton of ML estmators for two N values Physcs 509 1

Errors on ML estmators Smulaton s the best way to estmate the true error range on an ML estmator: assume a true value for the parameter, and smulate a few hundred experments, then calculate ML estmates for each. N=0: Range from lkelhood functon: -0.16 / +0.17 RMS of smulaton: 0.16 N=500: Range from lkelhood functon: -0.030 / +0.035 RMS of smulaton: 0.030 Physcs 509

Lkelhood functons of multple parameters Often there s more than one free parameter. To handle ths, we smply mnmze the negatve log lkelhood over all free parameters. lnlx 1...x N a 1...a m a j =0 Errors determned by (n the Gaussan approxmaton): cov 1 a,a j = lnl a a j evaluated at mnmum Physcs 509 3

Error contours for multple parameters We can also fnd the errors on parameters by drawng contours on ln L. 1σ range on a sngle parameter a: the smallest and largest values of a that gve ln L=½, mnmzng ln L over all other parameters. But to get jont error contours, must use dfferent values of ln L (see Num Rec Sec 15.6): m=1 m= m=3 68.00% 0.5 1.15 1.77 90.00% 1.36.31 3.13 95.40% 3.09 4.01 99.00% 3.3 4.61 5.65 Physcs 509 4

Maxmum Lkelhood wth Gaussan Errors Suppose we want to ft a set of ponts (x,y ) to some model y=f(x α), n order to determne the parameter(s) α. Often the measurements wll be scattered around the model wth some Gaussan error. Let's derve the ML estmator for α. N L= =1 The log lkelhood s then 1 [ exp 1 lnl= 1 N y f x =1 Maxmzng ths s equvalent to mnmzng y f x N =1 ] ln N = =1 y f x Physcs 509 3

The Least Squares Method Taken outsde the context of the ML method, the least squares method s the most commonly known estmator. Why? N = =1 y f x 1) Easly mplemented. ) Graphcally motvated (see ttle slde!) 3) Mathematcally straghtforward---often analytc soluton 4) Extenson of LS to correlated uncertantes straghtforward: N N = y f x y f x j V 1 j =1 j=1 Physcs 509 4

Least Squares Straght Lne Ft The most straghtforward example s a lnear ft: y=mx+b. = y mx b Least squares estmators for m and b are found by dfferentatng χ wth respect to m & b. d dm = y mx b d db = y mx b x =0 =0 Ths s a lnear system of smultaneous equatons wth two unknowns. Physcs 509 5

Solvng for m and b The most straghtforward example s a lnear ft: y=mx+b. d dm = y mx b x =0 d db = y mx b =0 x y =m x b x y =m x b 1 m= y b= y x x m x 1 1 x x y 1 (Specal case of equal σ's.) m=yxxy x x b=ymx Physcs 509 6

Soluton for least squares m and b There's a nce analytc soluton---rather than tryng to numercally mnmze a χ, we can just plug n values nto the formulas! Ths worked out ncely because of the very smple form of the lkelhood, due to the lnearty of the problem and the assumpton of Gaussan errors. m= y x x 1 x x y 1 (Specal case of equal errors) m=yxxy x x b= y m x 1 b=ymx Physcs 509 7

Errors n the Least Squares Method What about the errors and correlatons between m and b? Smplest way to derve ths s to look at the ch-squared, and remember that ths s a specal case of the ML method: lnl= 1 = 1 y mx b In the ML method, we defne the 1σ error on a parameter by the mnmum and maxmum value of that parameter satsfyng ln L=½. In LS method, ths corresponds to χ =+1 above the best-ft pont. Two sgma error range corresponds to χ =+4, 3σ s χ =+9, etc. But notce one thng about the dependence of the χ ---t s quadratc n both m and b, and generally ncludes a cross-term proportonal to mb. Concluson: Gaussan uncertantes on m and b, wth a covarance between them. Physcs 509 8

Formulas for Errors n the Least Squares Method We can also derve the errors by relatng the χ to the negatve log lkelhood, and usng the error formula: cov 1 a,a j = lnl a a j = lnl a a j a=a = 1 a a j a=a m = 1 1/ 1 x x = N 1 x x b = 1 1/ x x x = N cov m, b= 1 1/ x x x x xx = N (ntutve when <x>=0) x x x Physcs 509 10

Nonlnear least squares The dervaton of the least squares method doesn't depend on the assumpton that your fttng functon s lnear n the parameters. Nonlnear fts, such as A + B sn(ct + D), can be tackled wth the least squares technque as well. But thngs aren't nearly as nce: No closed form soluton---have to mnmze the χ numercally. Estmators are no longer guaranteed to have zero bas and mnmum varance. Contours generated by χ =+1 no longer are ellpses, and the tangents to these contours no longer gve the standard devatons. (However, we can stll nterpret them as gvng 1σ errors--- although snce the dstrbuton s non-gaussan, ths error range sn't the same thng as a standard devaton Be very careful wth mnmzaton routnes---dependng on how badly non-lnear your problem s, there may be multple solutons, local mnma, etc. Physcs 509 18

Goodness of ft for least squares By now you're probably wonderng why I haven't dscussed the use of χ as a goodness of ft parameter. Partly ths s because parameter estmaton and goodness of ft are logcally separate thngs---f you're CERTAIN that you've got the correct model and error estmates, then a poor χ can only be bad luck, and tells you nothng about how accurate your parameter estmates are. Carefully dstngush between: 1) Value of χ at mnmum: a measure of goodness of ft ) How quckly χ changes as a functon of the parameter: a measure of the uncertanty on the parameter. Nonetheless, a major advantage of the χ approach s that t does automatcally generate a goodness of ft parameter as a byproduct of the ft. As we'll see, the maxmum lkelhood method doesn't. How does ths work? Physcs 509 19

χ as a goodness of ft parameter Remember that the sum of N Gaussan varables wth zero mean and unt RMS, when squared and added, follows a χ dstrbuton wth N degrees of freedom. Compare to the least squares formula: = j y f x y j f x j V 1 j If each y s dstrbuted around the functon accordng to a Gaussan, and f(x α) s a lnear functon of the m free parameters α, and the error estmates don't depend on the free parameters, then the best-ft least squares quantty we call χ actually follows a χ dstrbuton wth N-m degrees of freedom. People usually gnore these varous caveats and assume ths works even when the parameter dependence s non-lnear and the errors aren't Gaussan. Be very careful wth ths, and check wth smulaton f you're not sure. Physcs 509 0

Goodness of ft: an example Does the data sample, known to have Gaussan errors, ft acceptably to a constant (flat lne)? 6 data ponts 1 free parameter = 5 d.o.f. χ = 8.85/5 d.o.f. Chance of gettng a larger χ s 1.5%---an acceptable ft by almost anyone's standard. Flat lne s a good ft. Physcs 509 1

Dstncton between goodness of ft and parameter estmaton Now f we ft a sloped lne to the same data, s the slope consstent wth flat. χ s obvously gong to be somewhat better. But slope s 3.5σ dfferent from zero! Chance probablty of ths s 0.000. How can we smultaneously say that the same data set s acceptably ft by a flat lne and has a slope that s sgnfcantly larger than zero??? Physcs 509

Dstncton between goodness of ft and parameter estmaton Goodness of ft and parameter estmaton are answerng two dfferent questons. 1) Goodness of ft: s the data consstent wth havng been drawn from a specfed dstrbuton? ) Parameter estmaton: whch of the followng lmted set of hypotheses s most consstent wth the data? One way to thnk of ths s that a χ goodness of ft compares the data set to all the possble ways that random Gaussan data mght fluctuate. Parameter estmaton chooses the best of a more lmted set of hypotheses. Parameter estmaton s generally more powerful, at the expense of beng more model-dependent. Complant of the statstcally llterate: Although you say your data strongly favours soluton A, doesn't soluton B also have an acceptable χ /dof close to 1? Physcs 509 3

What s an error bar? Someone hands you a plot lke ths. What do the error bars ndcate? Answer: you can never be sure, unless t's specfed! Most common: vertcal error bars ndcate ±1σ uncertantes. Horzontal error bars can ndcate uncertanty on X coordnate, or can ndcate bnnng. Correlatons unknown! Physcs 509

Relaton of an error bar to PDF shape The error bar on a plot s most often meant to represent the ±1σ uncertanty on a data pont. Bayesans and frequentsts wll dsagree on what that means. If data s dstrbuted normally around true value, t's clear what s ntended: exp[-(x-µ) /σ ]. But for asymmetrc dstrbutons, dfferent thngs are sometmes meant... Physcs 509 3

An error bar s a shorthand approxmaton to a PDF! In an deal Bayesan unverse, error bars don't exst. Instead, everyone wll use the full pror PDF and the data to calculate the posteror PDF, and then report the shape of that PDF (preferably as a graph or table). An error bar s really a shorthand way to parameterze a PDF. Most often ths means pretendng the PDF s Gaussan and reportng ts mean and RMS. Many sns wth error bars come from assumng Gaussan dstrbutons when there aren't any. Physcs 509 4

An error bar as a confdence nterval Frequentst technques don't drectly answer the queston of what the probablty s for a parameter to have a partcular value. All you can calculate s the probablty of observng your data gven a value of the parameter.the confdence nterval constructon s a dodge to get around ths. Startng pont s the PDF for the estmator, for a fxed value of the parameter. The estmator has probablty 1 α β to fall n the whte regon. Physcs 509 5

The ln(l) rule It s not trval to construct proper frequentst confdence ntervals. Most often an approxmaton s used: the confdence nterval for a sngle parameter s defned as the range n whch ln(l max )-ln(l)<0.5 Ths s only an approxmaton, and does not gve exactly the rght coverage when N s small. More generally, f you have d free parameters, then the quantty ω = χ = [ln(l max )-ln(l)] approxmates a χ wth d degrees of freedom. For experts: there do exst correctons to the ln(l) rule that more accurately approxmate coverage---see Bartlett's correcton. Often MC s better way to go. Physcs 509 7

Error-weghted averages Suppose you have N ndependent measurements of a quantty. You average them. The proper error-weghted average s: x x= / 1/ Vx= 1 1/ If all of the uncertantes are equal, then ths reduces to the smple arthmetc mean, wth V(<x>) = V(x)/N. Physcs 509 8

Averagng correlated measurements II The obvous generalzaton for correlated uncertantes s to form the χ ncludng the covarance matrx: = j x x j V 1 j We fnd the best value of µ by mnmzng ths χ and can then fnd the 1σ uncertantes on µ by fndng the values of µ for whch χ = χ mn + 1. Ths s really parameter estmaton wth one varable. The best-ft value s easy enough to fnd: =,j x j V 1 j V 1 j,j Physcs 509 11

Averagng correlated measurements III Recognzng that the χ really just s the argument of an exponental defnng a Gaussan PDF for µ... = j x x j V 1 j we can n fact read off the coeffcent of µ, whch wll be 1/V(µ): 1 = V 1 j,j In general ths can only be computed by nvertng the matrx as far as I know. Physcs 509 1

The error propagaton equaton Let f(x,y) be a functon of two varables, and assume that the uncertantes on x and y are known and small. Then: f = df dx df x dy y df dx df dy x y The assumptons underlyng the error propagaton equaton are: covarances are known f s an approxmately lnear functon of x and y over the span of x±dx or y±dy. The most common mstake n the world: gnorng the thrd term. Intro courses gnore ts exstence entrely! Physcs 509 17

Example: nterpolatng a straght lne ft Straght lne ft y=mx+b Reported values from a standard fttng package: m = 0.658 ± 0.056 b = 6.81 ±.57 Estmate the value and uncertanty of y when x=45.5: y=0.658*45.5+6.81=36.75 dy=.57 45.5.056 =3.6 UGH! NONSENSE! Physcs 509 18

Example: straght lne ft, done correctly Here's the correct way to estmate y at x=45.5. Frst, I fnd a better ftter, whch reports the actual covarance matrx of the ft: m = 0.0658 +.056 b = 6.81 +.57 ρ = -0.9981 dy=.57 0.05645.5 0.99810.05645.5.57=0.16 (Snce the uncertanty on each ndvdual data pont was 0.5, and the fttng procedure effectvely averages out ther fluctuatons, then we expect that we could predct the value of y n the meat of the dstrbuton to better than 0.5.) Food for thought: f the correlatons matter so much, why don't most fttng programs report them routnely??? Physcs 509 19

Reducng correlatons n the straght lne ft The strong correlaton between m and b results from the long lever arm--- snce you must extrapolate lne to x=0 to determne b, a bg error on m makes a bg error on b. You can avod strong correlatons by usng more sensble parameterzatons: for example, ft data to y=b'+m(x-45.5): b' = 36.77 ± 0.16 m = 0.658 ±.085 ρ = 0.43 dy at x=45.5 = 0.16 Physcs 509 0