Dealing with systematics for chi-square and for log likelihood goodness of fit statistics
|
|
|
- Austin Chandler
- 10 years ago
- Views:
Transcription
1 Statistical Inference Problems in High Energy Physics (06w5054) Banff International Research Station July 15 July 20, 2006 Dealing with systematics for chi-square and for log likelihood goodness of fit statistics Volker Blobel Universität Hamburg 1. Global analysis of data from HEP experiments 2. Normalization errors (muliplicative) 3. Additive systematic errors 4. Correlated statistical errors 5. Parameter uncertainties: Profile likelihods and χ 2 6. Outliers and their influence on the fit Keys during display: enter = next page; = next page; = previous page; home = first page; end = last page (index, clickable); C- = back; C-N = goto page; C-L = full screen (or back); C-+ = zoom in; C- = zoom out; C-0 = fit in window; C-M = zoom to; C-F = find; C-P = print; C-Q = exit.
2 1. Global analysis of data from HEP experiments Systematic errors are at the origin of the unsatisfactory situation, when data from many experiments are used by theoreticians in a global analysis and parameter estimation, and when attempts are made to determine e.g. uncertainties of predictions from parton distribution functions. It is difficult to estimate systematic errors correctly, and the estimated errors are only as good as the model used for systematic errors. It is difficult to construct a correct chisquare expression for the estimation of parameters, taking into account the systematic contributions. The construction of the chisquare expression is often done incorrectly and, if published in a respected journal, it is often the model for the next generation of scientists. There is a tendency to allow extremely large χ 2 in an error analysis. The situation is e.g. described in the paper: D. Stump et al., Phys. Rev. D65, Volker Blobel Banff: Statistical Inference Problems in High Energy Physics and Astronomy page 2
3 Examples with large χ 2 The large, artificial and arbitrary magnification of errors is hardly acceptable the procedure points to a deep problem in the whole data analysis. Two examples from parton distribution fits: W production at the Tevatron α S (M 2 Z ) Both curves are parabolas to a very good approximation over a range of χ 2 > while usually one would consider only a range of χ 2 4, corresponding to two standard deviations.... determine the increase in χ 2 global that corresponds to our estimated uncertainty σ W in the σ W prediction.... corresponds to χ 2 global 180. Volker Blobel Banff: Statistical Inference Problems in High Energy Physics and Astronomy page 3
4 Data uncertainties in HEP experiments Statistical errors: The statistical error are given by the Poisson statistics of event numbers (n± n), without correlations between bins but numbers corrected for finite resolution are correlated = Correlated statistical errors. Normalization error: There is an uncertainty in the factor used to convert event numbers to cross sections. This so-called normalization error applies to cross sections and to statistical errors of cross sections as well. Because of its origin product of many factors, each with some uncertainty the factor perhaps follows a log-normal distribution due the multiplicative central limit theorem. Systematic errors: There are uncertainties in the detector behaviour, e.g. energy measurements by a calorimeter may have a general relative error of a few %. Often the experiments analysis is repeated with ± few % relative change of e.g. calorimeter data; from the change in the result, error contributions for all data are related to the single systematic uncertainty. A single error contribution is a rank-1 contribution to the covariance matrix of the data. Correlated statistical errors, unfolding error: The smearing effect of the detector is usually corrected for in a way, which, due to the smoothing aspect, introduces at least positive correlations between neightboring points. Several different error contributions are reported by the experiments, but some may be missing; the unfolding error (with positive correlations) usually remains unpublished. Volker Blobel Banff: Statistical Inference Problems in High Energy Physics and Astronomy page 4
5 2. Normalization errors (multiplicative) Normalization error: There is an uncertainty in the factor used to convert event numbers to cross sections. This so-called normalization error applies to cross sections and to statistical errors of cross sections as well. Because of its origin product of many factors, each with some uncertainty the factor perhaps follows a log-normal distribution due the multiplicative central limit theorem. From a paper:... Then, in addition, the fully correlated normalization error of the experiment is usually specified separately. For this reason, it is naturally to adopt the following definition for the effective χ 2 (as done in previous... analyses) χ 2 global = n w n χ 2 n(a) (n labels the different experiments), ( ) 2 1 χ 2 Nn n(a) = + N nd nl σ N n l σ D nl T nl (a) For the nth experiment, D nl, σnl D, and T nl(a) denote the data value, measurement uncertainty (statistical and systematic combined) and theoretical value (dependent on {a}) for the lth data point, σn N is the experimental normalization uncertainty, and N n is an overall normalization factor (with default value 1) for the data of experiment n.... The nuisance parameter N n should be applied as a factor to T nl (a), instead of D nl, or alternatively to both D nl and σnl D ; otherwise a normalization bias is introduced. Volker Blobel Banff: Statistical Inference Problems in High Energy Physics and Astronomy page 5 2
6 Common normalisation errors example from a publication Data are y 1 = 8.0 ± 2% and y 2 = 8.5 ± 2%, with a common (relative) normalisation error of ε = 10%. The mean value (constraint ŷ 1 = ŷ 2 ) resulting from a χ 2 minimisation of χ 2 = T V 1 is: ( ) y1 y y = 7.87 ± 0.81 i.e. < y 1 and < y 2 = y 2 y this is apparently wrong. The method used was to define a full covariance matrix for the correlated data by ( ) ( ) ( ) σ 2 V a = 1 0 y 0 σ2 2 + ε y 1 y 2 σ 2 y 1 y 2 y2 2 = 1 + ε 2 y1 2 ε 2 y 1 y 2 ε 2 y 1 y 2 σ2 2 + ε 2 y2 2 Conclusion in the paper:... that including normalisation errors in the correlation matrix will produce a fit which is biased towards smaller values... the effect is a direct consequence of the hypothesis to estimate the empirical covariance matrix, namely the linearisation on which the usual error propagation relies. But the matrix V a is wrong! Correct model: the normalisation errors ε value are identical ( ) ( ) ( ) σ 2 V b = 1 0 y 0 σ2 2 + ε 2 2 y 2 σ 2 y 2 y 2 = 1 + ε 2 y 2 ε 2 y 2 ε 2 y σ2 2 + ε 2 y 2 will give the correct result with y 1 < y < y 2. Volker Blobel Banff: Statistical Inference Problems in High Energy Physics and Astronomy page 6
7 Covariance matrix comparison Plot of one measured value vs. the other measured value, with the assumed covariance ellipse; the mean value is on the diagonal. Covariance ellipse for V a Covariance ellipse for V b Axis of ellipse is tilted w.r.t. the diagonal and ellipse touches the diagonal at a biased point. Axis of the ellipse is 45 and ellipse touches the diagonal at the correct point. The result of χ 2 minimisation may depend critically on details of the model implementation! Volker Blobel Banff: Statistical Inference Problems in High Energy Physics and Astronomy page 7
8 The method with a nuisance parameter... Another method often used is to define χ 2 a = k which will again produce a biased result. (f y k y) 2 σ 2 k + (f 1)2 ε 2, The χ 2 definition for this problem 8 χ 2 b = k (y k f y) 2 + σ 2 k (f 1)2 ε 2 will give the correct result (data unchanged and fitted value according to the model), as seen by blue curve. chi square 6 chi square(a) chi square(b) mean Y Volker Blobel Banff: Statistical Inference Problems in High Energy Physics and Astronomy page 8
9 The log-normal distribution... and the normalisation error The normalisation factor determined in an experiment is more the product (luminosity, detector acceptance, efficiency) than the sum of random variables. According to the multiplicative central limit theorem the product of positive random variables follows the log-normal distribution, i.e. the logarithm of the normalisation factor follows the normal distribution. For a log-normal distribution of a random variable α with E[α] = 1 and standard deviation of ε the contribution to the χ 2 -function S(a, α) is S norm = ln α ( 3 + (α 1)2 ε 2 ) ln α ln (1 + ε 2 ) for small ε Log-normal distribution with sigma = The normal and the log-normal distribution, both with mean 1 and standard deviation ε = 0.5. ( ) log-normal density(α) = exp [ 12 ln α 3 + ln α + ε2 ln ε ln ] 2π ln(1+ε 2 ) 8 Volker Blobel Banff: Statistical Inference Problems in High Energy Physics and Astronomy page 9
10 Aymmetric normalisation errors Proposed method, to take the normalisation error ε into account, if data from > 1 experiment are combined: Introduce one additional factor α for each experiment as nuisance parameter, which has been measured to be α = 1 ± ε, modify the expectation according to and make fit with f i = α f(x i, a) S(a) = (y i α f(x i, a)) 2 + S norm with S norm (α 1)2 = σ 2 i i ε 2 ( ) or S norm ln α = ln α 3 + lognormal distribution ln (1 + ε 2 ) Volker Blobel Banff: Statistical Inference Problems in High Energy Physics and Astronomy page 10
11 3. Additive systematic errors Systematic errors: There are uncertainties in the detector behaviour, e.g. energy measurements by a calorimeter may have a general relative error of a few %. Often the experiments analysis is repeated with ± few % relative change of e.g. calorimeter data; from the change in the result, error contributions for all data are related to the single systematic uncertainty. A single error contribution is a rank-1 contribution to the covariance matrix of the data. Experimental method: e.g. run MC for each systematic (Unisim) with constant varied by 1 σ and redetermine result determine signed shifts s i of data values y i. 1. Method: Modify covariance matrix to include contribution(s) due to systematic errors V a = V stat + V syst with V syst = ss T (rank-1 matrix) or k s k s T k and use inverse matrix V 1 as weight matrix in the χ 2 function. ( simplified calculation of inverse) 2. Method (recommended): Introduce one nuisance parameter β, assumed to be measured as 0 ± 1, for each systematic error source, and make fit with S(a) = i (y i + βs i α f(x i, a)) 2 σ 2 i + β 2 Other method: Multisim: vary all systematic parameters randomly using their assumed probability distribution and redetermine result. Volker Blobel Banff: Statistical Inference Problems in High Energy Physics and Astronomy page 11
12 Simplified calculation of inverse Assume that the inverse A 1 of the n-by-n matrix A is known. If a small change in A is done in one of the two forms below, the corresponding change in A 1 is calculated faster by the formulas below. Sherman-Morrison Formula: u and v are n-vectors and the change uv T of A is of rank 1. ( ) A + uv T 1 = A 1 1 ( A 1 u ) ( v T A 1) with λ = v T A 1 u 1 + λ Woodbury Formula: U and V are n-by-k matrices with k < n and usually k n. ( A + UV T ) 1 = A 1 [A 1 U ( 1 + V T A 1 U ) 1 V T A 1 ] Only a k-by-k matrix has to be inverted. Inversion of a 100-by-100 matrix takes a few 10 3 seconds. W.H.Press et al., Numerical Recipes, The Art of Scientific Computing, Cambridge University Press For larger k the direct methods may be faster and more accurate because of the stabilizing advantages of pivoting. Volker Blobel Banff: Statistical Inference Problems in High Energy Physics and Astronomy page 12
13 Example from an experimental paper Systematic uncertainties: Correlated systematic uncertainties: e.g. 0.2% 0.7% + 1.0% + 0.9% 0.6%... = 1.6% electron energy electron angle hadronic calibration calorimeter noise contribution photoproduction background Uncorrelated systematic uncertainties: e.g. 0.4% 0.8%... = 2.1% Monte carlo statistic trigger efficiency detector efficiency radiative corrections Total cross section uncertainty e.g. 2 3% Shown are typical (small) values for error contributions. Volker Blobel Banff: Statistical Inference Problems in High Energy Physics and Astronomy page 13
14 Example from analysis papers All the experiment included in our analysis provide full correlated systematics, as well as normalization errors. The covariance matrix can be computed from these as cov ij = ( Nsys ) σ i,k σ j,k + F i F j σn 2 + δ ij σi,t 2 ; k=1 where F i, F j are central experimental values, σ i,k are the N sys correlated systematics, σ N is the total normalization uncertainty, and the uncorrelated uncertainty σ i,t is the sum of the statistical uncertainty σ i,s and the N u uncorrelated systematic uncertainties (when present)... The inverse of the covariance matrix cov above is used as a weight matrix for a χ 2 calculation.... However... correlations between measurement errors, and correlated theoretical errors, are not included in its definition.... Instead, the evaluation of likelihoods and estimation of global uncertainty will be carried out... after sets of optimal sample PDF s for the physical variable of interest have been obtained. Comments: The quadratic combination of statistical and systematic measurement uncertainty neglects the known correlation, inherent in the systematic effect. Neither unbiased optimal parameters values nor a usable χ 2 or parameter errors can be expected from the χ 2 function. Volker Blobel Banff: Statistical Inference Problems in High Energy Physics and Astronomy page 14
15 4. Correlated statistical errors Statistical deviations per bin of measured quantities from event numbers (n ± n) are independent. The covariance matrix is diagonal. but... the selected event samples are corrected for detector acceptance and migration using the simulation and are converted to bin-centred cross sections.... The bins used in the measurement are required to have stability and purity larger than 30 %.... The stability (purity) is defined as the number of simulated events which originate from a bin and which are reconstructed in it, divided by the number of generated (reconstructed) events in that bin... A stability/purity of 30 % corresponds to a bin size narrower than 1 standard deviation of the measurement accuracy (for a flat distribution)! Up to 70 % of the events in a bin come from the neighbour bins. There must be a strong correlation of the corrected data in neighbour bins. The covariance matrix for corrected data is non-diagonal and the variances are magnified. This is visible in the eigenvalue spectrum of the migration matrix.... the published covariance matrices of statistical errors are diagonal. Volker Blobel Banff: Statistical Inference Problems in High Energy Physics and Astronomy page 15
16 Why the cov matrix is non-diagonal... 1 E Spectrum of inverse eigenvalues binwidth = 2.4 sigma 60 stability/purity 0.5 binwidth = sigma sigma/binwidth Index of eigenvalue Assuming a uniform distribution, the stability/purity is plotted as a function of the ratio of σ to binwidth. Consider bin [2, 3] with σ = binwidth: the red contribution originated from this bin neighbour bins contribute significantly to the bin [2, 3]. Assuming σ = binwidth the eigenvalue spectrum is shown for a 100-bin histogram. The eigenvalue is a magnification factor for the variance of linear combinations. Volker Blobel Banff: Statistical Inference Problems in High Energy Physics and Astronomy page 16
17 5. Parameter uncertainties: profile likelihoods and χ 2 Covariance matrix: parameter uncertainties and correlations are given by the covariance matrix V of the fit, obtained by inversion of the matrix of second derivatives (Hessian) of the log-likelihood function (Fishers Information I = V 1 ). The covariance matrix is usually assumed to be sufficient to describe the parameter uncertainties. χ 2 contour: the surface of the error ellipsoid corresponds to the area of χ 2 mininum + 1. Single-parameter uncertainty: the information is supplemented by the profile likelihood, obtained by minimizing, for many fixed values in the range ± several σ of one of the parameters, with respect to all other parameters. This is used e.g. in Minuit in the Minos option to check the uncertainties and evtl. define asymmetric errors. Function of parameters: the uncertainty of functions g(a) of the parameters is determined by the error propagation formula (derivatives of g(a) w.r.t. the parameters and covariance matrix V. The information is supplemented by the profile likelihood for functions g(a). Volker Blobel Banff: Statistical Inference Problems in High Energy Physics and Astronomy page 17
18 Profile likelihood for a function The profile likelihood for a function g(a) of parameters, e.g. the W production cross section σ W at the Tevatron, or α S (MZ 2 ) can be calculated by the use of the Lagrange multiplier method (Lagrange ): For many fixed values g fix of the function g(a) the likelihood function is minimized. The standard method is to define a Lagrange function L(a) = S(a) + λ (g(a) g fix ) and to find the stationary point w.r.t the parameters a and the Lagrange multiplier λ, given g fix. The constraint defines a set of parameter values for each value of g fix, e.g. σ W. An alternative method, used in a recent paper, is... to assume, by trial-and-error, fixed values of the Lagrange multiplier λ and to minimize S(a) + λ g(a) and after minimization, to calculate the corresponding fixed g(a) (allows to use MINUIT). Volker Blobel Banff: Statistical Inference Problems in High Energy Physics and Astronomy page 18
19 Comments to a recent paper Global fit: The overall ( global ) fit is done, taking into account the normalization errors, but neglecting certain systematic error contributions (non-diagonal) in the experimental data. Why are the non-diagonal error contributions neglected, but later used? Profile χ 2 -function: In the determination of the profile χ 2 -function however the normalization is fixed at the previously fitted value. Fixing certain parameters will not result in a correct profile. Single experiment analysis: Afterwards a χ 2 -analysis is done, separately for each experiment, now taking into account the systematic error contributions (non-diagonal) in the experimental data. For each experiment the profile χ 2 -function is evaluated, however with the parameters from the global fit. Each experiment has a different parameter covariance matrix, depending on the kinematical region and the accuracy. Evaluating the χ 2 -function using parameter sets from the global fit will not result in a correct profile. Volker Blobel Banff: Statistical Inference Problems in High Energy Physics and Astronomy page 19
20 From papers... Notice that the covariance matrix V p ij = i j = χ 2 H 1 ij depends on the choice of χ 2 which usually, but not always, is taken to be χ 2 = 1. This choice... corresponds to the definition of the width of a Gaussian distribution. In full global fit art in choosing correct χ 2 given complication of errors. Ideally χ 2 = 1, but unrealistic.... and χ 2 is the allowed variation in χ and a suitable choice of χ 2... and χ 2 is the allowed deterioration in fit quality for the error determination.... Our standard PDF set S 0 is a parametrized fit to 1295 data points with 16 fitting parameters. The minimum of χ 2 global is approximately Naively, it seems that an increase of χ 2 global by merely 1, say from 1200 to 1201, could not possibly represent a standard deviation of the fit. Naively one might suppose that a standard deviation would have χ rather than 1. However this is an misconception. If the errors are uncorrelated (or if the correlations are incorporated into χ 2 ) then indeed χ 2 = 1 would represent a standard deviation. But this theorem is irrelevant to our problem, because the large correlations of systematic errors are not taken into account in χ 2 global.... (Phys. Rev.) Volker Blobel Banff: Statistical Inference Problems in High Energy Physics and Astronomy page 20
21 Use of redundant parameters An example from parton density fits: the gluon parametrization is xg(x, Q 2 0) =... A (1 x) η x δ where A 0.2, δ 0.3 and η fixed at 10. A change of δ changes both shape and normalisation.... we notice that a certain amount of redundancy in parameters leads to potentially disatrous departures... For example, in the negative term in the gluon parameterization very small changes in the value of δ can be compensated almost exactly by a change in A and in the other gluon parameters... We found our input parameterization was sufficiently flexible to accomodate data, and indeed there is a certain redundancy evident. In the case of highly correlated, redundant parameters the Hessian will be (almost) singular, inversion may be impossible and the convergence of the fit is doubtful. Redundant parameters have to be avoided! Volker Blobel Banff: Statistical Inference Problems in High Energy Physics and Astronomy page 21
22 6. Outliers and their influence on the fit Everyone believes in the normal law of errors, the experimenters because they think it is a mathematical theorem, the mathematicians because they think it is an experimental fact. [Poincaré] Outliers single unusual large or small values among a sample are dangerous and will usually, because of their large influence, introduce a bias in the result: in the final χ 2 value, in the values of the fitted parameters, and in the parameter uncertainties. A method for outlier treatment: M-estimation, closely related to the maximum-likelihood method. For data with a probability density pdf(z) the method of maximum-likelihood requires to minimize n S(a) = ln pdf(z i ) = i=1 n ρ(z i ) i=1 with ρ(z) = ln pdf(z). For a Gaussian distribution ρ(z) = 1 2 z2. The function ρ(z) is modified in M-estimation by down-weighting. Volker Blobel Banff: Statistical Inference Problems in High Energy Physics and Astronomy page 22
23 M-estimates Generalization of least-squares, following from Maximum Likelihood arguments. Abbreviation z i = y i f(x i ; a) σ i ( N(0, 1) for Gaussian measurement) Least-squares: minimize i 1 2 z2 i solve i y i f(x i ; a) σ 2 i f a j = 0 j = 1, 2... p M-estimates: minimize i ρ (z i ) solve i y i f(x i ; a) σ 2 i w(z i ) f a j = 0 j = 1, 2... p with influence function ψ(z) = dρ dz and with additional weight w(z) = ψ(z)/z ρ(z) = 1 2 z2 ψ(z) = z w(z) = 1 Case of least-squares Requires iteration (non-linearity(!)) e.g with weight w(z), calculated from previous values. Volker Blobel Banff: Statistical Inference Problems in High Energy Physics and Astronomy page 23
24 Influence functions and weights... as a function of z ( y f(x) z σ ) 2 15 log of probability density 2 Influence function Weight least squares 1 least squares least squares Cauchy 10 Tukey 0 5 Cauchy 0.5 Tukey Tukey Cauchy ρ (z) ψ(z) = dρ(z)/dz w(z) = ψ(z)/z Volker Blobel Banff: Statistical Inference Problems in High Energy Physics and Astronomy page 24
25 Commonly used M-estimators influence function weight ρ(z) = ln pdf(z) ψ(z) = dρ(z)/dz w(z) = ψ(z)/z Least squares = 1 2 z2 = z = 1 Cauchy { if z c Tukey if z > c { if z c Huber if z > c = c2 2 ln ( a + (z/c) 2) = { c 2 /6 (1 [1 (z/c) 2 ] 3) = = c 2 /6 { z 2 /2 = = c ( z c/2) z 1 + (z/c) 2 = { z [1 (z/c) 2 ] 2 0 { z c sign (z) (z/c) 2 { [1 (z/c) 2 ] 2 = 0 { 1 = c/ z Asymptotic efficiency of 95 % on the normal distribution obtained with c = (Cauchy), c = (Tukey) and c = (Huber). M-estimation: reduces the influence of outliers and improves fitted parameter values and uncertainties... but the final χ 2 -function values does not follow the standard χ 2 distribution. Volker Blobel Banff: Statistical Inference Problems in High Energy Physics and Astronomy page 25
26 Conclusion Systematic errors of experimental data: Recent experiments provide a lot of information on systematic error contributions this information should be used in a global analysis. Statistical errors of experimental data: The published statistical errors are often too optimistic, because correlations (especially between neighbour bins) are neglected. Construction of χ 2 -function: Each error contribution should be taken into account with the correct underlying model of the error contribution. Parametrization: Redundant parameters have to be avoided in a global fit. Volker Blobel Banff: Statistical Inference Problems in High Energy Physics and Astronomy page 26
27 Contents 1. Global analysis of data from HEP experiments 2 Examples with large χ Data uncertainties in HEP experiments Normalization errors (multiplicative) 5 Common normalisation errors Covariance matrix comparison The method with a nuisance parameter The log-normal distribution Aymmetric normalisation errors Additive systematic errors 11 Simplified calculation of inverse Example from an experimental paper Example from analysis papers Correlated statistical errors 15 Why the cov matrix is non-diagonal Parameter uncertainties: profile likelihoods and χ 2 17 Profile likelihood for a function Comments to a recent paper From papers Use of redundant parameters Outliers and their influence on the fit 22 M-estimates Influence functions and weights Commonly used M-estimators Conclusion 26 Table of contents 27
Theory versus Experiment. Prof. Jorgen D Hondt Vrije Universiteit Brussel [email protected]
Theory versus Experiment Prof. Jorgen D Hondt Vrije Universiteit Brussel [email protected] Theory versus Experiment Pag. 2 Dangerous cocktail!!! Pag. 3 The basics in these lectures Part1 : Theory meets
Multivariate Normal Distribution
Multivariate Normal Distribution Lecture 4 July 21, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Lecture #4-7/21/2011 Slide 1 of 41 Last Time Matrices and vectors Eigenvalues
AP Physics 1 and 2 Lab Investigations
AP Physics 1 and 2 Lab Investigations Student Guide to Data Analysis New York, NY. College Board, Advanced Placement, Advanced Placement Program, AP, AP Central, and the acorn logo are registered trademarks
STA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! [email protected]! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct
Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.
Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C
Component Ordering in Independent Component Analysis Based on Data Power
Component Ordering in Independent Component Analysis Based on Data Power Anne Hendrikse Raymond Veldhuis University of Twente University of Twente Fac. EEMCS, Signals and Systems Group Fac. EEMCS, Signals
Statistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes
Linear Classification. Volker Tresp Summer 2015
Linear Classification Volker Tresp Summer 2015 1 Classification Classification is the central task of pattern recognition Sensors supply information about an object: to which class do the object belong
Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model
Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical
Java Modules for Time Series Analysis
Java Modules for Time Series Analysis Agenda Clustering Non-normal distributions Multifactor modeling Implied ratings Time series prediction 1. Clustering + Cluster 1 Synthetic Clustering + Time series
Part 4 fitting with energy loss and multiple scattering non gaussian uncertainties outliers
Part 4 fitting with energy loss and multiple scattering non gaussian uncertainties outliers material intersections to treat material effects in track fit, locate material 'intersections' along particle
Least Squares Estimation
Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David
33. STATISTICS. 33. Statistics 1
33. STATISTICS 33. Statistics 1 Revised September 2011 by G. Cowan (RHUL). This chapter gives an overview of statistical methods used in high-energy physics. In statistics, we are interested in using a
Master s Theory Exam Spring 2006
Spring 2006 This exam contains 7 questions. You should attempt them all. Each question is divided into parts to help lead you through the material. You should attempt to complete as much of each problem
15.062 Data Mining: Algorithms and Applications Matrix Math Review
.6 Data Mining: Algorithms and Applications Matrix Math Review The purpose of this document is to give a brief review of selected linear algebra concepts that will be useful for the course and to develop
Lecture 3: Linear methods for classification
Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,
11 Linear and Quadratic Discriminant Analysis, Logistic Regression, and Partial Least Squares Regression
Frank C Porter and Ilya Narsky: Statistical Analysis Techniques in Particle Physics Chap. c11 2013/9/9 page 221 le-tex 221 11 Linear and Quadratic Discriminant Analysis, Logistic Regression, and Partial
Understanding and Applying Kalman Filtering
Understanding and Applying Kalman Filtering Lindsay Kleeman Department of Electrical and Computer Systems Engineering Monash University, Clayton 1 Introduction Objectives: 1. Provide a basic understanding
Multivariate Analysis of Variance (MANOVA): I. Theory
Gregory Carey, 1998 MANOVA: I - 1 Multivariate Analysis of Variance (MANOVA): I. Theory Introduction The purpose of a t test is to assess the likelihood that the means for two groups are sampled from the
NEW YORK STATE TEACHER CERTIFICATION EXAMINATIONS
NEW YORK STATE TEACHER CERTIFICATION EXAMINATIONS TEST DESIGN AND FRAMEWORK September 2014 Authorized for Distribution by the New York State Education Department This test design and framework document
Dealing with large datasets
Dealing with large datasets (by throwing away most of the data) Alan Heavens Institute for Astronomy, University of Edinburgh with Ben Panter, Rob Tweedie, Mark Bastin, Will Hossack, Keith McKellar, Trevor
Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression
Logistic Regression Department of Statistics The Pennsylvania State University Email: [email protected] Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max
Measurement of the Mass of the Top Quark in the l+ Jets Channel Using the Matrix Element Method
Measurement of the Mass of the Top Quark in the l+ Jets Channel Using the Matrix Element Method Carlos Garcia University of Rochester For the DØ Collaboration APS Meeting 2007 Outline Introduction Top
Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics
Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics For 2015 Examinations Aim The aim of the Probability and Mathematical Statistics subject is to provide a grounding in
Dirichlet forms methods for error calculus and sensitivity analysis
Dirichlet forms methods for error calculus and sensitivity analysis Nicolas BOULEAU, Osaka university, november 2004 These lectures propose tools for studying sensitivity of models to scalar or functional
Introduction to General and Generalized Linear Models
Introduction to General and Generalized Linear Models General Linear Models - part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby
Regression III: Advanced Methods
Lecture 16: Generalized Additive Models Regression III: Advanced Methods Bill Jacoby Michigan State University http://polisci.msu.edu/jacoby/icpsr/regress3 Goals of the Lecture Introduce Additive Models
Tail-Dependence an Essential Factor for Correctly Measuring the Benefits of Diversification
Tail-Dependence an Essential Factor for Correctly Measuring the Benefits of Diversification Presented by Work done with Roland Bürgi and Roger Iles New Views on Extreme Events: Coupled Networks, Dragon
Smoothing. Fitting without a parametrization
Smoothing or Fitting without a parametrization Volker Blobel University of Hamburg March 2005 1. Why smoothing 2. Running median smoother 3. Orthogonal polynomials 4. Transformations 5. Spline functions
Methods of Data Analysis Working with probability distributions
Methods of Data Analysis Working with probability distributions Week 4 1 Motivation One of the key problems in non-parametric data analysis is to create a good model of a generating probability distribution,
Function minimization
Function minimization Volker Blobel University of Hamburg March 2005 1. Optimization 2. One-dimensional minimization 3. Search methods 4. Unconstrained minimization 5. Derivative calculation 6. Trust-region
NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )
Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates
Fitting Subject-specific Curves to Grouped Longitudinal Data
Fitting Subject-specific Curves to Grouped Longitudinal Data Djeundje, Viani Heriot-Watt University, Department of Actuarial Mathematics & Statistics Edinburgh, EH14 4AS, UK E-mail: [email protected] Currie,
MATH4427 Notebook 2 Spring 2016. 2 MATH4427 Notebook 2 3. 2.1 Definitions and Examples... 3. 2.2 Performance Measures for Estimators...
MATH4427 Notebook 2 Spring 2016 prepared by Professor Jenny Baglivo c Copyright 2009-2016 by Jenny A. Baglivo. All Rights Reserved. Contents 2 MATH4427 Notebook 2 3 2.1 Definitions and Examples...................................
MATHEMATICAL METHODS OF STATISTICS
MATHEMATICAL METHODS OF STATISTICS By HARALD CRAMER TROFESSOK IN THE UNIVERSITY OF STOCKHOLM Princeton PRINCETON UNIVERSITY PRESS 1946 TABLE OF CONTENTS. First Part. MATHEMATICAL INTRODUCTION. CHAPTERS
Christfried Webers. Canberra February June 2015
c Statistical Group and College of Engineering and Computer Science Canberra February June (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 829 c Part VIII Linear Classification 2 Logistic
Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur
Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur Module No. #01 Lecture No. #15 Special Distributions-VI Today, I am going to introduce
CHI-SQUARE: TESTING FOR GOODNESS OF FIT
CHI-SQUARE: TESTING FOR GOODNESS OF FIT In the previous chapter we discussed procedures for fitting a hypothesized function to a set of experimental data points. Such procedures involve minimizing a quantity
Data Preparation and Statistical Displays
Reservoir Modeling with GSLIB Data Preparation and Statistical Displays Data Cleaning / Quality Control Statistics as Parameters for Random Function Models Univariate Statistics Histograms and Probability
Tutorial 5: Hypothesis Testing
Tutorial 5: Hypothesis Testing Rob Nicholls [email protected] MRC LMB Statistics Course 2014 Contents 1 Introduction................................ 1 2 Testing distributional assumptions....................
Simulation Exercises to Reinforce the Foundations of Statistical Thinking in Online Classes
Simulation Exercises to Reinforce the Foundations of Statistical Thinking in Online Classes Simcha Pollack, Ph.D. St. John s University Tobin College of Business Queens, NY, 11439 [email protected]
Compensation Basics - Bagwell. Compensation Basics. C. Bruce Bagwell MD, Ph.D. Verity Software House, Inc.
Compensation Basics C. Bruce Bagwell MD, Ph.D. Verity Software House, Inc. 2003 1 Intrinsic or Autofluorescence p2 ac 1,2 c 1 ac 1,1 p1 In order to describe how the general process of signal cross-over
Department of Economics
Department of Economics On Testing for Diagonality of Large Dimensional Covariance Matrices George Kapetanios Working Paper No. 526 October 2004 ISSN 1473-0278 On Testing for Diagonality of Large Dimensional
Nonlinear Iterative Partial Least Squares Method
Numerical Methods for Determining Principal Component Analysis Abstract Factors Béchu, S., Richard-Plouet, M., Fernandez, V., Walton, J., and Fairley, N. (2016) Developments in numerical treatments for
GLM, insurance pricing & big data: paying attention to convergence issues.
GLM, insurance pricing & big data: paying attention to convergence issues. Michaël NOACK - [email protected] Senior consultant & Manager of ADDACTIS Pricing Copyright 2014 ADDACTIS Worldwide.
STATISTICA Formula Guide: Logistic Regression. Table of Contents
: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary
Basics of Statistical Machine Learning
CS761 Spring 2013 Advanced Machine Learning Basics of Statistical Machine Learning Lecturer: Xiaojin Zhu [email protected] Modern machine learning is rooted in statistics. You will find many familiar
Exploratory Data Analysis
Exploratory Data Analysis Johannes Schauer [email protected] Institute of Statistics Graz University of Technology Steyrergasse 17/IV, 8010 Graz www.statistics.tugraz.at February 12, 2008 Introduction
A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution
A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 4: September
Current Standard: Mathematical Concepts and Applications Shape, Space, and Measurement- Primary
Shape, Space, and Measurement- Primary A student shall apply concepts of shape, space, and measurement to solve problems involving two- and three-dimensional shapes by demonstrating an understanding of:
Spatial Statistics Chapter 3 Basics of areal data and areal data modeling
Spatial Statistics Chapter 3 Basics of areal data and areal data modeling Recall areal data also known as lattice data are data Y (s), s D where D is a discrete index set. This usually corresponds to data
From the help desk: Bootstrapped standard errors
The Stata Journal (2003) 3, Number 1, pp. 71 80 From the help desk: Bootstrapped standard errors Weihua Guan Stata Corporation Abstract. Bootstrapping is a nonparametric approach for evaluating the distribution
DATA INTERPRETATION AND STATISTICS
PholC60 September 001 DATA INTERPRETATION AND STATISTICS Books A easy and systematic introductory text is Essentials of Medical Statistics by Betty Kirkwood, published by Blackwell at about 14. DESCRIPTIVE
Algebra 1 Course Information
Course Information Course Description: Students will study patterns, relations, and functions, and focus on the use of mathematical models to understand and analyze quantitative relationships. Through
December 4, 2013 MATH 171 BASIC LINEAR ALGEBRA B. KITCHENS
December 4, 2013 MATH 171 BASIC LINEAR ALGEBRA B KITCHENS The equation 1 Lines in two-dimensional space (1) 2x y = 3 describes a line in two-dimensional space The coefficients of x and y in the equation
Statistical Distributions in Astronomy
Data Mining In Modern Astronomy Sky Surveys: Statistical Distributions in Astronomy Ching-Wa Yip [email protected]; Bloomberg 518 From Data to Information We don t just want data. We want information from
Chapter 3 RANDOM VARIATE GENERATION
Chapter 3 RANDOM VARIATE GENERATION In order to do a Monte Carlo simulation either by hand or by computer, techniques must be developed for generating values of random variables having known distributions.
Statistics Graduate Courses
Statistics Graduate Courses STAT 7002--Topics in Statistics-Biological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.
Chapter 2 Portfolio Management and the Capital Asset Pricing Model
Chapter 2 Portfolio Management and the Capital Asset Pricing Model In this chapter, we explore the issue of risk management in a portfolio of assets. The main issue is how to balance a portfolio, that
How to assess the risk of a large portfolio? How to estimate a large covariance matrix?
Chapter 3 Sparse Portfolio Allocation This chapter touches some practical aspects of portfolio allocation and risk assessment from a large pool of financial assets (e.g. stocks) How to assess the risk
An introduction to OBJECTIVE ASSESSMENT OF IMAGE QUALITY. Harrison H. Barrett University of Arizona Tucson, AZ
An introduction to OBJECTIVE ASSESSMENT OF IMAGE QUALITY Harrison H. Barrett University of Arizona Tucson, AZ Outline! Approaches to image quality! Why not fidelity?! Basic premises of the task-based approach!
Adequacy of Biomath. Models. Empirical Modeling Tools. Bayesian Modeling. Model Uncertainty / Selection
Directions in Statistical Methodology for Multivariable Predictive Modeling Frank E Harrell Jr University of Virginia Seattle WA 19May98 Overview of Modeling Process Model selection Regression shape Diagnostics
99.37, 99.38, 99.38, 99.39, 99.39, 99.39, 99.39, 99.40, 99.41, 99.42 cm
Error Analysis and the Gaussian Distribution In experimental science theory lives or dies based on the results of experimental evidence and thus the analysis of this evidence is a critical part of the
SENSITIVITY ANALYSIS AND INFERENCE. Lecture 12
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this
Gamma Distribution Fitting
Chapter 552 Gamma Distribution Fitting Introduction This module fits the gamma probability distributions to a complete or censored set of individual or grouped data values. It outputs various statistics
Pattern Analysis. Logistic Regression. 12. Mai 2009. Joachim Hornegger. Chair of Pattern Recognition Erlangen University
Pattern Analysis Logistic Regression 12. Mai 2009 Joachim Hornegger Chair of Pattern Recognition Erlangen University Pattern Analysis 2 / 43 1 Logistic Regression Posteriors and the Logistic Function Decision
GRADES 7, 8, AND 9 BIG IDEAS
Table 1: Strand A: BIG IDEAS: MATH: NUMBER Introduce perfect squares, square roots, and all applications Introduce rational numbers (positive and negative) Introduce the meaning of negative exponents for
Experiment #1, Analyze Data using Excel, Calculator and Graphs.
Physics 182 - Fall 2014 - Experiment #1 1 Experiment #1, Analyze Data using Excel, Calculator and Graphs. 1 Purpose (5 Points, Including Title. Points apply to your lab report.) Before we start measuring
Normality Testing in Excel
Normality Testing in Excel By Mark Harmon Copyright 2011 Mark Harmon No part of this publication may be reproduced or distributed without the express permission of the author. [email protected]
Applied Statistics. J. Blanchet and J. Wadsworth. Institute of Mathematics, Analysis, and Applications EPF Lausanne
Applied Statistics J. Blanchet and J. Wadsworth Institute of Mathematics, Analysis, and Applications EPF Lausanne An MSc Course for Applied Mathematicians, Fall 2012 Outline 1 Model Comparison 2 Model
Fairfield Public Schools
Mathematics Fairfield Public Schools AP Statistics AP Statistics BOE Approved 04/08/2014 1 AP STATISTICS Critical Areas of Focus AP Statistics is a rigorous course that offers advanced students an opportunity
Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.
Course Catalog In order to be assured that all prerequisites are met, students must acquire a permission number from the education coordinator prior to enrolling in any Biostatistics course. Courses are
9.2 Summation Notation
9. Summation Notation 66 9. Summation Notation In the previous section, we introduced sequences and now we shall present notation and theorems concerning the sum of terms of a sequence. We begin with a
PITFALLS IN TIME SERIES ANALYSIS. Cliff Hurvich Stern School, NYU
PITFALLS IN TIME SERIES ANALYSIS Cliff Hurvich Stern School, NYU The t -Test If x 1,..., x n are independent and identically distributed with mean 0, and n is not too small, then t = x 0 s n has a standard
Simple Second Order Chi-Square Correction
Simple Second Order Chi-Square Correction Tihomir Asparouhov and Bengt Muthén May 3, 2010 1 1 Introduction In this note we describe the second order correction for the chi-square statistic implemented
Multiple Optimization Using the JMP Statistical Software Kodak Research Conference May 9, 2005
Multiple Optimization Using the JMP Statistical Software Kodak Research Conference May 9, 2005 Philip J. Ramsey, Ph.D., Mia L. Stephens, MS, Marie Gaudard, Ph.D. North Haven Group, http://www.northhavengroup.com/
Algebra 1 2008. Academic Content Standards Grade Eight and Grade Nine Ohio. Grade Eight. Number, Number Sense and Operations Standard
Academic Content Standards Grade Eight and Grade Nine Ohio Algebra 1 2008 Grade Eight STANDARDS Number, Number Sense and Operations Standard Number and Number Systems 1. Use scientific notation to express
FEGYVERNEKI SÁNDOR, PROBABILITY THEORY AND MATHEmATICAL
FEGYVERNEKI SÁNDOR, PROBABILITY THEORY AND MATHEmATICAL STATIsTICs 4 IV. RANDOm VECTORs 1. JOINTLY DIsTRIBUTED RANDOm VARIABLEs If are two rom variables defined on the same sample space we define the joint
Probabilistic Models for Big Data. Alex Davies and Roger Frigola University of Cambridge 13th February 2014
Probabilistic Models for Big Data Alex Davies and Roger Frigola University of Cambridge 13th February 2014 The State of Big Data Why probabilistic models for Big Data? 1. If you don t have to worry about
Nonlinear Regression:
Zurich University of Applied Sciences School of Engineering IDP Institute of Data Analysis and Process Design Nonlinear Regression: A Powerful Tool With Considerable Complexity Half-Day : Improved Inference
Monte Carlo Simulation
1 Monte Carlo Simulation Stefan Weber Leibniz Universität Hannover email: [email protected] web: www.stochastik.uni-hannover.de/ sweber Monte Carlo Simulation 2 Quantifying and Hedging
Tutorial on Markov Chain Monte Carlo
Tutorial on Markov Chain Monte Carlo Kenneth M. Hanson Los Alamos National Laboratory Presented at the 29 th International Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Technology,
Observability Index Selection for Robot Calibration
Observability Index Selection for Robot Calibration Yu Sun and John M Hollerbach Abstract This paper relates 5 observability indexes for robot calibration to the alphabet optimalities from the experimental
Dynamic data processing
Dynamic data processing recursive least-squares P.J.G. Teunissen Series on Mathematical Geodesy and Positioning Dynamic data processing recursive least-squares Dynamic data processing recursive least-squares
Dimensionality Reduction: Principal Components Analysis
Dimensionality Reduction: Principal Components Analysis In data mining one often encounters situations where there are a large number of variables in the database. In such situations it is very likely
Towards running complex models on big data
Towards running complex models on big data Working with all the genomes in the world without changing the model (too much) Daniel Lawson Heilbronn Institute, University of Bristol 2013 1 / 17 Motivation
i=1 In practice, the natural logarithm of the likelihood function, called the log-likelihood function and denoted by
Statistics 580 Maximum Likelihood Estimation Introduction Let y (y 1, y 2,..., y n be a vector of iid, random variables from one of a family of distributions on R n and indexed by a p-dimensional parameter
Linear Models for Classification
Linear Models for Classification Sumeet Agarwal, EEL709 (Most figures from Bishop, PRML) Approaches to classification Discriminant function: Directly assigns each data point x to a particular class Ci
Logistic Regression (1/24/13)
STA63/CBB540: Statistical methods in computational biology Logistic Regression (/24/3) Lecturer: Barbara Engelhardt Scribe: Dinesh Manandhar Introduction Logistic regression is model for regression used
Data Mining and Visualization
Data Mining and Visualization Jeremy Walton NAG Ltd, Oxford Overview Data mining components Functionality Example application Quality control Visualization Use of 3D Example application Market research
Least-Squares Intersection of Lines
Least-Squares Intersection of Lines Johannes Traa - UIUC 2013 This write-up derives the least-squares solution for the intersection of lines. In the general case, a set of lines will not intersect at a
Precalculus REVERSE CORRELATION. Content Expectations for. Precalculus. Michigan CONTENT EXPECTATIONS FOR PRECALCULUS CHAPTER/LESSON TITLES
Content Expectations for Precalculus Michigan Precalculus 2011 REVERSE CORRELATION CHAPTER/LESSON TITLES Chapter 0 Preparing for Precalculus 0-1 Sets There are no state-mandated Precalculus 0-2 Operations
How To Calculate A Multiiperiod Probability Of Default
Mean of Ratios or Ratio of Means: statistical uncertainty applied to estimate Multiperiod Probability of Default Matteo Formenti 1 Group Risk Management UniCredit Group Università Carlo Cattaneo September
How to report the percentage of explained common variance in exploratory factor analysis
UNIVERSITAT ROVIRA I VIRGILI How to report the percentage of explained common variance in exploratory factor analysis Tarragona 2013 Please reference this document as: Lorenzo-Seva, U. (2013). How to report
Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools 2009-2010
Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools 2009-2010 Week 1 Week 2 14.0 Students organize and describe distributions of data by using a number of different
NOV - 30211/II. 1. Let f(z) = sin z, z C. Then f(z) : 3. Let the sequence {a n } be given. (A) is bounded in the complex plane
Mathematical Sciences Paper II Time Allowed : 75 Minutes] [Maximum Marks : 100 Note : This Paper contains Fifty (50) multiple choice questions. Each question carries Two () marks. Attempt All questions.
The CUSUM algorithm a small review. Pierre Granjon
The CUSUM algorithm a small review Pierre Granjon June, 1 Contents 1 The CUSUM algorithm 1.1 Algorithm............................... 1.1.1 The problem......................... 1.1. The different steps......................
