Interpreting Kullback-Leibler Divergence with the Neyman-Pearson Lemma
|
|
|
- Brent White
- 10 years ago
- Views:
Transcription
1 Interpreting Kullback-Leibler Divergence with the Neyman-Pearson Lemma Shinto Eguchi a, and John Copas b a Institute of Statistical Mathematics and Graduate University of Advanced Studies, Minami-azabu 4-6-7, Tokyo , Japan b Department of Statistics, University of Warwick, Coventry CV4 7AL, UK Abstract Kullback-Leibler divergence and the Neyman-Pearson lemma are two fundamental concepts in statistics. Both are about likelihood ratios: Kullback-Leibler divergence is the expected log-likelihood ratio, and the Neyman-Pearson lemma is about error rates of likelihood ratio tests. Exploring this connection gives another statistical interpretation of the Kullback-Leibler divergence in terms of the loss of power of the likelihood ratio test when the wrong distribution is used for one of the hypotheses. In this interpretation, the standard non-negativity property of the Kullback-Leibler divergence is essentially a restatement of the optimal property of likelihood ratios established by the Neyman-Pearson lemma. Keywords: Kullback-Leibler divergence; Neyman-Pearson lemma; Information geometry; Testing hypotheses; Maximum likelihood. Corresponding author. address: [email protected] (S. Eguchi) 1
2 1. The Kullback-Leibler Divergence, D Let P and Q be two probability distributions of a random variable X. Ifx is continuous, with P and Q described by probability density functions p(x) and q(x) respectively, the Kullback-Leibler (KL) divergence from P to Q is defined by { D(P, Q) = log p(x) } p(x)dx. q(x) See Kullback and Leibler (1951). More generally, D(P, Q) is defined by { D(P, Q) = log P } Q (x) dp (x), where P / Q(x) is the density function of P relative to Q. Another way of writing D is where D(P, Q) =L(P, P) L(P, Q) (1) L(P, Q) =E P {log q(x)}, and the symbol E P means taking the expectation over the distribution P. Two fundamental properties of D are non-negativity : D(P, Q) 0 with equality if and only if P = Q. asymmetry : D(P, Q) D(Q, P ). KL divergence plays a central role in the theory of statistical inference. Properties of D, in particular its asymmetry, are exploited through the concept of dual affine connections in the information geometry of Amari (1985) and Amari and Nagaoka (2001). A well known application is Akaike s information criterion (AIC) used to control over-fitting in statistical modeling (Akaike, 1973). In this note we suggest some statistical interpretations of KL divergence which can help our understanding of this important theoretical concept. Section 2 reviews the well known connections with maximum likelihood and expected log likelihood ratios. Section 3 shows how the Neyman-Pearson lemma gives a new interpretation of D and gives an alternative 2
3 proof of the non-negativity property. Section 4. An extension of the argument is mentioned in 2. Well Known Statistical Interpretations of D 2.1 Maximum likelihood Given a random sample x 1,x 2,,x n from the underlying distribution P, let P n be the empirical distribution, which puts probability 1/n on each sample value x i. Now let Q θ be a statistical model f(x, θ) with unknown parameter θ. Then the empirical version of L(P, Q θ )is L(P n,q θ )= 1 n log f(x i,θ). n Apart from the factor 1/n, this is just the log likelihood function. Note that the empirical version L(P n,q θ ) reduces to the population version L(P, Q θ ) for any n by taking its expectation, i.e., E P {L(P n,q θ )} = L(P, Q θ ). Hence, from (1), maximizing the likelihood to find the maximum likelihood estimate (MLE) is analogous to finding θ which minimizes D(P, Q θ ). It is obvious that the best possible model is the one that fits the data exactly, i.e. when P = Q θ, so for any general model Q θ L(P, Q θ ) L(P, P) which is just the same as the non-negativity property D(P, Q θ ) 0. Asymmetry is equally clear because of the very distinct roles of P (the data) and Q θ (the model). Just as likelihood measures how well a model explains the data, so we can think of D as measuring the lack of fit between model and data relative to a perfect fit. It follows from the law of large numbers that the MLE converges almost surely to θ(p ) = arg min θ D(P, Q θ ). If P = Q θ then θ(p )=θ. This outlines the elegant proof of the consistency of the MLE by Wald (1949). In this way, KL divergence helps us to understand the theoretical properties of the maximum likelihood method. 2.2 Likelihood ratios Now suppose that P and Q are possible models for data (vector) X, which we now think of as being a null hypothesis H and an alternative hypothesis A. Suppose that X 3 i=1
4 has probability density or mass function f H (x) under H and f A (x) under A. Then the log likelihood ratio is λ = λ(x) = log f A(x) f H (x). (2) If these two hypotheses are reasonably well separated, intuition suggests that λ(x) will tend to be positive if A is true (the correct model f A fits the data better than wrong model f H ), and will tend to be negative if H is true (then we would expect f H to fit the data better than f A ). The expected values of λ(x) are E A {λ(x)} = D(P A,P H ), E H {λ(x)} = D(P H,P A ). Thus D(P A,P H ) is the expected log likelihood ratio when the alternative hypothesis is true. The larger is the likelihood ratio, the more evidence we have for the alternative hypothesis. In this interpretation, D is reminiscent of the power function in hypothesis testing, measuring the degree to which the data will reveal that the null hypothesis is false when the alternative is in fact true. Both this, and its dual when H is true, are zero when the two hypotheses coincide so that no statistical discrimination is possible. The asymmetry property of D corresponds to the asymmetric roles that the null and alternative hypotheses play in the theory of hypothesis testing. 3. Interpreting D through the Neyman-Pearson Lemma In terms of the set-up of Section 2.2, the Neyman-Pearson lemma establishes the optimality of the likelihood ratio critical region W = {x : λ(x) u}, in terms of Type I and Type II error rates. If we choose u such that P H (W ) = α, the likelihood ratio test offers W as a critical region with size α. For any other critical region W with the same size, i.e. with P H (W )=P H (W ), the lemma claims that P A (W ) P A (W ). (3) 4
5 The proof is easily shown by noting that and λ(x) u on W W λ(x) <u on W W, and then integrating separately over W W and W W leads to P A (W ) P A (W ) e u {P H (W ) P H (W )}. (4) This yields (3) from the size α condition on W and W. The lemma and proof are covered in most text books on mathematical statistics, for example Cox and Hinkley (1974) and Lindsey (1996). However, to use the optimal test we need to know both f H (x) and f A (x) so that we have the correct log likelihood ratio λ(x). Suppose we mis-specified the alternative hypothesis as f Q (x) instead of f A (x). We would then use the incorrect log likelihood ratio λ = λ(x) = log f Q(x) f H (x). The same argument as that used in arriving at (4) now gives P A (λ>u) P A ( λ >u) e u {P H (λ>u) P H ( λ >u)}. (5) The left hand side of this inequality is the loss of power when our test is based on λ(x) instead of λ(x). To measure the overall loss of power for different thresholds u we can integrate this over all possible values of u from to +. Integration by parts gives power = {P A (λ>u) P A ( λ >u)}du = u{p A (λ>u) P A ( λ >u)} + = (λ λ) f A (x)dx = log f A(x) f Q (x) f A(x)dx u u {P A(λ u) P A ( λ u)}du 5
6 by the definition of the expectations of λ and λ. Doing the same thing to the right hand side of (5) gives e u {P H (λ>u) P H ( λ >u)}du = e u {P H (λ>u) P H ( λ >u)} + = (e λ e λ) f H (x)dx =1 1= 0. Hence we get e u u {P H(λ u) P H ( λ u)}du power = D(P A,P Q ) 0. (6) We now have another interpretation of D: the KL divergence from P to Q measures how much power we lose with the likelihood ratio test if we mis-specify the alternative hypothesis P as Q. The non-negativity of D in (6) is essentially a restatement of the Neyman-Pearson lemma. Interestingly, this argument is independent of the choice of null hypothesis. We get a dualistic version of this if we imagine that it is the null hypothesis that is mis-specified. If we mistakenly take the null as f Q (x) instead of f H (x), we would now use the log likelihood ratio λ = λ(x) = log f A(x) f Q (x). Multiply both sides of the inequality (5) by e u to give e u {P A (λ>u) P A ( λ >u)} P H (λ>u) P H ( λ >u) and integrate both sides with respect to u as before. This gives (e λ e λ)f A (x)dx (λ λ)f H (x)dx which leads to power = D(P H,P Q ) 0. KL divergence now corresponds to loss of power if we mis-specify the null hypothesis. Note the essential asymmetry in these arguments: power is explicitly about the alternative 6
7 hypothesis, and in arriving at the second version we have integrated the power difference using a different weight function. 4. Comment The receiver operating characteristic (ROC) curve of a test statistic is the graph of the power (one minus the Type II error) against the size (Type I error) as the threshold u takes all possible values. Thus the two sides of inequality (5) are about the differences in coordinates of two ROC curves, one for the optimum λ(x) and the other for the suboptimum statistic λ(x). Integrating each side of (5) over u is finding a weighted area between these two curves. The smaller is this area, the closer are the statistical properties of λ(x) to those of λ(x) in the context of discriminant analysis. Eguchi and Copas (2002) exploit this interpretation by minimizing this area over a parametric family of statistics λ(x). In this way we can find a discriminant function which best approximates the true, but generally unknown, log likelihood ratio statistic. References Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. In Proc. 2nd Int. Symp. on Inference Theory (eds B. N. Petrov and F. Csáki), Budapest: Académiai Kiadó. Amari, S. (1985). Statistics, vol. Differential-Geometrical Methods in Statistics. Lecture Notes in 28. New York: Springer. Amari, S. and Nagaoka, H. (2001). Methods of Information Geometry. Translations of Mathematical Monographs, vol New York: American Mathematical Society. Cox, D. R. and Hinkley, D. V. (1974). London. Theoretical Statistics. Chapman and Hall, Eguchi, S and Copas, J. B. (2002). Biometrika, 89, A class of logistic-type discriminant functions. Kullback, S and Leibler, R. A. (1951). On information and sufficiency. Ann. Math. Statist., 55,
8 Lindsey, J. K. (1996). Parametric Statistical Inference. Oxford University Press, Oxford. Wald, A. (1949). Note on the consistency of the maximum likelihood estimate. Math. Statist., 20, Ann. 8
171:290 Model Selection Lecture II: The Akaike Information Criterion
171:290 Model Selection Lecture II: The Akaike Information Criterion Department of Biostatistics Department of Statistics and Actuarial Science August 28, 2012 Introduction AIC, the Akaike Information
Maximum Likelihood Estimation
Math 541: Statistical Theory II Lecturer: Songfeng Zheng Maximum Likelihood Estimation 1 Maximum Likelihood Estimation Maximum likelihood is a relatively simple method of constructing an estimator for
STATISTICA Formula Guide: Logistic Regression. Table of Contents
: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary
Introduction to Detection Theory
Introduction to Detection Theory Reading: Ch. 3 in Kay-II. Notes by Prof. Don Johnson on detection theory, see http://www.ece.rice.edu/~dhj/courses/elec531/notes5.pdf. Ch. 10 in Wasserman. EE 527, Detection
The CUSUM algorithm a small review. Pierre Granjon
The CUSUM algorithm a small review Pierre Granjon June, 1 Contents 1 The CUSUM algorithm 1.1 Algorithm............................... 1.1.1 The problem......................... 1.1. The different steps......................
Statistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes
. (3.3) n Note that supremum (3.2) must occur at one of the observed values x i or to the left of x i.
Chapter 3 Kolmogorov-Smirnov Tests There are many situations where experimenters need to know what is the distribution of the population of their interest. For example, if they want to use a parametric
Differentiating under an integral sign
CALIFORNIA INSTITUTE OF TECHNOLOGY Ma 2b KC Border Introduction to Probability and Statistics February 213 Differentiating under an integral sign In the derivation of Maximum Likelihood Estimators, or
Multivariate Normal Distribution
Multivariate Normal Distribution Lecture 4 July 21, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Lecture #4-7/21/2011 Slide 1 of 41 Last Time Matrices and vectors Eigenvalues
Why is Insurance Good? An Example Jon Bakija, Williams College (Revised October 2013)
Why is Insurance Good? An Example Jon Bakija, Williams College (Revised October 2013) Introduction The United States government is, to a rough approximation, an insurance company with an army. 1 That is
Signal Detection. Outline. Detection Theory. Example Applications of Detection Theory
Outline Signal Detection M. Sami Fadali Professor of lectrical ngineering University of Nevada, Reno Hypothesis testing. Neyman-Pearson (NP) detector for a known signal in white Gaussian noise (WGN). Matched
Time Series Analysis
Time Series Analysis [email protected] Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby 1 Outline of the lecture Identification of univariate time series models, cont.:
Testing theory an introduction
Testing theory an introduction P.J.G. Teunissen Series on Mathematical Geodesy and Positioning Testing theory an introduction Testing theory an introduction P.J.G. Teunissen Delft University of Technology
Survival Analysis of Left Truncated Income Protection Insurance Data. [March 29, 2012]
Survival Analysis of Left Truncated Income Protection Insurance Data [March 29, 2012] 1 Qing Liu 2 David Pitt 3 Yan Wang 4 Xueyuan Wu Abstract One of the main characteristics of Income Protection Insurance
UNDERSTANDING INFORMATION CRITERIA FOR SELECTION AMONG CAPTURE-RECAPTURE OR RING RECOVERY MODELS
UNDERSTANDING INFORMATION CRITERIA FOR SELECTION AMONG CAPTURE-RECAPTURE OR RING RECOVERY MODELS David R. Anderson and Kenneth P. Burnham Colorado Cooperative Fish and Wildlife Research Unit Room 201 Wagar
Discrete Mathematics and Probability Theory Fall 2009 Satish Rao, David Tse Note 2
CS 70 Discrete Mathematics and Probability Theory Fall 2009 Satish Rao, David Tse Note 2 Proofs Intuitively, the concept of proof should already be familiar We all like to assert things, and few of us
Duality in General Programs. Ryan Tibshirani Convex Optimization 10-725/36-725
Duality in General Programs Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: duality in linear programs Given c R n, A R m n, b R m, G R r n, h R r : min x R n c T x max u R m, v R r b T
Network quality control
Network quality control Network quality control P.J.G. Teunissen Delft Institute of Earth Observation and Space systems (DEOS) Delft University of Technology VSSD iv Series on Mathematical Geodesy and
JUST THE MATHS UNIT NUMBER 1.8. ALGEBRA 8 (Polynomials) A.J.Hobson
JUST THE MATHS UNIT NUMBER 1.8 ALGEBRA 8 (Polynomials) by A.J.Hobson 1.8.1 The factor theorem 1.8.2 Application to quadratic and cubic expressions 1.8.3 Cubic equations 1.8.4 Long division of polynomials
FALSE ALARMS IN FAULT-TOLERANT DOMINATING SETS IN GRAPHS. Mateusz Nikodem
Opuscula Mathematica Vol. 32 No. 4 2012 http://dx.doi.org/10.7494/opmath.2012.32.4.751 FALSE ALARMS IN FAULT-TOLERANT DOMINATING SETS IN GRAPHS Mateusz Nikodem Abstract. We develop the problem of fault-tolerant
Lecture 8: Signal Detection and Noise Assumption
ECE 83 Fall Statistical Signal Processing instructor: R. Nowak, scribe: Feng Ju Lecture 8: Signal Detection and Noise Assumption Signal Detection : X = W H : X = S + W where W N(, σ I n n and S = [s, s,...,
On using numerical algebraic geometry to find Lyapunov functions of polynomial dynamical systems
Dynamics at the Horsetooth Volume 2, 2010. On using numerical algebraic geometry to find Lyapunov functions of polynomial dynamical systems Eric Hanson Department of Mathematics Colorado State University
On Adaboost and Optimal Betting Strategies
On Adaboost and Optimal Betting Strategies Pasquale Malacaria School of Electronic Engineering and Computer Science Queen Mary, University of London Email: [email protected] Fabrizio Smeraldi School of
6 EXTENDING ALGEBRA. 6.0 Introduction. 6.1 The cubic equation. Objectives
6 EXTENDING ALGEBRA Chapter 6 Extending Algebra Objectives After studying this chapter you should understand techniques whereby equations of cubic degree and higher can be solved; be able to factorise
Linear Programming Notes V Problem Transformations
Linear Programming Notes V Problem Transformations 1 Introduction Any linear programming problem can be rewritten in either of two standard forms. In the first form, the objective is to maximize, the material
Math Review. for the Quantitative Reasoning Measure of the GRE revised General Test
Math Review for the Quantitative Reasoning Measure of the GRE revised General Test www.ets.org Overview This Math Review will familiarize you with the mathematical skills and concepts that are important
Introduction to General and Generalized Linear Models
Introduction to General and Generalized Linear Models General Linear Models - part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby
Bayesian Information Criterion The BIC Of Algebraic geometry
Generalized BIC for Singular Models Factoring through Regular Models Shaowei Lin http://math.berkeley.edu/ shaowei/ Department of Mathematics, University of California, Berkeley PhD student (Advisor: Bernd
Likelihood Approaches for Trial Designs in Early Phase Oncology
Likelihood Approaches for Trial Designs in Early Phase Oncology Clinical Trials Elizabeth Garrett-Mayer, PhD Cody Chiuzan, PhD Hollings Cancer Center Department of Public Health Sciences Medical University
Introduction to. Hypothesis Testing CHAPTER LEARNING OBJECTIVES. 1 Identify the four steps of hypothesis testing.
Introduction to Hypothesis Testing CHAPTER 8 LEARNING OBJECTIVES After reading this chapter, you should be able to: 1 Identify the four steps of hypothesis testing. 2 Define null hypothesis, alternative
1 Maximum likelihood estimation
COS 424: Interacting with Data Lecturer: David Blei Lecture #4 Scribes: Wei Ho, Michael Ye February 14, 2008 1 Maximum likelihood estimation 1.1 MLE of a Bernoulli random variable (coin flips) Given N
Reject Inference in Credit Scoring. Jie-Men Mok
Reject Inference in Credit Scoring Jie-Men Mok BMI paper January 2009 ii Preface In the Master programme of Business Mathematics and Informatics (BMI), it is required to perform research on a business
PUTNAM TRAINING POLYNOMIALS. Exercises 1. Find a polynomial with integral coefficients whose zeros include 2 + 5.
PUTNAM TRAINING POLYNOMIALS (Last updated: November 17, 2015) Remark. This is a list of exercises on polynomials. Miguel A. Lerma Exercises 1. Find a polynomial with integral coefficients whose zeros include
2.3 Convex Constrained Optimization Problems
42 CHAPTER 2. FUNDAMENTAL CONCEPTS IN CONVEX OPTIMIZATION Theorem 15 Let f : R n R and h : R R. Consider g(x) = h(f(x)) for all x R n. The function g is convex if either of the following two conditions
Nonparametric adaptive age replacement with a one-cycle criterion
Nonparametric adaptive age replacement with a one-cycle criterion P. Coolen-Schrijner, F.P.A. Coolen Department of Mathematical Sciences University of Durham, Durham, DH1 3LE, UK e-mail: [email protected]
Penalized Logistic Regression and Classification of Microarray Data
Penalized Logistic Regression and Classification of Microarray Data Milan, May 2003 Anestis Antoniadis Laboratoire IMAG-LMC University Joseph Fourier Grenoble, France Penalized Logistic Regression andclassification
Lecture 3: Linear methods for classification
Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,
Inference of Probability Distributions for Trust and Security applications
Inference of Probability Distributions for Trust and Security applications Vladimiro Sassone Based on joint work with Mogens Nielsen & Catuscia Palamidessi Outline 2 Outline Motivations 2 Outline Motivations
A linear algebraic method for pricing temporary life annuities
A linear algebraic method for pricing temporary life annuities P. Date (joint work with R. Mamon, L. Jalen and I.C. Wang) Department of Mathematical Sciences, Brunel University, London Outline Introduction
arxiv:1112.0829v1 [math.pr] 5 Dec 2011
How Not to Win a Million Dollars: A Counterexample to a Conjecture of L. Breiman Thomas P. Hayes arxiv:1112.0829v1 [math.pr] 5 Dec 2011 Abstract Consider a gambling game in which we are allowed to repeatedly
Notes from February 11
Notes from February 11 Math 130 Course web site: www.courses.fas.harvard.edu/5811 Two lemmas Before proving the theorem which was stated at the end of class on February 8, we begin with two lemmas. The
Linear Programming I
Linear Programming I November 30, 2003 1 Introduction In the VCR/guns/nuclear bombs/napkins/star wars/professors/butter/mice problem, the benevolent dictator, Bigus Piguinus, of south Antarctica penguins
Penalized regression: Introduction
Penalized regression: Introduction Patrick Breheny August 30 Patrick Breheny BST 764: Applied Statistical Modeling 1/19 Maximum likelihood Much of 20th-century statistics dealt with maximum likelihood
Continuity of the Perron Root
Linear and Multilinear Algebra http://dx.doi.org/10.1080/03081087.2014.934233 ArXiv: 1407.7564 (http://arxiv.org/abs/1407.7564) Continuity of the Perron Root Carl D. Meyer Department of Mathematics, North
Non Parametric Inference
Maura Department of Economics and Finance Università Tor Vergata Outline 1 2 3 Inverse distribution function Theorem: Let U be a uniform random variable on (0, 1). Let X be a continuous random variable
From the help desk: Bootstrapped standard errors
The Stata Journal (2003) 3, Number 1, pp. 71 80 From the help desk: Bootstrapped standard errors Weihua Guan Stata Corporation Abstract. Bootstrapping is a nonparametric approach for evaluating the distribution
A Model of Optimum Tariff in Vehicle Fleet Insurance
A Model of Optimum Tariff in Vehicle Fleet Insurance. Bouhetala and F.Belhia and R.Salmi Statistics and Probability Department Bp, 3, El-Alia, USTHB, Bab-Ezzouar, Alger Algeria. Summary: An approach about
1 3 4 = 8i + 20j 13k. x + w. y + w
) Find the point of intersection of the lines x = t +, y = 3t + 4, z = 4t + 5, and x = 6s + 3, y = 5s +, z = 4s + 9, and then find the plane containing these two lines. Solution. Solve the system of equations
Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression
Logistic Regression Department of Statistics The Pennsylvania State University Email: [email protected] Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max
MINIMIZATION OF ENTROPY FUNCTIONALS UNDER MOMENT CONSTRAINTS. denote the family of probability density functions g on X satisfying
MINIMIZATION OF ENTROPY FUNCTIONALS UNDER MOMENT CONSTRAINTS I. Csiszár (Budapest) Given a σ-finite measure space (X, X, µ) and a d-tuple ϕ = (ϕ 1,..., ϕ d ) of measurable functions on X, for a = (a 1,...,
AP Physics 1 and 2 Lab Investigations
AP Physics 1 and 2 Lab Investigations Student Guide to Data Analysis New York, NY. College Board, Advanced Placement, Advanced Placement Program, AP, AP Central, and the acorn logo are registered trademarks
Discussion on the paper Hypotheses testing by convex optimization by A. Goldenschluger, A. Juditsky and A. Nemirovski.
Discussion on the paper Hypotheses testing by convex optimization by A. Goldenschluger, A. Juditsky and A. Nemirovski. Fabienne Comte, Celine Duval, Valentine Genon-Catalot To cite this version: Fabienne
Improving predictive inference under covariate shift by weighting the log-likelihood function
Journal of Statistical Planning and Inference 90 (2000) 227 244.elsevier.com/locate/jspi Improving predictive inference under covariate shift by eighting the log-likelihood function Hidetoshi Shimodaira
Dongfeng Li. Autumn 2010
Autumn 2010 Chapter Contents Some statistics background; ; Comparing means and proportions; variance. Students should master the basic concepts, descriptive statistics measures and graphs, basic hypothesis
Algebra Unpacked Content For the new Common Core standards that will be effective in all North Carolina schools in the 2012-13 school year.
This document is designed to help North Carolina educators teach the Common Core (Standard Course of Study). NCDPI staff are continually updating and improving these tools to better serve teachers. Algebra
MOP 2007 Black Group Integer Polynomials Yufei Zhao. Integer Polynomials. June 29, 2007 Yufei Zhao [email protected]
Integer Polynomials June 9, 007 Yufei Zhao [email protected] We will use Z[x] to denote the ring of polynomials with integer coefficients. We begin by summarizing some of the common approaches used in dealing
Testing predictive performance of binary choice models 1
Testing predictive performance of binary choice models 1 Bas Donkers Econometric Institute and Department of Marketing Erasmus University Rotterdam Bertrand Melenberg Department of Econometrics Tilburg
The Kelly criterion for spread bets
IMA Journal of Applied Mathematics 2007 72,43 51 doi:10.1093/imamat/hxl027 Advance Access publication on December 5, 2006 The Kelly criterion for spread bets S. J. CHAPMAN Oxford Centre for Industrial
The VAR models discussed so fare are appropriate for modeling I(0) data, like asset returns or growth rates of macroeconomic time series.
Cointegration The VAR models discussed so fare are appropriate for modeling I(0) data, like asset returns or growth rates of macroeconomic time series. Economic theory, however, often implies equilibrium
Basics of Statistical Machine Learning
CS761 Spring 2013 Advanced Machine Learning Basics of Statistical Machine Learning Lecturer: Xiaojin Zhu [email protected] Modern machine learning is rooted in statistics. You will find many familiar
4.6 Linear Programming duality
4.6 Linear Programming duality To any minimization (maximization) LP we can associate a closely related maximization (minimization) LP. Different spaces and objective functions but in general same optimal
Breaking The Code. Ryan Lowe. Ryan Lowe is currently a Ball State senior with a double major in Computer Science and Mathematics and
Breaking The Code Ryan Lowe Ryan Lowe is currently a Ball State senior with a double major in Computer Science and Mathematics and a minor in Applied Physics. As a sophomore, he took an independent study
INTRODUCTION TO RATING MODELS
INTRODUCTION TO RATING MODELS Dr. Daniel Straumann Credit Suisse Credit Portfolio Analytics Zurich, May 26, 2005 May 26, 2005 / Daniel Straumann Slide 2 Motivation Due to the Basle II Accord every bank
SAS Software to Fit the Generalized Linear Model
SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling
Exact Nonparametric Tests for Comparing Means - A Personal Summary
Exact Nonparametric Tests for Comparing Means - A Personal Summary Karl H. Schlag European University Institute 1 December 14, 2006 1 Economics Department, European University Institute. Via della Piazzuola
OPRE 6201 : 2. Simplex Method
OPRE 6201 : 2. Simplex Method 1 The Graphical Method: An Example Consider the following linear program: Max 4x 1 +3x 2 Subject to: 2x 1 +3x 2 6 (1) 3x 1 +2x 2 3 (2) 2x 2 5 (3) 2x 1 +x 2 4 (4) x 1, x 2
Separation Properties for Locally Convex Cones
Journal of Convex Analysis Volume 9 (2002), No. 1, 301 307 Separation Properties for Locally Convex Cones Walter Roth Department of Mathematics, Universiti Brunei Darussalam, Gadong BE1410, Brunei Darussalam
Lecture 14: Section 3.3
Lecture 14: Section 3.3 Shuanglin Shao October 23, 2013 Definition. Two nonzero vectors u and v in R n are said to be orthogonal (or perpendicular) if u v = 0. We will also agree that the zero vector in
Review Horse Race Gambling and Side Information Dependent horse races and the entropy rate. Gambling. Besma Smida. ES250: Lecture 9.
Gambling Besma Smida ES250: Lecture 9 Fall 2008-09 B. Smida (ES250) Gambling Fall 2008-09 1 / 23 Today s outline Review of Huffman Code and Arithmetic Coding Horse Race Gambling and Side Information Dependent
Pattern Analysis. Logistic Regression. 12. Mai 2009. Joachim Hornegger. Chair of Pattern Recognition Erlangen University
Pattern Analysis Logistic Regression 12. Mai 2009 Joachim Hornegger Chair of Pattern Recognition Erlangen University Pattern Analysis 2 / 43 1 Logistic Regression Posteriors and the Logistic Function Decision
Permutation Tests for Comparing Two Populations
Permutation Tests for Comparing Two Populations Ferry Butar Butar, Ph.D. Jae-Wan Park Abstract Permutation tests for comparing two populations could be widely used in practice because of flexibility of
Technical Efficiency Accounting for Environmental Influence in the Japanese Gas Market
Technical Efficiency Accounting for Environmental Influence in the Japanese Gas Market Sumiko Asai Otsuma Women s University 2-7-1, Karakida, Tama City, Tokyo, 26-854, Japan [email protected] Abstract:
6 Hedging Using Futures
ECG590I Asset Pricing. Lecture 6: Hedging Using Futures 1 6 Hedging Using Futures 6.1 Types of hedges using futures Two types of hedge: short and long. ECG590I Asset Pricing. Lecture 6: Hedging Using Futures
The Envelope Theorem 1
John Nachbar Washington University April 2, 2015 1 Introduction. The Envelope Theorem 1 The Envelope theorem is a corollary of the Karush-Kuhn-Tucker theorem (KKT) that characterizes changes in the value
Integer roots of quadratic and cubic polynomials with integer coefficients
Integer roots of quadratic and cubic polynomials with integer coefficients Konstantine Zelator Mathematics, Computer Science and Statistics 212 Ben Franklin Hall Bloomsburg University 400 East Second Street
FIN-40008 FINANCIAL INSTRUMENTS SPRING 2008. Options
FIN-40008 FINANCIAL INSTRUMENTS SPRING 2008 Options These notes describe the payoffs to European and American put and call options the so-called plain vanilla options. We consider the payoffs to these
Examples of Tasks from CCSS Edition Course 3, Unit 5
Examples of Tasks from CCSS Edition Course 3, Unit 5 Getting Started The tasks below are selected with the intent of presenting key ideas and skills. Not every answer is complete, so that teachers can
18.06 Problem Set 4 Solution Due Wednesday, 11 March 2009 at 4 pm in 2-106. Total: 175 points.
806 Problem Set 4 Solution Due Wednesday, March 2009 at 4 pm in 2-06 Total: 75 points Problem : A is an m n matrix of rank r Suppose there are right-hand-sides b for which A x = b has no solution (a) What
INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition)
INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition) Abstract Indirect inference is a simulation-based method for estimating the parameters of economic models. Its
Bayesian Updating with Discrete Priors Class 11, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom
1 Learning Goals Bayesian Updating with Discrete Priors Class 11, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom 1. Be able to apply Bayes theorem to compute probabilities. 2. Be able to identify
Poisson Models for Count Data
Chapter 4 Poisson Models for Count Data In this chapter we study log-linear models for count data under the assumption of a Poisson error structure. These models have many applications, not only to the
Important Probability Distributions OPRE 6301
Important Probability Distributions OPRE 6301 Important Distributions... Certain probability distributions occur with such regularity in real-life applications that they have been given their own names.
HOMEWORK 5 SOLUTIONS. n!f n (1) lim. ln x n! + xn x. 1 = G n 1 (x). (2) k + 1 n. (n 1)!
Math 7 Fall 205 HOMEWORK 5 SOLUTIONS Problem. 2008 B2 Let F 0 x = ln x. For n 0 and x > 0, let F n+ x = 0 F ntdt. Evaluate n!f n lim n ln n. By directly computing F n x for small n s, we obtain the following
Introduction to Logistic Regression
OpenStax-CNX module: m42090 1 Introduction to Logistic Regression Dan Calderon This work is produced by OpenStax-CNX and licensed under the Creative Commons Attribution License 3.0 Abstract Gives introduction
Logistic Regression. Vibhav Gogate The University of Texas at Dallas. Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld.
Logistic Regression Vibhav Gogate The University of Texas at Dallas Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld. Generative vs. Discriminative Classifiers Want to Learn: h:x Y X features
Exact Confidence Intervals
Math 541: Statistical Theory II Instructor: Songfeng Zheng Exact Confidence Intervals Confidence intervals provide an alternative to using an estimator ˆθ when we wish to estimate an unknown parameter
The relationship between exchange rates, interest rates. In this lecture we will learn how exchange rates accommodate equilibrium in
The relationship between exchange rates, interest rates In this lecture we will learn how exchange rates accommodate equilibrium in financial markets. For this purpose we examine the relationship between
MATH 90 CHAPTER 1 Name:.
MATH 90 CHAPTER 1 Name:. 1.1 Introduction to Algebra Need To Know What are Algebraic Expressions? Translating Expressions Equations What is Algebra? They say the only thing that stays the same is change.
Linear Codes. Chapter 3. 3.1 Basics
Chapter 3 Linear Codes In order to define codes that we can encode and decode efficiently, we add more structure to the codespace. We shall be mainly interested in linear codes. A linear code of length
Statistical Machine Learning from Data
Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Gaussian Mixture Models Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique
Rolle s Theorem. q( x) = 1
Lecture 1 :The Mean Value Theorem We know that constant functions have derivative zero. Is it possible for a more complicated function to have derivative zero? In this section we will answer this question
Applied Statistics. J. Blanchet and J. Wadsworth. Institute of Mathematics, Analysis, and Applications EPF Lausanne
Applied Statistics J. Blanchet and J. Wadsworth Institute of Mathematics, Analysis, and Applications EPF Lausanne An MSc Course for Applied Mathematicians, Fall 2012 Outline 1 Model Comparison 2 Model
There are three kinds of people in the world those who are good at math and those who are not. PSY 511: Advanced Statistics for Psychological and Behavioral Research 1 Positive Views The record of a month
2 Primality and Compositeness Tests
Int. J. Contemp. Math. Sciences, Vol. 3, 2008, no. 33, 1635-1642 On Factoring R. A. Mollin Department of Mathematics and Statistics University of Calgary, Calgary, Alberta, Canada, T2N 1N4 http://www.math.ucalgary.ca/
Statistical Machine Translation: IBM Models 1 and 2
Statistical Machine Translation: IBM Models 1 and 2 Michael Collins 1 Introduction The next few lectures of the course will be focused on machine translation, and in particular on statistical machine translation
RINGS WITH A POLYNOMIAL IDENTITY
RINGS WITH A POLYNOMIAL IDENTITY IRVING KAPLANSKY 1. Introduction. In connection with his investigation of projective planes, M. Hall [2, Theorem 6.2]* proved the following theorem: a division ring D in
