A tutorial on Bayesian model selection. and on the BMSL Laplace approximation
|
|
- Aldous Alexander
- 7 years ago
- Views:
Transcription
1 A tutorial on Bayesian model selection and on the BMSL Laplace approximation Jean-Luc Institut de la Communication Parlée, CNRS UMR 5009, INPG-Université Stendhal INPG, 46 Av. Félix Viallet, Grenoble Cedex 1, France 1
2 A. The Bayesian framework for model assessment In most papers comparing models in the field of speech perception, the tool used to compare models is the fit estimated by the root mean square error RMSE, computed by taking the squared distances between observed and predicted probabilities of responses, averaging them over all categories Ci (total number n C ) and all experimental conditions Ej (total number n E ), and taking the square root of the result: RMSE = [( Ej, Ci (P Ej (C i ) p Ej (C i )) 2 ) / (n E n C )] 1/2 (1) The fit may be derived from the logarithm of the maximum likelihood of a model, considering a data set. If D is a set of k data d i, and M a model with parameters Θ, the estimation of the best set of parameter values θ is provided by (1) : θ = argmax p(θ D,M) (2) or, through the Bayes formula and assuming that all values of Θ are a priori equiprobable: θ = argmax p(d Θ,M) (3) If the model predicts that the d i values come from Gaussian models (δ i, σ i ), we have: log (p(d Θ,M)) = constant 1/2 i (d i -δ i ) 2 /σ i 2 (4) log (p(d θ,m)) = constant k/2 RMSE /σ 2 if σ i 2 =σ 2 i (5) 2
3 Hence the θ parameters maximizing the likelihood of M are those providing the best fit by minimizing RMSE. Notice that if the d i values do not come from Gaussian models, maximal likelihood is no more equivalent to best fit, which is typically the case with models of audiovisual categorization data, involving multinomial laws. More importantly, in the Bayesian theory, the comparison of two models is more complex than the comparison of their best fit (Jaynes, 1995). Indeed, comparing a model M 1 with a model M 2 by comparing their best fits means that there is a first step of estimation of these best fits, and it must be acknowledged that the estimation process is not error-free. Therefore, the comparison must account for this error-prone process, which is done by computing the total likelihood of the model knowing the data. This results in integrating likelihood over all model parameter values: p(d M)= p(d,θ M) dθ= p(d Θ,M) p(θ M) dθ= L(D M) p(θ M) dθ (6) where L(Θ M) is the likelihood of parameter Θ for the model, considering the data: L(Θ M) = p(d Θ,M) (7) This means that the a priori distribution of data D knowing model M integrates the distribution for all values Θ of the parameters of the model. Taking the opposite of the logarithm of total likelihood leads to the so-called Bayesian Model Selection (BMS) criterion for model evaluation (MacKay, 1992, Pitt & Myung, 2002): BMS = log L(Θ M) p(θ M) dθ (8) 3
4 Let us consider two models M 1 and M 2 that have to be compared in relation to a data set D. The best fit θ 1 for model M 1 provides a posterior likelihood Λ 1 =max p(d Θ 1,M 1 ) and the best fit θ 2 for model M 2 provides a posterior likelihood Λ 2 =max p(d Θ 2,M 2 ). From Eq. (6) it follows that the model comparison criterion is not provided by Λ 1 /Λ 2 (or by comparing RMSE 1 and RMSE 2, as classically done), but by: p(m 1 D) / p(m 2 D) = Λ 1 W 1 / Λ 2 W 2 (9) with: W i = [p(d Θ i,m i )/p(d θ i,m i )] p(θ i M i )dθ i (10) The ratio in Eq. (9) is called the Bayes factor (Kass & Raftery, 1995). The term p(d Θ i,m i )/p(d θ i,m i ) in Eq. (10) evaluates the likelihood of Θ i values relative to the likelihood of the θ i set providing the highest likelihood Λ i for model M i. Hence W i evaluates the volume of Θ i values providing an acceptable fit (not too far from the best one) relative to the whole volume of possible Θ i values. This relative volume decreases with the increase of the total Θ i volume: for example with the dimension of the Θ i space (2). But it also decreases if the function p(d Θ i,m i )/p(d θ i,m i ) decreases too quickly: this is what happens if the model is too sensitive. 4
5 B. BMSL, a simple and intuitive approximation of BMS The computation of BMS through Eq. (8) or the Bayes factor through Eq. (9-10) is complex. It involves the estimation of an integral, which generally requires use of numerical integration techniques, typically Monte-Carlo methods (e.g. Gilks et al., 1996). However, Jaynes (1995, ch. 24) proposes an approximation of the total likelihood in Eq. (6), based on an expansion of log(l) around the maximum likelihood point θ. Log(L(Θ)) Log(L(θ)) + 1/2 (Θ θ) [ 2 log(l) / Θ 2 ] θ (Θ θ) (11) where [ 2 log(l) / Θ 2 ] θ is the Hessian matrix of the function log(l) computed at the position of the parameter set θ providing the maximal likelihood L max of the considered model. Then, near this position, a good approximation of the likelihood is provided by: L(Θ) L max exp [ 1/2 (Θ θ) Σ 1 (Θ θ) ] (12) that is a multivariate Gaussian function with the inverse covariance matrix : Σ 1 = [ 2 log(l) / Θ 2 ] θ (13) Coming back to Eq. (6), and assuming that there is no a priori assumption on the distribution of parameters Θ, that is their distribution is uniform, we obtain: p(d M) = L(Θ M) p(θ M)dΘ L max exp [ 1/2 (Θ θ) Σ 1 (Θ θ) ] p(θ M) dθ (14) 5
6 Since p(θ M) is constant, the integral is now simply the volume of a Gaussian distribution: p(d M) L max (2π) m/2 det(σ) / V (15) where V is the total volume of the space occupied by parameters Θ and m is its dimension, that is the number of free parameters in the considered model. This leads to the so-called Laplace approximation of the BMS criterion (Kass & Raftery, 1995): BMSL = log(l max ) m/2 log(2π) + log(v) 1/2 log(det(σ)) (16) The preferred model considering the data D should minimize the BMSL criterion. There are in fact three kinds of terms in Eq. (16). Firstly, the term log(l max ) is directly linked to the maximum likelihood of the model, more or less accurately estimated by RMSE in Eq. (5): the larger the maximum likelihood, the smaller the BMSL criterion. Then, the two following terms are linked to the dimensionality and volume of the considered model. Altogether, they result in the handicapping of models that are too large (that is, models with a too high number of free parameters) by increasing BMSL (3). Finally, the fourth term provides exactly what we were looking for: that is, a term favoring models with a large value of det(σ). Indeed, if det(σ) is large, this means that the determinant of the Hessian matrix of log(l) is small, which expresses that the likelihood L does not vary too quickly around its maximum value L max. This is the precise mathematical way the BMSL criterion integrates fit (provided by the first term in Eq. (16)) and stability (provided by the fourth term), the second and third term just being there to account for possible differences in the global size of the tested models. Notice that if two models with the same number of free parameters and occupying 6
7 the same size are compared on a given data set D, BMSL just depends on the first and fourth terms, which is the (fit + stability) compromise we were looking for. Bayesian Model Selection has already been applied to the comparison of AV speech perception models, including FLMP (see Myung & Pitt, 1997; Massaro et al., 2001; Pitt et al., 2003). However, this involved heavy computations of integrals in Eq. (10) through Monte Carlo techniques, which would be difficult to apply in all the model comparison works in the domain. BMSL has the double interest to be easy to compute, and easy to interpret in terms of (fit + stability) compromise. Furthermore, if the amount of available data is much higher than the number of parameters involved in the models to compare (that is, the dimension m of the Θ space) the probability distributions become highly peaked around their maxima, and the central limit theorem shows that the approximation in Eqs. (11-12) becomes quite reasonable (Walker, 1967). Kass & Raftery (1995) suggest that the approximation should work well for a sample size greater than 20 times the parameter size m (see Slate, 1999, for further discussions about assessing non-normality). 7
8 C. Implementing BMSL for audiovisual speech perception experiments An audiovisual speech perception experiment typically involves various experimental conditions E i (e.g. various A, V, AV stimuli, conflicting or not), with categorization data described by observed frequencies p ij for each category C j in each condition E i (Σ j p ij = 1 for all values of i). A model M, depending on m free parameters Θ, predicts probabilities P ij (Θ) for each category C j in each condition E i. The distribution of probabilities in each experimental condition follows a multinomial law hence the logarithm of the likelihood of the Θ parameter set can be approximated by: log(l(θ)) = Σ ij n i (p ij log(p ij (Θ / p ij )) (17) where n i is the total number of responses provided by the subjects in condition E i. Therefore, the computation of BMSL can be easily done in four steps: (i) select the value of the Θ parameter set maximizing log(l(θ)), that is θ providing log(l(θ)) = L max ; (ii) compute the Hessian matrix of log(l) around θ, and its opposite inverse Σ; (iii) estimate the volume V of the Θ parameter set; (iv) compute BMSL according to Eq. (16). Let us take as an example the Fuzzy-Logical Model of Perception (FLMP) (Massaro, 1987, 1998) simulation of a test-case with two categories C1 and C2; one A, one V and one AV condition; and the following pattern of data: p A (C1)=0.99, p V (C1)=0.01, and p AV (C1)=0.95 obtained on 10 repetitions of each condition (n=10). The basic FLMP equation is: 8
9 P AV (C i ) = P A (C i )P V (C i ) / j P A (C j )P V (C j ) (18) Ci and Cj being phonetic categories involved in the experiment, and P A, P V and P AV the model probability of responses respectively in the A, V and AV conditions (observed probabilities are in lower case and simulated probabilities in upper case throughout this paper).the FLMP depends on two parameters Θ A and Θ V, varying each one between 0 and 1, hence in Eq. (16) we take m=2 and V=1. Θ A and Θ V respectively predict the audio and video responses: P A (C1) = Θ A P V (C1) = Θ V while the AV response is predicted by Eq (18) : P AV (C1) = Θ A Θ V / (Θ A Θ V + (1 Θ A ) (1 Θ V )) The probabilities of category C2 are of course the complement to 1 of all values for C1: P A (C2) = 1 P A (C1) P V (C2) = 1 P V (C1) P AV (C2) = 1 P AV (C1) In the continuation, all observed and predicted probabilities for C1 are respectively called p or P, and all observed and predicted probabilities for C2 are respectively called q or Q. This enables to compute the model log-likelihood function from Eq. (17): log(l(θ)) = n (p A log(p A /p A ) + q A log(q A /q A ) + p V log(p V /p V ) + q V log(q V /q V ) + p AV log(p AV /p AV ) + q AV log(q AV /q AV )) The next step consists in minimizing log(l(θ)) over the range Θ A,Θ V [0, 1]. This can be done by any optimization algorithm available in various libraries. In the present case, the minimum should be obtained around: θ A =
10 θ V = which provide: log(l(θ A, θ V )) = log(l max ) = This is the end of step (i). Step (ii) consists in the computation of the Hessian matrix H of log(l) around θ. This can be done by classical numeric approximations of differential functions by Taylor developments. The core program, which can be directly implemented by users of the BMSL algorithm, is provided here under: ε = ; z = zeros (1,m); for i = 1:m e = z ; e(i) = ε ; H (i, i) = (log(l(θ+e)) + log(l(θ e)) 2* log(l(θ))) / ε 2 ; end for i=1:m for j=(i+1):m e=z; e(i)= ε ; e(j)= ε ; b = (log(l(θ+e)) + log(l(θ e)) 2* log(l(θ))) / ε 2 ; H (i, j) = (b H (i, i) H (j, j)) / 2; H (j, i) = H(i, j); end end Σ = inv(h); Computation of BMSL can then be done from Eq. 16, in which all terms are now computed. This provides a BMSL value of 7.94 in the present example. 10
11 Footnotes 1. In the following, bold symbols deal with vectors or matrices, and all maximizations are computed on the model parameter set Θ. 2. Massaro (1998) proposes to apply a correction factor k/(k-f) to RMSE, with k the number of data and f the freedom degree of the model (p. 301). 3. The interpretation of the term log(v) is straightforward, and results in handicapping large models by increasing BMSL. The term m/2 log(2π) comes more indirectly from the analysis, and could seem to favor large models. In fact, it can only decrease the trend to favor small models over large ones. 11
12 References Gilks, W.R., Richardson, S., & Spiegelhalter, D.J. (1996). Markov Chain Monte Carlo in Practice. New-York: Chapman & Hall. Jaynes E.T. (1995). Probability theory - The logic of science. Cambridge University Press (in press). Kass, R.E., & Raftery, A.E. (1995). Bayes factor, Journal of the American Statistical Association 90, MacKay, D.J.C. (1992). Bayesian interpolation, Neural Computation 4, Massaro, D.W. (1987). Speech perception by ear and eye: a paradigm for psychological inquiry. London: Laurence Erlbaum Associates. Massaro, D.W. (1998). Perceiving Talking Faces. Cambridge: MIT Press. Massaro, D.W., Cohen, M. M., Campbell, C.S., & Rodriguez, T. (2001). Bayes factor of model selection validates FLMP, Psychonomic Bulletin & Review 8, Myung, I. J., & Pitt, M. A. (1997). Applying Occam's razor in modeling cognition: A Bayesian approach, Psychonomic Bulletin & Review 4, Pitt, M.A., & Myung, I.J. (2002). When a good fit can be bad., Trends in Cognitive Science 6, Pitt, M.A., Kim, W., & Myung, I.J. (2003). Flexibility versus generalizablity in model selection., Psychonomic Bulletin & Review 10, Slate, E.H. (1999). Assessing multivariate nonnormality using univariate distributions, Biometrika 86, Walker, A.M. (1967). On the asymptotic behaviour of posterior distributions, J. R. Stat. Soc. B. 31,
STA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct
More informationBayesian Statistics: Indian Buffet Process
Bayesian Statistics: Indian Buffet Process Ilker Yildirim Department of Brain and Cognitive Sciences University of Rochester Rochester, NY 14627 August 2012 Reference: Most of the material in this note
More informationA Bayesian Antidote Against Strategy Sprawl
A Bayesian Antidote Against Strategy Sprawl Benjamin Scheibehenne (benjamin.scheibehenne@unibas.ch) University of Basel, Missionsstrasse 62a 4055 Basel, Switzerland & Jörg Rieskamp (joerg.rieskamp@unibas.ch)
More informationStatistics Graduate Courses
Statistics Graduate Courses STAT 7002--Topics in Statistics-Biological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.
More informationTutorial on Markov Chain Monte Carlo
Tutorial on Markov Chain Monte Carlo Kenneth M. Hanson Los Alamos National Laboratory Presented at the 29 th International Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Technology,
More informationLinear Classification. Volker Tresp Summer 2015
Linear Classification Volker Tresp Summer 2015 1 Classification Classification is the central task of pattern recognition Sensors supply information about an object: to which class do the object belong
More informationIntroduction to General and Generalized Linear Models
Introduction to General and Generalized Linear Models General Linear Models - part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby
More informationBasics of Statistical Machine Learning
CS761 Spring 2013 Advanced Machine Learning Basics of Statistical Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu Modern machine learning is rooted in statistics. You will find many familiar
More informationSTATISTICA Formula Guide: Logistic Regression. Table of Contents
: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary
More informationStatistics in Retail Finance. Chapter 6: Behavioural models
Statistics in Retail Finance 1 Overview > So far we have focussed mainly on application scorecards. In this chapter we shall look at behavioural models. We shall cover the following topics:- Behavioural
More informationi=1 In practice, the natural logarithm of the likelihood function, called the log-likelihood function and denoted by
Statistics 580 Maximum Likelihood Estimation Introduction Let y (y 1, y 2,..., y n be a vector of iid, random variables from one of a family of distributions on R n and indexed by a p-dimensional parameter
More informationBayesian probability theory
Bayesian probability theory Bruno A. Olshausen arch 1, 2004 Abstract Bayesian probability theory provides a mathematical framework for peforming inference, or reasoning, using probability. The foundations
More information11 Linear and Quadratic Discriminant Analysis, Logistic Regression, and Partial Least Squares Regression
Frank C Porter and Ilya Narsky: Statistical Analysis Techniques in Particle Physics Chap. c11 2013/9/9 page 221 le-tex 221 11 Linear and Quadratic Discriminant Analysis, Logistic Regression, and Partial
More informationCHAPTER 2 Estimating Probabilities
CHAPTER 2 Estimating Probabilities Machine Learning Copyright c 2016. Tom M. Mitchell. All rights reserved. *DRAFT OF January 24, 2016* *PLEASE DO NOT DISTRIBUTE WITHOUT AUTHOR S PERMISSION* This is a
More informationStatistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes
More informationLinear Threshold Units
Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear
More informationLogistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression
Logistic Regression Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max
More informationModeling Individual Differences in Category Learning
Modeling Individual Differences in Category Learning Michael R. Webb (michael.webb@dsto.defence.gov.au) Command and Control Division, Defence Science and Technology Organisation Edinburgh, South Australia,
More informationA hidden Markov model for criminal behaviour classification
RSS2004 p.1/19 A hidden Markov model for criminal behaviour classification Francesco Bartolucci, Institute of economic sciences, Urbino University, Italy. Fulvia Pennoni, Department of Statistics, University
More informationHandling attrition and non-response in longitudinal data
Longitudinal and Life Course Studies 2009 Volume 1 Issue 1 Pp 63-72 Handling attrition and non-response in longitudinal data Harvey Goldstein University of Bristol Correspondence. Professor H. Goldstein
More informationAdaptive Online Gradient Descent
Adaptive Online Gradient Descent Peter L Bartlett Division of Computer Science Department of Statistics UC Berkeley Berkeley, CA 94709 bartlett@csberkeleyedu Elad Hazan IBM Almaden Research Center 650
More informationExample: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.
Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C
More informationBy choosing to view this document, you agree to all provisions of the copyright laws protecting it.
This material is posted here with permission of the IEEE Such permission of the IEEE does not in any way imply IEEE endorsement of any of Helsinki University of Technology's products or services Internal
More informationDepartment of Economics
Department of Economics On Testing for Diagonality of Large Dimensional Covariance Matrices George Kapetanios Working Paper No. 526 October 2004 ISSN 1473-0278 On Testing for Diagonality of Large Dimensional
More informationAPPLIED MISSING DATA ANALYSIS
APPLIED MISSING DATA ANALYSIS Craig K. Enders Series Editor's Note by Todd D. little THE GUILFORD PRESS New York London Contents 1 An Introduction to Missing Data 1 1.1 Introduction 1 1.2 Chapter Overview
More informationChristfried Webers. Canberra February June 2015
c Statistical Group and College of Engineering and Computer Science Canberra February June (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 829 c Part VIII Linear Classification 2 Logistic
More informationProbabilistic Linear Classification: Logistic Regression. Piyush Rai IIT Kanpur
Probabilistic Linear Classification: Logistic Regression Piyush Rai IIT Kanpur Probabilistic Machine Learning (CS772A) Jan 18, 2016 Probabilistic Machine Learning (CS772A) Probabilistic Linear Classification:
More informationLOGISTIC REGRESSION. Nitin R Patel. where the dependent variable, y, is binary (for convenience we often code these values as
LOGISTIC REGRESSION Nitin R Patel Logistic regression extends the ideas of multiple linear regression to the situation where the dependent variable, y, is binary (for convenience we often code these values
More informationDirichlet forms methods for error calculus and sensitivity analysis
Dirichlet forms methods for error calculus and sensitivity analysis Nicolas BOULEAU, Osaka university, november 2004 These lectures propose tools for studying sensitivity of models to scalar or functional
More informationProbabilistic user behavior models in online stores for recommender systems
Probabilistic user behavior models in online stores for recommender systems Tomoharu Iwata Abstract Recommender systems are widely used in online stores because they are expected to improve both user
More informationBayesX - Software for Bayesian Inference in Structured Additive Regression
BayesX - Software for Bayesian Inference in Structured Additive Regression Thomas Kneib Faculty of Mathematics and Economics, University of Ulm Department of Statistics, Ludwig-Maximilians-University Munich
More informationMultivariate Normal Distribution
Multivariate Normal Distribution Lecture 4 July 21, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Lecture #4-7/21/2011 Slide 1 of 41 Last Time Matrices and vectors Eigenvalues
More informationGaussian Conjugate Prior Cheat Sheet
Gaussian Conjugate Prior Cheat Sheet Tom SF Haines 1 Purpose This document contains notes on how to handle the multivariate Gaussian 1 in a Bayesian setting. It focuses on the conjugate prior, its Bayesian
More informationGaussian Processes in Machine Learning
Gaussian Processes in Machine Learning Carl Edward Rasmussen Max Planck Institute for Biological Cybernetics, 72076 Tübingen, Germany carl@tuebingen.mpg.de WWW home page: http://www.tuebingen.mpg.de/ carl
More informationIntroduction to Detection Theory
Introduction to Detection Theory Reading: Ch. 3 in Kay-II. Notes by Prof. Don Johnson on detection theory, see http://www.ece.rice.edu/~dhj/courses/elec531/notes5.pdf. Ch. 10 in Wasserman. EE 527, Detection
More informationApplication of discriminant analysis to predict the class of degree for graduating students in a university system
International Journal of Physical Sciences Vol. 4 (), pp. 06-0, January, 009 Available online at http://www.academicjournals.org/ijps ISSN 99-950 009 Academic Journals Full Length Research Paper Application
More informationComputing with Finite and Infinite Networks
Computing with Finite and Infinite Networks Ole Winther Theoretical Physics, Lund University Sölvegatan 14 A, S-223 62 Lund, Sweden winther@nimis.thep.lu.se Abstract Using statistical mechanics results,
More informationMonotonicity Hints. Abstract
Monotonicity Hints Joseph Sill Computation and Neural Systems program California Institute of Technology email: joe@cs.caltech.edu Yaser S. Abu-Mostafa EE and CS Deptartments California Institute of Technology
More informationAn introduction to Value-at-Risk Learning Curve September 2003
An introduction to Value-at-Risk Learning Curve September 2003 Value-at-Risk The introduction of Value-at-Risk (VaR) as an accepted methodology for quantifying market risk is part of the evolution of risk
More informationClassification by Pairwise Coupling
Classification by Pairwise Coupling TREVOR HASTIE * Stanford University and ROBERT TIBSHIRANI t University of Toronto Abstract We discuss a strategy for polychotomous classification that involves estimating
More informationIncorporating cost in Bayesian Variable Selection, with application to cost-effective measurement of quality of health care.
Incorporating cost in Bayesian Variable Selection, with application to cost-effective measurement of quality of health care University of Florida 10th Annual Winter Workshop: Bayesian Model Selection and
More informationLecture 3: Linear methods for classification
Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,
More informationThe equivalence of logistic regression and maximum entropy models
The equivalence of logistic regression and maximum entropy models John Mount September 23, 20 Abstract As our colleague so aptly demonstrated ( http://www.win-vector.com/blog/20/09/the-simplerderivation-of-logistic-regression/
More informationMultivariate Analysis of Variance (MANOVA): I. Theory
Gregory Carey, 1998 MANOVA: I - 1 Multivariate Analysis of Variance (MANOVA): I. Theory Introduction The purpose of a t test is to assess the likelihood that the means for two groups are sampled from the
More informationS-Parameters and Related Quantities Sam Wetterlin 10/20/09
S-Parameters and Related Quantities Sam Wetterlin 10/20/09 Basic Concept of S-Parameters S-Parameters are a type of network parameter, based on the concept of scattering. The more familiar network parameters
More informationUsing SAS PROC MCMC to Estimate and Evaluate Item Response Theory Models
Using SAS PROC MCMC to Estimate and Evaluate Item Response Theory Models Clement A Stone Abstract Interest in estimating item response theory (IRT) models using Bayesian methods has grown tremendously
More information11. Time series and dynamic linear models
11. Time series and dynamic linear models Objective To introduce the Bayesian approach to the modeling and forecasting of time series. Recommended reading West, M. and Harrison, J. (1997). models, (2 nd
More informationLanguage Modeling. Chapter 1. 1.1 Introduction
Chapter 1 Language Modeling (Course notes for NLP by Michael Collins, Columbia University) 1.1 Introduction In this chapter we will consider the the problem of constructing a language model from a set
More informationItem selection by latent class-based methods: an application to nursing homes evaluation
Item selection by latent class-based methods: an application to nursing homes evaluation Francesco Bartolucci, Giorgio E. Montanari, Silvia Pandolfi 1 Department of Economics, Finance and Statistics University
More informationOverview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model
Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written
More informationMarkov Chain Monte Carlo Simulation Made Simple
Markov Chain Monte Carlo Simulation Made Simple Alastair Smith Department of Politics New York University April2,2003 1 Markov Chain Monte Carlo (MCMC) simualtion is a powerful technique to perform numerical
More informationA Game Theoretical Framework for Adversarial Learning
A Game Theoretical Framework for Adversarial Learning Murat Kantarcioglu University of Texas at Dallas Richardson, TX 75083, USA muratk@utdallas Chris Clifton Purdue University West Lafayette, IN 47907,
More informationLeast-Squares Intersection of Lines
Least-Squares Intersection of Lines Johannes Traa - UIUC 2013 This write-up derives the least-squares solution for the intersection of lines. In the general case, a set of lines will not intersect at a
More informationLogistic Regression (1/24/13)
STA63/CBB540: Statistical methods in computational biology Logistic Regression (/24/3) Lecturer: Barbara Engelhardt Scribe: Dinesh Manandhar Introduction Logistic regression is model for regression used
More informationMulti-variable Calculus and Optimization
Multi-variable Calculus and Optimization Dudley Cooke Trinity College Dublin Dudley Cooke (Trinity College Dublin) Multi-variable Calculus and Optimization 1 / 51 EC2040 Topic 3 - Multi-variable Calculus
More informationPredict the Popularity of YouTube Videos Using Early View Data
000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050
More informationProbabilistic Methods for Time-Series Analysis
Probabilistic Methods for Time-Series Analysis 2 Contents 1 Analysis of Changepoint Models 1 1.1 Introduction................................ 1 1.1.1 Model and Notation....................... 2 1.1.2 Example:
More informationFactor Analysis. Chapter 420. Introduction
Chapter 420 Introduction (FA) is an exploratory technique applied to a set of observed variables that seeks to find underlying factors (subsets of variables) from which the observed variables were generated.
More information15.062 Data Mining: Algorithms and Applications Matrix Math Review
.6 Data Mining: Algorithms and Applications Matrix Math Review The purpose of this document is to give a brief review of selected linear algebra concepts that will be useful for the course and to develop
More informationHow To Check For Differences In The One Way Anova
MINITAB ASSISTANT WHITE PAPER This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. One-Way
More informationBayesian Machine Learning (ML): Modeling And Inference in Big Data. Zhuhua Cai Google, Rice University caizhua@gmail.com
Bayesian Machine Learning (ML): Modeling And Inference in Big Data Zhuhua Cai Google Rice University caizhua@gmail.com 1 Syllabus Bayesian ML Concepts (Today) Bayesian ML on MapReduce (Next morning) Bayesian
More informationBayes and Naïve Bayes. cs534-machine Learning
Bayes and aïve Bayes cs534-machine Learning Bayes Classifier Generative model learns Prediction is made by and where This is often referred to as the Bayes Classifier, because of the use of the Bayes rule
More informationTorgerson s Classical MDS derivation: 1: Determining Coordinates from Euclidean Distances
Torgerson s Classical MDS derivation: 1: Determining Coordinates from Euclidean Distances It is possible to construct a matrix X of Cartesian coordinates of points in Euclidean space when we know the Euclidean
More informationBayesian Information Criterion The BIC Of Algebraic geometry
Generalized BIC for Singular Models Factoring through Regular Models Shaowei Lin http://math.berkeley.edu/ shaowei/ Department of Mathematics, University of California, Berkeley PhD student (Advisor: Bernd
More information1 Teaching notes on GMM 1.
Bent E. Sørensen January 23, 2007 1 Teaching notes on GMM 1. Generalized Method of Moment (GMM) estimation is one of two developments in econometrics in the 80ies that revolutionized empirical work in
More informationLikelihood: Frequentist vs Bayesian Reasoning
"PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200B University of California, Berkeley Spring 2009 N Hallinan Likelihood: Frequentist vs Bayesian Reasoning Stochastic odels and
More informationCapacity Limits of MIMO Channels
Tutorial and 4G Systems Capacity Limits of MIMO Channels Markku Juntti Contents 1. Introduction. Review of information theory 3. Fixed MIMO channels 4. Fading MIMO channels 5. Summary and Conclusions References
More informationCS229 Lecture notes. Andrew Ng
CS229 Lecture notes Andrew Ng Part X Factor analysis Whenwehavedatax (i) R n thatcomesfromamixtureofseveral Gaussians, the EM algorithm can be applied to fit a mixture model. In this setting, we usually
More informationA BAYESIAN MODEL COMMITTEE APPROACH TO FORECASTING GLOBAL SOLAR RADIATION
A BAYESIAN MODEL COMMITTEE APPROACH TO FORECASTING GLOBAL SOLAR RADIATION Philippe Lauret Hadja Maïmouna Diagne Mathieu David PIMENT University of La Reunion 97715 Saint Denis Cedex 9 hadja.diagne@univ-reunion.fr
More informationHow To Understand The Theory Of Probability
Graduate Programs in Statistics Course Titles STAT 100 CALCULUS AND MATR IX ALGEBRA FOR STATISTICS. Differential and integral calculus; infinite series; matrix algebra STAT 195 INTRODUCTION TO MATHEMATICAL
More informationPackage fastghquad. R topics documented: February 19, 2015
Package fastghquad February 19, 2015 Type Package Title Fast Rcpp implementation of Gauss-Hermite quadrature Version 0.2 Date 2014-08-13 Author Alexander W Blocker Maintainer Fast, numerically-stable Gauss-Hermite
More informationStatistical Machine Translation: IBM Models 1 and 2
Statistical Machine Translation: IBM Models 1 and 2 Michael Collins 1 Introduction The next few lectures of the course will be focused on machine translation, and in particular on statistical machine translation
More informationPrentice Hall Algebra 2 2011 Correlated to: Colorado P-12 Academic Standards for High School Mathematics, Adopted 12/2009
Content Area: Mathematics Grade Level Expectations: High School Standard: Number Sense, Properties, and Operations Understand the structure and properties of our number system. At their most basic level
More informationMatrices 2. Solving Square Systems of Linear Equations; Inverse Matrices
Matrices 2. Solving Square Systems of Linear Equations; Inverse Matrices Solving square systems of linear equations; inverse matrices. Linear algebra is essentially about solving systems of linear equations,
More informationNotes on Determinant
ENGG2012B Advanced Engineering Mathematics Notes on Determinant Lecturer: Kenneth Shum Lecture 9-18/02/2013 The determinant of a system of linear equations determines whether the solution is unique, without
More informationFactor analysis. Angela Montanari
Factor analysis Angela Montanari 1 Introduction Factor analysis is a statistical model that allows to explain the correlations between a large number of observed correlated variables through a small number
More informationModel Uncertainty in Classical Conditioning
Model Uncertainty in Classical Conditioning A. C. Courville* 1,3, N. D. Daw 2,3, G. J. Gordon 4, and D. S. Touretzky 2,3 1 Robotics Institute, 2 Computer Science Department, 3 Center for the Neural Basis
More informationNon Linear Dependence Structures: a Copula Opinion Approach in Portfolio Optimization
Non Linear Dependence Structures: a Copula Opinion Approach in Portfolio Optimization Jean- Damien Villiers ESSEC Business School Master of Sciences in Management Grande Ecole September 2013 1 Non Linear
More informationMachine Learning and Pattern Recognition Logistic Regression
Machine Learning and Pattern Recognition Logistic Regression Course Lecturer:Amos J Storkey Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh Crichton Street,
More informationMUSICAL INSTRUMENT FAMILY CLASSIFICATION
MUSICAL INSTRUMENT FAMILY CLASSIFICATION Ricardo A. Garcia Media Lab, Massachusetts Institute of Technology 0 Ames Street Room E5-40, Cambridge, MA 039 USA PH: 67-53-0 FAX: 67-58-664 e-mail: rago @ media.
More informationProbability and Random Variables. Generation of random variables (r.v.)
Probability and Random Variables Method for generating random variables with a specified probability distribution function. Gaussian And Markov Processes Characterization of Stationary Random Process Linearly
More informationJava Modules for Time Series Analysis
Java Modules for Time Series Analysis Agenda Clustering Non-normal distributions Multifactor modeling Implied ratings Time series prediction 1. Clustering + Cluster 1 Synthetic Clustering + Time series
More information1 Determinants and the Solvability of Linear Systems
1 Determinants and the Solvability of Linear Systems In the last section we learned how to use Gaussian elimination to solve linear systems of n equations in n unknowns The section completely side-stepped
More informationHow To Test Granger Causality Between Time Series
A general statistical framework for assessing Granger causality The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters. Citation As Published
More informationEstimation across multiple models with application to Bayesian computing and software development
Estimation across multiple models with application to Bayesian computing and software development Trevor J Sweeting (Department of Statistical Science, University College London) Richard J Stevens (Diabetes
More informationPricing and calibration in local volatility models via fast quantization
Pricing and calibration in local volatility models via fast quantization Parma, 29 th January 2015. Joint work with Giorgia Callegaro and Martino Grasselli Quantization: a brief history Birth: back to
More informationStatistical Machine Learning from Data
Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Gaussian Mixture Models Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique
More information33. STATISTICS. 33. Statistics 1
33. STATISTICS 33. Statistics 1 Revised September 2011 by G. Cowan (RHUL). This chapter gives an overview of statistical methods used in high-energy physics. In statistics, we are interested in using a
More informationCSCI567 Machine Learning (Fall 2014)
CSCI567 Machine Learning (Fall 2014) Drs. Sha & Liu {feisha,yanliu.cs}@usc.edu September 22, 2014 Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 22, 2014 1 /
More informationBayesian Model Averaging CRM in Phase I Clinical Trials
M.D. Anderson Cancer Center 1 Bayesian Model Averaging CRM in Phase I Clinical Trials Department of Biostatistics U. T. M. D. Anderson Cancer Center Houston, TX Joint work with Guosheng Yin M.D. Anderson
More informationSAS Software to Fit the Generalized Linear Model
SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling
More informationTime Series Analysis
Time Series Analysis hm@imm.dtu.dk Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby 1 Outline of the lecture Identification of univariate time series models, cont.:
More informationTHE USE OF STATISTICAL DISTRIBUTIONS TO MODEL CLAIMS IN MOTOR INSURANCE
THE USE OF STATISTICAL DISTRIBUTIONS TO MODEL CLAIMS IN MOTOR INSURANCE Batsirai Winmore Mazviona 1 Tafadzwa Chiduza 2 ABSTRACT In general insurance, companies need to use data on claims gathered from
More informationDiscussion on the paper Hypotheses testing by convex optimization by A. Goldenschluger, A. Juditsky and A. Nemirovski.
Discussion on the paper Hypotheses testing by convex optimization by A. Goldenschluger, A. Juditsky and A. Nemirovski. Fabienne Comte, Celine Duval, Valentine Genon-Catalot To cite this version: Fabienne
More informationIntroduction to Matrix Algebra
Psychology 7291: Multivariate Statistics (Carey) 8/27/98 Matrix Algebra - 1 Introduction to Matrix Algebra Definitions: A matrix is a collection of numbers ordered by rows and columns. It is customary
More informationApplying MCMC Methods to Multi-level Models submitted by William J Browne for the degree of PhD of the University of Bath 1998 COPYRIGHT Attention is drawn tothefactthatcopyright of this thesis rests with
More informationA Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution
A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 4: September
More informationOptimizing Prediction with Hierarchical Models: Bayesian Clustering
1 Technical Report 06/93, (August 30, 1993). Presidencia de la Generalidad. Caballeros 9, 46001 - Valencia, Spain. Tel. (34)(6) 386.6138, Fax (34)(6) 386.3626, e-mail: bernardo@mac.uv.es Optimizing Prediction
More information