Bayesian Information Criterion The BIC Of Algebraic geometry



Similar documents
Linear Classification. Volker Tresp Summer 2015

Basics of Statistical Machine Learning

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris

Statistical Machine Learning

Interpreting Kullback-Leibler Divergence with the Neyman-Pearson Lemma

Package MixGHD. June 26, 2015

STA 4273H: Statistical Machine Learning

Maximum Likelihood Estimation

CPM High Schools California Standards Test (CST) Results for

Statistical Machine Learning from Data

Algebraic Computation Models. Algebraic Computation Models

MATH 590: Meshfree Methods

Statistics Graduate Courses

Linear Threshold Units

MATHEMATICS. Administered by the Department of Mathematical and Computing Sciences within the College of Arts and Sciences. Degree Requirements

THE MULTIVARIATE ANALYSIS RESEARCH GROUP. Carles M Cuadras Departament d Estadística Facultat de Biologia Universitat de Barcelona

Two Topics in Parametric Integration Applied to Stochastic Simulation in Industrial Engineering

CCNY. BME I5100: Biomedical Signal Processing. Linear Discrimination. Lucas C. Parra Biomedical Engineering Department City College of New York

Section 5. Stan for Big Data. Bob Carpenter. Columbia University

STATISTICA Formula Guide: Logistic Regression. Table of Contents

Message-passing sequential detection of multiple change points in networks

Logistic Regression. Vibhav Gogate The University of Texas at Dallas. Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld.

Acknowledgments. Data Mining with Regression. Data Mining Context. Overview. Colleagues

Computational algebraic geometry

ON THE DEGREES OF FREEDOM IN RICHLY PARAMETERISED MODELS

Lecture 3: Linear methods for classification

MATH 423 Linear Algebra II Lecture 38: Generalized eigenvectors. Jordan canonical form (continued).

Linear algebra and the geometry of quadratic equations. Similarity transformations and orthogonal matrices

Bindel, Spring 2012 Intro to Scientific Computing (CS 3220) Week 3: Wednesday, Feb 8

Bayesian Classifier for a Gaussian Distribution, Decision Surface Equation, with Application

GLM, insurance pricing & big data: paying attention to convergence issues.

Bayesian logistic betting strategy against probability forecasting. Akimichi Takemura, Univ. Tokyo. November 12, 2012

COURSE SYLLABUS Pre-Calculus A/B Last Modified: April 2015

Introduction to General and Generalized Linear Models

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

Factoring Cubic Polynomials

i=1 In practice, the natural logarithm of the likelihood function, called the log-likelihood function and denoted by

Exact Inference for Gaussian Process Regression in case of Big Data with the Cartesian Product Structure

Math College Algebra (Online)

NEW YORK STATE TEACHER CERTIFICATION EXAMINATIONS

by the matrix A results in a vector which is a reflection of the given

ISU Department of Mathematics. Graduate Examination Policies and Procedures

Multivariate Normal Distribution

Understanding Poles and Zeros

CONTROL SYSTEMS, ROBOTICS, AND AUTOMATION - Vol. V - Relations Between Time Domain and Frequency Domain Prediction Error Methods - Tomas McKelvey

Exam C, Fall 2006 PRELIMINARY ANSWER KEY

A HYBRID GENETIC ALGORITHM FOR THE MAXIMUM LIKELIHOOD ESTIMATION OF MODELS WITH MULTIPLE EQUILIBRIA: A FIRST REPORT

171:290 Model Selection Lecture II: The Akaike Information Criterion

INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition)

RESONANCES AND BALLS IN OBSTACLE SCATTERING WITH NEUMANN BOUNDARY CONDITIONS

Factorial experimental designs and generalized linear models

MATHEMATICAL METHODS OF STATISTICS

On The So-Called Huber Sandwich Estimator and Robust Standard Errors by David A. Freedman

Big Data, Statistics, and the Internet

Pa8ern Recogni6on. and Machine Learning. Chapter 4: Linear Models for Classifica6on

Dynamic Eigenvalues for Scalar Linear Time-Varying Systems

Row Ideals and Fibers of Morphisms

Classification by Pairwise Coupling



arxiv:hep-th/ v1 25 Jul 2005

Research Article The General Traveling Wave Solutions of the Fisher Equation with Degree Three

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

z-scores AND THE NORMAL CURVE MODEL

Anomaly detection for Big Data, networks and cyber-security

Extreme Value Modeling for Detection and Attribution of Climate Extremes

A hidden Markov model for criminal behaviour classification

Max-Min Representation of Piecewise Linear Functions

Calculation of Minimum Distances. Minimum Distance to Means. Σi i = 1

REVIEW EXERCISES DAVID J LOWRY

PCHS ALGEBRA PLACEMENT TEST

MCS 563 Spring 2014 Analytic Symbolic Computation Wednesday 9 April. Hilbert Polynomials

Prof. Dr. Raymond Hemmecke

Component Ordering in Independent Component Analysis Based on Data Power

Gaussian Processes to Speed up Hamiltonian Monte Carlo

Large Sample Covariance Matrices and High-Dimensional Data Analysis

30. Bode Plots. Introduction

Penalized Splines - A statistical Idea with numerous Applications...

UNDERSTANDING INFORMATION CRITERIA FOR SELECTION AMONG CAPTURE-RECAPTURE OR RING RECOVERY MODELS

Unsupervised Learning

The CUSUM algorithm a small review. Pierre Granjon

Basic Math Course Map through algebra and calculus

A REVIEW OF CONSISTENCY AND CONVERGENCE OF POSTERIOR DISTRIBUTION

Orthogonal Diagonalization of Symmetric Matrices

Linear Discrimination. Linear Discrimination. Linear Discrimination. Linearly Separable Systems Pairwise Separation. Steven J Zeil.

BayesX - Software for Bayesian Inference in Structured Additive Regression

Clovis Community College Core Competencies Assessment Area II: Mathematics Algebra

Algebra II. Weeks 1-3 TEKS

MATH. ALGEBRA I HONORS 9 th Grade ALGEBRA I HONORS

An Internal Model for Operational Risk Computation

Corrected Diffusion Approximations for the Maximum of Heavy-Tailed Random Walk

Probabilistic Linear Classification: Logistic Regression. Piyush Rai IIT Kanpur

Extrinsic geometric flows

Model-Based Recursive Partitioning for Detecting Interaction Effects in Subgroups

Transcription:

Generalized BIC for Singular Models Factoring through Regular Models Shaowei Lin http://math.berkeley.edu/ shaowei/ Department of Mathematics, University of California, Berkeley PhD student (Advisor: Bernd Sturmfels)

Abstract The Bayesian Information Criterion (BIC) is an important tool for model selection, but its use is limited only to regular models. Recently, Watanabe [5] generalized the BIC to singular models using ideas from algebraic geometry. In this paper, we compute this generalized BIC for singular models factoring through regular models, relating it to the real log canonical threshold of polynomial fiber ideals.

Bayesian Information Criterion The BIC is a score used for model selection. BIC = log L 0 + d 2 log N L 0 : maximum likelihood, d: dimension, N: sample size. However, it applies only for regular models.

Regular and Singular Models A model M(U) is regular if it is identifiable p(x u, M) = p(x u, M) x u = u and the Fisher information matrix I(u) is positive definite u. I jk (u) = log p(x u) u j log p(x u) p(x u)dx u k Otherwise, the model is singular. e.g. most exponential families are regular, while most hidden variable models are singular.

Factored Models M 2 (Ω) factors through M 1 (U) if for some function u(ω), p(x ω, M 2 ) = p(x u(ω), M 1 ). e.g. parametrized multivariate Gaussian models N(0, Σ(ω)).

Generalized BIC In 2001, Watanabe showed that for singular models M(Ω), the BIC generalizes to log L 0 + λ log N (θ 1) log log N, where λ is the smallest pole of the zeta function ζ(z) = K(ω) z ϕ(ω)dω, z C Ω and θ its multiplicity. Here, ϕ(ω) is a prior on Ω, and K(ω) the Kullback-Leibler distance to the true distribution. We compute (λ, θ) by monomializing K(ω). (a.k.a. resolution of singularities)

Key Idea: Fiber Ideals Even if M(Ω) is parametrized by simple polynomials, the Kullback function K(ω) is non-polynomial and difficult to monomialize. We show that if M(Ω) factors through a regular model, we can define a polynomial fiber ideal and compute (λ, θ) by monomializing this ideal.

Real Log Canonical Thresholds Given an ideal I = f 1,...,f r generated by polynomials, define the real log canonical threshold (λ,θ) to be the smallest pole λ and multiplicity θ of the zeta function ζ(z) = (f 1 (ω) 2 + + f r (ω) 2 ) z/2 dω, z C. The RLCT is independent of the choice of generators f 1,...,f r, and can be computed by monomializing I. For monomial ideals, we find RLCTs using a geometriccombinatorial tool involving Newton polyhedra.

Main Result Let M 2 (Ω) factor through a regular model M 1 (U) via u(ω). Given N i.i.d. samples, let their M.L.E. in M 1 be û U. Let Z(N) be the marginal likelihood of the data given M 2. Theorem (L.) If û u(ω), then asymptotically, log Z(N) log L 0 + λ log N (θ 1) log log N where (2λ,θ) is the RLCT of the fiber ideal I = u 1 (ω) û 1,...,u d (ω) û d.

Example: Classical BIC Let M(U) be a regular model and û the M.L.E. of the data. Then, (2λ,θ) is the RLCT of the fiber ideal which is just (d, 1). I = u 1 û 1,...,u d û d

Example: Discrete Models Let M(Ω) be a model on discrete space {1, 2,...,k} with polynomial state probabilities p 1 (ω),...,p k (ω). Let q 1,...,q k be relative frequencies of the data. If q p(ω), then (2λ,θ) is the RLCT of the fiber ideal I = p 1 (ω) q 1,...,p k (ω) q k.

Example: Gaussian Models Let N(µ(Ω), Σ(Ω)) be a multivariate Gaussian model with sample mean ˆµ and covariance matrix ˆΣ. If ˆµ µ(ω) and ˆΣ Σ(Ω), then (2λ,θ) is the RLCT of the fiber ideal I = µ(ω) ˆµ, Σ(ω) ˆΣ.

Acknowledgements Many thanks go to Mathias Drton and Sumio Watanabe for enlightening discussions and key ideas.

References 1. V. I. Arnol d, S. M. Guseĭn-Zade and A. N. Varchenko: Singularities of Differentiable Maps, Vol. II, Birkhäuser, Boston, 1985. 2. H. Hironaka: Resolution of singularities of an algebraic variety over a field of characteristic zero I, II, Ann. of Math. (2) 79 (1964) 109 203. 3. S. Lin: Asymptotic Approximation of Marginal Likelihood Integrals, preprint arxiv:1003.5338 (2010). 4. S. Watanabe: Algebraic analysis for nonidentifiable learning machines, Neural Computation 13 (2001) 899 933. 5. S. Watanabe: Algebraic Geometry and Statistical Learning Theory, Cambridge Monographs on Applied and Computational Mathematics 25, Cambridge University Press, Cambridge, 2009. 6. G. Schwarz: Estimating the Dimension of a Model, Annals of Statistics 6 (1978) 461 464.