Bayesian Information Criterion The BIC Of Algebraic geometry

Size: px

Start display at page:

Download "Bayesian Information Criterion The BIC Of Algebraic geometry"

Morgan Banks
3 years ago
Views:

1 Generalized BIC for Singular Models Factoring through Regular Models Shaowei Lin shaowei/ Department of Mathematics, University of California, Berkeley PhD student (Advisor: Bernd Sturmfels)

edu/ shaowei/ Department of Mathematics, University

2 Abstract The Bayesian Information Criterion (BIC) is an important tool for model selection, but its use is limited only to regular models. Recently, Watanabe [5] generalized the BIC to singular models using ideas from algebraic geometry. In this paper, we compute this generalized BIC for singular models factoring through regular models, relating it to the real log canonical threshold of polynomial fiber ideals.

Recently, Watanabe [5] generalized the BIC to singular models using ideas from algebraic geometry.

3 Bayesian Information Criterion The BIC is a score used for model selection. BIC = log L 0 + d 2 log N L 0 : maximum likelihood, d: dimension, N: sample size. However, it applies only for regular models.

4 Regular and Singular Models A model M(U) is regular if it is identifiable p(x u, M) = p(x u, M) x u = u and the Fisher information matrix I(u) is positive definite u. I jk (u) = log p(x u) u j log p(x u) p(x u)dx u k Otherwise, the model is singular. e.g. most exponential families are regular, while most hidden variable models are singular.

I jk (u) = log p(x u) u j log p(x u) p(x u)dx u k Otherwise, the model is singular. e.

5 Factored Models M 2 (Ω) factors through M 1 (U) if for some function u(ω), p(x ω, M 2 ) = p(x u(ω), M 1 ). e.g. parametrized multivariate Gaussian models N(0, Σ(ω)).

6 Generalized BIC In 2001, Watanabe showed that for singular models M(Ω), the BIC generalizes to log L 0 + λ log N (θ 1) log log N, where λ is the smallest pole of the zeta function ζ(z) = K(ω) z ϕ(ω)dω, z C Ω and θ its multiplicity. Here, ϕ(ω) is a prior on Ω, and K(ω) the Kullback-Leibler distance to the true distribution. We compute (λ, θ) by monomializing K(ω). (a.k.a. resolution of singularities)

7 Key Idea: Fiber Ideals Even if M(Ω) is parametrized by simple polynomials, the Kullback function K(ω) is non-polynomial and difficult to monomialize. We show that if M(Ω) factors through a regular model, we can define a polynomial fiber ideal and compute (λ, θ) by monomializing this ideal.

8 Real Log Canonical Thresholds Given an ideal I = f 1,...,f r generated by polynomials, define the real log canonical threshold (λ,θ) to be the smallest pole λ and multiplicity θ of the zeta function ζ(z) = (f 1 (ω) f r (ω) 2 ) z/2 dω, z C. The RLCT is independent of the choice of generators f 1,...,f r, and can be computed by monomializing I. For monomial ideals, we find RLCTs using a geometriccombinatorial tool involving Newton polyhedra.

multiplicity θ of the zeta function ζ(z) = (f 1 (ω) 2 + + f r (ω) 2 ) z/2 dω, z C.

9 Main Result Let M 2 (Ω) factor through a regular model M 1 (U) via u(ω). Given N i.i.d. samples, let their M.L.E. in M 1 be û U. Let Z(N) be the marginal likelihood of the data given M 2. Theorem (L.) If û u(ω), then asymptotically, log Z(N) log L 0 + λ log N (θ 1) log log N where (2λ,θ) is the RLCT of the fiber ideal I = u 1 (ω) û 1,...,u d (ω) û d.

10 Example: Classical BIC Let M(U) be a regular model and û the M.L.E. of the data. Then, (2λ,θ) is the RLCT of the fiber ideal which is just (d, 1). I = u 1 û 1,...,u d û d

11 Example: Discrete Models Let M(Ω) be a model on discrete space {1, 2,...,k} with polynomial state probabilities p 1 (ω),...,p k (ω). Let q 1,...,q k be relative frequencies of the data. If q p(ω), then (2λ,θ) is the RLCT of the fiber ideal I = p 1 (ω) q 1,...,p k (ω) q k.

Let q 1,...,q k be relative frequencies of the data.

12 Example: Gaussian Models Let N(µ(Ω), Σ(Ω)) be a multivariate Gaussian model with sample mean ˆµ and covariance matrix ˆΣ. If ˆµ µ(ω) and ˆΣ Σ(Ω), then (2λ,θ) is the RLCT of the fiber ideal I = µ(ω) ˆµ, Σ(ω) ˆΣ.

13 Acknowledgements Many thanks go to Mathias Drton and Sumio Watanabe for enlightening discussions and key ideas.

14 References 1. V. I. Arnol d, S. M. Guseĭn-Zade and A. N. Varchenko: Singularities of Differentiable Maps, Vol. II, Birkhäuser, Boston, H. Hironaka: Resolution of singularities of an algebraic variety over a field of characteristic zero I, II, Ann. of Math. (2) 79 (1964) S. Lin: Asymptotic Approximation of Marginal Likelihood Integrals, preprint arxiv: (2010). 4. S. Watanabe: Algebraic analysis for nonidentifiable learning machines, Neural Computation 13 (2001) S. Watanabe: Algebraic Geometry and Statistical Learning Theory, Cambridge Monographs on Applied and Computational Mathematics 25, Cambridge University Press, Cambridge, G. Schwarz: Estimating the Dimension of a Model, Annals of Statistics 6 (1978)

Lin: Asymptotic Approximation of Marginal Likelihood Integrals, preprint arxiv:1003.5338 (2010). 4. S.

Linear Classification. Volker Tresp Summer 2015

Linear Classification. Volker Tresp Summer 2015 Linear Classification Volker Tresp Summer 2015 1 Classification Classification is the central task of pattern recognition Sensors supply information about an object: to which class do the object belong