libdai: A Free and Open Source C++ Library for Discrete Approximate Inference in Graphical Models



Similar documents
Finding the M Most Probable Configurations Using Loopy Belief Propagation

Approximating the Partition Function by Deleting and then Correcting for Model Edges

Solving NP Hard problems in practice lessons from Computer Vision and Computational Biology

Bayesian networks - Time-series models - Apache Spark & Scala

3. The Junction Tree Algorithms

Scaling Bayesian Network Parameter Learning with Expectation Maximization using MapReduce

Structured Learning and Prediction in Computer Vision. Contents

Model Reductions for Inference: Generality of Pairwise, Binary, and Planar Factor Graphs

The Basics of Graphical Models

Maximum Likelihood Graph Structure Estimation with Degree Distributions

Learning Sum-Product Networks with Direct and Indirect Variable Interactions

Bayesian Statistics: Indian Buffet Process

MAP-Inference for Highly-Connected Graphs with DC-Programming

Lifted First-Order Belief Propagation

CHAPTER 2 Estimating Probabilities

Evaluation of Machine Learning Techniques for Green Energy Prediction

Journal of Machine Learning Research 1 (2013) 1-1 Submitted 8/13; Published 10/13

8. Machine Learning Applied Artificial Intelligence

Course: Model, Learning, and Inference: Lecture 5

Compression algorithm for Bayesian network modeling of binary systems

Chapter 14 Managing Operational Risks with Bayesian Networks

Customer Classification And Prediction Based On Data Mining Technique

Agenda. Interface Agents. Interface Agents

Tracking Groups of Pedestrians in Video Sequences

MapReduce Approach to Collective Classification for Networks

Antonino Freno. Curriculum Vitae. Phone (office): Office: +33 (0)

An Introduction to Data Mining

In Proceedings of the Eleventh Conference on Biocybernetics and Biomedical Engineering, pages , Warsaw, Poland, December 2-4, 1999

Factor Graphs and the Sum-Product Algorithm

Advances in Health Care Predictive Analytics

Network Metrics, Planar Graphs, and Software Tools. Based on materials by Lala Adamic, UMichigan

arxiv: v1 [cs.dc] 18 Oct 2014

CURRICULUM VITAE. April 2012

Visual Analysis Tool for Bipartite Networks

Machine Learning.

Up/Down Analysis of Stock Index by Using Bayesian Network

SYSM 6304: Risk and Decision Analysis Lecture 5: Methods of Risk Analysis

Distributed Structured Prediction for Big Data

Graphical Models, Exponential Families, and Variational Inference

Waffles: A Machine Learning Toolkit

Compact Representations and Approximations for Compuation in Games

A Bayesian Network Model for Diagnosis of Liver Disorders Agnieszka Onisko, M.S., 1,2 Marek J. Druzdzel, Ph.D., 1 and Hanna Wasyluk, M.D.,Ph.D.

HIGH PERFORMANCE BIG DATA ANALYTICS

Course Outline Department of Computing Science Faculty of Science. COMP Applied Artificial Intelligence (3,1,0) Fall 2015

NEW VERSION OF DECISION SUPPORT SYSTEM FOR EVALUATING TAKEOVER BIDS IN PRIVATIZATION OF THE PUBLIC ENTERPRISES AND SERVICES

Model-based Synthesis. Tony O Hagan

Prediction of Heart Disease Using Naïve Bayes Algorithm

Bayesian Machine Learning (ML): Modeling And Inference in Big Data. Zhuhua Cai Google, Rice University

Programming Tools based on Big Data and Conditional Random Fields

A Probabilistic Causal Model for Diagnosis of Liver Disorders

Agent-Based Decentralised Coordination for Sensor Networks using the Max-Sum Algorithm

Gibbs Sampling and Online Learning Introduction

Sanjeev Kumar. contribute

MapReduce for Bayesian Network Parameter Learning using the EM Algorithm

INFORMATION SECURITY RISK ASSESSMENT UNDER UNCERTAINTY USING DYNAMIC BAYESIAN NETWORKS

Going Interactive: Combining Ad-Hoc and Regression Testing

Big Data, Machine Learning, Causal Models

A Serial Partitioning Approach to Scaling Graph-Based Knowledge Discovery

On the effect of data set size on bias and variance in classification learning

CS 2750 Machine Learning. Lecture 1. Machine Learning. CS 2750 Machine Learning.

A Learning Based Method for Super-Resolution of Low Resolution Images

Chapter 28. Bayesian Networks

List of Publications by Claudio Gentile

A Bayesian Approach for on-line max auditing of Dynamic Statistical Databases

Optimization of Image Search from Photo Sharing Websites Using Personal Data

fml The SHOGUN Machine Learning Toolbox (and its python interface)

Why? A central concept in Computer Science. Algorithms are ubiquitous.

Lecture Note 1 Set and Probability Theory. MIT Spring 2006 Herman Bennett

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

Data Structure Reverse Engineering

Dirichlet Processes A gentle tutorial

Evolutionary SAT Solver (ESS)

Machine Learning and Data Analysis overview. Department of Cybernetics, Czech Technical University in Prague.

DiPro - A Tool for Probabilistic Counterexample Generation

Introducing diversity among the models of multi-label classification ensemble

bluecape s Official Website

ACO Hypercube Framework for Solving a University Course Timetabling Problem

: Introduction to Machine Learning Dr. Rita Osadchy

Credit Card Fraud Detection Using Meta-Learning: Issues 1 and Initial Results

Message-passing sequential detection of multiple change points in networks

Local Gaussian Process Regression for Real Time Online Model Learning and Control

Outline. hardware components programming environments. installing Python executing Python code. decimal and binary notations running Sage

University of Washington, Seattle Ph. No: (248) melodi.ee.washington.edu/~rkiyer/ Seattle, WA

Life of A Knowledge Base (KB)

Analytics on Big Data

Ant Colony Optimization and Constraint Programming

Visualizing class probability estimators

MayaVi: A free tool for CFD data visualization

Component visualization methods for large legacy software in C/C++

Graduate Co-op Students Information Manual. Department of Computer Science. Faculty of Science. University of Regina

Transcription:

Journal of Machine Learning Research 11 (2010) 2169-2173 Submitted 2/10; Revised 8/10; Published 8/10 libdai: A Free and Open Source C++ Library for Discrete Approximate Inference in Graphical Models Joris M. Mooij Max Planck Institute for Biological Cybernetics Department for Empirical Inference Spemannstraße 38, 72076 Tübingen Germany JORIS.MOOIJ@LIBDAI.ORG Editor: Cheng Soon Ong Abstract This paper describes the software package libdai, a free & open source C++ library that provides implementations of various exact and approximate inference methods for graphical models with discrete-valued variables. libdai supports directed graphical models (Bayesian networks) as well as undirected ones (Markov random fields and factor graphs). It offers various approximations of the partition sum, marginal probability distributions and maximum probability states. Parameter learning is also supported. A feature comparison with other open source software packages for approximate inference is given. libdai is licensed under the GPL v2+ license and is available at http://www.libdai.org. Keywords: probabilistic graphical models, approximate inference, open source software, factor graphs, Markov random fields, Bayesian networks 1. Introduction Probabilistic graphical models (Koller and Friedman, 2009) are at the core of many applications in machine learning nowadays. Because exact inference in graphical models is often infeasible, an extensive literature has evolved in recent years describing various approximation methods for this task. Unfortunately, there exist only few free software implementations for many of these methods. The software package libdai presented in this paper is the result of an ongoing attempt to improve this situation. libdai is a free and open source C++ library (licensed under the GNU General Public License version 2, or higher) that provides implementations of various exact and approximate inference methods for graphical models. libdai supports factor graphs with discrete variables. This paper describes the most recent libdai release, version 0.2.7. The modular design of libdai makes it easy to implement novel inference algorithms. The large number of inference algorithms already implemented in libdai allows for easy comparison of accuracy and performance of various algorithms. 2. Inference in Factor Graphs Although Bayesian networks and Markov random fields are the most commonly used graphical models, libdai uses a slightly more general type of graphical model: factor graphs (Kschischang c 2010 Joris M. Mooij.

MOOIJ et al., 2001). A factor graph is a bipartite graph consisting of variable nodes i V and factor nodes I F, a family of random variables (X i ) i V, a family of factors ( f I ) I F (which are nonnegative functions depending on subsets N I V of the variables), and a set of edges E V F, where variable node i and factor node I are connected by an undirected edge if and only if the factor f I depends on variable X i, that is, if i N I. The probability distribution encoded by the factor graph is given by: P(X V )= 1 Z I F f I (X NI ), Z := f I (X NI ). X V I F A Bayesian network can easily be represented as a factor graph by introducing a factor for each variable, namely the (conditional) probability table of the variable given its parents in the Bayesian network. A Markov random field can be represented as a factor graph by taking the clique potentials as factors. Factor graphs naturally express the factorization structure of probability distributions. Therefore, they form a convenient representation for approximate inference algorithms that exploit this factorization. This is the reason that libdai uses factor graphs as the basic representation for graphical models. Given a factor graph, the following important inference tasks can be distinguished: calculating the partition sum Z; calculating marginal probability distributions P(X A ) = 1 Z X V\A I F f I (X I ) over subsets of variables X A, A V; calculating the Maximum A Posteriori (MAP) state, that is, the joint configuration argmax XV I F f I (X I ) which has the maximum probability mass. libdai offers several inference algorithms which solve these tasks either approximately or exactly, for factor graphs with discrete variables. 2.1 Inference Methods Implemented in libdai Apart from exact inference by brute force enumeration and the junction-tree method, libdai currently offers the following approximate inference methods for calculating partition sums, marginals and MAP states: 1 mean field, (loopy) belief propagation (Kschischang et al., 2001), fractional belief propagation (Wiegerinck and Heskes, 2003), tree-reweighted belief propagation (Wainwright et al., 2003), tree expectation propagation (Minka and Qi, 2004), generalized belief propagation (Yedidia et al., 2005), double-loop generalized belief propagation (Heskes et al., 2003), loop-corrected belief propagation (Mooij and Kappen, 2007; Montanari and Rizzo, 2005), conditioned belief propagation (Eaton and Ghahramani, 2009), a Gibbs sampler and a decimation method. In addition, libdai supports parameter learning of conditional probability tables by maximum likelihood or expectation maximization (in case of missing data). 3. Technical Details libdai is targeted at researchers that have a good understanding of graphical models. The best way to use libdai is by writing C++ code that directly invokes the library functions and classes. In addition, part of the functionality is accessible by using various interfaces: a command line interface, a limited MatLab interface, and experimental Python and Octave interfaces (powered by SWIG, see http://www.swig.org). Because libdai is implemented in C++, it is very fast compared with implementations in pure MatLab or Python; the difference in computation time can be up to a few orders of magnitude. 1. Not all inference tasks are implemented by each method. 2170

LIBDAI: A FREE AND OPEN SOURCE C++ LIBRARY FOR DISCRETE APPROXIMATE INFERENCE libdai BNT PNL FastInf GRMM Factorie JProGraM Language C++ MatLab, C C++ C++ Java Scala Java License GPL2+ LGPL2 BSD GPL3 CPL 1.0 CC-by 3.0 GPL3 Junction tree + + + + + - + Belief propagation (BP) + + + + + + - Fractional BP + - - - - - - Tree-reweighted BP + - - + - - - Generalized BP (GBP) + - - + + - - Double-loop GBP + - - - - - - Loop-corrected BP + - - - - - - Conditioned BP + - - - - - - Tree expectation propagation + - - - - - - Mean field + - - + - - - Gibbs sampling + + + + + + + Other sampling methods - + + + + + - Decimation + - - - - - - Continuous variables - + + - - + + Dynamic Bayes nets - + + - - + - Conditional random fields - - - - + + - Relational models - - - + - + - Influence diagrams - + - - - - - Parameter learning + + + + + + + Structure learning - + + - - - + Table 1: Feature comparison of various open source software packages for approximate inference on graphical models. libdai is designed to be cross-platform and is known to compile on GNU/Linux distributions and on Mac OS X using the GCC compiler suite, and also under Windows using either CygWin or MS Visual Studio 2008. The libdai sources and documentation can be downloaded or browsed from the libdai website at http://www.libdai.org. The Google group libdai (see http://groups.google.com/ group/libdai) can be used for getting support and discussing development issues. Extensive documentation generated by Doxygen (see http://www.doxygen.org) is available, both as part of the source code and online. Unit tests and various example programs are provided, including the full source code of the solver which won in two categories of the 2010 UAI Approximate Inference Challenge (see also http://www.cs.huji.ac.il/project/uai10/) and uses several of the algorithms implemented in libdai. 4. Related Work Other prominent open source software packages supporting both directed and undirected graphical models are the Bayes Net Toolbox (BNT, see http://bnt.googlecode.com), the Probabilistic Networks Library (PNL, see http://sourceforge.net/projects/openpnl), FastInf (Jaimovich et al., 2010), GRMM (http://mallet.cs.umass.edu/grmm), Factorie (http://code.google. com/p/factorie), and JProGraM (http://jprogram.sourceforge.net). A feature comparison of libdai with these other packages is provided in Table 1. 2171

MOOIJ Acknowledgments Part of this work was part of the Interactive Collaborative Information Systems (ICIS) project, supported by the Dutch Ministry of Economic Affairs, grant BSIK03024. I would like to express my gratitude to the following people who made major contributions to libdai: Martijn Leisink (for laying down the foundations for the library), Frederik Eaton (for contributing the Gibbs sampler, conditioned BP, fractional BP and various other improvements), Charles Vaske (for contributing the parameter learning algorithms), Giuseppe Passino (for various improvements), Bastian Wemmenhove (for contributing the MR algorithm), Patrick Pletscher (for contributing the SWIG interface). I am also indebted to the following people for bug fixes and miscellaneous smaller patches: Claudio Lima, Christian Wojek, Sebastian Nowozin, Stefano Pellegrini, Ofer Meshi, Dan Preston, Peter Gober, Jiuxiang Hu, Peter Rockett, Dhruv Batra, Alexander Schwing, Alejandro Lage, Matt Dunham. References F. Eaton and Z. Ghahramani. Choosing a variable to clamp. In Proceedings of the Twelfth International Conference on Artificial Intelligence and Statistics (AISTATS 2009), volume 5 of JMLR Workshop and Conference Proceedings, pages 145 152, 2009. T. Heskes, K. Albers, and B. Kappen. Approximate inference and constrained optimization. In C. Meek and U. Kjærulff, editors, Proceedings of the Nineteenth Annual Conference on Uncertainty in Artificial Intelligence (UAI-03), pages 313 320. Morgan Kaufmann, 2003. A. Jaimovich, O. Meshi, I. McGraw, and G. Elidan. FastInf: An efficient approximate inference library. Journal of Machine Learning Research, 11:17331736, 2010. D. Koller and N. Friedman. Probabilistic Graphical Models: Principles and Techniques. MIT Press, 2009. F. Kschischang, B. Frey, and H.-A. Loeliger. Factor graphs and the sum-product algorithm. IEEE Transactions on Information Theory, 47(2):498 519, 2001. T. Minka and Y. Qi. Tree-structured approximations by expectation propagation. In S. Thrun, L. Saul, and B. Schölkopf, editors, Advances in Neural Information Processing Systems 16, pages 193 200. MIT Press, 2004. A. Montanari and T. Rizzo. How to compute loop corrections to the Bethe approximation. Journal of Statistical Mechanics: Theory and Experiment, 2005(10):P10011, 2005. J. Mooij and B. Kappen. Loop corrections for approximate inference on factor graphs. Journal of Machine Learning Research, 8:1113 1143, 2007. M. Wainwright, T. Jaakkola, and A. Willsky. Tree-reweighted belief propagation algorithms and approximate ML estimation by pseudo-moment matching. In C. Bishop and B. Frey, editors, Proceedings of the Ninth International Workshop on Artificial Intelligence and Statistics (AIS- TATS 2003), 2003. 2172

LIBDAI: A FREE AND OPEN SOURCE C++ LIBRARY FOR DISCRETE APPROXIMATE INFERENCE W. Wiegerinck and T. Heskes. Fractional belief propagation. In S. Becker, S. Thrun, and K. Obermayer, editors, Advances in Neural Information Processing Systems 15, pages 438 445. MIT Press, 2003. J. Yedidia, W. Freeman, and Y. Weiss. Constructing free-energy approximations and generalized belief propagation algorithms. IEEE Transactions on Information Theory, 51(7):2282 2312, 2005. 2173