# Variational Mean Field for Graphical Models

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 Variational Mean Field for Graphical Models CS/CNS/EE 155 Baback Moghaddam Machine Learning Group jpl.nasa.gov

2 Approximate Inference Consider general UGs (i.e., not tree-structured) All basic computations are intractable (for large G) - likelihoods & partition function - marginals & conditionals - finding modes

3 Taxonomy of Inference Methods Inference Exact VE JT BP Stochastic Gibbs, M-H (MC) MC, SA Approximate Deterministic Cluster ~MP LBP EP Variational

4 Approximate Inference Stochastic (Sampling) - Metropolis-Hastings, Gibbs, (Markov Chain) Monte Carlo, etc - Computationally expensive, but is exact (in the limit) Deterministic (Optimization) - Mean Field (MF), Loopy Belief Propagation (LBP) - Variational Bayes (VB), Expectation Propagation (EP) - Computationally cheaper, but is not exact (gives bounds)

5 General idea Mean Field : Overview - approximate p(x) by a simpler factored distribution q(x) - minimize distance D(p q) - e.g., Kullback-Liebler original G (Naïve) MF H 0 structured MF H s p ( x) φ ( ) c c x c q ( x) q i ( x i ) i q( x) qa( xa) qb( xb)

6 Mean Field : Overview Naïve MF has roots in Statistical Mechanics (1890s) - physics of spin glasses (Ising), ferromagnetism, etc - why is it called Mean Field? with full factorization : E[x i x j ] = E[x i ] E[x j ] Structured MF is more modern Coupled HMM Structured MF approximation (with tractable chains)

7 KL Projection D(Q P) Infer hidden h given visible v (clamp v nodes with δ s) Variational : optimize KL globally the right density form for Q falls out KL is easier since we re taking E[.] wrt simpler Q Q seeks mode with the largest mass (not height) so it will tend to underestimate the support of P P = 0 forces Q = 0

8 KL Projection D(P Q) Infer hidden h given visible v (clamp v nodes with δ s) Expectation Propagation (EP) : optimize KL locally this KL is harder since we re taking E[.] wrt P no nice global solution for Q falls out must sequentially tweak each q c (match moments) Q covers all modes so it overestimates support P > 0 forces Q > 0

9 α - divergences The 2 basic KL divergences are special cases of D α (p q) is non-negative and 0 iff p = q when α - 1 we get KL(P Q) when α +1 we get KL(Q P) when α = 0 D 0 (P Q) is proportional to Hellinger s distance (metric) So many variational approximations must exist, one for each α!

10 for more on α - divergences Shun-ichi Amari

11 for specific examples of α = ±1 See Chapter 10 Variational Single Gaussian Variational Linear Regression Variational Mixture of Gaussians Variational Logistic Regression Expectation Propagation (α = -1)

12 Hierarchy of Algorithms (based on α and structuring) Power EP exp family D α (p q) Structured MF exp family KL(q p) FBP fully factorized D α (p q) EP exp family KL(p q) MF fully factorized KL(q p) TRW fully factorized D α (p q) α > 1 BP fully factorized KL(p q) by Tom Minka

13 Variational MF p( x) 1 1 ψ ( x) = γ c( xc ) = e = Z c Z ) c ψ ( x log( γ ( )) c x c log Z = log e ψ ( x) dx = log Q( x) ψ ( x) e Q( x) dx Jensen s E Q log[ e ψ ( x) / Q( x)] = sup Q E Q log[ e ψ ( x) / Q( x)] = sup { E [ ψ Q Q ( x)] + H [ Q( x)]}

14 Variational MF log Z ψ sup { E [ ( x)] + Q Q H [ Q( x)]} Equality is obtained for Q(x) = P(x) (all Q admissible) Using any other Q yields a lower bound on log Z The slack in this bound is KL-divergence D(Q P) Goal: restrict Q to a tractable subclass Q optimize with sup Q to tighten this bound note we re (also) maximizing entropy H[Q]

15 Variational MF log Z ψ sup { E [ ( x)] + Q Q H [ Q( x)]} Most common specialized family : log-linear models = T ψ ( x) θ φ ( x ) = θ φ( x) linear in parameters θ (natural parameters of EFs) c c c c clique potentials φ(x) (sufficient statistics of EFs) Fertile ground for plowing Convex Analysis

16 Convex Analysis The Old Testament The New Testament

17 Variational MF for EF log Z ψ sup { E [ ( x)] + Q Q H [ Q( x)]} log Z sup { E [ θ φ( x)] + Q Q T H [ Q( x)]} log Z T sup { θ E [ φ( x)] + Q Q H [ Q( x)]} A( θ ) sup { θ µ A*( µ ) } µ M T EF notation M = set of all moment parameters realizable under subclass Q

18 Variational MF for EF So it looks like we are just optimizing a concave function (linear term + negative-entropy) over a convex set Yet it is hard... Why? 1. graph probability (being a measure) requires a very large number of marginalization constraints for consistency (leads to a typically beastly marginal polytope M in the discrete case) e.g., a complete 7-node graph s polytope has over 10 8 facets! In fact, optimizing just the linear term alone can be hard 2. exact computation of entropy -A*(µ) is highly non-trivial (hence the famed Bethe & Kikuchi approximations)

19 Gibbs Sampling for Ising Binary MRF G = (V,E) with pairwise clique potentials 1. pick a node s at random 2. sample u ~ Uniform(0,1) 3. update node s : 4. goto step 1 a slower stochastic version of ICM

20 Naive MF for Ising use a variational mean parameter at each site 1. pick a node s at random 2. update its parameter : 3. goto step 1 deterministic loopy message-passing how well does it work? depends on θ

21 Graphical Models as EF G(V,E) with nodes sufficient stats : clique potentials likewise for θ st probability log-partition mean parameters

22 Variational Theorem for EF For any mean parameter µ where θ (µ) is the corresponding natural parameter in relative interior of M not in the closure of M the log-partition function has this variational representation this supremum is achieved at the moment-matching value of µ

23 Legendre-Fenchel Duality Main Idea: (convex) functions can be supported (lower-bounded) by a continuum of lines (hyperplanes) whose intercepts create a conjugate dual of the original function (and vice versa) conjugate dual of A conjugate dual of A* Note that A** = A (iff A is convex)

24 Dual Map for EF Two equivalent parameterizations of the EF Bijective mapping between Ω and the interior of M Mapping is defined by the gradients of A and its dual A* Shape & complexity of M depends on X and size and structure of G

25 Marginal Polytope G(V,E) = graph with discrete nodes Then M = convex hull of all φ(x) equivalent to intersecting half-spaces a T µ > b difficult to characterize for large G hence difficult to optimize over interior of M is 1-to-1 with Ω

26 The Simplest Graph x G(V,E) = a single Bernoulli node φ(x) = x density log-partition (of course we knew this) we know A* too, but let s solve for it variationally differentiate stationary point rearrange to, substitute into A* Note: we found both the mean parameter and the lower bound using the variational method

27 The 2 nd Simplest Graph x 1 x 2 G(V,E) = 2 connected Bernoulli nodes moments moment constraints variational problem solve (it s still easy!)

28 The 3 rd Simplest Graph x 1 x 2 x 3 3 nodes 16 constraints # of constraints blows up real fast: 7 nodes 200,000,000+ constraints hard to keep track of valid µ s (i.e., the full shape and extent of M ) no more checking our results against closed-forms expressions that we already knew in advance! unless G remains a tree, entropy A* will not decompose nicely, etc

29 tractable subgraph H = (V,0) fully-factored distribution moment space Variational MF for Ising entropy is additive : - variational problem for A(θ ) using coordinate ascent :

30 Variational MF for Ising M tr is a non-convex inner approximation M tr M what causes this funky curvature? optimizing over M tr must then yield a lower bound

31 Factorization with Trees suppose we have a tree G = (V,T) useful factorization for trees entropy becomes - singleton terms - pairwise terms Mutual Information

32 Variational MF for Loopy Graphs pretend entropy factorizes like a tree (Bethe approximation) define pseudo marginals must impose these normalization and marginalization constraints define local polytope L(G) obeying these constraints M ( G) L( G) note that for any G with equality only for trees : M(G) = L(G)

33 Variational MF for Loopy Graphs L(G) is an outer polyhedral approximation solving this Bethe Variational Problem we get the LBP eqs! so fixed points of LBP are the stationary points of the BVP this not only illuminates what was originally an educated hack (LBP) but suggests new convergence conditions and improved algorithms (TRW)

34 see ICML 2008 Tutorial

35 Summary SMF can also be cast in terms of Free Energy etc Tightening the var bound = min KL divergence Other schemes (e.g, Variational Bayes ) = SMF - with additional conditioning (hidden, visible, parameter) Solving variational problem gives both µ and A(θ) Helps to see problems through lens of Var Analysis

36 Matrix of Inference Methods Exact Deterministic approximation Stochastic approximation Chain (online) Low treewidth High treewidth Discrete BP = forwards Boyen-Koller (ADF), beam search VarElim, Jtree, recursive conditioning Loopy BP, mean field, structured variational, EP, graph-cuts Gibbs Gaussian BP = Kalman filter Jtree = sparse linear Loopy BP algebra Gibbs Other EKF, UKF, moment EP, EM, VB, EP, variational EM, VB, matching (ADF) NBP, Gibbs NBP, Gibbs Particle filter BP = Belief Propagation, EP = Expectation Propagation, ADF = Assumed Density Filtering, EKF = Extended Kalman Filter, UKF = unscented Kalman filter, VarElim = Variable Elimination, Jtree= Junction Tree, EM = Expectation Maximization, VB = Variational Bayes, NBP = Non-parametric BP by Kevin Murphy

37 T H E E N D

### Divergence measures and message passing

Divergence measures and message passing Tom Minka Microsoft Research Cambridge, UK with thanks to the Machine Learning and Perception Group 1 Message-Passing Algorithms Mean-field MF [Peterson,Anderson

### Tutorial on variational approximation methods. Tommi S. Jaakkola MIT AI Lab

Tutorial on variational approximation methods Tommi S. Jaakkola MIT AI Lab tommi@ai.mit.edu Tutorial topics A bit of history Examples of variational methods A brief intro to graphical models Variational

### Probabilistic Models for Big Data. Alex Davies and Roger Frigola University of Cambridge 13th February 2014

Probabilistic Models for Big Data Alex Davies and Roger Frigola University of Cambridge 13th February 2014 The State of Big Data Why probabilistic models for Big Data? 1. If you don t have to worry about

### Introduction to Deep Learning Variational Inference, Mean Field Theory

Introduction to Deep Learning Variational Inference, Mean Field Theory 1 Iasonas Kokkinos Iasonas.kokkinos@ecp.fr Center for Visual Computing Ecole Centrale Paris Galen Group INRIA-Saclay Lecture 3: recap

### STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct

### Lecture 6: The Bayesian Approach

Lecture 6: The Bayesian Approach What Did We Do Up to Now? We are given a model Log-linear model, Markov network, Bayesian network, etc. This model induces a distribution P(X) Learning: estimate a set

### Tracking Algorithms. Lecture17: Stochastic Tracking. Joint Probability and Graphical Model. Probabilistic Tracking

Tracking Algorithms (2015S) Lecture17: Stochastic Tracking Bohyung Han CSE, POSTECH bhhan@postech.ac.kr Deterministic methods Given input video and current state, tracking result is always same. Local

### The Basics of Graphical Models

The Basics of Graphical Models David M. Blei Columbia University October 3, 2015 Introduction These notes follow Chapter 2 of An Introduction to Probabilistic Graphical Models by Michael Jordan. Many figures

### Course: Model, Learning, and Inference: Lecture 5

Course: Model, Learning, and Inference: Lecture 5 Alan Yuille Department of Statistics, UCLA Los Angeles, CA 90095 yuille@stat.ucla.edu Abstract Probability distributions on structured representation.

### Approximating the Partition Function by Deleting and then Correcting for Model Edges

Approximating the Partition Function by Deleting and then Correcting for Model Edges Arthur Choi and Adnan Darwiche Computer Science Department University of California, Los Angeles Los Angeles, CA 995

### Square Root Propagation

Square Root Propagation Andrew G. Howard Department of Computer Science Columbia University New York, NY 10027 ahoward@cs.columbia.edu Tony Jebara Department of Computer Science Columbia University New

### Statistical Machine Learning

Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes

### Dirichlet Processes A gentle tutorial

Dirichlet Processes A gentle tutorial SELECT Lab Meeting October 14, 2008 Khalid El-Arini Motivation We are given a data set, and are told that it was generated from a mixture of Gaussian distributions.

### Deterministic Sampling-based Switching Kalman Filtering for Vehicle Tracking

Proceedings of the IEEE ITSC 2006 2006 IEEE Intelligent Transportation Systems Conference Toronto, Canada, September 17-20, 2006 WA4.1 Deterministic Sampling-based Switching Kalman Filtering for Vehicle

### UNIVERSITY OF CALIFORNIA, IRVINE. Variational Message-Passing: Extension to Continuous Variables and Applications in Multi-Target Tracking

UNIVERSITY OF CALIFORNIA, IRVINE Variational Message-Passing: Extension to Continuous Variables and Applications in Multi-Target Tracking DISSERTATION submitted in partial satisfaction of the requirements

### Master s thesis tutorial: part III

for the Autonomous Compliant Research group Tinne De Laet, Wilm Decré, Diederik Verscheure Katholieke Universiteit Leuven, Department of Mechanical Engineering, PMA Division 30 oktober 2006 Outline General

### The Exponential Family

The Exponential Family David M. Blei Columbia University November 3, 2015 Definition A probability density in the exponential family has this form where p.x j / D h.x/ expf > t.x/ a./g; (1) is the natural

### From Maxent to Machine Learning and Back

From Maxent to Machine Learning and Back T. Sears ANU March 2007 T. Sears (ANU) From Maxent to Machine Learning and Back Maxent 2007 1 / 36 50 Years Ago... The principles and mathematical methods of statistical

### Linear Threshold Units

Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear

### Basics of Statistical Machine Learning

CS761 Spring 2013 Advanced Machine Learning Basics of Statistical Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu Modern machine learning is rooted in statistics. You will find many familiar

### Exponential Random Graph Models for Social Network Analysis. Danny Wyatt 590AI March 6, 2009

Exponential Random Graph Models for Social Network Analysis Danny Wyatt 590AI March 6, 2009 Traditional Social Network Analysis Covered by Eytan Traditional SNA uses descriptive statistics Path lengths

### Bayesian Machine Learning (ML): Modeling And Inference in Big Data. Zhuhua Cai Google, Rice University caizhua@gmail.com

Bayesian Machine Learning (ML): Modeling And Inference in Big Data Zhuhua Cai Google Rice University caizhua@gmail.com 1 Syllabus Bayesian ML Concepts (Today) Bayesian ML on MapReduce (Next morning) Bayesian

### Structured Learning and Prediction in Computer Vision. Contents

Foundations and Trends R in Computer Graphics and Vision Vol. 6, Nos. 3 4 (2010) 185 365 c 2011 S. Nowozin and C. H. Lampert DOI: 10.1561/0600000033 Structured Learning and Prediction in Computer Vision

### 1 Polyhedra and Linear Programming

CS 598CSC: Combinatorial Optimization Lecture date: January 21, 2009 Instructor: Chandra Chekuri Scribe: Sungjin Im 1 Polyhedra and Linear Programming In this lecture, we will cover some basic material

### Sampling via Moment Sharing: A New Framework for Distributed Bayesian Inference for Big Data

Sampling via Moment Sharing: A New Framework for Distributed Bayesian Inference for Big Data (Oxford) in collaboration with: Minjie Xu, Jun Zhu, Bo Zhang (Tsinghua) Balaji Lakshminarayanan (Gatsby) Bayesian

### Conditional mean field

Conditional mean field Peter Carbonetto Department of Computer Science University of British Columbia Vancouver, BC, Canada V6T 1Z4 pcarbo@cs.ubc.ca Nando de Freitas Department of Computer Science University

### Lecture 11: Graphical Models for Inference

Lecture 11: Graphical Models for Inference So far we have seen two graphical models that are used for inference - the Bayesian network and the Join tree. These two both represent the same joint probability

### Graphical Models as Block-Tree Graphs

Graphical Models as Block-Tree Graphs 1 Divyanshu Vats and José M. F. Moura arxiv:1007.0563v2 [stat.ml] 13 Nov 2010 Abstract We introduce block-tree graphs as a framework for deriving efficient algorithms

### Mathematical Background

Appendix A Mathematical Background A.1 Joint, Marginal and Conditional Probability Let the n (discrete or continuous) random variables y 1,..., y n have a joint joint probability probability p(y 1,...,

### Graphical Models, Exponential Families, and Variational Inference

Foundations and Trends R in Machine Learning Vol. 1, Nos. 1 2 (2008) 1 305 c 2008 M. J. Wainwright and M. I. Jordan DOI: 10.1561/2200000001 Graphical Models, Exponential Families, and Variational Inference

### Load Balancing and Switch Scheduling

EE384Y Project Final Report Load Balancing and Switch Scheduling Xiangheng Liu Department of Electrical Engineering Stanford University, Stanford CA 94305 Email: liuxh@systems.stanford.edu Abstract Load

### 3. The Junction Tree Algorithms

A Short Course on Graphical Models 3. The Junction Tree Algorithms Mark Paskin mark@paskin.org 1 Review: conditional independence Two random variables X and Y are independent (written X Y ) iff p X ( )

### Stock Option Pricing Using Bayes Filters

Stock Option Pricing Using Bayes Filters Lin Liao liaolin@cs.washington.edu Abstract When using Black-Scholes formula to price options, the key is the estimation of the stochastic return variance. In this

### Introduction and message of the book

1 Introduction and message of the book 1.1 Why polynomial optimization? Consider the global optimization problem: P : for some feasible set f := inf x { f(x) : x K } (1.1) K := { x R n : g j (x) 0, j =

### HT2015: SC4 Statistical Data Mining and Machine Learning

HT2015: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Bayesian Nonparametrics Parametric vs Nonparametric

### COS513 LECTURE 5 JUNCTION TREE ALGORITHM

COS513 LECTURE 5 JUNCTION TREE ALGORITHM D. EIS AND V. KOSTINA The junction tree algorithm is the culmination of the way graph theory and probability combine to form graphical models. After we discuss

### C: LEVEL 800 {MASTERS OF ECONOMICS( ECONOMETRICS)}

C: LEVEL 800 {MASTERS OF ECONOMICS( ECONOMETRICS)} 1. EES 800: Econometrics I Simple linear regression and correlation analysis. Specification and estimation of a regression model. Interpretation of regression

### Discrete Frobenius-Perron Tracking

Discrete Frobenius-Perron Tracing Barend J. van Wy and Michaël A. van Wy French South-African Technical Institute in Electronics at the Tshwane University of Technology Staatsartillerie Road, Pretoria,

### Bayesian networks - Time-series models - Apache Spark & Scala

Bayesian networks - Time-series models - Apache Spark & Scala Dr John Sandiford, CTO Bayes Server Data Science London Meetup - November 2014 1 Contents Introduction Bayesian networks Latent variables Anomaly

### Monte Carlo and Empirical Methods for Stochastic Inference (MASM11/FMS091)

Monte Carlo and Empirical Methods for Stochastic Inference (MASM11/FMS091) Magnus Wiktorsson Centre for Mathematical Sciences Lund University, Sweden Lecture 6 Sequential Monte Carlo methods II February

### Logistic Regression. Vibhav Gogate The University of Texas at Dallas. Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld.

Logistic Regression Vibhav Gogate The University of Texas at Dallas Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld. Generative vs. Discriminative Classifiers Want to Learn: h:x Y X features

### Semi-Supervised Support Vector Machines and Application to Spam Filtering

Semi-Supervised Support Vector Machines and Application to Spam Filtering Alexander Zien Empirical Inference Department, Bernhard Schölkopf Max Planck Institute for Biological Cybernetics ECML 2006 Discovery

Statistics Graduate Courses STAT 7002--Topics in Statistics-Biological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.

### Random graphs with a given degree sequence

Sourav Chatterjee (NYU) Persi Diaconis (Stanford) Allan Sly (Microsoft) Let G be an undirected simple graph on n vertices. Let d 1,..., d n be the degrees of the vertices of G arranged in descending order.

### Introduction to Machine Learning. Speaker: Harry Chao Advisor: J.J. Ding Date: 1/27/2011

Introduction to Machine Learning Speaker: Harry Chao Advisor: J.J. Ding Date: 1/27/2011 1 Outline 1. What is machine learning? 2. The basic of machine learning 3. Principles and effects of machine learning

### Distributed Structured Prediction for Big Data

Distributed Structured Prediction for Big Data A. G. Schwing ETH Zurich aschwing@inf.ethz.ch T. Hazan TTI Chicago M. Pollefeys ETH Zurich R. Urtasun TTI Chicago Abstract The biggest limitations of learning

### PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical

### Cell Phone based Activity Detection using Markov Logic Network

Cell Phone based Activity Detection using Markov Logic Network Somdeb Sarkhel sxs104721@utdallas.edu 1 Introduction Mobile devices are becoming increasingly sophisticated and the latest generation of smart

### Introduction to Markov Chain Monte Carlo

Introduction to Markov Chain Monte Carlo Monte Carlo: sample from a distribution to estimate the distribution to compute max, mean Markov Chain Monte Carlo: sampling using local information Generic problem

### Lecture 3: Linear methods for classification

Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,

### Pricing and calibration in local volatility models via fast quantization

Pricing and calibration in local volatility models via fast quantization Parma, 29 th January 2015. Joint work with Giorgia Callegaro and Martino Grasselli Quantization: a brief history Birth: back to

### Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris

Class #6: Non-linear classification ML4Bio 2012 February 17 th, 2012 Quaid Morris 1 Module #: Title of Module 2 Review Overview Linear separability Non-linear classification Linear Support Vector Machines

### Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh

Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh Peter Richtárik Week 3 Randomized Coordinate Descent With Arbitrary Sampling January 27, 2016 1 / 30 The Problem

Adaptive Online Gradient Descent Peter L Bartlett Division of Computer Science Department of Statistics UC Berkeley Berkeley, CA 94709 bartlett@csberkeleyedu Elad Hazan IBM Almaden Research Center 650

### A crash course in probability and Naïve Bayes classification

Probability theory A crash course in probability and Naïve Bayes classification Chapter 9 Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s

### THE NUMBER OF GRAPHS AND A RANDOM GRAPH WITH A GIVEN DEGREE SEQUENCE. Alexander Barvinok

THE NUMBER OF GRAPHS AND A RANDOM GRAPH WITH A GIVEN DEGREE SEQUENCE Alexer Barvinok Papers are available at http://www.math.lsa.umich.edu/ barvinok/papers.html This is a joint work with J.A. Hartigan

### Social Media Mining. Data Mining Essentials

Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

### Section 5. Stan for Big Data. Bob Carpenter. Columbia University

Section 5. Stan for Big Data Bob Carpenter Columbia University Part I Overview Scaling and Evaluation data size (bytes) 1e18 1e15 1e12 1e9 1e6 Big Model and Big Data approach state of the art big model

### Finding the M Most Probable Configurations Using Loopy Belief Propagation

Finding the M Most Probable Configurations Using Loopy Belief Propagation Chen Yanover and Yair Weiss School of Computer Science and Engineering The Hebrew University of Jerusalem 91904 Jerusalem, Israel

### Probabilistic Graphical Models

Probabilistic Graphical Models Raquel Urtasun and Tamir Hazan TTI Chicago April 4, 2011 Raquel Urtasun and Tamir Hazan (TTI-C) Graphical Models April 4, 2011 1 / 22 Bayesian Networks and independences

### Model-based Synthesis. Tony O Hagan

Model-based Synthesis Tony O Hagan Stochastic models Synthesising evidence through a statistical model 2 Evidence Synthesis (Session 3), Helsinki, 28/10/11 Graphical modelling The kinds of models that

### LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING. ----Changsheng Liu 10-30-2014

LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING ----Changsheng Liu 10-30-2014 Agenda Semi Supervised Learning Topics in Semi Supervised Learning Label Propagation Local and global consistency Graph

### Data Mining Chapter 6: Models and Patterns Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

Data Mining Chapter 6: Models and Patterns Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Models vs. Patterns Models A model is a high level, global description of a

### Message-passing sequential detection of multiple change points in networks

Message-passing sequential detection of multiple change points in networks Long Nguyen, Arash Amini Ram Rajagopal University of Michigan Stanford University ISIT, Boston, July 2012 Nguyen/Amini/Rajagopal

### Towards running complex models on big data

Towards running complex models on big data Working with all the genomes in the world without changing the model (too much) Daniel Lawson Heilbronn Institute, University of Bristol 2013 1 / 17 Motivation

### 2.3 Convex Constrained Optimization Problems

42 CHAPTER 2. FUNDAMENTAL CONCEPTS IN CONVEX OPTIMIZATION Theorem 15 Let f : R n R and h : R R. Consider g(x) = h(f(x)) for all x R n. The function g is convex if either of the following two conditions

### P (x) 0. Discrete random variables Expected value. The expected value, mean or average of a random variable x is: xp (x) = v i P (v i )

Discrete random variables Probability mass function Given a discrete random variable X taking values in X = {v 1,..., v m }, its probability mass function P : X [0, 1] is defined as: P (v i ) = Pr[X =

### NEURAL NETWORKS A Comprehensive Foundation

NEURAL NETWORKS A Comprehensive Foundation Second Edition Simon Haykin McMaster University Hamilton, Ontario, Canada Prentice Hall Prentice Hall Upper Saddle River; New Jersey 07458 Preface xii Acknowledgments

### Tutorial on Markov Chain Monte Carlo

Tutorial on Markov Chain Monte Carlo Kenneth M. Hanson Los Alamos National Laboratory Presented at the 29 th International Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Technology,

### Why the Normal Distribution?

Why the Normal Distribution? Raul Rojas Freie Universität Berlin Februar 2010 Abstract This short note explains in simple terms why the normal distribution is so ubiquitous in pattern recognition applications.

### Christfried Webers. Canberra February June 2015

c Statistical Group and College of Engineering and Computer Science Canberra February June (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 829 c Part VIII Linear Classification 2 Logistic

### Efficient particle-based smoothing in general hidden Markov models: the PaRIS algorithm

Efficient particle-based smoothing in general hidden Markov models: the PaRIS algorithm Jimmy Olsson Stockholm University 11 February 2015 Outline Background The PaRIS algorithm Numerical results Theoretical

### Monte Carlo and Empirical Methods for Stochastic Inference (MASM11/FMS091)

Monte Carlo and Empirical Methods for Stochastic Inference (MASM11/FMS091) Magnus Wiktorsson Centre for Mathematical Sciences Lund University, Sweden Lecture 5 Sequential Monte Carlo methods I February

### MAP-Inference for Highly-Connected Graphs with DC-Programming

MAP-Inference for Highly-Connected Graphs with DC-Programming Jörg Kappes and Christoph Schnörr Image and Pattern Analysis Group, Heidelberg Collaboratory for Image Processing, University of Heidelberg,

### Unsupervised Learning

Unsupervised Learning Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London, UK zoubin@gatsby.ucl.ac.uk http://www.gatsby.ucl.ac.uk/~zoubin September 16, 2004 Abstract We give

### PARTICLE FILTERING UNDER COMMUNICATIONS CONSTRAINTS

PARTICLE FILTERING UNDER COMMUNICATIONS CONSTRAINTS Alexander T. Ihler Donald Brin School of Info. & Comp. Sci. U.C. Irvine, Irvine, CA 92697-3425 John W. Fisher III, Alan S. Willsky Massachusetts Institute

### Message-passing for graph-structured linear programs: Proximal methods and rounding schemes

Message-passing for graph-structured linear programs: Proximal methods and rounding schemes Pradeep Ravikumar pradeepr@stat.berkeley.edu Martin J. Wainwright, wainwrig@stat.berkeley.edu Alekh Agarwal alekh@eecs.berkeley.edu

### Linear Programming. Widget Factory Example. Linear Programming: Standard Form. Widget Factory Example: Continued.

Linear Programming Widget Factory Example Learning Goals. Introduce Linear Programming Problems. Widget Example, Graphical Solution. Basic Theory:, Vertices, Existence of Solutions. Equivalent formulations.

### Learning Organizational Principles in Human Environments

Learning Organizational Principles in Human Environments Outline Motivation: Object Allocation Problem Organizational Principles in Kitchen Environments Datasets Learning Organizational Principles Features

### 3. Linear Programming and Polyhedral Combinatorics

Massachusetts Institute of Technology Handout 6 18.433: Combinatorial Optimization February 20th, 2009 Michel X. Goemans 3. Linear Programming and Polyhedral Combinatorics Summary of what was seen in the

CS787: Advanced Algorithms Topic: Online Learning Presenters: David He, Chris Hopman 17.3.1 Follow the Perturbed Leader 17.3.1.1 Prediction Problem Recall the prediction problem that we discussed in class.

### Lecture 6: Logistic Regression

Lecture 6: CS 194-10, Fall 2011 Laurent El Ghaoui EECS Department UC Berkeley September 13, 2011 Outline Outline Classification task Data : X = [x 1,..., x m]: a n m matrix of data points in R n. y { 1,

### Bayesian Statistics in One Hour. Patrick Lam

Bayesian Statistics in One Hour Patrick Lam Outline Introduction Bayesian Models Applications Missing Data Hierarchical Models Outline Introduction Bayesian Models Applications Missing Data Hierarchical

### A Tutorial on Particle Filters for Online Nonlinear/Non-Gaussian Bayesian Tracking

174 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 50, NO. 2, FEBRUARY 2002 A Tutorial on Particle Filters for Online Nonlinear/Non-Gaussian Bayesian Tracking M. Sanjeev Arulampalam, Simon Maskell, Neil

### Learning Gaussian process models from big data. Alan Qi Purdue University Joint work with Z. Xu, F. Yan, B. Dai, and Y. Zhu

Learning Gaussian process models from big data Alan Qi Purdue University Joint work with Z. Xu, F. Yan, B. Dai, and Y. Zhu Machine learning seminar at University of Cambridge, July 4 2012 Data A lot of

### Introduction to Monte Carlo. Astro 542 Princeton University Shirley Ho

Introduction to Monte Carlo Astro 542 Princeton University Shirley Ho Agenda Monte Carlo -- definition, examples Sampling Methods (Rejection, Metropolis, Metropolis-Hasting, Exact Sampling) Markov Chains

### Lecture 3. Linear Programming. 3B1B Optimization Michaelmas 2015 A. Zisserman. Extreme solutions. Simplex method. Interior point method

Lecture 3 3B1B Optimization Michaelmas 2015 A. Zisserman Linear Programming Extreme solutions Simplex method Interior point method Integer programming and relaxation The Optimization Tree Linear Programming

### Monte Carlo-based statistical methods (MASM11/FMS091)

Monte Carlo-based statistical methods (MASM11/FMS091) Jimmy Olsson Centre for Mathematical Sciences Lund University, Sweden Lecture 5 Sequential Monte Carlo methods I February 5, 2013 J. Olsson Monte Carlo-based

### Bayesian Statistics: Indian Buffet Process

Bayesian Statistics: Indian Buffet Process Ilker Yildirim Department of Brain and Cognitive Sciences University of Rochester Rochester, NY 14627 August 2012 Reference: Most of the material in this note

### EC 6310: Advanced Econometric Theory

EC 6310: Advanced Econometric Theory July 2008 Slides for Lecture on Bayesian Computation in the Nonlinear Regression Model Gary Koop, University of Strathclyde 1 Summary Readings: Chapter 5 of textbook.

### Bayesian Clustering for Email Campaign Detection

Peter Haider haider@cs.uni-potsdam.de Tobias Scheffer scheffer@cs.uni-potsdam.de University of Potsdam, Department of Computer Science, August-Bebel-Strasse 89, 14482 Potsdam, Germany Abstract We discuss

### A Stochastic 3MG Algorithm with Application to 2D Filter Identification

A Stochastic 3MG Algorithm with Application to 2D Filter Identification Emilie Chouzenoux 1, Jean-Christophe Pesquet 1, and Anisia Florescu 2 1 Laboratoire d Informatique Gaspard Monge - CNRS Univ. Paris-Est,

### , each of which contains a unique key value, say k i , R 2. such that k i equals K (or to determine that no such record exists in the collection).

The Search Problem 1 Suppose we have a collection of records, say R 1, R 2,, R N, each of which contains a unique key value, say k i. Given a particular key value, K, the search problem is to locate the

### Statistical machine learning, high dimension and big data

Statistical machine learning, high dimension and big data S. Gaïffas 1 14 mars 2014 1 CMAP - Ecole Polytechnique Agenda for today Divide and Conquer principle for collaborative filtering Graphical modelling,

### Predict Influencers in the Social Network

Predict Influencers in the Social Network Ruishan Liu, Yang Zhao and Liuyu Zhou Email: rliu2, yzhao2, lyzhou@stanford.edu Department of Electrical Engineering, Stanford University Abstract Given two persons

### RECURSIVE TREES. Michael Drmota. Institue of Discrete Mathematics and Geometry Vienna University of Technology, A 1040 Wien, Austria

RECURSIVE TREES Michael Drmota Institue of Discrete Mathematics and Geometry Vienna University of Technology, A 040 Wien, Austria michael.drmota@tuwien.ac.at www.dmg.tuwien.ac.at/drmota/ 2008 Enrage Topical

### Chapter 15 Introduction to Linear Programming

Chapter 15 Introduction to Linear Programming An Introduction to Optimization Spring, 2014 Wei-Ta Chu 1 Brief History of Linear Programming The goal of linear programming is to determine the values of

### Parallelization Strategies for Multicore Data Analysis

Parallelization Strategies for Multicore Data Analysis Wei-Chen Chen 1 Russell Zaretzki 2 1 University of Tennessee, Dept of EEB 2 University of Tennessee, Dept. Statistics, Operations, and Management