Inference and Representation

Size: px
Start display at page:

Download "Inference and Representation"

Transcription

1 Inference and Representation David Sontag New York University Lecture 11, Nov. 24, 2015 David Sontag (NYU) Inference and Representation Lecture 11, Nov. 24, / 32

2 Approximate marginal inference Given the joint p(x 1,..., x n ) represented as a graphical model, how do we perform marginal inference, e.g. to compute p(x 1 e)? We showed in Lecture 4 that doing this exactly is NP-hard Nearly all approximate inference algorithms are either: 1 Monte-carlo methods (e.g., Gibbs sampling, likelihood reweighting, MCMC) 2 Variational algorithms (e.g., mean-field, loopy belief propagation) David Sontag (NYU) Inference and Representation Lecture 11, Nov. 24, / 32

3 Variational methods Goal: Approximate difficult distribution p(x e) with a new distribution q(x) such that: 1 p(x e) and q(x) are close 2 Computation on q(x) is easy How should we measure distance between distributions? The Kullback-Leibler divergence (KL-divergence) between two distributions p and q is defined as D(p q) = x p(x) log p(x) q(x). (measures the expected number of extra bits required to describe samples from p(x) using a code based on q instead of p) D(p q) 0 for all p, q, with equality if and only if p = q Notice that KL-divergence is asymmetric David Sontag (NYU) Inference and Representation Lecture 11, Nov. 24, / 32

4 KL-divergence (see Section of Murphy) D(p q) = x p(x) log p(x) q(x). Suppose p is the true distribution we wish to do inference with What is the difference between the solution to arg min D(p q) q (called the M-projection of q onto p) and (called the I-projection)? arg min D(q p) q These two will differ only when q is minimized over a restricted set of probability distributions Q = {q 1,...}, and in particular when p Q David Sontag (NYU) Inference and Representation Lecture 11, Nov. 24, / 32

5 KL-divergence M-projection q = arg min q Q D(p q) = x p(x) log p(x) q(x). For example, suppose that p(z) is a 2D Gaussian and Q is the set of all Gaussian distributions with diagonal covariance matrices: 1 z z 1 1 p=green, (b) q =Red David Sontag (NYU) Inference and Representation Lecture 11, Nov. 24, / 32

6 KL-divergence I-projection q = arg min q Q D(q p) = x q(x) log q(x) p(x). For example, suppose that p(z) is a 2D Gaussian and Q is the set of all Gaussian distributions with diagonal covariance matrices: 1 z z 1 1 p=green, (a) q =Red David Sontag (NYU) Inference and Representation Lecture 11, Nov. 24, / 32

7 KL-divergence (single Gaussian) In this simple example, both the M-projection and I-projection find an approximate q(x) that has the correct mean (i.e. E p [z] = E q [z]): 1 1 z 2 z z 1 1 (b) z 1 1 (a) What if p(x) is multi-modal? David Sontag (NYU) Inference and Representation Lecture 11, Nov. 24, / 32

8 KL-divergence M-projection (mixture of Gaussians) q = arg min q Q D(p q) = x p(x) log p(x) q(x). Now suppose that p(x) is mixture of two 2D Gaussians and Q is the set of all 2D Gaussian distributions (with arbitrary covariance matrices): p=blue, q =Red M-projection yields distribution q(x) with the correct mean and covariance. David Sontag (NYU) Inference and Representation Lecture 11, Nov. 24, / 32

9 KL-divergence I-projection (mixture of Gaussians) q = arg min q Q D(q p) = x q(x) log q(x) p(x). p=blue, q =Red (two local minima!) Unlike M-projection, the I-projection does not always yield the correct moments. Q: D(p q) is convex so why are there local minima? A: using a parametric form for q (i.e., a Gaussian). Not convex in µ, Σ. David Sontag (NYU) Inference and Representation Lecture 11, Nov. 24, / 32

10 M-projection does moment matching Recall that the M-projection is: q = arg min q Q D(p q) = x p(x) log p(x) q(x). Suppose that Q is an exponential family (p(x) can be arbitrary) and that we perform the M-projection, finding q Theorem: The expected sufficient statistics, with respect to q (x), are exactly the marginals of p(x): E q [f(x)] = E p [f(x)] Thus, solving for the M-projection (exactly) is just as hard as the original inference problem David Sontag (NYU) Inference and Representation Lecture 11, Nov. 24, / 32

11 M-projection does moment matching Recall that the M-projection is: q = arg min D(p q) = q(x;η) Q x Theorem: E q [f(x)] = E p [f(x)]. Proof: Look at the first-order optimality conditions. ηi D(p q) = ηi p(x) log q(x) x = ηi x = ηi x p(x) log p(x) q(x). { } p(x) log h(x) exp{η f(x) ln Z(η)} { } p(x) η f(x) ln Z(η) = p(x)f i (x) + E q(x;η) [f i (x)] x = E p [f i (x)] + E q(x;η) [f i (x)] = 0. (since ηi ln Z(η) = E q [f i (x)]) Corollary: Even computing the gradients is hard (can t do gradient descent) David Sontag (NYU) Inference and Representation Lecture 11, Nov. 24, / 32

12 Most variational inference algorithms make use of the I-projection David Sontag (NYU) Inference and Representation Lecture 11, Nov. 24, / 32

13 Variational methods Suppose that we have an arbitrary graphical model: p(x; θ) = 1 ( ) φ c (x c ) = exp θ c (x c ) ln Z(θ) Z(θ) c C All of the approaches begin as follows: D(q p) = x q(x) ln q(x) p(x) c C = q(x) ln p(x) q(x) ln 1 q(x) x x = q(x) ( θ c (x c ) ln Z(θ) ) H(q(x)) x c C = q(x)θ c (x c ) + q(x) ln Z(θ) H(q(x)) c C x x = c C E q [θ c (x c )] + ln Z(θ) H(q(x)). David Sontag (NYU) Inference and Representation Lecture 11, Nov. 24, / 32

14 Mean field algorithms for variational inference Eric CMU, max q Q q(x c )θ c (x c ) + H(q(x)). c C x c Naïve Mean Field Although this function is concave and thus in theory should be easy to optimize, we Fully need factorized somevariational compact distribution way of representing q(x) Mean field algorithms assume a factored representation of the joint distribution, e.g. q(x) = i V q i (x i ) Eric Xing CMU, naive mean field) 34 David Sontag (NYU) Inference and Representation Lecture 11, Nov. 24, / 32

15 Naive mean-field Suppose that Q consists of all fully factored distributions, of the form q(x) = i V q i(x i ) We can use this to simplify q(x c )θ c (x c ) + H(q) max q Q c C First, note that q(x c ) = i c q i(x i ) x c Next, notice that the joint entropy decomposes as a sum of local entropies: H(q) = x q(x) ln q(x) = q(x) ln q i (x i ) = q(x) ln q i (x i ) x i V x i V = q(x) ln q i (x i ) i V x = q i (x i ) ln q i (x i ) q(x V \i x i ) = H(q i ). i V x i x V \i i V David Sontag (NYU) Inference and Representation Lecture 11, Nov. 24, / 32

16 Naive mean-field Suppose that Q consists of all fully factored distributions, of the form q(x) = i V q i(x i ) We can use this to simplify q(x c )θ c (x c ) + H(q) max q Q c C First, note that q(x c ) = i c q i(x i ) x c Next, notice that the joint entropy decomposes as H(q) = i V H(q i). Putting these together, we obtain the following variational objective: ( ) max θ c (x c ) q i (x i ) + H(q i ) q x c i V subject to the constraints c C i c q i (x i ) 0 i V, x i Val(X i ) q i (x i ) = 1 i V x i Val(X i ) David Sontag (NYU) Inference and Representation Lecture 11, Nov. 24, / 32

17 Naive mean-field for pairwise MRFs How do we maximize the variational objective? ( ) max θ ij (x i, x j )q i (x i )q j (x j ) q i (x i ) ln q i (x i ) q x i,x j x i ij E i V This is a non-concave optimization problem, with many local maxima! Nonetheless, we can greedily maximize it using block coordinate ascent: 1 Iterate over each of the variables i V. For variable i, 2 Fully maximize (*) with respect to {q i (x i ), x i Val(X i )}. 3 Repeat until convergence. Constructing the Lagrangian, taking the derivative, setting to zero, and solving yields the update: (shown on blackboard) q i (x i ) 1 exp {θ i (x i ) + Z i j N(i) } q j (x j )θ ij (x i, x j ) x j David Sontag (NYU) Inference and Representation Lecture 11, Nov. 24, / 32

18 How accurate will the approximation be? Consider a distribution which is an XOR of two binary variables A and B: p(a, b) = 0.5 ɛ if a b and p(a, b) = ɛ if a = b The contour plot of the variational objective is: Q(b 1 ) Q(a 1 ) Even for a single edge, mean field can give very wrong answers! Interestingly, once ɛ > 0.1, mean field has a single maximum point at the uniform distribution (thus, exact) David Sontag (NYU) Inference and Representation Lecture 11, Nov. 24, / 32

19 Fig. 5.4 Structured mean field approximation for a factorial HMM. (a) Original model David Sontag (NYU) consists of a set of hiddeninference Markovand models Representation (defined on chains), coupled Lecture 11, at each Nov. time 24, 2015 by 19 / 32 Structured mean-field approximations Rather than assuming a fully-factored distribution for q, we can use a structured approximation, such as a spanning tree For example, for a factorial HMM, a good approximation may be a product of 146chain-structured Mean Field Methods models:

20 Recall our starting place for variational methods... Suppose that we have an arbitrary graphical model: p(x; θ) = 1 ( ) φ c (x c ) = exp θ c (x c ) ln Z(θ) Z(θ) c C All of the approaches begin as follows: D(q p) = x q(x) ln q(x) p(x) c C = q(x) ln p(x) q(x) ln 1 q(x) x x = q(x) ( θ c (x c ) ln Z(θ) ) H(q(x)) x c C = q(x)θ c (x c ) + q(x) ln Z(θ) H(q(x)) c C x x = c C E q [θ c (x c )] + ln Z(θ) H(q(x)). David Sontag (NYU) Inference and Representation Lecture 11, Nov. 24, / 32

21 The log-partition function Since D(q p) 0, we have c C E q [θ c (x c )] + ln Z(θ) H(q(x)) 0, which implies that ln Z(θ) c C E q [θ c (x c )] + H(q(x)). Thus, any approximating distribution q(x) gives a lower bound on the log-partition function (for a BN, this is the log probability of the observed variables) Recall that D(q p) = 0 if and only if p = q.thus, if we allow ourselves to optimize over all distributions, we have: ln Z(θ) = max q E q [θ c (x c )] + H(q(x)). c C David Sontag (NYU) Inference and Representation Lecture 11, Nov. 24, / 32

22 Re-writing objective in terms of moments ln Z(θ) = max q = max q = max q E q [θ c (x c )] + H(q(x)) c C q(x)θ c (x c ) + H(q(x)) c C x q(x c )θ c (x c ) + H(q(x)). c C x c Now assume that p(x) is in the exponential family, and let f(x) be its sufficient statistic vector Define µ q = E q [f(x)] to be the marginals of q(x) We can re-write the objective as ln Z(θ) = max max µ M q:e q[f(x)]=µ c C x c θ c (x c )µ c (x c ) + H(q(x)), where M, the marginal polytope, consists of all valid marginal vectors David Sontag (NYU) Inference and Representation Lecture 11, Nov. 24, / 32

23 Re-writing objective in terms of moments Next, push the max over q instead to obtain: ln Z(θ) = max θ c (x c )µ c (x c ) + H(µ), where µ M c C x c H(µ) = max q:e q[f(x)]=µ H(q) Does this look familiar? For discrete random variables, the marginal polytope M is given by M = {µ R d µ = } p(x)f(x) for some p(x) 0, p(x) = 1 x X m x X { m = conv f(x), x X m} (conv denotes the convex hull operation) For a discrete-variable MRF, the sufficient statistic vector f(x) is simply the concatenation of indicator functions for each clique of variables that appear together in a potential function For example, if we have a pairwise MRF on binary variables with m = V variables and E edges, d = 2m + 4 E David Sontag (NYU) Inference and Representation Lecture 11, Nov. 24, / 32

24 Marginal polytope for discrete MRFs µ = 0! 1! 0! 1! 1! 0! 0" 0" 1" 0" 0! 0! 0! 1! 0! 0! 1! 0" Marginal polytope! (Wainwright & Jordan, 03)! µ = 1 µ 2 + µ valid marginal probabilities! X 1! = 1! 1! 0! 0! 1! 1! 0! 1" 0" 0" 0" 0! 1! 0! 0! 0! 0! 1! 0" Assignment for X 1" Assignment for X 2" Assignment for X 3! Edge assignment for" X 1 X 3! Edge assignment for" X 1 X 2! Edge assignment for" X 2 X 3! X 1! = 0! X 2! = 1! X 3 =! 0! X 2! = 1! X 3 =! 0! Figure 2-1: Illustration of the marginal polytope for a Markov random field with three nodes that have states in {0, 1}. The vertices correspond one-to-one with global assignments to David Sontag (NYU) Inference and Representation Lecture 11, Nov. 24, / 32

25 Relaxation ln Z(θ) = max θ c (x c )µ c (x c ) + H(µ) µ M c C x c We still haven t achieved anything, because: 1 The marginal polytope M is complex to describe (in general, exponentially many vertices and facets) 2 H(µ) is very difficult to compute or optimize over We now make two approximations: 1 We replace M with a relaxation of the marginal polytope, e.g. the local consistency constraints M L 2 We replace H(µ) with a function H(µ) which approximates H(µ) David Sontag (NYU) Inference and Representation Lecture 11, Nov. 24, / 32

26 Local consistency constraints Force every cluster of variables to choose a local assignment: µ i (x i ) 0 i V, x i x i µ i (x i ) = 1 i V µ ij (x i, x j ) 0 ij E, x i, x j x i,x j µ ij (x i, x j ) = 1 ij E Enforce that these local assignments are globally consistent: µ i (x i ) = x j µ ij (x i, x j ) ij E, x i µ j (x j ) = x i µ ij (x i, x j ) ij E, x j The local consistency polytope, M L is defined by these constraints Theorem: The local consistency constraints exactly define the marginal polytope for a tree-structured MRF David Sontag (NYU) Inference and Representation Lecture 11, Nov. 24, / 32

27 Entropy for tree-structured models Suppose that p is a tree-structured distribution, so that we are optimizing only over marginals µ ij (x i, x j ) for ij T The solution to arg max q:eq[f(x)]=µ H(q) is a tree-structured MRF (c.f. lecture 10, maximum entropy estimation) The entropy of q as a function of its marginals can be shown to be H( µ) = i V H(µ i ) ij T I (µ ij ) where H(µ i ) = x i µ i (x i ) log µ i (x i ) I (µ ij ) = x i,x j µ ij (x i, x j ) log µ ij(x i, x j ) µ i (x i )µ j (x j ) Can we use this for non-tree structured models? David Sontag (NYU) Inference and Representation Lecture 11, Nov. 24, / 32

28 Bethe-free energy approximation The Bethe entropy approximation is (for any graph) H bethe ( µ) = H(µ i ) I (µ ij ) i V ij E This gives the following variational approximation: max θ c (x c )µ c (x c ) + H bethe ( µ) µ M L c C x c For non tree-structured models this is not concave, and is hard to maximize Loopy belief propagation, if it converges, finds a saddle point! David Sontag (NYU) Inference and Representation Lecture 11, Nov. 24, / 32

29 Concave relaxation Let H(µ) be an upper bound on H(µ), i.e. H(µ) H(µ) As a result, we obtain the following upper bound on the log-partition function: ln Z(θ) max θ c (x c )µ c (x c ) + H(µ) µ M L c C x c An example of a concave entropy upper bound is the tree-reweighted approximation (Jaakkola, Wainwright, & Wilsky, 05), given by specifying a distribution over spanning trees of the graph e f b e f b (a) (b) (c) (d) Figure 1. Illustration of the spanning tree polytope T(G). Original graph is shown in panel (a). Probability 1/3 is assigned to each of the three spanning trees { Ti i = 1, 2, 3 } shown in panels (b) (d). Edge b is a so-called bridge in G, meaning that it must appear in any spanning tree (i.e., µb = 1). Edges e and f appear in two and one of the spanning trees respectively, which gives rise to edge appearance probabilities µe = 2/3 and µf = 1/3. e f b e f b the Bethe free energy [11]. In fact, suppose that w It can be seen that this function is closely related David Sontag (NYU) Tree-consistent pseudomarginals: Inference and Representation The con- set µst = 1 for all edges (s, t) E, meaning that eve Lecture edge appears 11, Nov. with probability 24, 2015one. In29 this/ case, 32 t joint pseudomarginal Tst: Letting {ρ ij } denote edge appearance probabilities, we have: H TRW ( µ) = i V H(µ i ) ij E ρ ij I (µ ij ) Ist(Tst) = Tst;jk log ( (j,k) k Xt Tst;jk Tst;jk)( j Xs Tst;jk Borrowing terminology from statistical physics [11], w define an average energy term as follows: T θ = Ts;jθs;j + Tst;jkθst; s V j (s,t) E (j,k) Using this notation, our bounds are based on the fo lowing function: F(T; µe; θ ) Hs(Ts) + µstist(tst) T s V (s,t) E

30 Comparison of LBP and TRW We showed two approximation methods, both making use of the local consistency constraints M L on the marginal polytope: 1 Bethe-free energy approximation (for pairwise MRFs): max µ ij (x i, x j )θ ij (x i, x j ) + H(µ i ) I (µ ij ) µ M L ij E x i,x j i V ij E Not concave. Can use concave-convex procedure to find local optima Loopy BP, if it converges, finds a saddle point (often a local maxima) 2 Tree re-weighted approximation (for pairwise MRFs): ( ) max µ ij (x i, x j )θ ij (x i, x j ) + H(µ i ) ρ ij I (µ ij ) µ M L ij E x i,x j i V ij E {ρ ij } are edge appearance probabilities (must be consistent with some set of spanning trees) This is concave! Find global maximiza using projected gradient ascent Provides an upper bound on log-partition function, i.e. ln Z(θ) ( ) David Sontag (NYU) Inference and Representation Lecture 11, Nov. 24, / 32

31 Two types of variational algorithms: Mean-field and Mean Field Approximation relaxation max q Q q(x c )θ c (x c ) + H(q(x)). c C x c Eric CMU, Although this function is concave and thus in theory should be easy to optimize, we need some compact way of representing q(x) Relaxation algorithms work directly with pseudomarginals which may not be Naïve Mean Field consistent with any joint distribution Mean-field algorithms assume a factored representation of the joint distribution, e.g. Fully factorized variational distribution 33 q(x) = i V q i (x i ) Eric CMU, (called naive mean field) David Sontag (NYU) Inference and Representation Lecture 11, Nov. 24, / 32

32 Naive mean-field Using the same notation as in the rest of the lecture, naive mean-field is: ( ) max θ c (x c )µ c (x c ) + H(µ i ) subject to µ x c c C x i Val(X i ) i V µ i (x i ) 0 i V, x i Val(X i ) µ i (x i ) = 1 i V µ c (x c ) = i c µ i (x i ) Corresponds to optimizing over an inner5.4bound Nonconvexityon of Mean the Fieldmarginal 141 polytope: We obtain a lower bound on the partition function, i.e. ( ) ln Z(θ) Fig. 5.3 Cartoon illustration of the set MF (G) of mean parameters that arise from tractable distributions is a nonconvex inner bound on M(G). Illustrated here is the case of discrete random variables where M(G) is a polytope. The circles correspond to mean parameters that arise from delta distributions, and belong to both M(G) and MF (G). David Sontag (NYU) Inference and Representation Lecture 11, Nov. 24, / 32

Variational Mean Field for Graphical Models

Variational Mean Field for Graphical Models Variational Mean Field for Graphical Models CS/CNS/EE 155 Baback Moghaddam Machine Learning Group baback @ jpl.nasa.gov Approximate Inference Consider general UGs (i.e., not tree-structured) All basic

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct

More information

Course: Model, Learning, and Inference: Lecture 5

Course: Model, Learning, and Inference: Lecture 5 Course: Model, Learning, and Inference: Lecture 5 Alan Yuille Department of Statistics, UCLA Los Angeles, CA 90095 yuille@stat.ucla.edu Abstract Probability distributions on structured representation.

More information

Introduction to Deep Learning Variational Inference, Mean Field Theory

Introduction to Deep Learning Variational Inference, Mean Field Theory Introduction to Deep Learning Variational Inference, Mean Field Theory 1 Iasonas Kokkinos Iasonas.kokkinos@ecp.fr Center for Visual Computing Ecole Centrale Paris Galen Group INRIA-Saclay Lecture 3: recap

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes

More information

Linear Threshold Units

Linear Threshold Units Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear

More information

CHAPTER 9. Integer Programming

CHAPTER 9. Integer Programming CHAPTER 9 Integer Programming An integer linear program (ILP) is, by definition, a linear program with the additional constraint that all variables take integer values: (9.1) max c T x s t Ax b and x integral

More information

Approximating the Partition Function by Deleting and then Correcting for Model Edges

Approximating the Partition Function by Deleting and then Correcting for Model Edges Approximating the Partition Function by Deleting and then Correcting for Model Edges Arthur Choi and Adnan Darwiche Computer Science Department University of California, Los Angeles Los Angeles, CA 995

More information

THE NUMBER OF GRAPHS AND A RANDOM GRAPH WITH A GIVEN DEGREE SEQUENCE. Alexander Barvinok

THE NUMBER OF GRAPHS AND A RANDOM GRAPH WITH A GIVEN DEGREE SEQUENCE. Alexander Barvinok THE NUMBER OF GRAPHS AND A RANDOM GRAPH WITH A GIVEN DEGREE SEQUENCE Alexer Barvinok Papers are available at http://www.math.lsa.umich.edu/ barvinok/papers.html This is a joint work with J.A. Hartigan

More information

2.3 Convex Constrained Optimization Problems

2.3 Convex Constrained Optimization Problems 42 CHAPTER 2. FUNDAMENTAL CONCEPTS IN CONVEX OPTIMIZATION Theorem 15 Let f : R n R and h : R R. Consider g(x) = h(f(x)) for all x R n. The function g is convex if either of the following two conditions

More information

Introduction to Markov Chain Monte Carlo

Introduction to Markov Chain Monte Carlo Introduction to Markov Chain Monte Carlo Monte Carlo: sample from a distribution to estimate the distribution to compute max, mean Markov Chain Monte Carlo: sampling using local information Generic problem

More information

Linear Programming. March 14, 2014

Linear Programming. March 14, 2014 Linear Programming March 1, 01 Parts of this introduction to linear programming were adapted from Chapter 9 of Introduction to Algorithms, Second Edition, by Cormen, Leiserson, Rivest and Stein [1]. 1

More information

The Basics of Graphical Models

The Basics of Graphical Models The Basics of Graphical Models David M. Blei Columbia University October 3, 2015 Introduction These notes follow Chapter 2 of An Introduction to Probabilistic Graphical Models by Michael Jordan. Many figures

More information

Bayesian Clustering for Email Campaign Detection

Bayesian Clustering for Email Campaign Detection Peter Haider haider@cs.uni-potsdam.de Tobias Scheffer scheffer@cs.uni-potsdam.de University of Potsdam, Department of Computer Science, August-Bebel-Strasse 89, 14482 Potsdam, Germany Abstract We discuss

More information

17.3.1 Follow the Perturbed Leader

17.3.1 Follow the Perturbed Leader CS787: Advanced Algorithms Topic: Online Learning Presenters: David He, Chris Hopman 17.3.1 Follow the Perturbed Leader 17.3.1.1 Prediction Problem Recall the prediction problem that we discussed in class.

More information

Finding the M Most Probable Configurations Using Loopy Belief Propagation

Finding the M Most Probable Configurations Using Loopy Belief Propagation Finding the M Most Probable Configurations Using Loopy Belief Propagation Chen Yanover and Yair Weiss School of Computer Science and Engineering The Hebrew University of Jerusalem 91904 Jerusalem, Israel

More information

Solving NP Hard problems in practice lessons from Computer Vision and Computational Biology

Solving NP Hard problems in practice lessons from Computer Vision and Computational Biology Solving NP Hard problems in practice lessons from Computer Vision and Computational Biology Yair Weiss School of Computer Science and Engineering The Hebrew University of Jerusalem www.cs.huji.ac.il/ yweiss

More information

Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay

Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay Lecture - 17 Shannon-Fano-Elias Coding and Introduction to Arithmetic Coding

More information

Lecture 3: Linear methods for classification

Lecture 3: Linear methods for classification Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,

More information

3. The Junction Tree Algorithms

3. The Junction Tree Algorithms A Short Course on Graphical Models 3. The Junction Tree Algorithms Mark Paskin mark@paskin.org 1 Review: conditional independence Two random variables X and Y are independent (written X Y ) iff p X ( )

More information

Bayesian Machine Learning (ML): Modeling And Inference in Big Data. Zhuhua Cai Google, Rice University caizhua@gmail.com

Bayesian Machine Learning (ML): Modeling And Inference in Big Data. Zhuhua Cai Google, Rice University caizhua@gmail.com Bayesian Machine Learning (ML): Modeling And Inference in Big Data Zhuhua Cai Google Rice University caizhua@gmail.com 1 Syllabus Bayesian ML Concepts (Today) Bayesian ML on MapReduce (Next morning) Bayesian

More information

Week 1: Introduction to Online Learning

Week 1: Introduction to Online Learning Week 1: Introduction to Online Learning 1 Introduction This is written based on Prediction, Learning, and Games (ISBN: 2184189 / -21-8418-9 Cesa-Bianchi, Nicolo; Lugosi, Gabor 1.1 A Gentle Start Consider

More information

Lecture 3. Linear Programming. 3B1B Optimization Michaelmas 2015 A. Zisserman. Extreme solutions. Simplex method. Interior point method

Lecture 3. Linear Programming. 3B1B Optimization Michaelmas 2015 A. Zisserman. Extreme solutions. Simplex method. Interior point method Lecture 3 3B1B Optimization Michaelmas 2015 A. Zisserman Linear Programming Extreme solutions Simplex method Interior point method Integer programming and relaxation The Optimization Tree Linear Programming

More information

Towards running complex models on big data

Towards running complex models on big data Towards running complex models on big data Working with all the genomes in the world without changing the model (too much) Daniel Lawson Heilbronn Institute, University of Bristol 2013 1 / 17 Motivation

More information

The Goldberg Rao Algorithm for the Maximum Flow Problem

The Goldberg Rao Algorithm for the Maximum Flow Problem The Goldberg Rao Algorithm for the Maximum Flow Problem COS 528 class notes October 18, 2006 Scribe: Dávid Papp Main idea: use of the blocking flow paradigm to achieve essentially O(min{m 2/3, n 1/2 }

More information

Gambling and Data Compression

Gambling and Data Compression Gambling and Data Compression Gambling. Horse Race Definition The wealth relative S(X) = b(x)o(x) is the factor by which the gambler s wealth grows if horse X wins the race, where b(x) is the fraction

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical

More information

Scheduling Shop Scheduling. Tim Nieberg

Scheduling Shop Scheduling. Tim Nieberg Scheduling Shop Scheduling Tim Nieberg Shop models: General Introduction Remark: Consider non preemptive problems with regular objectives Notation Shop Problems: m machines, n jobs 1,..., n operations

More information

Structured Learning and Prediction in Computer Vision. Contents

Structured Learning and Prediction in Computer Vision. Contents Foundations and Trends R in Computer Graphics and Vision Vol. 6, Nos. 3 4 (2010) 185 365 c 2011 S. Nowozin and C. H. Lampert DOI: 10.1561/0600000033 Structured Learning and Prediction in Computer Vision

More information

MAP-Inference for Highly-Connected Graphs with DC-Programming

MAP-Inference for Highly-Connected Graphs with DC-Programming MAP-Inference for Highly-Connected Graphs with DC-Programming Jörg Kappes and Christoph Schnörr Image and Pattern Analysis Group, Heidelberg Collaboratory for Image Processing, University of Heidelberg,

More information

The Steepest Descent Algorithm for Unconstrained Optimization and a Bisection Line-search Method

The Steepest Descent Algorithm for Unconstrained Optimization and a Bisection Line-search Method The Steepest Descent Algorithm for Unconstrained Optimization and a Bisection Line-search Method Robert M. Freund February, 004 004 Massachusetts Institute of Technology. 1 1 The Algorithm The problem

More information

Increasing for all. Convex for all. ( ) Increasing for all (remember that the log function is only defined for ). ( ) Concave for all.

Increasing for all. Convex for all. ( ) Increasing for all (remember that the log function is only defined for ). ( ) Concave for all. 1. Differentiation The first derivative of a function measures by how much changes in reaction to an infinitesimal shift in its argument. The largest the derivative (in absolute value), the faster is evolving.

More information

Probabilistic Models for Big Data. Alex Davies and Roger Frigola University of Cambridge 13th February 2014

Probabilistic Models for Big Data. Alex Davies and Roger Frigola University of Cambridge 13th February 2014 Probabilistic Models for Big Data Alex Davies and Roger Frigola University of Cambridge 13th February 2014 The State of Big Data Why probabilistic models for Big Data? 1. If you don t have to worry about

More information

Logistic Regression. Vibhav Gogate The University of Texas at Dallas. Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld.

Logistic Regression. Vibhav Gogate The University of Texas at Dallas. Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld. Logistic Regression Vibhav Gogate The University of Texas at Dallas Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld. Generative vs. Discriminative Classifiers Want to Learn: h:x Y X features

More information

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression Logistic Regression Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max

More information

Interpreting Kullback-Leibler Divergence with the Neyman-Pearson Lemma

Interpreting Kullback-Leibler Divergence with the Neyman-Pearson Lemma Interpreting Kullback-Leibler Divergence with the Neyman-Pearson Lemma Shinto Eguchi a, and John Copas b a Institute of Statistical Mathematics and Graduate University of Advanced Studies, Minami-azabu

More information

Duality in General Programs. Ryan Tibshirani Convex Optimization 10-725/36-725

Duality in General Programs. Ryan Tibshirani Convex Optimization 10-725/36-725 Duality in General Programs Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: duality in linear programs Given c R n, A R m n, b R m, G R r n, h R r : min x R n c T x max u R m, v R r b T

More information

Statistical Machine Translation: IBM Models 1 and 2

Statistical Machine Translation: IBM Models 1 and 2 Statistical Machine Translation: IBM Models 1 and 2 Michael Collins 1 Introduction The next few lectures of the course will be focused on machine translation, and in particular on statistical machine translation

More information

Methods of Data Analysis Working with probability distributions

Methods of Data Analysis Working with probability distributions Methods of Data Analysis Working with probability distributions Week 4 1 Motivation One of the key problems in non-parametric data analysis is to create a good model of a generating probability distribution,

More information

Message-passing sequential detection of multiple change points in networks

Message-passing sequential detection of multiple change points in networks Message-passing sequential detection of multiple change points in networks Long Nguyen, Arash Amini Ram Rajagopal University of Michigan Stanford University ISIT, Boston, July 2012 Nguyen/Amini/Rajagopal

More information

Pattern Analysis. Logistic Regression. 12. Mai 2009. Joachim Hornegger. Chair of Pattern Recognition Erlangen University

Pattern Analysis. Logistic Regression. 12. Mai 2009. Joachim Hornegger. Chair of Pattern Recognition Erlangen University Pattern Analysis Logistic Regression 12. Mai 2009 Joachim Hornegger Chair of Pattern Recognition Erlangen University Pattern Analysis 2 / 43 1 Logistic Regression Posteriors and the Logistic Function Decision

More information

Table 1: Summary of the settings and parameters employed by the additive PA algorithm for classification, regression, and uniclass.

Table 1: Summary of the settings and parameters employed by the additive PA algorithm for classification, regression, and uniclass. Online Passive-Aggressive Algorithms Koby Crammer Ofer Dekel Shai Shalev-Shwartz Yoram Singer School of Computer Science & Engineering The Hebrew University, Jerusalem 91904, Israel {kobics,oferd,shais,singer}@cs.huji.ac.il

More information

Christfried Webers. Canberra February June 2015

Christfried Webers. Canberra February June 2015 c Statistical Group and College of Engineering and Computer Science Canberra February June (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 829 c Part VIII Linear Classification 2 Logistic

More information

Semi-Supervised Support Vector Machines and Application to Spam Filtering

Semi-Supervised Support Vector Machines and Application to Spam Filtering Semi-Supervised Support Vector Machines and Application to Spam Filtering Alexander Zien Empirical Inference Department, Bernhard Schölkopf Max Planck Institute for Biological Cybernetics ECML 2006 Discovery

More information

Mathematical finance and linear programming (optimization)

Mathematical finance and linear programming (optimization) Mathematical finance and linear programming (optimization) Geir Dahl September 15, 2009 1 Introduction The purpose of this short note is to explain how linear programming (LP) (=linear optimization) may

More information

Discrete Optimization

Discrete Optimization Discrete Optimization [Chen, Batson, Dang: Applied integer Programming] Chapter 3 and 4.1-4.3 by Johan Högdahl and Victoria Svedberg Seminar 2, 2015-03-31 Todays presentation Chapter 3 Transforms using

More information

Linear Classification. Volker Tresp Summer 2015

Linear Classification. Volker Tresp Summer 2015 Linear Classification Volker Tresp Summer 2015 1 Classification Classification is the central task of pattern recognition Sensors supply information about an object: to which class do the object belong

More information

LECTURE 4. Last time: Lecture outline

LECTURE 4. Last time: Lecture outline LECTURE 4 Last time: Types of convergence Weak Law of Large Numbers Strong Law of Large Numbers Asymptotic Equipartition Property Lecture outline Stochastic processes Markov chains Entropy rate Random

More information

t := maxγ ν subject to ν {0,1,2,...} and f(x c +γ ν d) f(x c )+cγ ν f (x c ;d).

t := maxγ ν subject to ν {0,1,2,...} and f(x c +γ ν d) f(x c )+cγ ν f (x c ;d). 1. Line Search Methods Let f : R n R be given and suppose that x c is our current best estimate of a solution to P min x R nf(x). A standard method for improving the estimate x c is to choose a direction

More information

Distributed Structured Prediction for Big Data

Distributed Structured Prediction for Big Data Distributed Structured Prediction for Big Data A. G. Schwing ETH Zurich aschwing@inf.ethz.ch T. Hazan TTI Chicago M. Pollefeys ETH Zurich R. Urtasun TTI Chicago Abstract The biggest limitations of learning

More information

Markov random fields and Gibbs measures

Markov random fields and Gibbs measures Chapter Markov random fields and Gibbs measures 1. Conditional independence Suppose X i is a random element of (X i, B i ), for i = 1, 2, 3, with all X i defined on the same probability space (.F, P).

More information

Statistical machine learning, high dimension and big data

Statistical machine learning, high dimension and big data Statistical machine learning, high dimension and big data S. Gaïffas 1 14 mars 2014 1 CMAP - Ecole Polytechnique Agenda for today Divide and Conquer principle for collaborative filtering Graphical modelling,

More information

Math 120 Final Exam Practice Problems, Form: A

Math 120 Final Exam Practice Problems, Form: A Math 120 Final Exam Practice Problems, Form: A Name: While every attempt was made to be complete in the types of problems given below, we make no guarantees about the completeness of the problems. Specifically,

More information

CS 688 Pattern Recognition Lecture 4. Linear Models for Classification

CS 688 Pattern Recognition Lecture 4. Linear Models for Classification CS 688 Pattern Recognition Lecture 4 Linear Models for Classification Probabilistic generative models Probabilistic discriminative models 1 Generative Approach ( x ) p C k p( C k ) Ck p ( ) ( x Ck ) p(

More information

Recovery of primal solutions from dual subgradient methods for mixed binary linear programming; a branch-and-bound approach

Recovery of primal solutions from dual subgradient methods for mixed binary linear programming; a branch-and-bound approach MASTER S THESIS Recovery of primal solutions from dual subgradient methods for mixed binary linear programming; a branch-and-bound approach PAULINE ALDENVIK MIRJAM SCHIERSCHER Department of Mathematical

More information

Message-passing for graph-structured linear programs: Proximal methods and rounding schemes

Message-passing for graph-structured linear programs: Proximal methods and rounding schemes Message-passing for graph-structured linear programs: Proximal methods and rounding schemes Pradeep Ravikumar pradeepr@stat.berkeley.edu Martin J. Wainwright, wainwrig@stat.berkeley.edu Alekh Agarwal alekh@eecs.berkeley.edu

More information

Dirichlet Processes A gentle tutorial

Dirichlet Processes A gentle tutorial Dirichlet Processes A gentle tutorial SELECT Lab Meeting October 14, 2008 Khalid El-Arini Motivation We are given a data set, and are told that it was generated from a mixture of Gaussian distributions.

More information

Perron vector Optimization applied to search engines

Perron vector Optimization applied to search engines Perron vector Optimization applied to search engines Olivier Fercoq INRIA Saclay and CMAP Ecole Polytechnique May 18, 2011 Web page ranking The core of search engines Semantic rankings (keywords) Hyperlink

More information

Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh

Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh Peter Richtárik Week 3 Randomized Coordinate Descent With Arbitrary Sampling January 27, 2016 1 / 30 The Problem

More information

Definition 11.1. Given a graph G on n vertices, we define the following quantities:

Definition 11.1. Given a graph G on n vertices, we define the following quantities: Lecture 11 The Lovász ϑ Function 11.1 Perfect graphs We begin with some background on perfect graphs. graphs. First, we define some quantities on Definition 11.1. Given a graph G on n vertices, we define

More information

Proximal mapping via network optimization

Proximal mapping via network optimization L. Vandenberghe EE236C (Spring 23-4) Proximal mapping via network optimization minimum cut and maximum flow problems parametric minimum cut problem application to proximal mapping Introduction this lecture:

More information

Conditional mean field

Conditional mean field Conditional mean field Peter Carbonetto Department of Computer Science University of British Columbia Vancouver, BC, Canada V6T 1Z4 pcarbo@cs.ubc.ca Nando de Freitas Department of Computer Science University

More information

5 Directed acyclic graphs

5 Directed acyclic graphs 5 Directed acyclic graphs (5.1) Introduction In many statistical studies we have prior knowledge about a temporal or causal ordering of the variables. In this chapter we will use directed graphs to incorporate

More information

Tail inequalities for order statistics of log-concave vectors and applications

Tail inequalities for order statistics of log-concave vectors and applications Tail inequalities for order statistics of log-concave vectors and applications Rafał Latała Based in part on a joint work with R.Adamczak, A.E.Litvak, A.Pajor and N.Tomczak-Jaegermann Banff, May 2011 Basic

More information

Why? A central concept in Computer Science. Algorithms are ubiquitous.

Why? A central concept in Computer Science. Algorithms are ubiquitous. Analysis of Algorithms: A Brief Introduction Why? A central concept in Computer Science. Algorithms are ubiquitous. Using the Internet (sending email, transferring files, use of search engines, online

More information

The Expectation Maximization Algorithm A short tutorial

The Expectation Maximization Algorithm A short tutorial The Expectation Maximiation Algorithm A short tutorial Sean Borman Comments and corrections to: em-tut at seanborman dot com July 8 2004 Last updated January 09, 2009 Revision history 2009-0-09 Corrected

More information

Lecture 1: Course overview, circuits, and formulas

Lecture 1: Course overview, circuits, and formulas Lecture 1: Course overview, circuits, and formulas Topics in Complexity Theory and Pseudorandomness (Spring 2013) Rutgers University Swastik Kopparty Scribes: John Kim, Ben Lund 1 Course Information Swastik

More information

1. Prove that the empty set is a subset of every set.

1. Prove that the empty set is a subset of every set. 1. Prove that the empty set is a subset of every set. Basic Topology Written by Men-Gen Tsai email: b89902089@ntu.edu.tw Proof: For any element x of the empty set, x is also an element of every set since

More information

Equilibrium computation: Part 1

Equilibrium computation: Part 1 Equilibrium computation: Part 1 Nicola Gatti 1 Troels Bjerre Sorensen 2 1 Politecnico di Milano, Italy 2 Duke University, USA Nicola Gatti and Troels Bjerre Sørensen ( Politecnico di Milano, Italy, Equilibrium

More information

Tutorial on Markov Chain Monte Carlo

Tutorial on Markov Chain Monte Carlo Tutorial on Markov Chain Monte Carlo Kenneth M. Hanson Los Alamos National Laboratory Presented at the 29 th International Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Technology,

More information

Probabilistic user behavior models in online stores for recommender systems

Probabilistic user behavior models in online stores for recommender systems Probabilistic user behavior models in online stores for recommender systems Tomoharu Iwata Abstract Recommender systems are widely used in online stores because they are expected to improve both user

More information

Monte Carlo-based statistical methods (MASM11/FMS091)

Monte Carlo-based statistical methods (MASM11/FMS091) Monte Carlo-based statistical methods (MASM11/FMS091) Jimmy Olsson Centre for Mathematical Sciences Lund University, Sweden Lecture 5 Sequential Monte Carlo methods I February 5, 2013 J. Olsson Monte Carlo-based

More information

Lecture 3: Finding integer solutions to systems of linear equations

Lecture 3: Finding integer solutions to systems of linear equations Lecture 3: Finding integer solutions to systems of linear equations Algorithmic Number Theory (Fall 2014) Rutgers University Swastik Kopparty Scribe: Abhishek Bhrushundi 1 Overview The goal of this lecture

More information

α = u v. In other words, Orthogonal Projection

α = u v. In other words, Orthogonal Projection Orthogonal Projection Given any nonzero vector v, it is possible to decompose an arbitrary vector u into a component that points in the direction of v and one that points in a direction orthogonal to v

More information

Graphical Models, Exponential Families, and Variational Inference

Graphical Models, Exponential Families, and Variational Inference Foundations and Trends R in Machine Learning Vol. 1, Nos. 1 2 (2008) 1 305 c 2008 M. J. Wainwright and M. I. Jordan DOI: 10.1561/2200000001 Graphical Models, Exponential Families, and Variational Inference

More information

Sampling via Moment Sharing: A New Framework for Distributed Bayesian Inference for Big Data

Sampling via Moment Sharing: A New Framework for Distributed Bayesian Inference for Big Data Sampling via Moment Sharing: A New Framework for Distributed Bayesian Inference for Big Data (Oxford) in collaboration with: Minjie Xu, Jun Zhu, Bo Zhang (Tsinghua) Balaji Lakshminarayanan (Gatsby) Bayesian

More information

Adaptive Online Gradient Descent

Adaptive Online Gradient Descent Adaptive Online Gradient Descent Peter L Bartlett Division of Computer Science Department of Statistics UC Berkeley Berkeley, CA 94709 bartlett@csberkeleyedu Elad Hazan IBM Almaden Research Center 650

More information

5.1 Bipartite Matching

5.1 Bipartite Matching CS787: Advanced Algorithms Lecture 5: Applications of Network Flow In the last lecture, we looked at the problem of finding the maximum flow in a graph, and how it can be efficiently solved using the Ford-Fulkerson

More information

Several Views of Support Vector Machines

Several Views of Support Vector Machines Several Views of Support Vector Machines Ryan M. Rifkin Honda Research Institute USA, Inc. Human Intention Understanding Group 2007 Tikhonov Regularization We are considering algorithms of the form min

More information

Walrasian Demand. u(x) where B(p, w) = {x R n + : p x w}.

Walrasian Demand. u(x) where B(p, w) = {x R n + : p x w}. Walrasian Demand Econ 2100 Fall 2015 Lecture 5, September 16 Outline 1 Walrasian Demand 2 Properties of Walrasian Demand 3 An Optimization Recipe 4 First and Second Order Conditions Definition Walrasian

More information

Section 5. Stan for Big Data. Bob Carpenter. Columbia University

Section 5. Stan for Big Data. Bob Carpenter. Columbia University Section 5. Stan for Big Data Bob Carpenter Columbia University Part I Overview Scaling and Evaluation data size (bytes) 1e18 1e15 1e12 1e9 1e6 Big Model and Big Data approach state of the art big model

More information

1 Introduction. 2 Prediction with Expert Advice. Online Learning 9.520 Lecture 09

1 Introduction. 2 Prediction with Expert Advice. Online Learning 9.520 Lecture 09 1 Introduction Most of the course is concerned with the batch learning problem. In this lecture, however, we look at a different model, called online. Let us first compare and contrast the two. In batch

More information

MATH10212 Linear Algebra. Systems of Linear Equations. Definition. An n-dimensional vector is a row or a column of n numbers (or letters): a 1.

MATH10212 Linear Algebra. Systems of Linear Equations. Definition. An n-dimensional vector is a row or a column of n numbers (or letters): a 1. MATH10212 Linear Algebra Textbook: D. Poole, Linear Algebra: A Modern Introduction. Thompson, 2006. ISBN 0-534-40596-7. Systems of Linear Equations Definition. An n-dimensional vector is a row or a column

More information

954 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 51, NO. 3, MARCH 2005

954 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 51, NO. 3, MARCH 2005 954 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 51, NO. 3, MARCH 2005 Using Linear Programming to Decode Binary Linear Codes Jon Feldman, Martin J. Wainwright, Member, IEEE, and David R. Karger, Associate

More information

Introduction to General and Generalized Linear Models

Introduction to General and Generalized Linear Models Introduction to General and Generalized Linear Models General Linear Models - part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby

More information

8.1 Min Degree Spanning Tree

8.1 Min Degree Spanning Tree CS880: Approximations Algorithms Scribe: Siddharth Barman Lecturer: Shuchi Chawla Topic: Min Degree Spanning Tree Date: 02/15/07 In this lecture we give a local search based algorithm for the Min Degree

More information

Lecture 7: NP-Complete Problems

Lecture 7: NP-Complete Problems IAS/PCMI Summer Session 2000 Clay Mathematics Undergraduate Program Basic Course on Computational Complexity Lecture 7: NP-Complete Problems David Mix Barrington and Alexis Maciel July 25, 2000 1. Circuit

More information

1 Maximum likelihood estimation

1 Maximum likelihood estimation COS 424: Interacting with Data Lecturer: David Blei Lecture #4 Scribes: Wei Ho, Michael Ye February 14, 2008 1 Maximum likelihood estimation 1.1 MLE of a Bernoulli random variable (coin flips) Given N

More information

Exponential Random Graph Models for Social Network Analysis. Danny Wyatt 590AI March 6, 2009

Exponential Random Graph Models for Social Network Analysis. Danny Wyatt 590AI March 6, 2009 Exponential Random Graph Models for Social Network Analysis Danny Wyatt 590AI March 6, 2009 Traditional Social Network Analysis Covered by Eytan Traditional SNA uses descriptive statistics Path lengths

More information

Lecture 15 An Arithmetic Circuit Lowerbound and Flows in Graphs

Lecture 15 An Arithmetic Circuit Lowerbound and Flows in Graphs CSE599s: Extremal Combinatorics November 21, 2011 Lecture 15 An Arithmetic Circuit Lowerbound and Flows in Graphs Lecturer: Anup Rao 1 An Arithmetic Circuit Lower Bound An arithmetic circuit is just like

More information

CSCI567 Machine Learning (Fall 2014)

CSCI567 Machine Learning (Fall 2014) CSCI567 Machine Learning (Fall 2014) Drs. Sha & Liu {feisha,yanliu.cs}@usc.edu September 22, 2014 Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 22, 2014 1 /

More information

Scheduling Home Health Care with Separating Benders Cuts in Decision Diagrams

Scheduling Home Health Care with Separating Benders Cuts in Decision Diagrams Scheduling Home Health Care with Separating Benders Cuts in Decision Diagrams André Ciré University of Toronto John Hooker Carnegie Mellon University INFORMS 2014 Home Health Care Home health care delivery

More information

CHAPTER 2 Estimating Probabilities

CHAPTER 2 Estimating Probabilities CHAPTER 2 Estimating Probabilities Machine Learning Copyright c 2016. Tom M. Mitchell. All rights reserved. *DRAFT OF January 24, 2016* *PLEASE DO NOT DISTRIBUTE WITHOUT AUTHOR S PERMISSION* This is a

More information

2.1 Complexity Classes

2.1 Complexity Classes 15-859(M): Randomized Algorithms Lecturer: Shuchi Chawla Topic: Complexity classes, Identity checking Date: September 15, 2004 Scribe: Andrew Gilpin 2.1 Complexity Classes In this lecture we will look

More information

Adaptive Linear Programming Decoding

Adaptive Linear Programming Decoding Adaptive Linear Programming Decoding Mohammad H. Taghavi and Paul H. Siegel ECE Department, University of California, San Diego Email: (mtaghavi, psiegel)@ucsd.edu ISIT 2006, Seattle, USA, July 9 14, 2006

More information

Lecture 11: 0-1 Quadratic Program and Lower Bounds

Lecture 11: 0-1 Quadratic Program and Lower Bounds Lecture : - Quadratic Program and Lower Bounds (3 units) Outline Problem formulations Reformulation: Linearization & continuous relaxation Branch & Bound Method framework Simple bounds, LP bound and semidefinite

More information

Load Balancing and Switch Scheduling

Load Balancing and Switch Scheduling EE384Y Project Final Report Load Balancing and Switch Scheduling Xiangheng Liu Department of Electrical Engineering Stanford University, Stanford CA 94305 Email: liuxh@systems.stanford.edu Abstract Load

More information

LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING. ----Changsheng Liu 10-30-2014

LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING. ----Changsheng Liu 10-30-2014 LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING ----Changsheng Liu 10-30-2014 Agenda Semi Supervised Learning Topics in Semi Supervised Learning Label Propagation Local and global consistency Graph

More information

Statistical Machine Learning from Data

Statistical Machine Learning from Data Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Gaussian Mixture Models Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique

More information

Best Monotone Degree Bounds for Various Graph Parameters

Best Monotone Degree Bounds for Various Graph Parameters Best Monotone Degree Bounds for Various Graph Parameters D. Bauer Department of Mathematical Sciences Stevens Institute of Technology Hoboken, NJ 07030 S. L. Hakimi Department of Electrical and Computer

More information