Introduction to Graphical Models

Size: px
Start display at page:

Download "Introduction to Graphical Models"

Transcription

1 obert Collins CSE586 Credits: Several slides are from: Introduction to Graphical odels eadings in Prince textbook: Chapters 0 and but mainly only on directed graphs at this time eview: Probability Theory Sum rule (marginal distributions) Product rule From these we have Bayes theorem with normalization factor eview: Conditional Probabilty Conditional Probability (rewriting product rule) A B) P (A, B) / B) Chain ule A,B,C,D) A) A,B) A,B,C) A,B,C,D) A) A,B) A,B,C) Conditional Independence A) B A) C A,B) D A,B,C) A, B C) A C ) B C) statistical independence A, B) A) B) Christopher Bishop, S Overview of Graphical odels Graphical odels model conditional dependence/ independence Graph structure specifies how joint probability factors Directed graphs Example:H The Joint Distribution ecipe for making a joint distribution of variables: Example: Boolean variables A, B, C Undirected graphs Example:F Inference by : belief propagation Sum-product algorithm ax-product (in-sum if using logs) We will focus mainly on directed graphs right now. Andrew oore, CU

2 The Joint Distribution Example: Boolean variables A, B, C The Joint Distribution Example: Boolean variables A, B, C ecipe for making a joint distribution of variables: A B C ecipe for making a joint distribution of variables: A B C Prob ake a truth table listing all combinations of values of your variables (if there are Boolean variables then the table will have 2 rows) ake a truth table listing all combinations of values of your variables (if there are Boolean variables then the table will have 2 rows). 2. For each combination of values, say how probable it is Andrew oore, CU Andrew oore, CU The Joint Distribution Example: Boolean variables A, B, C Joint distributions ecipe for making a joint distribution of variables:. ake a truth table listing all combinations of values of your variables (if there are Boolean variables then the table will have 2 rows). 2. For each combination of values, say how probable it is. 3. If you subscribe to the axioms of probability, those numbers must sum to. A B C Prob A truth table C Good news Once you have a joint distribution, you can answer all sorts of probabilistic questions involving combinations of attributes 0.30 B 0.0 Andrew oore, CU Using the Joint Using the Joint Poor ale) E) row) rows matching E Poor) E) row) rows matching E Andrew oore, CU Andrew oore, CU 2

3 Inference with the Joint Inference with the Joint computing conditional probabilities E E2) E E2) E ) 2 rows matching E and E2 rows matching E2 row) row) E E2) E E2) E ) 2 rows matching E and E2 rows matching E2 row) row) ale Poor) / Andrew oore, CU Andrew oore, CU Joint distributions Good news Once you have a joint distribution, you can answer all sorts of probabilistic questions involving combinations of attributes Bad news Impossible to create JD for more than about ten attributes because there are so many numbers needed when you build the thing. For 0 binary variables you need to specify numbers 023. (question for class: why the -?) How to use Fewer Numbers Factor the joint distribution into a product of distributions over subsets of variables Identify (or just assume) independence between some subsets of variables Use that independence to simplify some of the distributions Graphical models provide a principled way of doing this. Factoring Directed versus Undirected Graphs Consider an arbitrary joint distribution We can always factor it, by application of the chain rule what this factored form looks like as a graphical model Directed Graph Examples: Bayes nets Hs Undirected Graph Examples FS Note: The word graphical denotes the graph structure underlying the model, not the fact that you can draw a pretty picture of it (although that helps). Christopher Bishop, S Christopher Bishop, S 3

4 Graphical odel Concepts Graphical odel Concepts s)0.3 S )0.6 s)0.3 S )0.6 ^S)0.05 ^~S)0. ~^S)0. ~^~S)0.2 T T )0.3 T ~)0.8 )0.3 ~)0.6 ^S)0.05 ^~S)0. ~^S)0. ~^~S)0.2 T T )0.3 T ~)0.8 )0.3 ~)0.6 Nodes represent random variables. Edges (or lack of edges) represent conditional dependence (or independence). Each node is annotated with a table of conditional probabilities wrt parents. Note: The word graphical denotes the graph structure underlying the model, not the fact that you can draw a pretty picture of it using graphics. Directed Acyclic Graphs Directed acyclic means we can t follow arrows around in a cycle. Examples: chains; trees Also, things that look like this: Factoring Examples Joint distribution where denotes the parents of i We can read the factored form of the joint distribution immediately from a directed graph where denotes the parents of i x parents of x) y parents of y) Factoring Examples Joint distribution Factoring Examples We can read the form of the joint distribution directly from the directed graph where denotes the parents of i where denotes the parents of i p(x,y)p(x)p(y x) p(x,y)p(y)p(x y) p(x,y)p(x)p(y) parents of ) parents of ) parents of ) 4

5 Factoring Examples We can read the form of the joint distribution directly from the directed graph Factoring Examples We can read the form of the joint distribution directly from a directed graph where denotes the parents of i where denotes the parents of i,,) ) ) ) parents of ) parents of ) parents of ) Factoring Examples Graphical odel Concepts We can read the form of the joint distribution directly from a directed graph s)0.3 S )0.6 where denotes the parents of i ^S)0.05 ^~S)0. ~^S)0. ~^~S)0.2 T T )0.3 T ~)0.8 )0.3 ~)0.6 Note:,,),))) How about this one?,,,s,t) Graphical odel Concepts Factoring Examples s)0.3 S )0.6 Joint distribution ^S)0.05 ^~S)0. ~^S)0. ~^~S)0.2 T T )0.3 T ~)0.8 )0.3 ~)0.6 where denotes the parents of i What about this one? How about this one?,,,s,t) S)) S,) )T ) No directed cycles Note to mention T )+T ~) Christopher Bishop, S 5

6 Factoring Examples How many probabilities do we have to specify/learn (assuming each x i is a binary variable)? if fully connected, we would need 2^7-27 but, for this connectivity, we need Important Case: Time Series Consider modeling a time series of sequential data x, x2,..., xn These could represent locations of a tracked object over time observations of the weather each day spectral coefficients of a speech signal joint angles during human motion Note: If all nodes were independent, we would only need 7! odeling Time Series odeling Time Series Simplest model of a time series is that all observations are independent. In the most general case, we could use chain rule to state that any node is dependent on all previous nodes... This would be appropriate for modeling successive tosses {heads,tails} of an unbiased coin. However, it doesn t really treat the series as a sequence. That is, we could permute the ordering of the observations and not change a thing. x,x2,x3,x4,...) x)x2 x)x3 x,x2)x4 x,x2,x3)... ook for an intermediate model between these two extremes. odeling Time Series odeling Time Series arkov assumption: xn x,x2,...,xn-) xn xn-) that is, assume all conditional distributions depend only on the most recent previous observation. Generalization: State-Space odels You have a arkov chain of latent (unobserved) states Each state generates an observation x! x 2! x n-! x n! x n+! The result is a first-order arkov Chain y! y 2! y n-! y n! y n+! x,x2,x3,x4,...) x)x2 x)x3 x2)x4 x3)... Goal: Given a sequence of observations, predict the sequence of unobserved states that imizes the joint probability. 6

7 odeling Time Series Examples of State Space models Hidden arkov model Kalman filter x! x 2! x n-! x n! x n+! odeling Time Series x,x2,x3,x4,...,y,y2,y3,y4,...) x)y x)x2 x)y2 x2)x3 x2)y3 x3)x4 x3)y4 x4)... x! x 2! x n-! x n! x n+! y! y 2! y n-! y n! y n+! y! y 2! y n-! y n! y n+! Example of a Tree-structured odel essage Passing Confusion alert: Our textbook uses w to denote a world state variable and x to denote a measurement. (we have been using x to denote world state and y as the measurement). essage Passing : Belief Propagation Example: D chain Find marginal for a particular node Key Idea of essage Passing multiplication distributes over addition a * b + a * c a * (b + c) as a consequence: for -state nodes, cost is exponential in length of chain but, we can exploit the graphical structure (conditional independences) is number of discrete values a variable can take is number of variables Applicable to both directed and undirected graphs. 7

8 Example essage Passing In the next several slides, we will consider an example of a simple, four-variable arkov chain. 48 multiplications + 23 additions 5 multiplications + 6 additions For, this principle is applied to functions of random variables, rather than the variables as done here. essage Passing Now consider computing the marginal distribution of variable x3 essage Passing ultiplication distributes over addition... essage Passing, aka Forward-Backward Algorithm Can view as sending/combining messages... Forward-Backward Algorithm Express marginals as product of messages evaluated forward from ancesters of xi and backwards from decendents of xi α β Forw * Back ecursive evaluation of messages - + Find Z by normalizing Works in both directed and undirected graphs Christopher Bishop, S 8

9 Confusion Alert! This standard notation for defining heavily overloads the notion of multiplication, e.g. the messages are not scalars it is more appropriate to think of them as vectors, matrices, or even tensors depending on how many variables are involved, with multiplication defined accordingly. [] Note: these are conditional probability tables, so values in each row must sum to one Not scalar multiplication! [] Note: these are conditional probability tables, so values in each row must sum to one [] Note: these are conditional probability tables, so values in each row must sum to one Interpretation x) x2) x2) x22) x22 X) 0.6 Sample computations: x, x2, x3, x4) (.7)(.4)(.8)().2 x2, x2, x32, x4) (.3)()(.2)(.7).02 Joint Probability, represented in a truth table x x2 x3 x4 x,x2,x3,x4) Joint Probability x x2 x3 x4 x,x2,x3,x4) Compute marginal of x3: x3) x32) 042 9

10 now compute via now compute via [] [] x)x2 x) x)x22 x).3.3 x2)x2 x2) x2)x22 x2) (.7)(.4) + (.3)() (.7)(.6) + (.3)() x)x2 x) x)x22 x).3.3 x2)x2 x2) x2)x22 x2) (.7)(.4) + (.3)() (.7)(.6) + (.3)() simpler way to compute this... [] [.43 7] i.e. matrix multiply can do the combining and marginalization all at once!!!! now compute via now compute via [] [] [] [.43 7] [ ] [] compute sum along rows of x4 x3) Can also do that with matrix multiply: [.43 7] note: this is not [ ] a coincidence essage Passing Can view as sending/combining messages... essage Passing Can view as sending/combining messages... Forw [ ] Belief that x3, from front part of chain Belief that x32, from front part of chain * How to combine them? Back Belief that x32, from back part of chain Belief that x3, from back part of chain Forw [ ] * Back [ (.458)() (42)() ] [ ] [ ] (after normalizing, but note that it was already normalized. Again, not a coincidence) These are the same values for the marginal x3) that we computed from the raw joint probability table. Whew!!! 0

11 [] [] If we want to compute all marginals, we can do it in one shot by cascading, for a big computational savings. We need one cascaded forward pass, one separate cascaded backward pass, then a combination and normalization at each node. forward pass [] backward pass Then combine each by elementwise multiply and normalize forward pass [] backward pass Then combine each by elementwise multiply and normalize Forward: [] [.43 7] [ ] [ ] Backward: [ ] [ ] [ ] [ ] combined+ normalized [] [.43 7] [ ] [ ] Note: In this example, a directed arkov chain using true conditional probabilities (rows sum to one), only the forward pass is needed. This is true because the backward pass sums along rows, and always produces [ ]. ax arginals What if we want to know the most probably state (mode of the distribution)? Since the marginal distributions can tell us which value of each variable yields the highest marginal probability (that is, which value is most likely), we might try to just take the arg of each marginal distribution. We didn t really need forward AND backward in this example. Forward: [] [.43 7] [ ] [ ] Backward: [ ] [ ] [ ] [ ] combined+ normalized [] [.43 7] [ ] [ ] these are already the marginals for this example Didn t need to do these steps marginals computed by belief propagation marginals [] [.43 7] [ ] [ ] arg x x 2 2 x 3 2 x 4 Although that s correct in this example, it isn t always the case

12 ax arginals can Fail to Find the AP However, the marginals find most likely values of each variable treated individually, which may not be the combination of values that jointly imize the distribution. marginals: w4, w22 actual AP solution: w2, w24 ax-product Algorithm Goal: find define the marginal then essage passing algorithm with sum replaced by Generalizes to any two operations forming a semiring Christopher Bishop, S Computing AP Value product Can solve using algorithm with sum replaced by. [] In our chain, we start at the end and work our way back to the root (x) using the -product algorithm, keeping track of the value as we go. marginal stage.7.4 x)x2 x).3 x2)x2 x2).7.6 x)x22 x).3 x2)x22 x2) stage3 stage2 [(.7)(.4), (.3)()] [(.7)(.6), (.3)()] product product [] [] [.28.42] [ ] (.28)(.8) (.28)(.2) (.42)(.2) (.42)(.8) Note that this is no longer matrix multiplication, since we are not summing down the columns but taking instead... [.28.42] [ ] (.28)(.8) (.28)(.2) (.42)(.2) (.42)(.8) compute along rows of x4 x3).7 2

13 [] product Joint Probability, represented in a truth table x x2 x3 x4 x,x2,x3,x4) argest value of joint prob mode AP interpretation: the mode of the joint distribution is.2352, and the value of variable x3, in the configuration that yields the mode value, is 2. [ ] [(.224)(), (.336)(.7)] arg 2 x3 interpretation: the mode of the joint distribution is.2352, and the value of variable x3, in the configuration that yields the mode value, is [(.224)(), (.336)(.7)].2352 arg 2 x3 Computing Arg-ax of AP Value : AP Estimate Chris Bishop, P: At this point, we might be tempted simply to continue with the algorithm [sending forward-backward messages and combining to compute arg for each variable node]. However, because we are now imizing rather than summing, it is possible that there may be multiple configurations of x all of which give rise to the imum value for p(x). In such cases, this strategy can fail because it is possible for the individual variable values obtained by imizing the product of messages at each node to belong to different imizing configurations, giving an overall configuration that no longer corresponds to a imum. The problem can be resolved by adopting a rather different kind of... Essentially, the solution is to write a dynamic programming algorithm based on -product. DP State Space Trellis : AP Estimate Joint Probability, represented in a truth table x x2 x3 x4 x,x2,x3,x4) DP State Space Trellis argest value of joint prob mode AP achieved for x, x22, x32, x4 3

14 Belief Propagation Summary Definition can be extended to general tree-structured graphs Works for both directed AND undirected graphs Efficiently computes marginals and AP configurations At each node: form product of incoming messages and local evidence marginalize to give outgoing message one message in each direction across every link oopy Belief Propagation BP applied to graph that contains loops needs a propagation schedule needs multiple iterations might not converge Typically works well, even though it isn t supposed to State-of-the-art performance in error-correcting codes Gives exact answer in any acyclic graph (no loops). Christopher Bishop, S Christopher Bishop, S 4

COS 424: Interacting with Data. Lecturer: Dave Blei Lecture #10 Scribe: Sunghwan Ihm March 8, 2007

COS 424: Interacting with Data. Lecturer: Dave Blei Lecture #10 Scribe: Sunghwan Ihm March 8, 2007 COS 424: Interacting with Data Lecturer: Dave Blei Lecture #10 Scribe: Sunghwan Ihm March 8, 2007 Homework #3 will be released soon. Some questions and answers given by comments: Computational efficiency

More information

Hidden Markov Models. and. Sequential Data

Hidden Markov Models. and. Sequential Data Hidden Markov Models and Sequential Data Sequential Data Often arise through measurement of time series Snowfall measurements on successive days in Buffalo Rainfall measurements in Chirrapunji Daily values

More information

The Basics of Graphical Models

The Basics of Graphical Models The Basics of Graphical Models David M. Blei Columbia University October 3, 2015 Introduction These notes follow Chapter 2 of An Introduction to Probabilistic Graphical Models by Michael Jordan. Many figures

More information

Markov chains and Markov Random Fields (MRFs)

Markov chains and Markov Random Fields (MRFs) Markov chains and Markov Random Fields (MRFs) 1 Why Markov Models We discuss Markov models now. This is the simplest statistical model in which we don t assume that all variables are independent; we assume

More information

Machine Learning

Machine Learning Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University February 8, 2011 Today: Graphical models Bayes Nets: Representing distributions Conditional independencies

More information

6.975 Week 11 Summary: Expectation Propagation

6.975 Week 11 Summary: Expectation Propagation 6.975 Week 11 Summary: Expectation Propagation Erik Sudderth November 20, 2002 1 Introduction Expectation Propagation (EP) [2, 3] addresses the problem of constructing tractable approximations to complex

More information

Lecture 7: graphical models and belief propagation

Lecture 7: graphical models and belief propagation Chapter 7 Lecture 7: graphical models and belief propagation Sept. 26, 203 MIT EECS course 6.869, Bill Freeman and Antonio Torralba 7. Graphical models Recall the Bayesian inference task: if our observations

More information

Course: Model, Learning, and Inference: Lecture 5

Course: Model, Learning, and Inference: Lecture 5 Course: Model, Learning, and Inference: Lecture 5 Alan Yuille Department of Statistics, UCLA Los Angeles, CA 90095 yuille@stat.ucla.edu Abstract Probability distributions on structured representation.

More information

Markov Chains, part I

Markov Chains, part I Markov Chains, part I December 8, 2010 1 Introduction A Markov Chain is a sequence of random variables X 0, X 1,, where each X i S, such that P(X i+1 = s i+1 X i = s i, X i 1 = s i 1,, X 0 = s 0 ) = P(X

More information

Probabilistic Graphical Models Final Exam

Probabilistic Graphical Models Final Exam Introduction to Probabilistic Graphical Models Final Exam, March 2007 1 Probabilistic Graphical Models Final Exam Please fill your name and I.D.: Name:... I.D.:... Duration: 3 hours. Guidelines: 1. The

More information

Bayes Nets. CS 188: Artificial Intelligence. Inference. Inference by Enumeration. Inference by Enumeration? Example: Enumeration.

Bayes Nets. CS 188: Artificial Intelligence. Inference. Inference by Enumeration. Inference by Enumeration? Example: Enumeration. CS 188: Artificial Intelligence Probabilistic Inference: Enumeration, Variable Elimination, Sampling Pieter Abbeel UC Berkeley Many slides over this course adapted from Dan Klein, Stuart Russell, Andrew

More information

Lecture 11: Graphical Models for Inference

Lecture 11: Graphical Models for Inference Lecture 11: Graphical Models for Inference So far we have seen two graphical models that are used for inference - the Bayesian network and the Join tree. These two both represent the same joint probability

More information

CS 188: Artificial Intelligence. Probability recap

CS 188: Artificial Intelligence. Probability recap CS 188: Artificial Intelligence Bayes Nets Representation and Independence Pieter Abbeel UC Berkeley Many slides over this course adapted from Dan Klein, Stuart Russell, Andrew Moore Conditional probability

More information

An Introduction to Probabilistic Graphical Models

An Introduction to Probabilistic Graphical Models An Introduction to Probabilistic Graphical Models Reading: Chapters 17 and 18 in Wasserman. EE 527, Detection and Estimation Theory, An Introduction to Probabilistic Graphical Models 1 Directed Graphs

More information

CS8803: Advanced Digital Design for Embedded Hardware

CS8803: Advanced Digital Design for Embedded Hardware CS8803: Advanced Digital Design for Embedded Hardware Lecture 3: Two-Level Logic Minimization and BDD Instructor: Sung Kyu Lim (limsk@ece.gatech.edu) Website: http://users.ece.gatech.edu/limsk/course/cs8803

More information

Graphical Models. CS 343: Artificial Intelligence Bayesian Networks. Raymond J. Mooney. Conditional Probability Tables.

Graphical Models. CS 343: Artificial Intelligence Bayesian Networks. Raymond J. Mooney. Conditional Probability Tables. Graphical Models CS 343: Artificial Intelligence Bayesian Networks Raymond J. Mooney University of Texas at Austin 1 If no assumption of independence is made, then an exponential number of parameters must

More information

Lecture 16: Hidden Markov Models Sanjeev Arora Elad Hazan. COS 402 Machine Learning and Artificial Intelligence Fall 2016

Lecture 16: Hidden Markov Models Sanjeev Arora Elad Hazan. COS 402 Machine Learning and Artificial Intelligence Fall 2016 Lecture 16: Hidden Markov Models Sanjeev Arora Elad Hazan COS 402 Machine Learning and Artificial Intelligence Fall 2016 Course progress Learning from examples Definition + fundamental theorem of statistical

More information

Machine Learning

Machine Learning Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University February 23, 2015 Today: Graphical models Bayes Nets: Representing distributions Conditional independencies

More information

Part II: Web Content Mining Chapter 3: Clustering

Part II: Web Content Mining Chapter 3: Clustering Part II: Web Content Mining Chapter 3: Clustering Learning by Example and Clustering Hierarchical Agglomerative Clustering K-Means Clustering Probability-Based Clustering Collaborative Filtering Slides

More information

Complexity of variable elimination Graphs with loops

Complexity of variable elimination Graphs with loops Readings: K&F: 8.1, 8.2, 8.3, 8.7.1 K&F: 9.1, 9.2, 9.3, 9.4 Variable Elimination 2 Clique Trees Graphical Models 10708 Carlos Guestrin Carnegie Mellon University October 13 th, 2006 1 Complexity of variable

More information

Probabilistic Graphical Models

Probabilistic Graphical Models School of Computer Science Probabilistic Graphical Models Kernel Graphical Models Eric Xing Lecture 23, April 9, 2014 Acknowledgement: slides first drafted by Ankur Parikh Eric Xing @ CMU, 2012-2014 1

More information

Hidden Markov Models Fundamentals

Hidden Markov Models Fundamentals Hidden Markov Models Fundamentals Daniel Ramage CS229 Section Notes December, 2007 Abstract How can we apply machine learning to data that is represented as a sequence of observations over time? For instance,

More information

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Graphical Models. Niels Landwehr

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Graphical Models. Niels Landwehr Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Graphical Models iels Landwehr Overview: Graphical Models Graphical models: tool for modelling domain with several random variables.

More information

Lecture 4 Hidden Markov Models

Lecture 4 Hidden Markov Models Lecture 4 Hidden Markov Models Sept 29, 2005 22 September 2005 Recap Can model distributions of feature vectors using mixtures of Gaussians. Maximum Likelihood estimation can be used to estimate the parameters

More information

CS181 Lecture 11 Multi-layer Feed-forward Neural Networks

CS181 Lecture 11 Multi-layer Feed-forward Neural Networks CS181 Lecture 11 Multi-layer Feed-forward Neural Networks 1 Limitations of Perceptrons 1 As we saw, perceptrons can only represent linearly separable hypotheses. Minsky & Papert s argument against perceptrons

More information

SEARCHING, ASYMPTOTIC COMPLEXITY

SEARCHING, ASYMPTOTIC COMPLEXITY SEARCHING, SORTING, AND ASYMPTOTIC COMPLEXITY Lecture 12 CS2110 Fall 2009 What Makes a Good Algorithm? 2 Suppose you have two possible algorithms or data structures that basically do the same thing; which

More information

Logic, Probability and Learning

Logic, Probability and Learning Logic, Probability and Learning Luc De Raedt luc.deraedt@cs.kuleuven.be Overview Logic Learning Probabilistic Learning Probabilistic Logic Learning Closely following : Russell and Norvig, AI: a modern

More information

1 of 1 7/9/009 8:3 PM Virtual Laboratories > 9. Hy pothesis Testing > 1 3 4 5 6 7 6. Chi-Square Tests In this section, we will study a number of important hypothesis tests that fall under the general term

More information

CSE 190, Great ideas in algorithms: Matrix multiplication

CSE 190, Great ideas in algorithms: Matrix multiplication CSE 190, Great ideas in algorithms: Matrix multiplication 1 Matrix multiplication Given two n n matrices A, B, compute their product C = AB using as few additions and multiplications as possible. That

More information

13.3 Inference Using Full Joint Distribution

13.3 Inference Using Full Joint Distribution 191 The probability distribution on a single variable must sum to 1 It is also true that any joint probability distribution on any set of variables must sum to 1 Recall that any proposition a is equivalent

More information

ECE 3060 VLSI and Advanced Digital Design

ECE 3060 VLSI and Advanced Digital Design ECE 3060 VLSI and Advanced Digital Design Lecture 11 Binary Decision Diagrams Outline Binary Decision Diagrams (BDDs) Ordered (OBDDs) and Reduced Ordered (ROBDDs) Tautology check Containment check ECE

More information

Markov chains (cont d)

Markov chains (cont d) 6.867 Machine learning, lecture 19 (Jaaola) 1 Lecture topics: Marov chains (cont d) Hidden Marov Models Marov chains (cont d) In the contet of spectral clustering (last lecture) we discussed a random wal

More information

Machine Learning

Machine Learning Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University February 22, 2011 Today: Clustering Mixture model clustering Learning Bayes Net structure Chow-Liu for trees

More information

A Primer on Matrices

A Primer on Matrices A Primer on Matrices Stephen Boyd September 7, 202 (These notes date from the 990s, I guess) These notes describe the notation of matrices, the mechanics of matrix manipulation, and how to use matrices

More information

COS513 LECTURE 2 CONDITIONAL INDEPENDENCE AND FACTORIZATION

COS513 LECTURE 2 CONDITIONAL INDEPENDENCE AND FACTORIZATION COS513 LECTURE 2 CONDITIONAL INDEPENDENCE AND FACTORIATION HAIPENG HENG, JIEQI U Let { 1, 2,, n } be a set of random variables. Now we have a series of questions about them: What are the conditional probabilities

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models David Sontag New York University Lecture 1, January 26, 2012 David Sontag (NYU) Graphical Models Lecture 1, January 26, 2012 1 / 37 One of the most exciting advances in machine

More information

Course Notes for CS336: Graph Theory

Course Notes for CS336: Graph Theory Course Notes for CS336: Graph Theory Jayadev Misra The University of Texas at Austin 5/11/01 Contents 1 Introduction 1 1.1 Basics................................. 2 1.2 Elementary theorems.........................

More information

SECTION 1.1: SYSTEMS OF LINEAR EQUATIONS

SECTION 1.1: SYSTEMS OF LINEAR EQUATIONS SECTION.: SYSTEMS OF LINEAR EQUATIONS THE BASICS What is a linear equation? What is a system of linear equations? What is a solution of a system? What is a solution set? When are two systems equivalent?

More information

CSE100. Advanced Data Structures. Lecture 6. (Based on Paul Kube course materials)

CSE100. Advanced Data Structures. Lecture 6. (Based on Paul Kube course materials) CSE100 Advanced Data Structures Lecture 6 (Based on Paul Kube course materials) Lecture 6 Binary search tree average cost analysis The importance of being balanced Reading: Weiss Ch 4, sections 1-4 Best,

More information

An Introduction to Graphical Models

An Introduction to Graphical Models An Introduction to Graphical Models Loïc Schwaller Octobre, 0 Abstract Graphical models are both a natural and powerful way to depict intricate dependency structures in multivariate random variables. They

More information

Hidden Markov Models Baum Welch Algorithm

Hidden Markov Models Baum Welch Algorithm Hidden Markov Models Baum Welch Algorithm Introduction to Natural Language Processing CS 585 Andrew McCallum March 9, 2004 Administration If you give me your quiz #2, I will give you feedback. I m now

More information

Sampling Methods, Particle Filtering, and Markov-Chain Monte Carlo

Sampling Methods, Particle Filtering, and Markov-Chain Monte Carlo Sampling Methods, Particle Filtering, and Markov-Chain Monte Carlo Vision-Based Tracking Fall 2012, CSE Dept, Penn State Univ References Recall: Bayesian Filtering Rigorous general framework for tracking.

More information

Artificial Intelligence Mar 27, Bayesian Networks 1 P (T D)P (D) + P (T D)P ( D) =

Artificial Intelligence Mar 27, Bayesian Networks 1 P (T D)P (D) + P (T D)P ( D) = Artificial Intelligence 15-381 Mar 27, 2007 Bayesian Networks 1 Recap of last lecture Probability: precise representation of uncertainty Probability theory: optimal updating of knowledge based on new information

More information

Review of Probability Theory

Review of Probability Theory Review of Probability Theory Arian Maleki and Tom Do Stanford University Probability theory is the study of uncertainty Through this class, we will be relying on concepts from probability theory for deriving

More information

Problem Set Fall 2012 Due: Friday Nov 30th, by 4pm Please write both your Andrew ID and name on the assignment.

Problem Set Fall 2012 Due: Friday Nov 30th, by 4pm Please write both your Andrew ID and name on the assignment. Problem Set 5 10-601 Fall 2012 Due: Friday Nov 30th, by 4pm Please write both your Andrew ID and name on the assignment. TAs: Selen and Brendan 1 Bayes Net questions [Brendan] 1.a Legal Bayes Nets [10

More information

SYSM 6304: Risk and Decision Analysis Lecture 5: Methods of Risk Analysis

SYSM 6304: Risk and Decision Analysis Lecture 5: Methods of Risk Analysis SYSM 6304: Risk and Decision Analysis Lecture 5: Methods of Risk Analysis M. Vidyasagar Cecil & Ida Green Chair The University of Texas at Dallas Email: M.Vidyasagar@utdallas.edu October 17, 2015 Outline

More information

Clustering / Unsupervised Methods

Clustering / Unsupervised Methods Clustering / Unsupervised Methods Jason Corso, Albert Chen SUNY at Buffalo J. Corso (SUNY at Buffalo) Clustering / Unsupervised Methods 1 / 41 Clustering Introduction Until now, we ve assumed our training

More information

Bayesian networks. Bayesian Networks. Example. Example. Chapter 14 Section 1, 2, 4

Bayesian networks. Bayesian Networks. Example. Example. Chapter 14 Section 1, 2, 4 Bayesian networks Bayesian Networks Chapter 14 Section 1, 2, 4 A simple, graphical notation for conditional independence assertions and hence for compact specification of full joint distributions Syntax:

More information

Computer Vision Group Prof. Daniel Cremers. 14. Sampling Methods

Computer Vision Group Prof. Daniel Cremers. 14. Sampling Methods Prof. Daniel Cremers 14. Sampling Methods Sampling Methods Sampling Methods are widely used in Computer Science as an approximation of a deterministic algorithm to represent uncertainty without a parametric

More information

Graphical Models and Message passing

Graphical Models and Message passing Graphical Models and Message passing Yunshu Liu ASPITRG Research Group 2013-07-16 References: [1]. Steffen Lauritzen, Graphical Models, Oxford University Press, 1996 [2]. Michael I. Jordan, Graphical models,

More information

Antisymmetric Relations. Definition A relation R on A is said to be antisymmetric

Antisymmetric Relations. Definition A relation R on A is said to be antisymmetric Antisymmetric Relations Definition A relation R on A is said to be antisymmetric if ( a, b A)(a R b b R a a = b). The picture for this is: Except For Example The relation on R: if a b and b a then a =

More information

Matrix Inverses. Since the linear system. can be written as. where. ,, and,

Matrix Inverses. Since the linear system. can be written as. where. ,, and, Matrix Inverses Consider the ordinary algebraic equation and its solution shown below: Since the linear system can be written as where,, and, (A = coefficient matrix, x = variable vector, b = constant

More information

Andrew Blake and Pushmeet Kohli

Andrew Blake and Pushmeet Kohli 1 Introduction to Markov Random Fields Andrew Blake and Pushmeet Kohli This book sets out to demonstrate the power of the Markov random field (MRF) in vision. It treats the MRF both as a tool for modeling

More information

Introduction to Discrete Probability Theory and Bayesian Networks

Introduction to Discrete Probability Theory and Bayesian Networks Introduction to Discrete Probability Theory and Bayesian Networks Dr Michael Ashcroft September 15, 2011 This document remains the property of Inatas. Reproduction in whole or in part without the written

More information

Systems of Linear Equations Introduction

Systems of Linear Equations Introduction Systems of Linear Equations Introduction Linear Equation a x = b Solution: Case a 0, then x = b (one solution) a Case 2 a = 0, b 0, then x (no solutions) Case 3 a = 0, b = 0, then x R (infinitely many

More information

These axioms must hold for all vectors ū, v, and w in V and all scalars c and d.

These axioms must hold for all vectors ū, v, and w in V and all scalars c and d. DEFINITION: A vector space is a nonempty set V of objects, called vectors, on which are defined two operations, called addition and multiplication by scalars (real numbers), subject to the following axioms

More information

Linear Algebra II. Notes 1 September a(b + c) = ab + ac

Linear Algebra II. Notes 1 September a(b + c) = ab + ac MTH6140 Linear Algebra II Notes 1 September 2010 1 Vector spaces This course is about linear maps and bilinear forms on vector spaces, how we represent them by matrices, how we manipulate them, and what

More information

3. The Junction Tree Algorithms

3. The Junction Tree Algorithms A Short Course on Graphical Models 3. The Junction Tree Algorithms Mark Paskin mark@paskin.org 1 Review: conditional independence Two random variables X and Y are independent (written X Y ) iff p X ( )

More information

Approximation Algorithms

Approximation Algorithms Approximation Algorithms or: How I Learned to Stop Worrying and Deal with NP-Completeness Ong Jit Sheng, Jonathan (A0073924B) March, 2012 Overview Key Results (I) General techniques: Greedy algorithms

More information

Analysis of Algorithms I: Randomized Algorithms

Analysis of Algorithms I: Randomized Algorithms Analysis of Algorithms I: Randomized Algorithms Xi Chen Columbia University Randomized algorithms: Algorithms that make random choices / decisions by, e.g., fliping coins or sampling random numbers. 1

More information

CS155: Probability and Computing: Randomized Algorithms and Probabilistic Analysis

CS155: Probability and Computing: Randomized Algorithms and Probabilistic Analysis CS155: Probability and Computing: Randomized Algorithms and Probabilistic Analysis Eli Upfal Eli Upfal@brown.edu Office: 319 TA s: Ahmad Mahmoody and Justin Sung-ho Oh cs155tas@cs.brown.edu It is remarkable

More information

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume

More information

Notes on Matrix Multiplication and the Transitive Closure

Notes on Matrix Multiplication and the Transitive Closure ICS 6D Due: Wednesday, February 25, 2015 Instructor: Sandy Irani Notes on Matrix Multiplication and the Transitive Closure An n m matrix over a set S is an array of elements from S with n rows and m columns.

More information

Approximate Inference

Approximate Inference Approximate Inference IPAM Summer School Ruslan Salakhutdinov BCS, MIT Deprtment of Statistics, University of Toronto 1 Plan 1. Introduction/Notation. 2. Illustrative Examples. 3. Laplace Approximation.

More information

S = {1, 2,..., n}. P (1, 1) P (1, 2)... P (1, n) P (2, 1) P (2, 2)... P (2, n) P = . P (n, 1) P (n, 2)... P (n, n)

S = {1, 2,..., n}. P (1, 1) P (1, 2)... P (1, n) P (2, 1) P (2, 2)... P (2, n) P = . P (n, 1) P (n, 2)... P (n, n) last revised: 26 January 2009 1 Markov Chains A Markov chain process is a simple type of stochastic process with many social science applications. We ll start with an abstract description before moving

More information

Bayesian network inference

Bayesian network inference Bayesian network inference Given: Query variables: X Evidence observed variables: E = e Unobserved variables: Y Goal: calculate some useful information about the query variables osterior Xe MA estimate

More information

A Tutorial on Dynamic Bayesian Networks

A Tutorial on Dynamic Bayesian Networks A Tutorial on Dynamic Bayesian Networks Kevin P. Murphy MIT AI lab 12 November 2002 Modelling sequential data Sequential data is everywhere, e.g., Sequence data (offline): Biosequence analysis, text processing,...

More information

COS 513: SEQUENCE MODELS I LECTURE ON NOV 22, 2010 PREM GOPALAN

COS 513: SEQUENCE MODELS I LECTURE ON NOV 22, 2010 PREM GOPALAN COS 513: SEQUENCE MODELS I LECTURE ON NOV 22, 2010 PREM GOPALAN 1. INTRODUCTION In this lecture we consider how to model sequential data. Rather than assuming that the data are all independent of each

More information

On Bethe Approximation Devavrat Shah Massachusetts Institute of Technology

On Bethe Approximation Devavrat Shah Massachusetts Institute of Technology On Bethe Approximation Devavrat Shah Massachusetts Institute of Technology V. Chandrasekaran M. Chertkov D. Gamarnik J. Shin Los Alamos Nat l Lab Massachusetts Institute of Technology This talk is about

More information

Lecture 10: Sequential Data Models

Lecture 10: Sequential Data Models CSC2515 Fall 2007 Introduction to Machine Learning Lecture 10: Sequential Data Models 1 Example: sequential data Until now, considered data to be i.i.d. Turn attention to sequential data Time-series: stock

More information

The Kalman Filter. x k+1 F k x k.

The Kalman Filter. x k+1 F k x k. The Kalman Filter The RLS algorithm for updating the least squares estimate given a series of observations vectors looked like a filter : new data comes in, and we use it (along with collected knowledge

More information

Basics Inversion and related concepts Random vectors Matrix calculus. Matrix algebra. Patrick Breheny. January 20

Basics Inversion and related concepts Random vectors Matrix calculus. Matrix algebra. Patrick Breheny. January 20 Matrix algebra January 20 Introduction Basics The mathematics of multiple regression revolves around ordering and keeping track of large arrays of numbers and solving systems of equations The mathematical

More information

Mixture Models and the EM Algorithm

Mixture Models and the EM Algorithm Mixture Models and the EM Algorithm Christopher M. Bishop Microsoft Research, Cambridge 1 1 0.5 0 0 0.5 1 (a) 2006 Advanced Tutorial Lecture Series, CUED 0.5 0 0 0.5 1 (b) Applications of Machine Learning

More information

Greedy Algorithms and Matroids. Andreas Klappenecker

Greedy Algorithms and Matroids. Andreas Klappenecker Greedy Algorithms and Matroids Andreas Klappenecker Giving Change Coin Changing Suppose we have n types of coins with values v[1] > v[2] > > v[n] > 0 Given an amount C, a positive integer, the following

More information

EE207: Digital Systems I, Semester I 2003/2004. CHAPTER 2-i: Combinational Logic Circuits (Sections )

EE207: Digital Systems I, Semester I 2003/2004. CHAPTER 2-i: Combinational Logic Circuits (Sections ) EE207: Digital Systems I, Semester I 2003/2004 CHAPTER 2-i: Combinational Logic Circuits (Sections 2. 2.5) Overview Binary logic and Gates Boolean Algebra Basic Properties Algebraic Manipulation Standard

More information

Mathematics MATH 236, Winter 2007 Linear Algebra Fields and vector spaces/ definitions and examples

Mathematics MATH 236, Winter 2007 Linear Algebra Fields and vector spaces/ definitions and examples Mathematics MATH 236, Winter 2007 Linear Algebra Fields and vector spaces/ definitions and examples Most of linear algebra takes place in structures called vector spaces. It takes place over structures

More information

3.4 Solving Matrix Equations with Inverses

3.4 Solving Matrix Equations with Inverses 3.4 Solving Matrix Equations with Inverses Question : How do you write a system of equations as a matrix equation? Question 2: How do you solve a matrix equation using the matrix inverse? Multiplicative

More information

Automata and Languages Computability Theory Complexity Theory

Automata and Languages Computability Theory Complexity Theory Theory of Computation Chapter 0: Introduction 1 What is this course about? This course is about the fundamental capabilities and limitations of computers/computation This course covers 3 areas, which make

More information

Bayesian Networks Representation. Machine Learning CSE446 Carlos Guestrin University of Washington. May 29, Carlos Guestrin

Bayesian Networks Representation. Machine Learning CSE446 Carlos Guestrin University of Washington. May 29, Carlos Guestrin Bayesian Networks Representation Machine Learning CSE446 Carlos Guestrin University of Washington May 29, 2013 1 Causal structure n Suppose we know the following: The flu causes sinus inflammation Allergies

More information

Markov chains. 3. This stochastic diffusion also provides a useful model of the spread of information throughout an image.

Markov chains. 3. This stochastic diffusion also provides a useful model of the spread of information throughout an image. Markov chains 1 Why Markov Models We discuss Markov models now. These have relevance in a few ways: 1. Markov models are a good way to model local, overlapping sets of information, which reflect cues to

More information

7 The Elimination Algorithm

7 The Elimination Algorithm Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms for Inference Fall 2014 7 The Elimination Algorithm Thus far, we have introduced directed,

More information

Chapter 3 Linear Algebra Supplement

Chapter 3 Linear Algebra Supplement Chapter 3 Linear Algebra Supplement Properties of the Dot Product Note: this and the following two sections should logically appear at the end of these notes, but I have moved them to the front because

More information

SUBSPACES. Chapter Introduction. 3.2 Subspaces of F n

SUBSPACES. Chapter Introduction. 3.2 Subspaces of F n Chapter 3 SUBSPACES 3. Introduction Throughout this chapter, we will be studying F n, the set of all n dimensional column vectors with components from a field F. We continue our study of matrices by considering

More information

EE365: Markov Chains. Markov chains. Transition Matrices. Distribution Propagation. Other Models

EE365: Markov Chains. Markov chains. Transition Matrices. Distribution Propagation. Other Models EE365: Markov Chains Markov chains Transition Matrices Distribution Propagation Other Models 1 Markov chains 2 Markov chains a model for dynamical systems with possibly uncertain transitions very widely

More information

Note Set 3: Models, Parameters, and Likelihood

Note Set 3: Models, Parameters, and Likelihood Note Set 3: Models, Parameters, and Likelihood Padhraic Smyth, Department of Computer Science University of California, Irvine c 7 Introduction This is a brief set of notes covering basic concepts related

More information

6.895 Randomness and Computation February 22, Lecture 4

6.895 Randomness and Computation February 22, Lecture 4 6.895 Randomness and Computation February 22, 2012 Lecture 4 Lecturer: Ronitt Rubinfeld Scribe: Cesar A. Cuenca 1 Overview: In this lecture, we look at some of the complexity theory aspects of the course

More information

Announcements. CS 188: Artificial Intelligence Fall Bayes Nets. Bayes Net Semantics. Building the (Entire) Joint. Example: Alarm Network

Announcements. CS 188: Artificial Intelligence Fall Bayes Nets. Bayes Net Semantics. Building the (Entire) Joint. Example: Alarm Network CS 188: Artificial Intelligence Fall 2009 Lecture 15: ayes Nets II Independence 10/15/2009 Announcements Midterm 10/22: see prep page on web One page note sheet, non-programmable calculators eview sessions

More information

SOLUTIONS TO PROBLEM SET 6

SOLUTIONS TO PROBLEM SET 6 SOLUTIONS TO PROBLEM SET 6 18.6 SPRING 16 Note the difference of conventions: these solutions adopt that the characteristic polynomial of a matrix A is det A xi while the lectures adopt the convention

More information

Analysis of Algorithms I: Strassen s Algorithm and the Master Theorem

Analysis of Algorithms I: Strassen s Algorithm and the Master Theorem Analysis of Algorithms I: Xi Chen Columbia University BinarySearch: Find a number in a sorted sequence BinarySearch (A, x), where A = (a 1, a 2,..., a n ) is nondecreasing 1 If n = 1 then output 1 if a

More information

Text Classification using Naive Bayes

Text Classification using Naive Bayes would have: Text Classification using Naive Bayes Hiroshi Shimodaira 10 February 2015 d B (1, 0, 1, 0, 1, 0 T d M (2, 0, 1, 0, 1, 0 T To classify a document we use equation (1, which requires estimating

More information

Survey of some new convergence analysis techniques for sum and max belief propagation on loopy graphs

Survey of some new convergence analysis techniques for sum and max belief propagation on loopy graphs Survey of some new convergence analysis techniques for sum and max belief propagation on loopy graphs Alekh Agarwal December 13, 2007 Basic setup Pairwise MRFs on binary variables Sum-product and max-product

More information

DISCRETE AND CONTINUOUS PROBABILITY DISTRIBUTIONS. f(x) = (1/2) x ) = 1, which is the result that we are seeking.

DISCRETE AND CONTINUOUS PROBABILITY DISTRIBUTIONS. f(x) = (1/2) x ) = 1, which is the result that we are seeking. 1 Probability mass functions DISCRETE AND CONTINUOUS PROBABILITY DISTRIBUTIONS If x {x 1, x 2, x 3,...} is discrete, then a function f(x i ) giving the probability that x = x i is called a probability

More information

Probability Theory Review for Machine Learning

Probability Theory Review for Machine Learning Probability Theory Review for Machine Learning Samuel Ieong November 6, 2006 1 Basic Concepts Broadly speaking, probability theory is the mathematical study of uncertainty. It plays a central role in machine

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Science ALGORITHMS FOR INFERENCE Fall 2014

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Science ALGORITHMS FOR INFERENCE Fall 2014 MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Science 6.438 ALGORITHMS FOR INFERENCE Fall 2014 Quiz 1 Thursday, October 16, 2014 7:00pm 10:00pm This is a closed

More information

MATH 2030: EIGENVALUES AND EIGENVECTORS

MATH 2030: EIGENVALUES AND EIGENVECTORS MATH 200: EIGENVALUES AND EIGENVECTORS Eigenvalues and Eigenvectors of n n matrices With the formula for the determinant of a n n matrix, we can extend our discussion on the eigenvalues and eigenvectors

More information

Applications of Spectral Matrix Theory in Computational Biology

Applications of Spectral Matrix Theory in Computational Biology Applications of Spectral Matrix Theory in Computational Biology Soheil Feizi Computer Science and Artificial Intelligence Laboratory Research Laboratory for Electronics Massachusetts Institute of Technology

More information

Solving Recurrence Relations

Solving Recurrence Relations Solving Recurrence Relations Motivation We frequently have to solve recurrence relations in computer science. For example, an interesting example of a heap data structure is a Fibonacci heap. This type

More information

Hidden Markov Models and Dynamic Programming

Hidden Markov Models and Dynamic Programming Hidden Markov Models and Dynamic Programming Jonathon Read October 14, 2011 1 Last week: stochastic part-of-speech tagging Last week we reviewed parts-of-speech, which are linguistic categories of words

More information

11 Matrices and graphs: Transitive closure

11 Matrices and graphs: Transitive closure Matrices and graphs: ransitive closure 1 11 Matrices and graphs: ransitive closure Atomic versus structured objects. Directed versus undirected graphs. ransitive closure. Adjacency and connectivity matrix.

More information

1111: Linear Algebra I

1111: Linear Algebra I 1111: Linear Algebra I Dr. Vladimir Dotsenko (Vlad) Lecture 3 Dr. Vladimir Dotsenko (Vlad) 1111: Linear Algebra I Lecture 3 1 / 12 Vector product and volumes Theorem. For three 3D vectors u, v, and w,

More information