# Introduction to Graphical Models

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 obert Collins CSE586 Credits: Several slides are from: Introduction to Graphical odels eadings in Prince textbook: Chapters 0 and but mainly only on directed graphs at this time eview: Probability Theory Sum rule (marginal distributions) Product rule From these we have Bayes theorem with normalization factor eview: Conditional Probabilty Conditional Probability (rewriting product rule) A B) P (A, B) / B) Chain ule A,B,C,D) A) A,B) A,B,C) A,B,C,D) A) A,B) A,B,C) Conditional Independence A) B A) C A,B) D A,B,C) A, B C) A C ) B C) statistical independence A, B) A) B) Christopher Bishop, S Overview of Graphical odels Graphical odels model conditional dependence/ independence Graph structure specifies how joint probability factors Directed graphs Example:H The Joint Distribution ecipe for making a joint distribution of variables: Example: Boolean variables A, B, C Undirected graphs Example:F Inference by : belief propagation Sum-product algorithm ax-product (in-sum if using logs) We will focus mainly on directed graphs right now. Andrew oore, CU

2 The Joint Distribution Example: Boolean variables A, B, C The Joint Distribution Example: Boolean variables A, B, C ecipe for making a joint distribution of variables: A B C ecipe for making a joint distribution of variables: A B C Prob ake a truth table listing all combinations of values of your variables (if there are Boolean variables then the table will have 2 rows) ake a truth table listing all combinations of values of your variables (if there are Boolean variables then the table will have 2 rows). 2. For each combination of values, say how probable it is Andrew oore, CU Andrew oore, CU The Joint Distribution Example: Boolean variables A, B, C Joint distributions ecipe for making a joint distribution of variables:. ake a truth table listing all combinations of values of your variables (if there are Boolean variables then the table will have 2 rows). 2. For each combination of values, say how probable it is. 3. If you subscribe to the axioms of probability, those numbers must sum to. A B C Prob A truth table C Good news Once you have a joint distribution, you can answer all sorts of probabilistic questions involving combinations of attributes 0.30 B 0.0 Andrew oore, CU Using the Joint Using the Joint Poor ale) E) row) rows matching E Poor) E) row) rows matching E Andrew oore, CU Andrew oore, CU 2

3 Inference with the Joint Inference with the Joint computing conditional probabilities E E2) E E2) E ) 2 rows matching E and E2 rows matching E2 row) row) E E2) E E2) E ) 2 rows matching E and E2 rows matching E2 row) row) ale Poor) / Andrew oore, CU Andrew oore, CU Joint distributions Good news Once you have a joint distribution, you can answer all sorts of probabilistic questions involving combinations of attributes Bad news Impossible to create JD for more than about ten attributes because there are so many numbers needed when you build the thing. For 0 binary variables you need to specify numbers 023. (question for class: why the -?) How to use Fewer Numbers Factor the joint distribution into a product of distributions over subsets of variables Identify (or just assume) independence between some subsets of variables Use that independence to simplify some of the distributions Graphical models provide a principled way of doing this. Factoring Directed versus Undirected Graphs Consider an arbitrary joint distribution We can always factor it, by application of the chain rule what this factored form looks like as a graphical model Directed Graph Examples: Bayes nets Hs Undirected Graph Examples FS Note: The word graphical denotes the graph structure underlying the model, not the fact that you can draw a pretty picture of it (although that helps). Christopher Bishop, S Christopher Bishop, S 3

4 Graphical odel Concepts Graphical odel Concepts s)0.3 S )0.6 s)0.3 S )0.6 ^S)0.05 ^~S)0. ~^S)0. ~^~S)0.2 T T )0.3 T ~)0.8 )0.3 ~)0.6 ^S)0.05 ^~S)0. ~^S)0. ~^~S)0.2 T T )0.3 T ~)0.8 )0.3 ~)0.6 Nodes represent random variables. Edges (or lack of edges) represent conditional dependence (or independence). Each node is annotated with a table of conditional probabilities wrt parents. Note: The word graphical denotes the graph structure underlying the model, not the fact that you can draw a pretty picture of it using graphics. Directed Acyclic Graphs Directed acyclic means we can t follow arrows around in a cycle. Examples: chains; trees Also, things that look like this: Factoring Examples Joint distribution where denotes the parents of i We can read the factored form of the joint distribution immediately from a directed graph where denotes the parents of i x parents of x) y parents of y) Factoring Examples Joint distribution Factoring Examples We can read the form of the joint distribution directly from the directed graph where denotes the parents of i where denotes the parents of i p(x,y)p(x)p(y x) p(x,y)p(y)p(x y) p(x,y)p(x)p(y) parents of ) parents of ) parents of ) 4

6 Factoring Examples How many probabilities do we have to specify/learn (assuming each x i is a binary variable)? if fully connected, we would need 2^7-27 but, for this connectivity, we need Important Case: Time Series Consider modeling a time series of sequential data x, x2,..., xn These could represent locations of a tracked object over time observations of the weather each day spectral coefficients of a speech signal joint angles during human motion Note: If all nodes were independent, we would only need 7! odeling Time Series odeling Time Series Simplest model of a time series is that all observations are independent. In the most general case, we could use chain rule to state that any node is dependent on all previous nodes... This would be appropriate for modeling successive tosses {heads,tails} of an unbiased coin. However, it doesn t really treat the series as a sequence. That is, we could permute the ordering of the observations and not change a thing. x,x2,x3,x4,...) x)x2 x)x3 x,x2)x4 x,x2,x3)... ook for an intermediate model between these two extremes. odeling Time Series odeling Time Series arkov assumption: xn x,x2,...,xn-) xn xn-) that is, assume all conditional distributions depend only on the most recent previous observation. Generalization: State-Space odels You have a arkov chain of latent (unobserved) states Each state generates an observation x! x 2! x n-! x n! x n+! The result is a first-order arkov Chain y! y 2! y n-! y n! y n+! x,x2,x3,x4,...) x)x2 x)x3 x2)x4 x3)... Goal: Given a sequence of observations, predict the sequence of unobserved states that imizes the joint probability. 6

7 odeling Time Series Examples of State Space models Hidden arkov model Kalman filter x! x 2! x n-! x n! x n+! odeling Time Series x,x2,x3,x4,...,y,y2,y3,y4,...) x)y x)x2 x)y2 x2)x3 x2)y3 x3)x4 x3)y4 x4)... x! x 2! x n-! x n! x n+! y! y 2! y n-! y n! y n+! y! y 2! y n-! y n! y n+! Example of a Tree-structured odel essage Passing Confusion alert: Our textbook uses w to denote a world state variable and x to denote a measurement. (we have been using x to denote world state and y as the measurement). essage Passing : Belief Propagation Example: D chain Find marginal for a particular node Key Idea of essage Passing multiplication distributes over addition a * b + a * c a * (b + c) as a consequence: for -state nodes, cost is exponential in length of chain but, we can exploit the graphical structure (conditional independences) is number of discrete values a variable can take is number of variables Applicable to both directed and undirected graphs. 7

8 Example essage Passing In the next several slides, we will consider an example of a simple, four-variable arkov chain. 48 multiplications + 23 additions 5 multiplications + 6 additions For, this principle is applied to functions of random variables, rather than the variables as done here. essage Passing Now consider computing the marginal distribution of variable x3 essage Passing ultiplication distributes over addition... essage Passing, aka Forward-Backward Algorithm Can view as sending/combining messages... Forward-Backward Algorithm Express marginals as product of messages evaluated forward from ancesters of xi and backwards from decendents of xi α β Forw * Back ecursive evaluation of messages - + Find Z by normalizing Works in both directed and undirected graphs Christopher Bishop, S 8

9 Confusion Alert! This standard notation for defining heavily overloads the notion of multiplication, e.g. the messages are not scalars it is more appropriate to think of them as vectors, matrices, or even tensors depending on how many variables are involved, with multiplication defined accordingly. [] Note: these are conditional probability tables, so values in each row must sum to one Not scalar multiplication! [] Note: these are conditional probability tables, so values in each row must sum to one [] Note: these are conditional probability tables, so values in each row must sum to one Interpretation x) x2) x2) x22) x22 X) 0.6 Sample computations: x, x2, x3, x4) (.7)(.4)(.8)().2 x2, x2, x32, x4) (.3)()(.2)(.7).02 Joint Probability, represented in a truth table x x2 x3 x4 x,x2,x3,x4) Joint Probability x x2 x3 x4 x,x2,x3,x4) Compute marginal of x3: x3) x32) 042 9

10 now compute via now compute via [] [] x)x2 x) x)x22 x).3.3 x2)x2 x2) x2)x22 x2) (.7)(.4) + (.3)() (.7)(.6) + (.3)() x)x2 x) x)x22 x).3.3 x2)x2 x2) x2)x22 x2) (.7)(.4) + (.3)() (.7)(.6) + (.3)() simpler way to compute this... [] [.43 7] i.e. matrix multiply can do the combining and marginalization all at once!!!! now compute via now compute via [] [] [] [.43 7] [ ] [] compute sum along rows of x4 x3) Can also do that with matrix multiply: [.43 7] note: this is not [ ] a coincidence essage Passing Can view as sending/combining messages... essage Passing Can view as sending/combining messages... Forw [ ] Belief that x3, from front part of chain Belief that x32, from front part of chain * How to combine them? Back Belief that x32, from back part of chain Belief that x3, from back part of chain Forw [ ] * Back [ (.458)() (42)() ] [ ] [ ] (after normalizing, but note that it was already normalized. Again, not a coincidence) These are the same values for the marginal x3) that we computed from the raw joint probability table. Whew!!! 0

11 [] [] If we want to compute all marginals, we can do it in one shot by cascading, for a big computational savings. We need one cascaded forward pass, one separate cascaded backward pass, then a combination and normalization at each node. forward pass [] backward pass Then combine each by elementwise multiply and normalize forward pass [] backward pass Then combine each by elementwise multiply and normalize Forward: [] [.43 7] [ ] [ ] Backward: [ ] [ ] [ ] [ ] combined+ normalized [] [.43 7] [ ] [ ] Note: In this example, a directed arkov chain using true conditional probabilities (rows sum to one), only the forward pass is needed. This is true because the backward pass sums along rows, and always produces [ ]. ax arginals What if we want to know the most probably state (mode of the distribution)? Since the marginal distributions can tell us which value of each variable yields the highest marginal probability (that is, which value is most likely), we might try to just take the arg of each marginal distribution. We didn t really need forward AND backward in this example. Forward: [] [.43 7] [ ] [ ] Backward: [ ] [ ] [ ] [ ] combined+ normalized [] [.43 7] [ ] [ ] these are already the marginals for this example Didn t need to do these steps marginals computed by belief propagation marginals [] [.43 7] [ ] [ ] arg x x 2 2 x 3 2 x 4 Although that s correct in this example, it isn t always the case

12 ax arginals can Fail to Find the AP However, the marginals find most likely values of each variable treated individually, which may not be the combination of values that jointly imize the distribution. marginals: w4, w22 actual AP solution: w2, w24 ax-product Algorithm Goal: find define the marginal then essage passing algorithm with sum replaced by Generalizes to any two operations forming a semiring Christopher Bishop, S Computing AP Value product Can solve using algorithm with sum replaced by. [] In our chain, we start at the end and work our way back to the root (x) using the -product algorithm, keeping track of the value as we go. marginal stage.7.4 x)x2 x).3 x2)x2 x2).7.6 x)x22 x).3 x2)x22 x2) stage3 stage2 [(.7)(.4), (.3)()] [(.7)(.6), (.3)()] product product [] [] [.28.42] [ ] (.28)(.8) (.28)(.2) (.42)(.2) (.42)(.8) Note that this is no longer matrix multiplication, since we are not summing down the columns but taking instead... [.28.42] [ ] (.28)(.8) (.28)(.2) (.42)(.2) (.42)(.8) compute along rows of x4 x3).7 2

13 [] product Joint Probability, represented in a truth table x x2 x3 x4 x,x2,x3,x4) argest value of joint prob mode AP interpretation: the mode of the joint distribution is.2352, and the value of variable x3, in the configuration that yields the mode value, is 2. [ ] [(.224)(), (.336)(.7)] arg 2 x3 interpretation: the mode of the joint distribution is.2352, and the value of variable x3, in the configuration that yields the mode value, is [(.224)(), (.336)(.7)].2352 arg 2 x3 Computing Arg-ax of AP Value : AP Estimate Chris Bishop, P: At this point, we might be tempted simply to continue with the algorithm [sending forward-backward messages and combining to compute arg for each variable node]. However, because we are now imizing rather than summing, it is possible that there may be multiple configurations of x all of which give rise to the imum value for p(x). In such cases, this strategy can fail because it is possible for the individual variable values obtained by imizing the product of messages at each node to belong to different imizing configurations, giving an overall configuration that no longer corresponds to a imum. The problem can be resolved by adopting a rather different kind of... Essentially, the solution is to write a dynamic programming algorithm based on -product. DP State Space Trellis : AP Estimate Joint Probability, represented in a truth table x x2 x3 x4 x,x2,x3,x4) DP State Space Trellis argest value of joint prob mode AP achieved for x, x22, x32, x4 3

14 Belief Propagation Summary Definition can be extended to general tree-structured graphs Works for both directed AND undirected graphs Efficiently computes marginals and AP configurations At each node: form product of incoming messages and local evidence marginalize to give outgoing message one message in each direction across every link oopy Belief Propagation BP applied to graph that contains loops needs a propagation schedule needs multiple iterations might not converge Typically works well, even though it isn t supposed to State-of-the-art performance in error-correcting codes Gives exact answer in any acyclic graph (no loops). Christopher Bishop, S Christopher Bishop, S 4

### The Basics of Graphical Models

The Basics of Graphical Models David M. Blei Columbia University October 3, 2015 Introduction These notes follow Chapter 2 of An Introduction to Probabilistic Graphical Models by Michael Jordan. Many figures

### CS 188: Artificial Intelligence. Probability recap

CS 188: Artificial Intelligence Bayes Nets Representation and Independence Pieter Abbeel UC Berkeley Many slides over this course adapted from Dan Klein, Stuart Russell, Andrew Moore Conditional probability

### Course: Model, Learning, and Inference: Lecture 5

Course: Model, Learning, and Inference: Lecture 5 Alan Yuille Department of Statistics, UCLA Los Angeles, CA 90095 yuille@stat.ucla.edu Abstract Probability distributions on structured representation.

### Lecture 11: Graphical Models for Inference

Lecture 11: Graphical Models for Inference So far we have seen two graphical models that are used for inference - the Bayesian network and the Join tree. These two both represent the same joint probability

### 13.3 Inference Using Full Joint Distribution

191 The probability distribution on a single variable must sum to 1 It is also true that any joint probability distribution on any set of variables must sum to 1 Recall that any proposition a is equivalent

### Logic, Probability and Learning

Logic, Probability and Learning Luc De Raedt luc.deraedt@cs.kuleuven.be Overview Logic Learning Probabilistic Learning Probabilistic Logic Learning Closely following : Russell and Norvig, AI: a modern

### Notes on Matrix Multiplication and the Transitive Closure

ICS 6D Due: Wednesday, February 25, 2015 Instructor: Sandy Irani Notes on Matrix Multiplication and the Transitive Closure An n m matrix over a set S is an array of elements from S with n rows and m columns.

### Lecture 10: Sequential Data Models

CSC2515 Fall 2007 Introduction to Machine Learning Lecture 10: Sequential Data Models 1 Example: sequential data Until now, considered data to be i.i.d. Turn attention to sequential data Time-series: stock

### SYSM 6304: Risk and Decision Analysis Lecture 5: Methods of Risk Analysis

SYSM 6304: Risk and Decision Analysis Lecture 5: Methods of Risk Analysis M. Vidyasagar Cecil & Ida Green Chair The University of Texas at Dallas Email: M.Vidyasagar@utdallas.edu October 17, 2015 Outline

### These axioms must hold for all vectors ū, v, and w in V and all scalars c and d.

DEFINITION: A vector space is a nonempty set V of objects, called vectors, on which are defined two operations, called addition and multiplication by scalars (real numbers), subject to the following axioms

### Linear Dependence Tests

Linear Dependence Tests The book omits a few key tests for checking the linear dependence of vectors. These short notes discuss these tests, as well as the reasoning behind them. Our first test checks

### Approximation Algorithms

Approximation Algorithms or: How I Learned to Stop Worrying and Deal with NP-Completeness Ong Jit Sheng, Jonathan (A0073924B) March, 2012 Overview Key Results (I) General techniques: Greedy algorithms

### Artificial Intelligence Mar 27, Bayesian Networks 1 P (T D)P (D) + P (T D)P ( D) =

Artificial Intelligence 15-381 Mar 27, 2007 Bayesian Networks 1 Recap of last lecture Probability: precise representation of uncertainty Probability theory: optimal updating of knowledge based on new information

### Algorithmic Crowdsourcing. Denny Zhou Microsoft Research Redmond Dec 9, NIPS13, Lake Tahoe

Algorithmic Crowdsourcing Denny Zhou icrosoft Research Redmond Dec 9, NIPS13, Lake Tahoe ain collaborators John Platt (icrosoft) Xi Chen (UC Berkeley) Chao Gao (Yale) Nihar Shah (UC Berkeley) Qiang Liu

### 3. The Junction Tree Algorithms

A Short Course on Graphical Models 3. The Junction Tree Algorithms Mark Paskin mark@paskin.org 1 Review: conditional independence Two random variables X and Y are independent (written X Y ) iff p X ( )

### Factor Graphs and the Sum-Product Algorithm

498 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 47, NO. 2, FEBRUARY 2001 Factor Graphs and the Sum-Product Algorithm Frank R. Kschischang, Senior Member, IEEE, Brendan J. Frey, Member, IEEE, and Hans-Andrea

### Bayesian Networks Chapter 14. Mausam (Slides by UW-AI faculty & David Page)

Bayesian Networks Chapter 14 Mausam (Slides by UW-AI faculty & David Page) Bayes Nets In general, joint distribution P over set of variables (X 1 x... x X n ) requires exponential space for representation

### Points Addressed in this Lecture. Standard form of Boolean Expressions. Lecture 5: Logic Simplication & Karnaugh Map

Points Addressed in this Lecture Lecture 5: Logic Simplication & Karnaugh Map Professor Peter Cheung Department of EEE, Imperial College London (Floyd 4.5-4.) (Tocci 4.-4.5) Standard form of Boolean Expressions

### Lecture Notes on Spanning Trees

Lecture Notes on Spanning Trees 15-122: Principles of Imperative Computation Frank Pfenning Lecture 26 April 26, 2011 1 Introduction In this lecture we introduce graphs. Graphs provide a uniform model

### a 11 x 1 + a 12 x 2 + + a 1n x n = b 1 a 21 x 1 + a 22 x 2 + + a 2n x n = b 2.

Chapter 1 LINEAR EQUATIONS 1.1 Introduction to linear equations A linear equation in n unknowns x 1, x,, x n is an equation of the form a 1 x 1 + a x + + a n x n = b, where a 1, a,..., a n, b are given

### We seek a factorization of a square matrix A into the product of two matrices which yields an

LU Decompositions We seek a factorization of a square matrix A into the product of two matrices which yields an efficient method for solving the system where A is the coefficient matrix, x is our variable

### Analysis MA131. University of Warwick. Term

Analysis MA131 University of Warwick Term 1 01 13 September 8, 01 Contents 1 Inequalities 5 1.1 What are Inequalities?........................ 5 1. Using Graphs............................. 6 1.3 Case

### CS 331: Artificial Intelligence Fundamentals of Probability II. Joint Probability Distribution. Marginalization. Marginalization

Full Joint Probability Distributions CS 331: Artificial Intelligence Fundamentals of Probability II Toothache Cavity Catch Toothache, Cavity, Catch false false false 0.576 false false true 0.144 false

### GREEN CHICKEN EXAM - NOVEMBER 2012

GREEN CHICKEN EXAM - NOVEMBER 2012 GREEN CHICKEN AND STEVEN J. MILLER Question 1: The Green Chicken is planning a surprise party for his grandfather and grandmother. The sum of the ages of the grandmother

### COS513 LECTURE 5 JUNCTION TREE ALGORITHM

COS513 LECTURE 5 JUNCTION TREE ALGORITHM D. EIS AND V. KOSTINA The junction tree algorithm is the culmination of the way graph theory and probability combine to form graphical models. After we discuss

### CHAPTER 2 Estimating Probabilities

CHAPTER 2 Estimating Probabilities Machine Learning Copyright c 2016. Tom M. Mitchell. All rights reserved. *DRAFT OF January 24, 2016* *PLEASE DO NOT DISTRIBUTE WITHOUT AUTHOR S PERMISSION* This is a

### 2.1: MATRIX OPERATIONS

.: MATRIX OPERATIONS What are diagonal entries and the main diagonal of a matrix? What is a diagonal matrix? When are matrices equal? Scalar Multiplication 45 Matrix Addition Theorem (pg 0) Let A, B, and

### 2. Methods of Proof Types of Proofs. Suppose we wish to prove an implication p q. Here are some strategies we have available to try.

2. METHODS OF PROOF 69 2. Methods of Proof 2.1. Types of Proofs. Suppose we wish to prove an implication p q. Here are some strategies we have available to try. Trivial Proof: If we know q is true then

### Chapter 8. Matrices II: inverses. 8.1 What is an inverse?

Chapter 8 Matrices II: inverses We have learnt how to add subtract and multiply matrices but we have not defined division. The reason is that in general it cannot always be defined. In this chapter, we

### L10: Probability, statistics, and estimation theory

L10: Probability, statistics, and estimation theory Review of probability theory Bayes theorem Statistics and the Normal distribution Least Squares Error estimation Maximum Likelihood estimation Bayesian

### Compression algorithm for Bayesian network modeling of binary systems

Compression algorithm for Bayesian network modeling of binary systems I. Tien & A. Der Kiureghian University of California, Berkeley ABSTRACT: A Bayesian network (BN) is a useful tool for analyzing the

### LS.6 Solution Matrices

LS.6 Solution Matrices In the literature, solutions to linear systems often are expressed using square matrices rather than vectors. You need to get used to the terminology. As before, we state the definitions

### Simplifying Logic Circuits with Karnaugh Maps

Simplifying Logic Circuits with Karnaugh Maps The circuit at the top right is the logic equivalent of the Boolean expression: f = abc + abc + abc Now, as we have seen, this expression can be simplified

### Magic Hexagon Puzzles

Magic Hexagon Puzzles Arthur Holshouser 3600 Bullard St Charlotte NC USA Harold Reiter Department of Mathematics University of North Carolina Charlotte Charlotte NC 28223 USA hbreiter@unccedu 1 Abstract

### A crash course in probability and Naïve Bayes classification

Probability theory A crash course in probability and Naïve Bayes classification Chapter 9 Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s

### Similarity and Diagonalization. Similar Matrices

MATH022 Linear Algebra Brief lecture notes 48 Similarity and Diagonalization Similar Matrices Let A and B be n n matrices. We say that A is similar to B if there is an invertible n n matrix P such that

### Linear transformations Affine transformations Transformations in 3D. Graphics 2011/2012, 4th quarter. Lecture 5: linear and affine transformations

Lecture 5 Linear and affine transformations Vector transformation: basic idea Definition Examples Finding matrices Compositions of transformations Transposing normal vectors Multiplication of an n n matrix

### The equation for the 3-input XOR gate is derived as follows

The equation for the 3-input XOR gate is derived as follows The last four product terms in the above derivation are the four 1-minterms in the 3-input XOR truth table. For 3 or more inputs, the XOR gate

### Lecture Note 1 Set and Probability Theory. MIT 14.30 Spring 2006 Herman Bennett

Lecture Note 1 Set and Probability Theory MIT 14.30 Spring 2006 Herman Bennett 1 Set Theory 1.1 Definitions and Theorems 1. Experiment: any action or process whose outcome is subject to uncertainty. 2.

### Lesson 3. Algebraic graph theory. Sergio Barbarossa. Rome - February 2010

Lesson 3 Algebraic graph theory Sergio Barbarossa Basic notions Definition: A directed graph (or digraph) composed by a set of vertices and a set of edges We adopt the convention that the information flows

### Decision Making under Uncertainty

6.825 Techniques in Artificial Intelligence Decision Making under Uncertainty How to make one decision in the face of uncertainty Lecture 19 1 In the next two lectures, we ll look at the question of how

### Math 4310 Handout - Quotient Vector Spaces

Math 4310 Handout - Quotient Vector Spaces Dan Collins The textbook defines a subspace of a vector space in Chapter 4, but it avoids ever discussing the notion of a quotient space. This is understandable

### Graph Theory. Directed and undirected graphs make useful mental models for many situations. These objects are loosely defined as follows:

Graph Theory Directed and undirected graphs make useful mental models for many situations. These objects are loosely defined as follows: Definition An undirected graph is a (finite) set of nodes, some

### Outline. Probability. Random Variables. Joint and Marginal Distributions Conditional Distribution Product Rule, Chain Rule, Bayes Rule.

Probability 1 Probability Random Variables Outline Joint and Marginal Distributions Conditional Distribution Product Rule, Chain Rule, Bayes Rule Inference Independence 2 Inference in Ghostbusters A ghost

### The PageRank Citation Ranking: Bring Order to the Web

The PageRank Citation Ranking: Bring Order to the Web presented by: Xiaoxi Pang 25.Nov 2010 1 / 20 Outline Introduction A ranking for every page on the Web Implementation Convergence Properties Personalized

### Lecture Notes 2: Matrices as Systems of Linear Equations

2: Matrices as Systems of Linear Equations 33A Linear Algebra, Puck Rombach Last updated: April 13, 2016 Systems of Linear Equations Systems of linear equations can represent many things You have probably

### Section 10.4 Vectors

Section 10.4 Vectors A vector is represented by using a ray, or arrow, that starts at an initial point and ends at a terminal point. Your textbook will always use a bold letter to indicate a vector (such

### Bayesian Updating with Discrete Priors Class 11, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom

1 Learning Goals Bayesian Updating with Discrete Priors Class 11, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom 1. Be able to apply Bayes theorem to compute probabilities. 2. Be able to identify

### ECON 459 Game Theory. Lecture Notes Auctions. Luca Anderlini Spring 2015

ECON 459 Game Theory Lecture Notes Auctions Luca Anderlini Spring 2015 These notes have been used before. If you can still spot any errors or have any suggestions for improvement, please let me know. 1

### Dynamic Programming. Lecture 11. 11.1 Overview. 11.2 Introduction

Lecture 11 Dynamic Programming 11.1 Overview Dynamic Programming is a powerful technique that allows one to solve many different types of problems in time O(n 2 ) or O(n 3 ) for which a naive approach

### Learning. Artificial Intelligence. Learning. Types of Learning. Inductive Learning Method. Inductive Learning. Learning.

Learning Learning is essential for unknown environments, i.e., when designer lacks omniscience Artificial Intelligence Learning Chapter 8 Learning is useful as a system construction method, i.e., expose

### MATHEMATICAL THEORY FOR SOCIAL SCIENTISTS THE BINOMIAL THEOREM. (p + q) 0 =1,

THE BINOMIAL THEOREM Pascal s Triangle and the Binomial Expansion Consider the following binomial expansions: (p + q) 0 1, (p+q) 1 p+q, (p + q) p +pq + q, (p + q) 3 p 3 +3p q+3pq + q 3, (p + q) 4 p 4 +4p

### Applied Algorithm Design Lecture 5

Applied Algorithm Design Lecture 5 Pietro Michiardi Eurecom Pietro Michiardi (Eurecom) Applied Algorithm Design Lecture 5 1 / 86 Approximation Algorithms Pietro Michiardi (Eurecom) Applied Algorithm Design

### CSE 20: Discrete Mathematics for Computer Science. Prof. Miles Jones. Today s Topics: Graphs. The Internet graph

Today s Topics: CSE 0: Discrete Mathematics for Computer Science Prof. Miles Jones. Graphs. Some theorems on graphs. Eulerian graphs Graphs! Model relations between pairs of objects The Internet graph!

### The basic unit in matrix algebra is a matrix, generally expressed as: a 11 a 12. a 13 A = a 21 a 22 a 23

(copyright by Scott M Lynch, February 2003) Brief Matrix Algebra Review (Soc 504) Matrix algebra is a form of mathematics that allows compact notation for, and mathematical manipulation of, high-dimensional

### 1 What s done well in CVXPY. Figure 1: Expression tree.

Figure 1: Expression tree. Here are my thoughts on what s done well in CVXPY, what s missing, and what I d like to see in a core CVX library. 1 What s done well in CVXPY The best design decision I made

### DETERMINANTS. b 2. x 2

DETERMINANTS 1 Systems of two equations in two unknowns A system of two equations in two unknowns has the form a 11 x 1 + a 12 x 2 = b 1 a 21 x 1 + a 22 x 2 = b 2 This can be written more concisely in

### Chapter 5: Sequential Circuits (LATCHES)

Chapter 5: Sequential Circuits (LATCHES) Latches We focuses on sequential circuits, where we add memory to the hardware that we ve already seen Our schedule will be very similar to before: We first show

### E3: PROBABILITY AND STATISTICS lecture notes

E3: PROBABILITY AND STATISTICS lecture notes 2 Contents 1 PROBABILITY THEORY 7 1.1 Experiments and random events............................ 7 1.2 Certain event. Impossible event............................

### GRAPH THEORY and APPLICATIONS. Trees

GRAPH THEORY and APPLICATIONS Trees Properties Tree: a connected graph with no cycle (acyclic) Forest: a graph with no cycle Paths are trees. Star: A tree consisting of one vertex adjacent to all the others.

### Bayesian Networks. Mausam (Slides by UW-AI faculty)

Bayesian Networks Mausam (Slides by UW-AI faculty) Bayes Nets In general, joint distribution P over set of variables (X 1 x... x X n ) requires exponential space for representation & inference BNs provide

### Vectors. Philippe B. Laval. Spring 2012 KSU. Philippe B. Laval (KSU) Vectors Spring /

Vectors Philippe B Laval KSU Spring 2012 Philippe B Laval (KSU) Vectors Spring 2012 1 / 18 Introduction - Definition Many quantities we use in the sciences such as mass, volume, distance, can be expressed

### Bayesian Networks. Read R&N Ch. 14.1-14.2. Next lecture: Read R&N 18.1-18.4

Bayesian Networks Read R&N Ch. 14.1-14.2 Next lecture: Read R&N 18.1-18.4 You will be expected to know Basic concepts and vocabulary of Bayesian networks. Nodes represent random variables. Directed arcs

### Tracking Algorithms. Lecture17: Stochastic Tracking. Joint Probability and Graphical Model. Probabilistic Tracking

Tracking Algorithms (2015S) Lecture17: Stochastic Tracking Bohyung Han CSE, POSTECH bhhan@postech.ac.kr Deterministic methods Given input video and current state, tracking result is always same. Local

### COT5405 Analysis of Algorithms Homework 3 Solutions

COT0 Analysis of Algorithms Homework 3 Solutions. Prove or give a counter example: (a) In the textbook, we have two routines for graph traversal - DFS(G) and BFS(G,s) - where G is a graph and s is any

### Summary of Probability

Summary of Probability Mathematical Physics I Rules of Probability The probability of an event is called P(A), which is a positive number less than or equal to 1. The total probability for all possible

### Solutions for Chapter 8

Solutions for Chapter 8 Solutions for exercises in section 8. 2 8.2.1. The eigenvalues are σ (A ={12, 6} with alg mult A (6 = 2, and it s clear that 12 = ρ(a σ (A. The eigenspace N(A 12I is spanned by

### 9.4. The Scalar Product. Introduction. Prerequisites. Learning Style. Learning Outcomes

The Scalar Product 9.4 Introduction There are two kinds of multiplication involving vectors. The first is known as the scalar product or dot product. This is so-called because when the scalar product of

### Likelihood: Frequentist vs Bayesian Reasoning

"PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200B University of California, Berkeley Spring 2009 N Hallinan Likelihood: Frequentist vs Bayesian Reasoning Stochastic odels and

### Graph Theory for Articulated Bodies

Graph Theory for Articulated Bodies Alba Perez-Gracia Department of Mechanical Engineering, Idaho State University Articulated Bodies A set of rigid bodies (links) joined by joints that allow relative

### Lecture 6: The Bayesian Approach

Lecture 6: The Bayesian Approach What Did We Do Up to Now? We are given a model Log-linear model, Markov network, Bayesian network, etc. This model induces a distribution P(X) Learning: estimate a set

### Equilibrium of Concurrent Forces (Force Table)

Equilibrium of Concurrent Forces (Force Table) Objectives: Experimental objective Students will verify the conditions required (zero net force) for a system to be in equilibrium under the influence of

### Theory of Computation Prof. Kamala Krithivasan Department of Computer Science and Engineering Indian Institute of Technology, Madras

Theory of Computation Prof. Kamala Krithivasan Department of Computer Science and Engineering Indian Institute of Technology, Madras Lecture No. # 31 Recursive Sets, Recursively Innumerable Sets, Encoding

### Approximation Algorithms: LP Relaxation, Rounding, and Randomized Rounding Techniques. My T. Thai

Approximation Algorithms: LP Relaxation, Rounding, and Randomized Rounding Techniques My T. Thai 1 Overview An overview of LP relaxation and rounding method is as follows: 1. Formulate an optimization

### Question 2: How do you solve a matrix equation using the matrix inverse?

Question : How do you solve a matrix equation using the matrix inverse? In the previous question, we wrote systems of equations as a matrix equation AX B. In this format, the matrix A contains the coefficients

### Probability, Conditional Independence

Probability, Conditional Independence June 19, 2012 Probability, Conditional Independence Probability Sample space Ω of events Each event ω Ω has an associated measure Probability of the event P(ω) Axioms

### Conditional Random Fields: An Introduction

Conditional Random Fields: An Introduction Hanna M. Wallach February 24, 2004 1 Labeling Sequential Data The task of assigning label sequences to a set of observation sequences arises in many fields, including

### Algorithms. Theresa Migler-VonDollen CMPS 5P

Algorithms Theresa Migler-VonDollen CMPS 5P 1 / 32 Algorithms Write a Python function that accepts a list of numbers and a number, x. If x is in the list, the function returns the position in the list

### Midterm Exam Solutions CS161 Computer Security, Spring 2008

Midterm Exam Solutions CS161 Computer Security, Spring 2008 1. To encrypt a series of plaintext blocks p 1, p 2,... p n using a block cipher E operating in electronic code book (ECB) mode, each ciphertext

### MINI LESSON. Lesson 5b Solving Quadratic Equations

MINI LESSON Lesson 5b Solving Quadratic Equations Lesson Objectives By the end of this lesson, you should be able to: 1. Determine the number and type of solutions to a QUADRATIC EQUATION by graphing 2.

### Solutions to Exercises 8

Discrete Mathematics Lent 2009 MA210 Solutions to Exercises 8 (1) Suppose that G is a graph in which every vertex has degree at least k, where k 1, and in which every cycle contains at least 4 vertices.

### 1. LINEAR EQUATIONS. A linear equation in n unknowns x 1, x 2,, x n is an equation of the form

1. LINEAR EQUATIONS A linear equation in n unknowns x 1, x 2,, x n is an equation of the form a 1 x 1 + a 2 x 2 + + a n x n = b, where a 1, a 2,..., a n, b are given real numbers. For example, with x and

### Why? A central concept in Computer Science. Algorithms are ubiquitous.

Analysis of Algorithms: A Brief Introduction Why? A central concept in Computer Science. Algorithms are ubiquitous. Using the Internet (sending email, transferring files, use of search engines, online

### CMSC 451: Graph Properties, DFS, BFS, etc.

CMSC 451: Graph Properties, DFS, BFS, etc. Slides By: Carl Kingsford Department of Computer Science University of Maryland, College Park Based on Chapter 3 of Algorithm Design by Kleinberg & Tardos. Graphs

### 4. MATRICES Matrices

4. MATRICES 170 4. Matrices 4.1. Definitions. Definition 4.1.1. A matrix is a rectangular array of numbers. A matrix with m rows and n columns is said to have dimension m n and may be represented as follows:

### Matrix Algebra 2.3 CHARACTERIZATIONS OF INVERTIBLE MATRICES Pearson Education, Inc.

2 Matrix Algebra 2.3 CHARACTERIZATIONS OF INVERTIBLE MATRICES Theorem 8: Let A be a square matrix. Then the following statements are equivalent. That is, for a given A, the statements are either all true

### Finding the M Most Probable Configurations Using Loopy Belief Propagation

Finding the M Most Probable Configurations Using Loopy Belief Propagation Chen Yanover and Yair Weiss School of Computer Science and Engineering The Hebrew University of Jerusalem 91904 Jerusalem, Israel

### TU e. Advanced Algorithms: experimentation project. The problem: load balancing with bounded look-ahead. Input: integer m 2: number of machines

The problem: load balancing with bounded look-ahead Input: integer m 2: number of machines integer k 0: the look-ahead numbers t 1,..., t n : the job sizes Problem: assign jobs to machines machine to which

### Structured Learning and Prediction in Computer Vision. Contents

Foundations and Trends R in Computer Graphics and Vision Vol. 6, Nos. 3 4 (2010) 185 365 c 2011 S. Nowozin and C. H. Lampert DOI: 10.1561/0600000033 Structured Learning and Prediction in Computer Vision

### A1. Basic Reviews PERMUTATIONS and COMBINATIONS... or HOW TO COUNT

A1. Basic Reviews Appendix / A1. Basic Reviews / Perms & Combos-1 PERMUTATIONS and COMBINATIONS... or HOW TO COUNT Question 1: Suppose we wish to arrange n 5 people {a, b, c, d, e}, standing side by side,

### Mathematics for Computer Science/Software Engineering. Notes for the course MSM1F3 Dr. R. A. Wilson

Mathematics for Computer Science/Software Engineering Notes for the course MSM1F3 Dr. R. A. Wilson October 1996 Chapter 1 Logic Lecture no. 1. We introduce the concept of a proposition, which is a statement

### Bayesian networks - Time-series models - Apache Spark & Scala

Bayesian networks - Time-series models - Apache Spark & Scala Dr John Sandiford, CTO Bayes Server Data Science London Meetup - November 2014 1 Contents Introduction Bayesian networks Latent variables Anomaly

### MATH 304 Linear Algebra Lecture 4: Matrix multiplication. Diagonal matrices. Inverse matrix.

MATH 304 Linear Algebra Lecture 4: Matrix multiplication. Diagonal matrices. Inverse matrix. Matrices Definition. An m-by-n matrix is a rectangular array of numbers that has m rows and n columns: a 11

### Linear Equations and Inverse Matrices

Chapter 4 Linear Equations and Inverse Matrices 4. Two Pictures of Linear Equations The central problem of linear algebra is to solve a system of equations. Those equations are linear, which means that

### 7 Communication Classes

this version: 26 February 2009 7 Communication Classes Perhaps surprisingly, we can learn much about the long-run behavior of a Markov chain merely from the zero pattern of its transition matrix. In the

### An Interesting Way to Combine Numbers

An Interesting Way to Combine Numbers Joshua Zucker and Tom Davis November 28, 2007 Abstract This exercise can be used for middle school students and older. The original problem seems almost impossibly

### MATH 240 Fall, Chapter 1: Linear Equations and Matrices

MATH 240 Fall, 2007 Chapter Summaries for Kolman / Hill, Elementary Linear Algebra, 9th Ed. written by Prof. J. Beachy Sections 1.1 1.5, 2.1 2.3, 4.2 4.9, 3.1 3.5, 5.3 5.5, 6.1 6.3, 6.5, 7.1 7.3 DEFINITIONS