13.3 Inference Using Full Joint Distribution


 Elwin Thompson
 1 years ago
 Views:
Transcription
1 191 The probability distribution on a single variable must sum to 1 It is also true that any joint probability distribution on any set of variables must sum to 1 Recall that any proposition a is equivalent to the disjunction of all the atomic events in which a holds Call this set of events e(a) Atomic events are mutually exclusive, so the probability of any conjunction of atomic events is zero, by axiom 2 Hence, from axiom 3 P(a) = ei e(a) P(e i ) Given a full joint distribution that specifies the probabilities of all atomic events, this equation provides a simple method for computing the probability of any proposition Inference Using Full Joint Distribution toothache toothache catch catch catch catch cavity cavity E.g., there are six atomic events for cavity toothache: = 0.28 Extracting the distribution over a variable (or some subset of variables), marginal probability, is attained by adding the entries in the corresponding rows or columns E.g., P(cavity) = = 0.2 We can write the following general marginalization (summing out) rule for any sets of variables Y and Z: P(Y) = z Z P(Y, z) 1
2 193 For example: P( Cavity) z { Catch, Toothache} P( Cavity, z) A variant of this rule involves conditional probabilities instead of joint probabilities, using the product rule: P( Y) P( Y z) P( z) z This rule is called conditioning Marginalization and conditioning turn out to be useful rules for all kinds of derivations involving probability expressions toothache toothache catch catch catch catch cavity cavity Computing a conditional probability P(cavity toothache) = P(cavity toothache)/p(toothache) = ( )/( ) = 0.12/0.2 = 0.6 Respectively P( cavity toothache) = ( )/0.2 = 0.4 The two probabilities sum up to one, as they should 2
3 195 1/P(toothache) = 1/0.2 = 5 is a normalization constant ensuring that the distribution P(Cavity toothache) adds up to 1 Let denote the normalization constant P(Cavity toothache) = P(Cavity, toothache) = (P(Cavity, toothache, catch) + P(Cavity, toothache, catch)) = ([0.108, 0.016] + [0.012, 0.064]) = [0.12, 0.08] = [0.6, 0.4] In other words, we can calculate the conditional probability distribution without knowing P(toothache) using normalization 196 More generally: we need to find out the distribution of the query variable X (Cavity), evidence variables E (Toothache) have observed values e, and the remaining unobserved variables are Y (Catch) Evaluation of a query: P(X e) = P(X, e) = y P(X, e, y), where the summation is over all possible ys; i.e., all possible combinations of values of the unobserved variables Y 3
4 197 P(X, e, y) is simply a subset of the joint probability distribution of variables X, E, and Y X, E, and Y together constitute the complete set of variables for the domain Given the full joint distribution to work with, the equation in the previous slide can answer probabilistic queries for discrete variables It does not scale well For a domain described by n Boolean variables, it requires an input table of size O(2 n ) and takes O(2 n ) time to process the table In realistic problems the approach is completely impractical Independence If we expand the previous example with a fourth random variable Weather, which has four possible values, we have to copy the table of joint probabilities four times to have 32 entries together Dental problems have no influence on the weather, hence: P(Weather = cloudy toothache, catch, cavity) = P(Weather = cloudy) By this observation and product rule P(toothache, catch, cavity, Weather = cloudy) = P(Weather = cloudy) P(toothache, catch, cavity) 4
5 199 A similar equation holds for the other values of the variable Weather, and hence P(Toothache, Catch, Cavity, Weather) = P(Toothache, Catch, Cavity) P(Weather) The required joint distribution tables have 8 and 4 elements Propositions a and b are independent if P(a b) = P(a) P(b a) = P(b) P(a b) = P(a) P(b) Respectively variables X and Y are independent of each other if P(X Y) = P(X) P(Y X) = P(Y) P(X, Y) = P(X)P(Y) Independent coin flips: P(C 1,, C n ) can be represented as the product of n singlevariable distributions P(C i ) Bayes Rule and Its Use By the product rule P(a b) = P(a b) P(b) and the commutativity of conjunction P(a b) = P(b a) P(a) Equating the two righthand sides and dividing by P(a), we get the Bayes rule P(b a) = P(a b) P(b) / P(a) The more general case of multivalued variables X and Y conditionalized on some background evidence e P(Y X, e) = P(X Y, e) P(Y e) / P(X e) Using normalization Bayes rule can be written as P(Y X) = P(X Y) P(Y) 5
6 201 Half of meningitis patients have a stiff neck P(s m) = 0.5 The prior probability of meningitis is 1 / : P(m) = 1/ Every 20th patient complains about a stiff neck P(s) = 1/20 What is the probability that a patient complaining about a stiff neck has meningitis? P(m s) = P(s m) P(m) / P(s) = 20 / ( ) = Perhaps the doctor knows that a stiff neck implies meningitis in 1 out of cases The doctor, hence, has quantitative information in the diagnostic direction from symptoms to causes, and no need to use Bayes rule Unfortunately, diagnostic knowledge is often more fragile than causal knowledge If there is a sudden epidemic of meningitis, the unconditional probability of meningitis P(m) will go up The conditional probability P(s m), however, stays the same The doctor who derived diagnostic probability P(m s) directly from statistical observation of patients before the epidemic will have no idea how to update the value The doctor who computes P(m s) from the other three values will see P(m s) go up proportionally with P(m) 6
7 203 All modern probabilistic inference systems are based on the use of Bayes rule On the surface the relatively simple rule does not seem very useful However, as the previous example illustrates, Bayes rule gives a chance to apply existing knowledge We can avoid assessing the probability of the evidence P(s) by instead computing a posterior probability for each value of the query variable m and m and then normalizing the result P(M s) = [ P(s m) P(m), P(s m) P( m) ] Thus, we need to estimate P(s m) instead of P(s) Sometimes easier, sometimes harder 204 When a probabilistic query has more than one piece of evidence the approach based on full joint probability will not scale up P(Cavity toothache catch) Neither will applying Bayes rule scale up in general P(toothache catch Cavity) P(Cavity) We would need variables to be independent, but variable Toothache and Catch obviously are not: if the probe catches in the tooth, it probably has a cavity and that probably causes a toothache Each is directly caused by the cavity, but neither has a direct effect on the other catch and toothache are conditionally independent given Cavity 7
8 205 Conditional independence: P(toothache catch Cavity) = P(toothache Cavity) P(catch Cavity) Plugging this into Bayes rule yields P(Cavity toothache catch) = P(Cavity) P(toothache Cavity) P(catch Cavity) Now we only need three separate distributions The general definition of conditional independence of variables X and Y, given a third variable Z is P(X, Y Z) = P(X Z) P(Y Z) Equivalently, P(X Y, Z) = P(X Z) and P(Y X, Z) = P(Y Z) 206 If all effects are conditionally independent given a single cause, the exponential size of knowledge representation is cut to linear A probability distribution is called a naïve Bayes model if all effects E 1,, E n are conditionally independent, given a single cause C The full joint probability distribution can be written as P(C, E 1,, E n ) = P(C) i P(E i C) It is often used as a simplifying assumption even in cases where the effect variables are not conditionally independent given the cause variable In practice, naïve Bayes systems can work surprisingly well, even when the independence assumption is not true 8
9 PROBABILISTIC REASONING A Bayesian network is a directed graph in which each node is annotated with quantitative probability information 1. A set of random variables makes up the nodes of the network. Variables may be discrete or continuous 2. A set of directed links (arrows) connects pairs of nodes. If there is an arrow from node X to node Y, then X is said to be a parent of Y 3. Each node X i has a conditional probability distribution P(X i Parents(X i )) 4. The graph has no directed cycles and hence is a directed, acyclic graph DAG 208 The intuitive meaning of arrow X Y is that X has a direct influence on Y The topology of the network the set of nodes and links specifies the conditional independence relations that hold in the domain Once the topology of the Bayesian network has been laid out, we need only specify a conditional probability distribution for each variable, given its parents The combination of the topology and the conditional distributions specify implicitly the full joint distribution for all the variables Weather Cavity Toothache Catch 9
10 209 Example (from LA) A new burglar alarm (A) has been installed at home It is fairly reliable at detecting a burglary (B), but also responds on occasion to minor earthquakes (E) Neighbors John (J) and Mary (M) have promised to call you at work when they hear the alarm John always calls when he hears the alarm, but sometimes confuses the telephone ringing with the alarm and calls then, too Mary, on the other hand, likes loud music and sometimes misses the alarm altogether Given the evidence of who has or has not called, we would like to estimate the probability of a burglary 210 P(B).001 Burglary Earthquake P(E).002 Alarm B E P(A) T T.95 T F.94 F T.29 F F.001 A P(J) T.90 F.05 John Calls Mary Calls A P(M) T.70 F.01 10
11 211 The topology of the network indicates that Burglary and earthquakes affect the probability of the alarm s going off Whether John and Mary call depends only on the alarm They do not perceive any burglaries directly, they do not notice minor earthquakes, and they do not confer before calling Mary listening to loud music and John confusing phone ringing to the sound of the alarm can be read from the network only implicitly as uncertainty associated to calling at work The probabilities actually summarize a potentially infinite set of circumstances The alarm might fail to go off due to high humidity, power failure, dead battery, cut wires, a dead mouse stuck inside the bell, etc. John and Mary might fail to call and report an alarm because they are out to lunch, on vacation, temporarily deaf, passing helicopter, etc. 212 The conditional probability tables in the network give the probabilities for the values of the random variable depending on the combination of values for the parent nodes Each row must sum to 1, because the entries represent exhaustive set of cases for the variable Above all variables are Boolean, and therefore it is enough to know that the probability of a true value is p, the probability of false must be 1 p In general, a table for a Boolean variable with k parents contains 2 k independently specifiable probabilities A variable with no parents has only one row, representing the prior probabilities of each possible value of the variable 11
12 The Semantics of Bayesian Networks Every entry in the full joint probability distribution can be calculated from the information in a Bayesian network A generic entry in the joint distribution is the probability of a conjunction of particular assignments to each variable P(X 1 = x 1 X n = x n ), abbreviated as P(x 1,, x n ) The value of this entry is P(x 1,, x n ) = i=1,,n P(x i parents(x i )), where parents(x i ) denotes the specific values of the variables Parents(X i ) P(j m a b e) = P(j a) P(m a) P(a b e) P( b) P( e) = = Constructing Bayesian networks We can rewrite an entry in the joint distribution P(x 1,, x n ), using the product rule, as P(x n x n1,, x 1 ) P(x n1,, x 1 ) Then we repeat the process, reducing each conjunctive probability to a conditional probability and a smaller conjunction. We end up with one big product: P(xn xn 1,,x 1)P(xn 1 xn 2,,x 1) P(x2 x 1)P(x 1) = P(x x,,x 1) This identity holds true for any set of random variables and is called the chain rule The specification of the joint distribution is thus equivalent to the general assertion that, for every variable X i in the network P(X i X i1,, X 1 ) = P(X i Parents(X i )) provided that Parents(X i ) { X i1,, X 1 } n i= 1 i i1 12
13 215 The last condition is satisfied by labeling the nodes in any order that is consistent with the partial order implicit in the graph structure Each node is required to be conditionally independent of its predecessors in the node ordering, given its parents We need to choose as parents of a node all those nodes that directly influence the value of the variable For example, MaryCalls is certainly influenced by whether there is a Burglary or an Earthquake, but not directly influenced We believe the following conditional independence statement to hold: P(MaryCalls JohnCalls, Alarm, Earthquake, Burglary) = P(MaryCalls Alarm) 216 A Bayesian network is a complete and nonredundant representation of the domain In addition, it is a more compact representation than the full joint distribution, due to its locally structured properties Each subcomponent interacts directly with only a bounded number of other components, regardless of the total number of components Local structure is usually associated with linear rather than exponential growth in complexity In domain of a Bayesian network, if each of the n random variables is influenced by at most k others, then specifying each conditional probability table will require at most 2 k numbers Altogether n2 k numbers; the full joint distribution has 2 n cells E.g., n = 30 and k = 5: n2 k = 960 and 2 n >
14 217 In a Bayesian network we can exchange accuracy with complexity It may be wiser to leave out very weak dependencies from the network in order to restrict the complexity, but this yields a lower accuracy Choosing the right topology for the network is a hard problem The correct order in which to add nodes is to add the root causes first, then the variables they influence, and so on, until we reach the leaves, which have no direct causal influence on other variables Adding nodes in the false order makes the network unnecessarily complex and unintuitive 218 Mary Calls John Calls Earthquake Burglary Alarm 14
Conditional probability
191 Conditional probability Once the agent has obtained some evidence concerning the previously unknown random variables making up the domain, we have to switch to using conditional (posterior) probabilities
More informationLogic, Probability and Learning
Logic, Probability and Learning Luc De Raedt luc.deraedt@cs.kuleuven.be Overview Logic Learning Probabilistic Learning Probabilistic Logic Learning Closely following : Russell and Norvig, AI: a modern
More informationArtificial Intelligence. Conditional probability. Inference by enumeration. Independence. Lesson 11 (From Russell & Norvig)
Artificial Intelligence Conditional probability Conditional or posterior probabilities e.g., cavity toothache) = 0.8 i.e., given that toothache is all I know tation for conditional distributions: Cavity
More informationOutline. Uncertainty. Uncertainty. Making decisions under uncertainty. Probability. Methods for handling uncertainty
Outline Uncertainty Chapter 13 Uncertainty Probability Syntax and Semantics Inference Independence and Bayes' Rule Uncertainty Let action A t = leave for airport t minutes before flight Will A t get me
More informationArtificial Intelligence
Artificial Intelligence ICS461 Fall 2010 Nancy E. Reed nreed@hawaii.edu 1 Lecture #13 Uncertainty Outline Uncertainty Probability Syntax and Semantics Inference Independence and Bayes' Rule Uncertainty
More informationBayesian networks. Bayesian Networks. Example. Example. Chapter 14 Section 1, 2, 4
Bayesian networks Bayesian Networks Chapter 14 Section 1, 2, 4 A simple, graphical notation for conditional independence assertions and hence for compact specification of full joint distributions Syntax:
More information/ Informatica. Intelligent Systems. Agents dealing with uncertainty. Example: driving to the airport. Why logic fails
Intelligent Systems Prof. dr. Paul De Bra Technische Universiteit Eindhoven debra@win.tue.nl Agents dealing with uncertainty Uncertainty Probability Syntax and Semantics Inference Independence and Bayes'
More informationBasics of Probability
Basics of Probability 1 Sample spaces, events and probabilities Begin with a set Ω the sample space e.g., 6 possible rolls of a die. ω Ω is a sample point/possible world/atomic event A probability space
More informationProbability, Conditional Independence
Probability, Conditional Independence June 19, 2012 Probability, Conditional Independence Probability Sample space Ω of events Each event ω Ω has an associated measure Probability of the event P(ω) Axioms
More informationCS 331: Artificial Intelligence Fundamentals of Probability II. Joint Probability Distribution. Marginalization. Marginalization
Full Joint Probability Distributions CS 331: Artificial Intelligence Fundamentals of Probability II Toothache Cavity Catch Toothache, Cavity, Catch false false false 0.576 false false true 0.144 false
More informationBayesian Networks. Mausam (Slides by UWAI faculty)
Bayesian Networks Mausam (Slides by UWAI faculty) Bayes Nets In general, joint distribution P over set of variables (X 1 x... x X n ) requires exponential space for representation & inference BNs provide
More informationProbability, Conditional Probability & Bayes Rule
Probability, Conditional Probability & Bayes Rule A FAST REVIEW OF DISCRETE PROBABILITY (PART 2) CIS 391 Intro to AI 2 Discrete random variables A random variable can take on one of a set of different
More informationLecture 2: Introduction to belief (Bayesian) networks
Lecture 2: Introduction to belief (Bayesian) networks Conditional independence What is a belief network? Independence maps (Imaps) January 7, 2008 1 COMP526 Lecture 2 Recall from last time: Conditional
More informationBayesian Networks Chapter 14. Mausam (Slides by UWAI faculty & David Page)
Bayesian Networks Chapter 14 Mausam (Slides by UWAI faculty & David Page) Bayes Nets In general, joint distribution P over set of variables (X 1 x... x X n ) requires exponential space for representation
More informationCS 771 Artificial Intelligence. Reasoning under uncertainty
CS 771 Artificial Intelligence Reasoning under uncertainty Today Representing uncertainty is useful in knowledge bases Probability provides a coherent framework for uncertainty Review basic concepts in
More informationUncertainty. Chapter 13. Chapter 13 1
Uncertainty Chapter 13 Chapter 13 1 Outline Uncertainty Probability Syntax and Semantics Inference Independence and Bayes Rule Chapter 13 2 Uncertainty Let action A t = leave for airport t minutes before
More informationUncertainty. Chapter 13. Chapter 13 1
Uncertainty Chapter 13 Chapter 13 1 Attribution Modified from Stuart Russell s slides (Berkeley) Chapter 13 2 Outline Uncertainty Probability Syntax and Semantics Inference Independence and Bayes Rule
More informationOutline. Syntax. Semantics. Parameterized distributions
Bayesian networks 1 Outline Syntax Semantics Parameterized distributions 2 Bayesian networks A simple, graphical notation for conditional independence assertions and hence for compact specification of
More informationBayesian Networks Chapter 14. Mausam (Slides by UWAI faculty & David Page)
Bayesian Networks Chapter 14 Mausam (Slides by UWAI faculty & David Page) Burglars and Earthquakes You are at a Done with the AI class party. Neighbor John calls to say your home alarm has gone off (but
More informationMarginal Independence and Conditional Independence
Marginal Independence and Conditional Independence Computer Science cpsc322, Lecture 26 (Textbook Chpt 6.12) March, 19, 2010 Recap with Example Marginalization Conditional Probability Chain Rule Lecture
More informationMethods for handling uncertainty. Outline. Uncertainty. Making decisions under uncertainty
Methods for handling uncertainty Uncertainty Chapter 13 Default or nonmonotonic logic: Assume my car does not have a flat tire Assume A 25 works unless contradicted by evidence Issues: What assumptions
More informationCS 188: Artificial Intelligence. Probability recap
CS 188: Artificial Intelligence Bayes Nets Representation and Independence Pieter Abbeel UC Berkeley Many slides over this course adapted from Dan Klein, Stuart Russell, Andrew Moore Conditional probability
More informationOutline. Probability. Random Variables. Joint and Marginal Distributions Conditional Distribution Product Rule, Chain Rule, Bayes Rule.
Probability 1 Probability Random Variables Outline Joint and Marginal Distributions Conditional Distribution Product Rule, Chain Rule, Bayes Rule Inference Independence 2 Inference in Ghostbusters A ghost
More informationMonte Carlo Artificial Intelligence Probability I. Random Variables
Monte Carlo Artificial Intelligence Probability I 1 Random Variables The basic element of probability is the random variable Think of the random variable as an event with some degree of uncertainty as
More informationProbability: Review. The state of the world is described using random. Probabilities are defined over events. E.g., g, D 1 1, D 2: rolls of two dice
robability: Review The state of the world is described using random variables robabilities are defined over events Sets of world states characterized by propositions about random variables E.g. g D D 2:
More informationData Modeling & Analysis Techniques. Probability & Statistics. Manfred Huber 2011 1
Data Modeling & Analysis Techniques Probability & Statistics Manfred Huber 2011 1 Probability and Statistics Probability and statistics are often used interchangeably but are different, related fields
More informationUniversität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Graphical Models. Niels Landwehr
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Graphical Models iels Landwehr Overview: Graphical Models Graphical models: tool for modelling domain with several random variables.
More information22c:145 Artificial Intelligence
22c:145 Artificial Intelligence Fall 2005 Uncertainty Cesare Tinelli The University of Iowa Copyright 200105 Cesare Tinelli and Hantao Zhang. a a These notes are copyrighted material and may not be used
More informationWhere are we in CS 440?
Where are we in CS 440? Now leaving: sequential deterministic reasoning Entering: probabilistic reasoning and machine learning robability: Review of main concepts Chapter 3 Motivation: lanning under uncertainty
More informationMaths Skills (MTCS) G5071
Maths Skills (MCS) G507 Lecture 9 Kingsley Sage Room 5C6, Pevensey III khs20@sussex.ac.uk University of Sussex 2006 Lecture 9 pplied probability and statistics Belief networks Entropy pplied probability
More informationP (X, Y Z) = P (X Z)P (Y Z), P (X =1, Y =1) P (X =1)P (Y =1).
CSE 150. Assignment 2 Summer 2016 Out: Thu Jun 30 Due: Tue Jul 05 beginning of class Supplementary reading: RN, Ch 14.114.3; KN, Ch 2.12.3. 1.1 Creative riting Attach events to the binary random variables
More informationDAGs, IMaps, Factorization, dseparation, Minimal IMaps, Bayesian Networks. Slides by Nir Friedman
DAGs, IMaps, Factorization, dseparation, Minimal IMaps, Bayesian Networks Slides by Nir Friedman. Probability Distributions Let X 1,,X n be random variables Let P be a joint distribution over X 1,,X
More informationBIN504  Lecture XI. Bayesian Inference & Markov Chains
BIN504  Lecture XI Bayesian Inference & References: Lee, Chp. 11 & Sec. 10.11 EwensGrant, Chps. 4, 7, 11 c 2012 Aybar C. Acar Including slides by Tolga Can Rev. 1.2 (Build 20130528191100) BIN 504  Bayes
More informationMachine Learning
Machine Learning 10701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University February 22, 2011 Today: Clustering Mixture model clustering Learning Bayes Net structure ChowLiu for trees
More informationBayesian networks. 1 Random variables and graphical models. Charles Elkan. October 23, 2012
Bayesian networks Charles Elkan October 23, 2012 Important: These lecture notes are based closely on notes written by Lawrence Saul. Text may be copied directly from his notes, or paraphrased. Also, these
More informationCOS513 LECTURE 2 CONDITIONAL INDEPENDENCE AND FACTORIZATION
COS513 LECTURE 2 CONDITIONAL INDEPENDENCE AND FACTORIATION HAIPENG HENG, JIEQI U Let { 1, 2,, n } be a set of random variables. Now we have a series of questions about them: What are the conditional probabilities
More informationLife of A Knowledge Base (KB)
Life of A Knowledge Base (KB) A knowledge base system is a special kind of database management system to for knowledge base management. KB extraction: knowledge extraction using statistical models in NLP/ML
More informationBayesian Networks. Read R&N Ch. 14.114.2. Next lecture: Read R&N 18.118.4
Bayesian Networks Read R&N Ch. 14.114.2 Next lecture: Read R&N 18.118.4 You will be expected to know Basic concepts and vocabulary of Bayesian networks. Nodes represent random variables. Directed arcs
More informationAn Introduction to Probabilistic Graphical Models
An Introduction to Probabilistic Graphical Models Reading: Chapters 17 and 18 in Wasserman. EE 527, Detection and Estimation Theory, An Introduction to Probabilistic Graphical Models 1 Directed Graphs
More informationToday. CS 188: Artificial Intelligence Spring More Bayes Rule. Bayes Rule. Utilities. Expectations. Bayes rule. Expectations and utilities
CS 88: Artificial Intelligence Spring 006 Bayes rule Today Lecture 9: Naïve Bayes //006 Dan Klein UC Berkeley Many slides from either Stuart Russell or Andrew Moore Expectations and utilities Naïve Bayes
More informationArtificial Intelligence Mar 27, Bayesian Networks 1 P (T D)P (D) + P (T D)P ( D) =
Artificial Intelligence 15381 Mar 27, 2007 Bayesian Networks 1 Recap of last lecture Probability: precise representation of uncertainty Probability theory: optimal updating of knowledge based on new information
More informationAI Tutorial 2 With Answers Harbin/Adelaide Course July 2016
AI Tutorial 2 With Answers Harbin/Adelaide Course July 2016 Note: you should, wherever possible, attempt these problems TWO ways. The first (and mandatory unless specified otherwise) is to work through
More information1. First, write our query in general form: 2. Next, loop iteratively to calculate main result:
Variable Elimination Class #17: Inference in Bayes Nets, II Artificial Intelligence (CS 452/552): M. Allen, 16 Oct. 15! Want to know the probability of some value of X, given known evidence e, and other
More informationGraphical Models. CS 343: Artificial Intelligence Bayesian Networks. Raymond J. Mooney. Conditional Probability Tables.
Graphical Models CS 343: Artificial Intelligence Bayesian Networks Raymond J. Mooney University of Texas at Austin 1 If no assumption of independence is made, then an exponential number of parameters must
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Raquel Urtasun and Tamir Hazan TTI Chicago April 18, 2011 Raquel Urtasun and Tamir Hazan (TTIC) Graphical Models April 18, 2011 1 / 23 Inference: conditional probabilities
More informationRelational Dynamic Bayesian Networks: a report. Cristina Manfredotti
Relational Dynamic Bayesian Networks: a report Cristina Manfredotti Dipartimento di Informatica, Sistemistica e Comunicazione (D.I.S.Co.) Università degli Studi MilanoBicocca manfredotti@disco.unimib.it
More informationThe Basics of Graphical Models
The Basics of Graphical Models David M. Blei Columbia University October 3, 2015 Introduction These notes follow Chapter 2 of An Introduction to Probabilistic Graphical Models by Michael Jordan. Many figures
More informationBayesian Networks Representation. Machine Learning CSE446 Carlos Guestrin University of Washington. May 29, Carlos Guestrin
Bayesian Networks Representation Machine Learning CSE446 Carlos Guestrin University of Washington May 29, 2013 1 Causal structure n Suppose we know the following: The flu causes sinus inflammation Allergies
More informationBayes Nets. CS 188: Artificial Intelligence. Inference. Inference by Enumeration. Inference by Enumeration? Example: Enumeration.
CS 188: Artificial Intelligence Probabilistic Inference: Enumeration, Variable Elimination, Sampling Pieter Abbeel UC Berkeley Many slides over this course adapted from Dan Klein, Stuart Russell, Andrew
More informationBayesian Networks Tutorial I Basics of Probability Theory and Bayesian Networks
Bayesian Networks 2015 2016 Tutorial I Basics of Probability Theory and Bayesian Networks Peter Lucas LIACS, Leiden University Elementary probability theory We adopt the following notations with respect
More informationIntroduction to Discrete Probability Theory and Bayesian Networks
Introduction to Discrete Probability Theory and Bayesian Networks Dr Michael Ashcroft September 15, 2011 This document remains the property of Inatas. Reproduction in whole or in part without the written
More informationSTAT 535 Lecture 4 Graphical representations of conditional independence Part II Bayesian Networks c Marina Meilă
TT 535 Lecture 4 Graphical representations of conditional independence Part II Bayesian Networks c Marina Meilă mmp@stat.washington.edu 1 limitation of Markov networks Between two variables or subsets
More informationIntroduction to Graphical Models
obert Collins CSE586 Credits: Several slides are from: Introduction to Graphical odels eadings in Prince textbook: Chapters 0 and but mainly only on directed graphs at this time eview: Probability Theory
More informationLecture 13: Knowledge Representation and Reasoning Part 2 Sanjeev Arora Elad Hazan
COS 402 Machine Learning and Artificial Intelligence Fall 2016 Lecture 13: Knowledge Representation and Reasoning Part 2 Sanjeev Arora Elad Hazan (Borrows from slides of Percy Liang, Stanford U.) Midterm
More informationVariable Elimination 1
Variable Elimination 1 Inference Exact inference Enumeration Variable elimination Junction trees and belief propagation Approximate inference Loopy belief propagation Sampling methods: likelihood weighting,
More informationMultiAgent Influence Diagrams
MultiAgent Influence Diagrams CS 886: MultiAgent Systems MultiAgent influence diagrams for representing and solving games [1] Authored by D. Koller and B. Milch Tyler Nowicki Computer Science University
More informationInference in Bayesian networks
Inference in Bayesian networks hapter 14.4 5 hapter 14.4 5 1 Outline Exact inference by enumeration Exact inference by variable elimination Approximate inference by stochastic simulation Approximate inference
More informationA crash course in probability and Naïve Bayes classification
Probability theory A crash course in probability and Naïve Bayes classification Chapter 9 Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s
More informationThe Resolution Proof System in Propositional Logic
The Resolution Proof System in Propositional Logic While the Hilbert proof system has the property that it reflects the methods by which humans commonly do mathematics, it is ill suited for use in automated
More informationSociology 6Z03 Topic 10: Probability (Part I)
Sociology 6Z03 Topic 10: Probability (Part I) John Fox McMaster University Fall 2016 John Fox (McMaster University) Soc 6Z03: Probability I Fall 2016 1 / 29 Outline: Probability (Part I) Introduction Probability
More informationBayesian networks. Data Mining. Abraham Otero. Data Mining. Agenda. Bayes' theorem Naive Bayes Bayesian networks Bayesian networks (example)
Bayesian networks 1/40 Agenda Introduction Bayes' theorem Bayesian networks Quick reference 2/40 1 Introduction Probabilistic methods: They are backed by a strong mathematical formalism. They can capture
More informationConditional Probability
Conditional Probability STA 281 Fall 2011 1 Definition Often we are only interested in particular rows or columns of a probability table. Consider the newspaper example, and the question Of those that
More informationBayesian Tutorial (Sheet Updated 20 March)
Bayesian Tutorial (Sheet Updated 20 March) Practice Questions (for discussing in Class) Week starting 21 March 2016 1. What is the probability that the total of two dice will be greater than 8, given that
More informationReview of Probability
Review of Probability David M. Blei Columbia University September 5, 2016 Before we discuss data and models, we will learn about graphical models. Graphical models are a way of using graphs to represent
More informationMotivation and Applications: Why Should I Study Probability?
Motivation and Applications: Why Should I Study Probability? As stated by Laplace, Probability is common sense reduced to calculation. You need to first learn the theory required to correctly do these
More informationAnnouncements. CS 188: Artificial Intelligence Fall Bayes Nets. Bayes Net Semantics. Building the (Entire) Joint. Example: Alarm Network
CS 188: Artificial Intelligence Fall 2009 Lecture 15: ayes Nets II Independence 10/15/2009 Announcements Midterm 10/22: see prep page on web One page note sheet, nonprogrammable calculators eview sessions
More informationProbability Models. Weather example: Raining or not Sunny or not Hot or not
Probability Models Venn diagram for all combinations of 3 binary (true/false) events. Weather example: Raining or not Sunny or not Hot or not Sample spaces: When a random event happens, what is the set
More informationStatistical Machine Learning Methods for Bioinformatics VII. Introduction to Bayesian Network Theory and Applications
Statistical Machine Learning Methods for Bioinformatics VII. Introduction to Bayesian Network Theory and Applications Jianlin Cheng, PhD Computer Science Department and Informatics Institute University
More informationMachine Learning
Machine Learning 10701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University February 8, 2011 Today: Graphical models Bayes Nets: Representing distributions Conditional independencies
More informationLecture 2 Machine Learning, Probability Fundamentals COSI 134
Lecture Machine Learning, Probability Fundamentals COSI 34 Machine Learning Why Machine Learning? Difficult to define some tasks, except by example Hidden relationships in lots of data (data mining) Rapidly
More information20 Probability and the Law of Addition
20 Probability and the Law of Addition Notation We are interested in the probabilities of events. Whatever events are, they are not numbers. Therefore, given events A and B, if I speak of the event A +
More informationDiscrete Mathematics and Probability Theory Fall 2009 Satish Rao,David Tse Note 11
CS 70 Discrete Mathematics and Probability Theory Fall 2009 Satish Rao,David Tse Note Conditional Probability A pharmaceutical company is marketing a new test for a certain medical condition. According
More information= = 2 3. variations on this classic theme. PROBABILITY MODELS 15
PROBABILITY MODELS 15 4. Conditional probability Suppose, for a random experiment, you are given some information about its outcome. For example, in a single roll of a fair 6sided die, suppose I tell
More informationDiscrete Mathematics & Mathematical Reasoning Chapter 7: Discrete Probability
Discrete Mathematics & Mathematical Reasoning Chapter 7: Discrete Probability Kousha Etessami U. of Edinburgh, UK Kousha Etessami (U. of Edinburgh, UK) Discrete Mathematics (Chapter 7) 1 / 16 Overview
More informationIf rules "a" and "a => b ^ c" exist in the rule base, show the steps necessary to prove "c" is true
Fill in the following truth table: a b a ^ b a v b a => b a b false false false true true false true true If probability of event a (P(a))=.2, P(b)=.3, P(c)=.5, P(a ^ b)=.1 (a AND b), P(a ^ c)=.15,
More informationThe CertaintyFactor Model
The CertaintyFactor Model David Heckerman Departments of Computer Science and Pathology University of Southern California HMR 204, 2025 Zonal Ave Los Angeles, CA 90033 dheck@sumexaim.stanford.edu 1 Introduction
More informationStatistical learning; Naive Bayes Learning; Classification; Evaluation; Smoothing
Statistical learning; Naive Bayes Learning; Classification; Evaluation; Smoothing Chapter 20, Sections 1 3 Chapter 20, Sections 1 3 1 Attribution Modified from Stuart Russell s slides (Berkeley) Parts
More informationLearning from Data: Naive Bayes
October 13, 2005 http://www.anc.ed.ac.uk/ amos/lfd/ Naive Bayes Typical example: Bayesian Spam Filter. Naive means naive. Bayesian methods can be much more sophisticated. Basic assumption: conditional
More informationLecture 11: Further Topics in Bayesian Statistical Modeling: Graphical Modelling and Model Selection with DIC
Lecture 11: Further topics in Bayesian statistical modeling [1] Lecture 11: Further Topics in Bayesian Statistical Modeling: Graphical Modelling and Model Selection with DIC Graphical Models Lecture 11:
More informationProbabilistic Graphical Models
Probabilistic Graphical Models David Sontag New York University Lecture 1, January 26, 2012 David Sontag (NYU) Graphical Models Lecture 1, January 26, 2012 1 / 37 One of the most exciting advances in machine
More informationProbability Theory. Elementary rules of probability Sum rule. Product rule. p. 23
Probability Theory Uncertainty is key concept in machine learning. Probability provides consistent framework for the quantification and manipulation of uncertainty. Probability of an event is the fraction
More informationIntro to Probabilistic Relational Models
Intro to Probabilistic Relational Models James Lenfestey, with Tom Temple and Ethan Howe Intro to Probabilistic Relational Models p.1/24 Outline Motivate problem Define PRMs Extensions and future work
More informationA Tractable Inference Algorithm for Diagnosing Multiple Diseases 1
A Tractable Inference Algorithm for Diagnosing Multiple Diseases 1 David Heckerman Medical Computer Science Group Knowledge Systems Laboratory Departments of Computer Science and Medicine Stanford, California
More informationMultiAgent Influence Diagrams for Representing and Solving Games 1
MultiAgent Influence Diagrams for Representing and Solving Games 1 Daphne Koller Stanford University Email: koller@cs.stanford.edu Brian Milch University of California Berkeley milch@cs.berkeley.edu
More informationThe Representational Power of Discrete Bayesian Networks
Journal of Machine Learning Research 3 (2002) 709721 Submitted 12/01; Published 12/02 The Representational Power of Discrete Bayesian Networks Charles X. Ling Department of Computer Science The University
More informationIntelligent Systems: Reasoning and Recognition. Uncertainty and Plausible Reasoning
Intelligent Systems: Reasoning and Recognition James L. Crowley ENSIMAG 2 / MoSIG M1 Second Semester 2015/2016 Lesson 12 25 march 2015 Uncertainty and Plausible Reasoning MYCIN (continued)...2 Backward
More informationApplications of Methods of Proof
CHAPTER 4 Applications of Methods of Proof 1. Set Operations 1.1. Set Operations. The settheoretic operations, intersection, union, and complementation, defined in Chapter 1.1 Introduction to Sets are
More informationConcept Learning. Machine Learning 1
Concept Learning Inducing general functions from specific training examples is a main issue of machine learning. Concept Learning: Acquiring the definition of a general category from given sample positive
More informationCOS 424: Interacting with Data. Lecturer: Dave Blei Lecture #10 Scribe: Sunghwan Ihm March 8, 2007
COS 424: Interacting with Data Lecturer: Dave Blei Lecture #10 Scribe: Sunghwan Ihm March 8, 2007 Homework #3 will be released soon. Some questions and answers given by comments: Computational efficiency
More informationTALLUR, GAYATRI, M.S. Uncertain Data Integration with Probabilities. (2013) Directed by Dr. Fereidoon Sadri. 48 pp.
TALLUR, GAYATRI, M.S. Uncertain Data Integration with Probabilities. (2013) Directed by Dr. Fereidoon Sadri. 48 pp. Real world applications that deal with information extraction, such as business intelligence
More informationExact Bayesian Learning of Ancestor Relations in Bayesian Networks
Exact Bayesian Learning of Ancestor Relations in Bayesian Networks Yetian Chen Lingjian Meng Jin Tian Department of Computer Science, Iowa State University, USA AI & Statistics 2015 Chen, Meng, Tian (CS@ISU)
More informationCognitive Modeling. Motivation. Graphical Models. Motivation Graphical Models Bayes Nets and Bayesian Statistics. Lecture 15: Bayes Nets
Cognitive Modeling Lecture 15: Sharon Goldwater School of Informatics University of Edinburgh sgwater@inf.ed.ac.uk March 5, 2010 Sharon Goldwater Cognitive Modeling 1 1 and Bayesian Statistics 2 3 Reading:
More informationIntroduction to Probability (1)
Introduction to Probability (1) John Kelleher & Brian Mac Namee 1 Introduction Summarizing Uncertainty 2 Where Do Probabilities Come From? 3 Basic Probability Notation Basics 4 The Axioms of Probabilities
More informationSection 2.2 Sample Space and Events
Section 2.2 Sample Space and Events We consider an experiment whose outcome is not predictable with certainty. However, we suppose that the set of all possible outcomes is known. DEFINITION: The set of
More informationClassification and Prediction
Classification and Prediction Classification and Prediction What is classification? What is prediction? Issues regarding classification and prediction Classification by decision tree induction Bayesian
More informationComputing the Probability of Formulas Representing Events in Product Spaces
Computing the Probability of Formulas Representing Events in Product Spaces P.A. Monney B. Anrig Seminar of Statistics Institute of Informatics Avenue de Beauregard 11 Rue Faucigny 2 University of Fribourg
More informationQuerying Joint Probability Distributions
Querying Joint Probability Distributions Sargur Srihari srihari@cedar.buffalo.edu 1 Queries of Interest Probabilistic Graphical Models (BNs and MNs) represent joint probability distributions over multiple
More informationWarmup. Probability in the movies
Warmup Probability in the movies Announcements Upcoming due dates Wednesday 10/14 11:59pm Homework 5 Friday 10/16 5:00 pm Project 2 Probability sessions Continuing for the next couple of weeks, including
More informationInference by Variable Elimination. p.1
Inference by Variable Elimination p.1 A Bayesian Network Winter? (A) Sprinkler? (B) Rain? (C) Wet Grass? (D) Slippery Road? (E) A true.6 false.4 Θ A A B Θ B A true true.2 true false.8 false true.75 false
More informationConditional probability. Math 425 Introduction to Probability Lecture 9. Example. Conjunctions
Review: Conditioning Rule and Conjunctions Conditional probability Math 425 Introduction to Probability Lecture 9 Definition (Conditioning Rule) Let E and F be events. The conditional probability of E
More information