Logic, Probability and Learning


 Spencer Barnett
 2 years ago
 Views:
Transcription
1 Logic, Probability and Learning Luc De Raedt
2 Overview Logic Learning Probabilistic Learning Probabilistic Logic Learning
3 Closely following : Russell and Norvig, AI: a modern approach Freiburg AI course slides (Burgard, Nebel, De Raedt) Probabilistic Learning
4 Probabilistic Learning 1. Probability theory 2. Bayesian Networks 3. Probabilistic Learning 4. Alternative models
5 Probabilistic Learning 1. Probability theory 2. Bayesian Networks 3. Probabilistic Learning 4. Alternative models
6 Daughter in Florence
7 Degree of belief and Probability Theory We (and other agents) are convinced by facts and rules only up to a certain degree. One possibility for expressing the degree of belief is to use probabilities. The agent is 90% (or 0.9) convinced by its sensor information = in 9 out of 10 cases, the information is correct (the agent believes). Probabilities sum up the uncertainty that stems from lack of knowledge.
8 Unconditional Probabilities P(A) denotes the unconditional probability or prior probability that A will appear in the absence of any other information, for example: P(Cavity) = 0.1 Cavity is a proposition. We obtain prior probabilities from statistical analysis or general rules.
9 Unconditional Probabilities (2) In general, a random variable can take on true and false values, as well as other values: P(Weather=Sunny) = 0.7 P(Weather=Rain) = 0.2 P(Weather=Cloudy) = 0.08 P(Weather=Snow) = 0.02 P(Headache=TRUE) = 0.1 Propositions can contain equations over random variables. Logical connectors can be used to build propositions, e.g. P(Cavity & Injured) = 0.06.
10 Unconditional Probabilities P(x) is the vector of probabilities for the (ordered) domain of the random variable X: P(Headache) = 0.1, 0.9 P(Weather) = 0.7, 0.2, 0.08, 0.02 define the probability distribution for the random variables Headache and Weather. P(Headache, Weather) is a 4x2 table of probabilities of all combinations of the values of a set of random variables. Headache = TRUE Headache = FALSE Weather = Sunny P(W = Sunny & Headache) P(W = Sunny & Headache) Weather = Rain Weather = Cloudy Weather = Snow
11 Conditional Probabilities (1) New information can change the probability. Example: The probability of a cavity increases if we know the patient has a toothache. If additional information is available, we can no longer use the prior probabilities! P(A B) is the conditional or posterior probability of A given that all we know is B: P(Cavity Toothache) = 0.8 P(X Y) is the table of all conditional probabilities over all values of X and Y.
12 Conditional Probabilities (2) P(Weather Headache) is a 4x2 table of conditional probabilities of all combinations of the values of a set of random variables. Headache = TRUE Headache = FALSE Weather = Sunny P(W = Sunny Headache) P(W = Sunny Headache) Weather = Rain Weather = Cloudy Weather = Snow Conditional probabilities result from unconditional probabilities (if P(B)>0) (by definition) P(A B) = P(A & B) P(B)
13 Conditional Probabilities (3) P(X,Y) = P(X Y) P(Y) corresponds to an equality system: P(W = Sunny & Headache) = P(W = Sunny Headache) P(Headache) P(W = Rain & Headache) = P(W = Rain Headache) P(Headache)... =... P(W = Snow & Headache) = P(W = Snow Headache) P( Headache)
14 Conditional Probabilities (4) P(A&B) P(A B) = P(B) Product rule: P(A&B) = P(A B) P(B) Analog: P(A&B) = P(B A) P(A) A and B are marginally independent if P(A B) = P(A) (equiv. P(B A) = P(B)). Then (and only then) it holds that P(A&B) = P(A) P(B).
15 Joint Probability The agent assigns probabilities to every proposition in the domain. An atomic event is an assignment of values to all random variables X 1,, X n (= complete specification of a state). Example: Let X and Y be boolean variables. Then we have the following 4 atomic events: X. Y, X. Y, X. Y, X. Y. The joint probability distribution P(X 1,, X n ) assigns a probability to every atomic event. Toothache Toothache Cavity Cavity Since all atomic events are disjoint, the sum of all fields is 1 (disjunction of events). The conjunction is necessarily false.
16 Working with Joint Probability All relevant probabilities can be computed using the joint probability by expressing them as a disjunction of atomic events. Examples: P(Cavity V Toothache) = P(Cavity & Toothache) + P( Cavity & Toothache) + P(Cavity & Toothache) We obtain unconditional probabilities by adding across a row or column (marginalization): P(Cavity) = P(Cavity & Toothache) + P(Cavity & Toothache) P(Cavity Toothache) = P(Cavity & Toothache) 0.04 = P(Toothache) = 0.80
17 Problems with Joint Distribution We can easily obtain all probabilities from the joint probability. The joint probability, however, involves k n values, if there are n random variables with k values. Difficult to represent Difficult to assess Questions: 1. Is there a more compact way of representing joint probabilities? 2. Is there an efficient method to work with this representation? Not in general, but it can work in many cases. Modern systems work directly with conditional probabilities and make assumptions on the independence of variables in order to simplify calculations.
18 Bayes Rule We know (product rule): P(A.B) = P(A B) P(B) and P(A.B) = P(B A) P(A) By equating the righthand sides, we get P(A B) P(B) = P(B A) P(A) P(B A) P(A) P(A B) = P(B) For multivalued variables (set of equalities): P(X Y) P(Y) P(Y X) = P(X) Generalization (conditioning on background evidence E): P(Y X,E) = P(X Y,E) P(Y E) P(X E)
19 Applying Bayes Rule P(Toothache Cavity) = 0.4 P(Cavity) = 0.1 P(Toothache) = 0.05 P(Cavity Toothache) = 0.4 x 0.1 = Why don t we try to assess P(Cavity Toothache) directly? P(Toothache Cavity) (causal) is more robust than P(Cavity Toothache) (diagnostic): P(Toothache Cavity) is independent from the prior probabilities P (Toothache) and P(Cavity). If there is a cavity epidemic and P(Cavity) increases, P(Toothache Cavity) does not change, but P(Toothache) and P(Cavity Toothache) will change proportionally.
20 Multiple Evidence (1) A dentist s probe catches in the aching tooth of a patient. Using Bayes rule, we can calculate: P(Cavity Catch) = 0.95 But how does the combined evidence help? Using Bayes rule, the dentist could establish: P(Cav Tooth & Catch) = P(Tooth & Catch Cav) x P(Cav) P(Tooth & Catch) P(Cav Tooth & Catch) = α P(Tooth & Catch Cav) x P(Cav)
21 Multiple Evidence (2) Problem: The dentist needs P(Tooth & Catch Cav), i.e. diagnostic knowledge of all combinations of symptoms in the general case. It would be nice if Tooth and Catch were independent but they are not: if a probe catches in the tooth, it probably has cavity which probably causes toothache. They are independent given we know whether the tooth has cavity: P(Tooth & Catch Cav) = P(Tooth Cav) P(Catch Cav) Each is directly caused by the cavity but neither has a direct effect on the other.
22 Conditional Independence The general definition of conditional independence of two variables X and Y given a third variable Z is: P(X,Y Z) = P(X Z) P(Y Z) Thus our diagnostic problem turns into: P(Cav Tooth & Catch) = α P(Tooth Cav) P(Catch Cav) P(Cav)
23 Probabilistic Learning 1. Probability theory 2. Bayesian Networks 3. Probabilistic Learning 4. Alternative models
24 Different Types of Models Graphical Models Directed  Bayesian Networks  HMMs (special case) Undirected  Markov Networks  Conditional Random Fields
25 Bayesian Networks (also belief networks, probabilistic networks, causal networks) 1. The random variables are the nodes. 2. Directed edges between nodes represent direct influence. 3. A table of conditional probabilities (CPT) is associated with every node, in which the effect of the parent nodes is quantified. 4. The graph is acyclic (a DAG). Remark: Burglary and Earthquake are denoted as the parents of Alarm
26 Bayesian Networks
27 The Meaning of Bayesian Networks Alarm depends on Burglary and Earthquake. MaryCalls only depends on Alarm. P(MaryCalls Alarm, Burglary) = P(MaryCalls Alarm) A Bayesian Network can be considered as a set of independence assumptions.
28 Conditional Independence Relations in Bayesian Networks (1) A node is conditionally independent of its nondescendants given its parents.
29 Example JohnCalls is cond. independent of Burglary and Earthquake given the value of Alarm.
30 Bayesian Networks and the Joint Probability Bayesian networks can be seen as a more comprehensive representation of joint probabilities. Let all nodes X 1,, X n be ordered topologically according to the the arrows in the network. Let x 1,, x n be the values of the variables. Then P(x 1,, x n ) = P(x n x n1,, x 1 ) P(x 2 x 1 ) P(x 1 ) = n i=1 P(x i x i1,, x 1 ) From the independence assumption, this is equivalent to P(x 1,, x n ) = n i=1 P(x i parents(x i )) We can calculate the joint probability from the network topology and the CPTs!
31 Example Only the probabilities for positive events are given. The negative probabilities can be found using P( X) = 1 P(X). P(J, M, A, B, E) = = P(J A) P(M A) P(A B, E) P( B) P( E) = 0.9 x 0.7 x x x =
32 Compactness of Bayesian Networks For the explicit representation of Bayesian networks, we need a table of size 2 n where n is the number of variables. In the case that every node in a network has at most k parents, we only need n tables of size 2 k (assuming boolean variables). Example: n = 20 and k = = and 20 x 2 5 = 640 different explicitlyrepresented probabilities! In the worst case, a Bayesian network can become exponentially large, for example if every variable is directly influenced by all the others. The size depends on the application domain (local vs. global interaction) and the skill of the designer.
33 Inference in Bayesian Networks Instantiating evidence variables and sending queries to nodes. What is P(Burglary JohnCalls) or P(Burglary JohnCalls, MaryCalls)?
34 Conditional Independence Relations in Bayesian Networks (1) A node is conditionally independent of its nondescendants given its parents.
35 Example JohnCalls is independent of Burglary and Earthquake given the value of Alarm.
36 Conditional Independence Relations in Bayesian Networks (2) A node is conditionally independent of all other nodes in the network given the Markov blanket, i.e. its parents, children and children s parents.
37 Example Burglary is independent of JohnCalls and MaryCalls, given the values of Alarm and Earthquake.
38 Various problems for BNs Find P(state) Find P(X e)  typical query Find most likely state arg max X P(X=x e) Learning
39 Exact Inference in Bayesian Networks Compute the posterior probability distribution for a set of query variables X given an observation, i.e. the values of a set of evidence variables E. Complete set of variables is X E Y Y are called the hidden variables Typical query P(X e) where e are the observed values of E. In the remainder: X is a singleton Example: P(Burglary JohnCalls = true, MaryCalls=true) = (0.284,0.716)
40 Inference by Enumeration P(X e) = α P(X,e) = α P(X,e,y) The network gives a complete representation of the full joint distribution. A query can be answered using a Bayesian network by computing sums of products of conditional probabilities from the network.
41 Example Consider P(Burglary JohnCalls = true, MaryCalls=true) The hidden variables are Earthquake and Alarm. We have: P(B j, m) = α P(B, j, m) = α P(B, j, m, e, a) If we consider the independence of variables, we obtain for B=b P(b j, m) = α P(j a) P(m a) P(a e,b) P(e) P(b)
42 Complexity of Exact Inference If the network is singly connected or a polytree, the time and space complexity of exact inference is linear in the size of the network. The burglary example is a typical singly connected network. For multiply connected networks inference in Bayesian Networks is NPhard. There are approximate inference methods for multiply connected networks such as sampling techniques or Markov chain Monte Carlo.
43 Probabilistic Learning 1. Probability theory 2. Bayesian Networks 3. Probabilistic Learning 4. Alternative models
44 Probabilistic Learning Two problems parameter estimation structure learning
45 Parameter Estimation Given a set of examples E, a probabilistic model M = (S, λ) with structure S and parameters λ, a probabilistic coverage relation P (e M) that computes the probabilty of observering the example e given the model M, a scoring function score(e, M) that employs the probabilistic coverage relation P (e M) Find the parameters λ that maximize score(e, M), i.e., λ = arg max λ score(e, (S, λ)) (1)
46 Scoring functions maximally a posteriori hypothesis H MAP H MAP = arg max H P (H E) = arg max H P (E H) P (H) P (E) (1) Simplified into maximum likelihood hypothesis H ML equally likely all hypotheses a priori H ML = arg max H P (E H) (2) Assuming independently and identically distributed examples: H ML = arg max P (e i H) (3) H e i E Often easier to maximized the log likelihood H LL = arg max log P (e i H) = arg max H H e i E log P (e i H) (4) e i E
47 Toy Example
48 Parameter Estimation A1 A2 A3 A4 A5 A6 true true false true false false false true true true false false true false false false true true complete data set simply counting X1 X2... XM
49 Parameter Estimation hidden/ latent A1 A2 A3 A4 A5 A6 true true? true false false? true?? false false true false? false true? incomplete data set Realworld data: states of some random variables are missing E.g. medical diagnose: not all patient are subjects to all test Parameter reduction, e.g. clustering,...? missing value
50 On the Alarm Net
51 Derivation (fully obs) Generalize : P (a, m) = λ 0 (1 λ 2 ) P (E M) = λ a 0.(1 λ 0) a.λ a m 1 (1 λ 1 ) a m.λ a m 2.(1 λ 2 ) a m Use log likelihood: L = log P (E M) = a. log λ 0 + a. log(1 λ 0 ) + a m. log λ 1 + a m. log(1 λ 1 ) + a m. log λ 2 + a m. log(1 λ 2 )
52 Derivation (fully obs) (2) Derivatives are : Yielding as solutions L = a a λ 0 λ 0 1 λ 0 L a m a m = λ 1 λ 1 1 λ 1 L a m a m = λ 2 λ 2 1 λ 2 λ 0 = λ 1 = λ 2 = a a + a a m a m + a m a m a m + a m (1) (2) (3) So, simply counting!
53 Parameter Estimation (part. obs) Idea Expectation Maximization Incomplete data? 1. Complete data (Imputation) most probable?, average?,... value 2. Count 3. Iterate
54 EM Idea: Complete the incomplete data complete A M true true true? false true true false false? P (m a) = 0.6 P (m a) = 0.2 complete data expected counts A true M true N 1.6 iterate true false false false true false λ 1 = P (m a) = λ 2 = P (m a) = = = 0.6
55 EM  formulae Q(λ), the likelihood is a function of λ: Q(λ) = E[L(λ)] = E[log P (E M(λ)] = E ( log P (e i M(λ)) ) e i E = E ( log P (x i, y i M(λ))] ) e i E = P (y i x i, M(λ j )) log P (x i, y i M(λ)) e i E
56 EM
57 Parameter Tying P (E M) = λ a 0.(1 λ 0) a.λ a m 1 (1 λ 1 ) a m.λ a m 2.(1 λ 2 ) a m. λ a j 3.(1 λ 3 ) a j.λ a j 4 (1 λ 4 ) a j Assume λ 1 = λ 3 and λ 2 = λ 4. The likelihood simplifies to P (E M) = λ a 0.(1 λ 0) a.λ a m + a j 1 (1 λ 1 ) a m + a j. λ a m + a j 2.(1 λ 2 ) a m + a m
58 Structure Learning Given a set of examples E a language L M of possible models of the form M = (S, λ) with structure S and parameters λ a probabilistic coverage relation P (e M) that computes the probabilty of observering the example e given the model M, a scoring function score(e, M) that employs the probabilistic coverage relation P (e M) Find the model M = (S, λ) that maximizes score(e, M), i.e., M = arg max M score(e, M) (1)
59 Structure learning H MAP = arg max H = arg max H = arg max H = arg max H P (H E) (1) P (E H)P (H) P (E) (2) P (E H)P (H) (3) log P (E H) + log P (H) (4) Often : heuristic search through space of models Bayesian Nets : add, delete or reverse arcs a kind of refinement operator
60 Probabilistic Learning 1. Probability theory 2. Bayesian Networks 3. Probabilistic Learning 4. Alternative models
61 Markov Models Define probability distribution over sequences. Assume that i, j 1 : P(X i X i 1 ) = P(X j X j 1 ) (1) i, j 1 : P(X i X i 1, X i 2,, X 0 ) = P(X i X i 1 ) (2) The Markov model factorizes as P(X 0,, X n ) = P(X 0 ) P(X i X i 1 ) i>0 (3)
62 Hidden Markov Model Assume also that i, j 0 : P(Y i X i ) = P(Y j X j ) (1) The hidden Markov model factorizes as n P(Y 0,, Y n, X 0,, X n ) = P(X 0 )P(Y 0 X 0 ) P(Y i X i )P(X i X i 1 ) (2) i>0
63 Example Possible states {rain, sun} Observations {umbrella, no umbrella } Questions Given sequence of umbrella, no umbrella  what was the weather outside? Is there now rain (given a sequence of observations)? Will there be sun tomorrow, given a sequence of observations? Next week?
64 Typical Problems Decoding: compute the probability P(Y 0 = y 0,, Y n = y n ) of a particular observation sequence. Most likely state sequence: compute the most likely hidden state sequence corresponding to a particular observation sequence, that is arg max P (X 0 = x 0,, X n = x n Y 0 = y o,, Y n = y n ) (1) x 0,,x n State prediction:compute the most likely state state based on a particular observation sequence: arg max x t P (X t = x t,, X n = x n Y 0 = y o,, Y n = y n ) (2) If t < n, this is called smoothing; for t = n, filtering, and if t > n it is prediction Parameter estimation For Markov Models : EFFICIENT
65 Conclusions Principles of Probability Theory & Graphical Models A wide variety of such models exist They can be learned from data
13.3 Inference Using Full Joint Distribution
191 The probability distribution on a single variable must sum to 1 It is also true that any joint probability distribution on any set of variables must sum to 1 Recall that any proposition a is equivalent
More informationOutline. Uncertainty. Uncertainty. Making decisions under uncertainty. Probability. Methods for handling uncertainty
Outline Uncertainty Chapter 13 Uncertainty Probability Syntax and Semantics Inference Independence and Bayes' Rule Uncertainty Let action A t = leave for airport t minutes before flight Will A t get me
More informationBayesian Networks. Mausam (Slides by UWAI faculty)
Bayesian Networks Mausam (Slides by UWAI faculty) Bayes Nets In general, joint distribution P over set of variables (X 1 x... x X n ) requires exponential space for representation & inference BNs provide
More informationBayesian Networks Chapter 14. Mausam (Slides by UWAI faculty & David Page)
Bayesian Networks Chapter 14 Mausam (Slides by UWAI faculty & David Page) Bayes Nets In general, joint distribution P over set of variables (X 1 x... x X n ) requires exponential space for representation
More informationArtificial Intelligence
Artificial Intelligence ICS461 Fall 2010 Nancy E. Reed nreed@hawaii.edu 1 Lecture #13 Uncertainty Outline Uncertainty Probability Syntax and Semantics Inference Independence and Bayes' Rule Uncertainty
More informationCS 771 Artificial Intelligence. Reasoning under uncertainty
CS 771 Artificial Intelligence Reasoning under uncertainty Today Representing uncertainty is useful in knowledge bases Probability provides a coherent framework for uncertainty Review basic concepts in
More informationProbability, Conditional Independence
Probability, Conditional Independence June 19, 2012 Probability, Conditional Independence Probability Sample space Ω of events Each event ω Ω has an associated measure Probability of the event P(ω) Axioms
More information/ Informatica. Intelligent Systems. Agents dealing with uncertainty. Example: driving to the airport. Why logic fails
Intelligent Systems Prof. dr. Paul De Bra Technische Universiteit Eindhoven debra@win.tue.nl Agents dealing with uncertainty Uncertainty Probability Syntax and Semantics Inference Independence and Bayes'
More informationArtificial Intelligence. Conditional probability. Inference by enumeration. Independence. Lesson 11 (From Russell & Norvig)
Artificial Intelligence Conditional probability Conditional or posterior probabilities e.g., cavity toothache) = 0.8 i.e., given that toothache is all I know tation for conditional distributions: Cavity
More informationBasics of Probability
Basics of Probability 1 Sample spaces, events and probabilities Begin with a set Ω the sample space e.g., 6 possible rolls of a die. ω Ω is a sample point/possible world/atomic event A probability space
More informationUncertainty. Chapter 13. Chapter 13 1
Uncertainty Chapter 13 Chapter 13 1 Attribution Modified from Stuart Russell s slides (Berkeley) Chapter 13 2 Outline Uncertainty Probability Syntax and Semantics Inference Independence and Bayes Rule
More informationCS 188: Artificial Intelligence. Probability recap
CS 188: Artificial Intelligence Bayes Nets Representation and Independence Pieter Abbeel UC Berkeley Many slides over this course adapted from Dan Klein, Stuart Russell, Andrew Moore Conditional probability
More informationLecture 2: Introduction to belief (Bayesian) networks
Lecture 2: Introduction to belief (Bayesian) networks Conditional independence What is a belief network? Independence maps (Imaps) January 7, 2008 1 COMP526 Lecture 2 Recall from last time: Conditional
More informationVariable Elimination 1
Variable Elimination 1 Inference Exact inference Enumeration Variable elimination Junction trees and belief propagation Approximate inference Loopy belief propagation Sampling methods: likelihood weighting,
More informationGraphical Models. CS 343: Artificial Intelligence Bayesian Networks. Raymond J. Mooney. Conditional Probability Tables.
Graphical Models CS 343: Artificial Intelligence Bayesian Networks Raymond J. Mooney University of Texas at Austin 1 If no assumption of independence is made, then an exponential number of parameters must
More informationCS 331: Artificial Intelligence Fundamentals of Probability II. Joint Probability Distribution. Marginalization. Marginalization
Full Joint Probability Distributions CS 331: Artificial Intelligence Fundamentals of Probability II Toothache Cavity Catch Toothache, Cavity, Catch false false false 0.576 false false true 0.144 false
More informationMachine Learning
Machine Learning 10601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University February 23, 2015 Today: Graphical models Bayes Nets: Representing distributions Conditional independencies
More information22c:145 Artificial Intelligence
22c:145 Artificial Intelligence Fall 2005 Uncertainty Cesare Tinelli The University of Iowa Copyright 200105 Cesare Tinelli and Hantao Zhang. a a These notes are copyrighted material and may not be used
More informationBayesian Networks. Read R&N Ch. 14.114.2. Next lecture: Read R&N 18.118.4
Bayesian Networks Read R&N Ch. 14.114.2 Next lecture: Read R&N 18.118.4 You will be expected to know Basic concepts and vocabulary of Bayesian networks. Nodes represent random variables. Directed arcs
More informationBIN504  Lecture XI. Bayesian Inference & Markov Chains
BIN504  Lecture XI Bayesian Inference & References: Lee, Chp. 11 & Sec. 10.11 EwensGrant, Chps. 4, 7, 11 c 2012 Aybar C. Acar Including slides by Tolga Can Rev. 1.2 (Build 20130528191100) BIN 504  Bayes
More informationMarginal Independence and Conditional Independence
Marginal Independence and Conditional Independence Computer Science cpsc322, Lecture 26 (Textbook Chpt 6.12) March, 19, 2010 Recap with Example Marginalization Conditional Probability Chain Rule Lecture
More informationLife of A Knowledge Base (KB)
Life of A Knowledge Base (KB) A knowledge base system is a special kind of database management system to for knowledge base management. KB extraction: knowledge extraction using statistical models in NLP/ML
More informationArtificial Intelligence Mar 27, Bayesian Networks 1 P (T D)P (D) + P (T D)P ( D) =
Artificial Intelligence 15381 Mar 27, 2007 Bayesian Networks 1 Recap of last lecture Probability: precise representation of uncertainty Probability theory: optimal updating of knowledge based on new information
More informationOutline. Probability. Random Variables. Joint and Marginal Distributions Conditional Distribution Product Rule, Chain Rule, Bayes Rule.
Probability 1 Probability Random Variables Outline Joint and Marginal Distributions Conditional Distribution Product Rule, Chain Rule, Bayes Rule Inference Independence 2 Inference in Ghostbusters A ghost
More informationBayesian network inference
Bayesian network inference Given: Query variables: X Evidence observed variables: E = e Unobserved variables: Y Goal: calculate some useful information about the query variables osterior Xe MA estimate
More informationMachine Learning
Machine Learning 10701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University February 8, 2011 Today: Graphical models Bayes Nets: Representing distributions Conditional independencies
More informationProbabilistic Graphical Models Final Exam
Introduction to Probabilistic Graphical Models Final Exam, March 2007 1 Probabilistic Graphical Models Final Exam Please fill your name and I.D.: Name:... I.D.:... Duration: 3 hours. Guidelines: 1. The
More informationQuerying Joint Probability Distributions
Querying Joint Probability Distributions Sargur Srihari srihari@cedar.buffalo.edu 1 Queries of Interest Probabilistic Graphical Models (BNs and MNs) represent joint probability distributions over multiple
More informationSTAT 535 Lecture 4 Graphical representations of conditional independence Part II Bayesian Networks c Marina Meilă
TT 535 Lecture 4 Graphical representations of conditional independence Part II Bayesian Networks c Marina Meilă mmp@stat.washington.edu 1 limitation of Markov networks Between two variables or subsets
More informationThe Basics of Graphical Models
The Basics of Graphical Models David M. Blei Columbia University October 3, 2015 Introduction These notes follow Chapter 2 of An Introduction to Probabilistic Graphical Models by Michael Jordan. Many figures
More informationData Modeling & Analysis Techniques. Probability & Statistics. Manfred Huber 2011 1
Data Modeling & Analysis Techniques Probability & Statistics Manfred Huber 2011 1 Probability and Statistics Probability and statistics are often used interchangeably but are different, related fields
More informationCourse: Model, Learning, and Inference: Lecture 5
Course: Model, Learning, and Inference: Lecture 5 Alan Yuille Department of Statistics, UCLA Los Angeles, CA 90095 yuille@stat.ucla.edu Abstract Probability distributions on structured representation.
More informationIntelligent Systems: Reasoning and Recognition. Uncertainty and Plausible Reasoning
Intelligent Systems: Reasoning and Recognition James L. Crowley ENSIMAG 2 / MoSIG M1 Second Semester 2015/2016 Lesson 12 25 march 2015 Uncertainty and Plausible Reasoning MYCIN (continued)...2 Backward
More informationIntroduction to Discrete Probability Theory and Bayesian Networks
Introduction to Discrete Probability Theory and Bayesian Networks Dr Michael Ashcroft September 15, 2011 This document remains the property of Inatas. Reproduction in whole or in part without the written
More informationDAGs, IMaps, Factorization, dseparation, Minimal IMaps, Bayesian Networks. Slides by Nir Friedman
DAGs, IMaps, Factorization, dseparation, Minimal IMaps, Bayesian Networks Slides by Nir Friedman. Probability Distributions Let X 1,,X n be random variables Let P be a joint distribution over X 1,,X
More informationMarkov chains and Markov Random Fields (MRFs)
Markov chains and Markov Random Fields (MRFs) 1 Why Markov Models We discuss Markov models now. This is the simplest statistical model in which we don t assume that all variables are independent; we assume
More informationBayesian Networks Tutorial I Basics of Probability Theory and Bayesian Networks
Bayesian Networks 2015 2016 Tutorial I Basics of Probability Theory and Bayesian Networks Peter Lucas LIACS, Leiden University Elementary probability theory We adopt the following notations with respect
More informationBayesian networks. Data Mining. Abraham Otero. Data Mining. Agenda. Bayes' theorem Naive Bayes Bayesian networks Bayesian networks (example)
Bayesian networks 1/40 Agenda Introduction Bayes' theorem Bayesian networks Quick reference 2/40 1 Introduction Probabilistic methods: They are backed by a strong mathematical formalism. They can capture
More informationProbabilistic Graphical Models Homework 1: Due January 29, 2014 at 4 pm
Probabilistic Graphical Models 10708 Homework 1: Due January 29, 2014 at 4 pm Directions. This homework assignment covers the material presented in Lectures 13. You must complete all four problems to
More informationA crash course in probability and Naïve Bayes classification
Probability theory A crash course in probability and Naïve Bayes classification Chapter 9 Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s
More informationLecture 9: Bayesian Learning
Lecture 9: Bayesian Learning Cognitive Systems II  Machine Learning SS 2005 Part II: Special Aspects of Concept Learning Bayes Theorem, MAL / ML hypotheses, Bruteforce MAP LEARNING, MDL principle, Bayes
More information3. The Junction Tree Algorithms
A Short Course on Graphical Models 3. The Junction Tree Algorithms Mark Paskin mark@paskin.org 1 Review: conditional independence Two random variables X and Y are independent (written X Y ) iff p X ( )
More informationModelbased Synthesis. Tony O Hagan
Modelbased Synthesis Tony O Hagan Stochastic models Synthesising evidence through a statistical model 2 Evidence Synthesis (Session 3), Helsinki, 28/10/11 Graphical modelling The kinds of models that
More informationDecision Trees and Networks
Lecture 21: Uncertainty 6 Today s Lecture Victor R. Lesser CMPSCI 683 Fall 2010 Decision Trees and Networks Decision Trees A decision tree is an explicit representation of all the possible scenarios from
More informationBayes Rule. How is this rule derived? Using Bayes rule for probabilistic inference: Evidence Cause) Evidence)
Bayes Rule P ( A B = B A A B ( Rev. Thomas Bayes (17021761 How is this rule derived? Using Bayes rule for probabilistic inference: P ( Cause Evidence = Evidence Cause Cause Evidence Cause Evidence: diagnostic
More informationBayesian Learning. CSL465/603  Fall 2016 Narayanan C Krishnan
Bayesian Learning CSL465/603  Fall 2016 Narayanan C Krishnan ckn@iitrpr.ac.in Outline Bayes Theorem MAP Learners Bayes optimal classifier Naïve Bayes classifier Example text classification Bayesian networks
More informationThe intuitions in examples from last time give us Dseparation and inference in belief networks
CSC384: Lecture 11 Variable Elimination Last time The intuitions in examples from last time give us Dseparation and inference in belief networks a simple inference algorithm for networks without Today
More informationBayesian networks  Timeseries models  Apache Spark & Scala
Bayesian networks  Timeseries models  Apache Spark & Scala Dr John Sandiford, CTO Bayes Server Data Science London Meetup  November 2014 1 Contents Introduction Bayesian networks Latent variables Anomaly
More informationBayesian probability theory
Bayesian probability theory Bruno A. Olshausen arch 1, 2004 Abstract Bayesian probability theory provides a mathematical framework for peforming inference, or reasoning, using probability. The foundations
More informationHidden Markov Models. and. Sequential Data
Hidden Markov Models and Sequential Data Sequential Data Often arise through measurement of time series Snowfall measurements on successive days in Buffalo Rainfall measurements in Chirrapunji Daily values
More informationRelational Dynamic Bayesian Networks: a report. Cristina Manfredotti
Relational Dynamic Bayesian Networks: a report Cristina Manfredotti Dipartimento di Informatica, Sistemistica e Comunicazione (D.I.S.Co.) Università degli Studi MilanoBicocca manfredotti@disco.unimib.it
More informationLecture 6: The Bayesian Approach
Lecture 6: The Bayesian Approach What Did We Do Up to Now? We are given a model Loglinear model, Markov network, Bayesian network, etc. This model induces a distribution P(X) Learning: estimate a set
More informationCIS 391 Introduction to Artificial Intelligence Practice Midterm II With Solutions (From old CIS 521 exam)
CIS 391 Introduction to Artificial Intelligence Practice Midterm II With Solutions (From old CIS 521 exam) Problem Points Possible 1. Perceptrons and SVMs 20 2. Probabilistic Models 20 3. Markov Models
More informationIntroduction to Machine Learning
Introduction to Machine Learning Brown University CSCI 1950F, Spring 2012 Prof. Erik Sudderth Lecture 5: Decision Theory & ROC Curves Gaussian ML Estimation Many figures courtesy Kevin Murphy s textbook,
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Raquel Urtasun and Tamir Hazan TTI Chicago April 4, 2011 Raquel Urtasun and Tamir Hazan (TTIC) Graphical Models April 4, 2011 1 / 22 Bayesian Networks and independences
More informationMarkov Chains, part I
Markov Chains, part I December 8, 2010 1 Introduction A Markov Chain is a sequence of random variables X 0, X 1,, where each X i S, such that P(X i+1 = s i+1 X i = s i, X i 1 = s i 1,, X 0 = s 0 ) = P(X
More informationFinal Exam, Spring 2007
10701 Final Exam, Spring 2007 1. Personal info: Name: Andrew account: Email address: 2. There should be 16 numbered pages in this exam (including this cover sheet). 3. You can use any material you brought:
More informationBig Data, Machine Learning, Causal Models
Big Data, Machine Learning, Causal Models Sargur N. Srihari University at Buffalo, The State University of New York USA Int. Conf. on Signal and Image Processing, Bangalore January 2014 1 Plan of Discussion
More informationAttribution. Modified from Stuart Russell s slides (Berkeley) Parts of the slides are inspired by Dan Klein s lecture material for CS 188 (Berkeley)
Machine Learning 1 Attribution Modified from Stuart Russell s slides (Berkeley) Parts of the slides are inspired by Dan Klein s lecture material for CS 188 (Berkeley) 2 Outline Inductive learning Decision
More information10601. Machine Learning. http://www.cs.cmu.edu/afs/cs/academic/class/10601f10/index.html
10601 Machine Learning http://www.cs.cmu.edu/afs/cs/academic/class/10601f10/index.html Course data All uptodate info is on the course web page: http://www.cs.cmu.edu/afs/cs/academic/class/10601f10/index.html
More informationHidden Markov Models Fundamentals
Hidden Markov Models Fundamentals Daniel Ramage CS229 Section Notes December, 2007 Abstract How can we apply machine learning to data that is represented as a sequence of observations over time? For instance,
More informationbayesian inference: principles and practice
bayesian inference: and http://jakehofman.com july 9, 2009 bayesian inference: and motivation background bayes theorem bayesian probability bayesian inference would like models that: provide predictive
More informationThe CertaintyFactor Model
The CertaintyFactor Model David Heckerman Departments of Computer Science and Pathology University of Southern California HMR 204, 2025 Zonal Ave Los Angeles, CA 90033 dheck@sumexaim.stanford.edu 1 Introduction
More informationCHAPTER 2 Estimating Probabilities
CHAPTER 2 Estimating Probabilities Machine Learning Copyright c 2016. Tom M. Mitchell. All rights reserved. *DRAFT OF January 24, 2016* *PLEASE DO NOT DISTRIBUTE WITHOUT AUTHOR S PERMISSION* This is a
More informationAn Empirical Evaluation of Bayesian Networks Derived from Fault Trees
An Empirical Evaluation of Bayesian Networks Derived from Fault Trees Shane Strasser and John Sheppard Department of Computer Science Montana State University Bozeman, MT 59717 {shane.strasser, john.sheppard}@cs.montana.edu
More informationDiscovering Markov Blankets: Finding Independencies Among Variables
Discovering Markov Blankets: Finding Independencies Among Variables Motivation: Algorithm: Applications: Toward Optimal Feature Selection. Koller and Sahami. Proc. 13 th ICML, 1996. Algorithms for Large
More information15381: Artificial Intelligence. Probabilistic Reasoning and Inference
538: Artificial Intelligence robabilistic Reasoning and Inference Advantages of probabilistic reasoning Appropriate for complex, uncertain, environments  Will it rain tomorrow? Applies naturally to many
More informationStatistical Machine Translation: IBM Models 1 and 2
Statistical Machine Translation: IBM Models 1 and 2 Michael Collins 1 Introduction The next few lectures of the course will be focused on machine translation, and in particular on statistical machine translation
More informationReal Time Traffic Monitoring With Bayesian Belief Networks
Real Time Traffic Monitoring With Bayesian Belief Networks Sicco Pier van Gosliga TNO Defence, Security and Safety, P.O.Box 96864, 2509 JG The Hague, The Netherlands +31 70 374 02 30, sicco_pier.vangosliga@tno.nl
More informationQuestion 2 Naïve Bayes (16 points)
Question 2 Naïve Bayes (16 points) About 2/3 of your email is spam so you downloaded an open source spam filter based on word occurrences that uses the Naive Bayes classifier. Assume you collected the
More informationBayesian Updating with Discrete Priors Class 11, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom
1 Learning Goals Bayesian Updating with Discrete Priors Class 11, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom 1. Be able to apply Bayes theorem to compute probabilities. 2. Be able to identify
More informationIncorporating Evidence in Bayesian networks with the Select Operator
Incorporating Evidence in Bayesian networks with the Select Operator C.J. Butz and F. Fang Department of Computer Science, University of Regina Regina, Saskatchewan, Canada SAS 0A2 {butz, fang11fa}@cs.uregina.ca
More informationDiscovering Structure in Continuous Variables Using Bayesian Networks
In: "Advances in Neural Information Processing Systems 8," MIT Press, Cambridge MA, 1996. Discovering Structure in Continuous Variables Using Bayesian Networks Reimar Hofmann and Volker Tresp Siemens AG,
More informationA Bayesian Approach for online max auditing of Dynamic Statistical Databases
A Bayesian Approach for online max auditing of Dynamic Statistical Databases Gerardo Canfora Bice Cavallo University of Sannio, Benevento, Italy, {gerardo.canfora,bice.cavallo}@unisannio.it ABSTRACT In
More information5 Directed acyclic graphs
5 Directed acyclic graphs (5.1) Introduction In many statistical studies we have prior knowledge about a temporal or causal ordering of the variables. In this chapter we will use directed graphs to incorporate
More informationSupervised Learning (Big Data Analytics)
Supervised Learning (Big Data Analytics) Vibhav Gogate Department of Computer Science The University of Texas at Dallas Practical advice Goal of Big Data Analytics Uncover patterns in Data. Can be used
More informationIntroduction to Markov Chain Monte Carlo
Introduction to Markov Chain Monte Carlo Monte Carlo: sample from a distribution to estimate the distribution to compute max, mean Markov Chain Monte Carlo: sampling using local information Generic problem
More informationML, MAP, and Bayesian The Holy Trinity of Parameter Estimation and Data Prediction. Avinash Kak Purdue University. January 4, :19am
ML, MAP, and Bayesian The Holy Trinity of Parameter Estimation and Data Prediction Avinash Kak Purdue University January 4, 2017 11:19am An RVL Tutorial Presentation originally presented in Summer 2008
More informationPart III: Machine Learning. CS 188: Artificial Intelligence. Machine Learning This Set of Slides. Parameter Estimation. Estimation: Smoothing
CS 188: Artificial Intelligence Lecture 20: Dynamic Bayes Nets, Naïve Bayes Pieter Abbeel UC Berkeley Slides adapted from Dan Klein. Part III: Machine Learning Up until now: how to reason in a model and
More informationProbabilistic Networks An Introduction to Bayesian Networks and Influence Diagrams
Probabilistic Networks An Introduction to Bayesian Networks and Influence Diagrams Uffe B. Kjærulff Department of Computer Science Aalborg University Anders L. Madsen HUGIN Expert A/S 10 May 2005 2 Contents
More informationAn Introduction to Bioinformatics Algorithms Hidden Markov Models
Hidden Markov Models Outline CGislands The Fair Bet Casino Hidden Markov Model Decoding Algorithm ForwardBackward Algorithm Profile HMMs HMM Parameter Estimation Viterbi training BaumWelch algorithm
More information3. Markov property. Markov property for MRFs. HammersleyClifford theorem. Markov property for Bayesian networks. Imap, Pmap, and chordal graphs
Markov property 33. Markov property Markov property for MRFs HammersleyClifford theorem Markov property for ayesian networks Imap, Pmap, and chordal graphs Markov Chain X Y Z X = Z Y µ(x, Y, Z) = f(x,
More informationKnowledgeBased Probabilistic Reasoning from Expert Systems to Graphical Models
KnowledgeBased Probabilistic Reasoning from Expert Systems to Graphical Models By George F. Luger & Chayan Chakrabarti {luger cc} @cs.unm.edu Department of Computer Science University of New Mexico Albuquerque
More informationLinear Threshold Units
Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear
More informationStudy Manual. Probabilistic Reasoning. Silja Renooij. August 2015
Study Manual Probabilistic Reasoning 2015 2016 Silja Renooij August 2015 General information This study manual was designed to help guide your self studies. As such, it does not include material that is
More informationLecture 11: Graphical Models for Inference
Lecture 11: Graphical Models for Inference So far we have seen two graphical models that are used for inference  the Bayesian network and the Join tree. These two both represent the same joint probability
More informationMATH 56A SPRING 2008 STOCHASTIC PROCESSES 31
MATH 56A SPRING 2008 STOCHASTIC PROCESSES 3.3. Invariant probability distribution. Definition.4. A probability distribution is a function π : S [0, ] from the set of states S to the closed unit interval
More informationMS&E 226: Small Data
MS&E 226: Small Data Lecture 16: Bayesian inference (v3) Ramesh Johari ramesh.johari@stanford.edu 1 / 35 Priors 2 / 35 Frequentist vs. Bayesian inference Frequentists treat the parameters as fixed (deterministic).
More informationChapter 28. Bayesian Networks
Chapter 28. Bayesian Networks The Quest for Artificial Intelligence, Nilsson, N. J., 2009. Lecture Notes on Artificial Intelligence, Spring 2012 Summarized by Kim, ByoungHee and Lim, ByoungKwon Biointelligence
More informationBayesian Tutorial (Sheet Updated 20 March)
Bayesian Tutorial (Sheet Updated 20 March) Practice Questions (for discussing in Class) Week starting 21 March 2016 1. What is the probability that the total of two dice will be greater than 8, given that
More informationJoin Bayes Nets: A New Type of Bayes net for Relational Data
Join Bayes Nets: A New Type of Bayes net for Relational Data Bayes nets, statisticalrelational learning, knowledge discovery in databases Abstract Many realworld data are maintained in relational format,
More informationAndrew Blake and Pushmeet Kohli
1 Introduction to Markov Random Fields Andrew Blake and Pushmeet Kohli This book sets out to demonstrate the power of the Markov random field (MRF) in vision. It treats the MRF both as a tool for modeling
More informationDeterministic Problems
Chapter 2 Deterministic Problems 1 In this chapter, we focus on deterministic control problems (with perfect information), i.e, there is no disturbance w k in the dynamics equation. For such problems there
More informationCell Phone based Activity Detection using Markov Logic Network
Cell Phone based Activity Detection using Markov Logic Network Somdeb Sarkhel sxs104721@utdallas.edu 1 Introduction Mobile devices are becoming increasingly sophisticated and the latest generation of smart
More informationConcept Learning. Machine Learning 1
Concept Learning Inducing general functions from specific training examples is a main issue of machine learning. Concept Learning: Acquiring the definition of a general category from given sample positive
More informationCompression algorithm for Bayesian network modeling of binary systems
Compression algorithm for Bayesian network modeling of binary systems I. Tien & A. Der Kiureghian University of California, Berkeley ABSTRACT: A Bayesian network (BN) is a useful tool for analyzing the
More informationMarkov Chains. Ben Langmead. Department of Computer Science
Markov Chains Ben Langmead Department of Computer Science You are free to use these slides. If you do, please sign the guestbook (www.langmeadlab.org/teachingmaterials), or email me (ben.langmead@gmail.com)
More informationData Mining Chapter 6: Models and Patterns Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University
Data Mining Chapter 6: Models and Patterns Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Models vs. Patterns Models A model is a high level, global description of a
More informationTutorial on variational approximation methods. Tommi S. Jaakkola MIT AI Lab
Tutorial on variational approximation methods Tommi S. Jaakkola MIT AI Lab tommi@ai.mit.edu Tutorial topics A bit of history Examples of variational methods A brief intro to graphical models Variational
More informationProblems often have a certain amount of uncertainty, possibly due to: Incompleteness of information about the environment,
Uncertainty Problems often have a certain amount of uncertainty, possibly due to: Incompleteness of information about the environment, E.g., loss of sensory information such as vision Incorrectness in
More information