Probability, Conditional Independence


 Antonia Goodman
 1 years ago
 Views:
Transcription
1 Probability, Conditional Independence June 19, 2012 Probability, Conditional Independence
2 Probability Sample space Ω of events Each event ω Ω has an associated measure Probability of the event P(ω) Axioms of Probability: ω, P(ω) [0, 1] P(Ω) = 1 P(ω 1 ω 2 ) = P(ω 1 ) + P(ω 2 ) P(ω 1 ω 2 ) Note: We are being informal Probability, Conditional Independence
3 Random Variables Random variables are mappings of events (to real numbers) Mapping X : Ω R Any event ω maps to X (ω) Example: Tossing a coin has two possible outcomes Denoted by {H, T } or {0, 1} Fair coin has uniform probabilities P(X = 0) = 1 2 P(X = 1) = 1 2 Random variables (r.v.s) can be Discrete, e.g., Bernoulli Continuous, e.g., Gaussian Probability, Conditional Independence
4 Distribution, Density For a continuous r.v. Distribution function F (x) = P(X x) Corresponding density function f (x)dx = df (x) Note that F (x) = x t= f (t)dt For a discrete r.v. Probability mass function f (x) = P(X = x) = P(x) We will call this the probability of a discrete event Distribution function F (x) = P(X x) Probability, Conditional Independence
5 Joint Distributions, Marginals For two continuous r.v.s X 1, X 2 Joint distribution F (x 1, x 2 ) = P(X 1 x 1, X 2 x 2 ) Joint density function f (x 1, x 2 ) can be defined as before The marginal probability density f (x 1 ) = x 2= f (x 1, x 2 )dx 2 For two discrete r.v.s X 1, X 2 Joint probability f (x 1, x 2 ) = P(X 1 = x 1, X 2 = x 2 ) = P(x 1, x 2 ) The marginal probability P(X 1 = x 1 ) = x 2 P(X 1 = x 1, X 2 = x 2 ) Can be extended to joint distribution over several r.v.s Many hard problems involve computing marginals Probability, Conditional Independence
6 Expectation The expected value of a r.v. X For continuous r.v.s E[X ] = x xp(x)dx For discrete r.v. E[X ] = i x ip i Expectation is a linear operator E[aX + by + c] = ae[x ] + be[y ] + c Probability, Conditional Independence
7 Independence Joint probability P(X 1 = x 1, X 2 = x 2 ) Probability, Conditional Independence
8 Independence Joint probability P(X 1 = x 1, X 2 = x 2 ) X 1, X 2 are different dice Probability, Conditional Independence
9 Independence Joint probability P(X 1 = x 1, X 2 = x 2 ) X 1, X 2 are different dice X 1 denotes if grass is wet, X 2 denotes if sprinkler was on Probability, Conditional Independence
10 Independence Joint probability P(X 1 = x 1, X 2 = x 2 ) X 1, X 2 are different dice X 1 denotes if grass is wet, X 2 denotes if sprinkler was on Two r.v.s are independent if P(X 1 = x 1, X 2 = x 2 ) = P(X 1 = x 1 )P(X 2 = x 2 ) Probability, Conditional Independence
11 Independence Joint probability P(X 1 = x 1, X 2 = x 2 ) X 1, X 2 are different dice X 1 denotes if grass is wet, X 2 denotes if sprinkler was on Two r.v.s are independent if P(X 1 = x 1, X 2 = x 2 ) = P(X 1 = x 1 )P(X 2 = x 2 ) Two different dice are independent Probability, Conditional Independence
12 Independence Joint probability P(X 1 = x 1, X 2 = x 2 ) X 1, X 2 are different dice X 1 denotes if grass is wet, X 2 denotes if sprinkler was on Two r.v.s are independent if P(X 1 = x 1, X 2 = x 2 ) = P(X 1 = x 1 )P(X 2 = x 2 ) Two different dice are independent If sprinkler was on, then grass will be wet dependent Probability, Conditional Independence
13 Conditional Probability, Bayes Rule Grass Wet Grass Dry Sprinkler On Sprinkler Off Inference problems: Probability, Conditional Independence
14 Conditional Probability, Bayes Rule Grass Wet Grass Dry Sprinkler On Sprinkler Off Inference problems: Given grass wet what is P( sprinkler on grass wet ) Probability, Conditional Independence
15 Conditional Probability, Bayes Rule Grass Wet Grass Dry Sprinkler On Sprinkler Off Inference problems: Given grass wet what is P( sprinkler on grass wet ) Given symptom what is P( disease symptom ) Probability, Conditional Independence
16 Conditional Probability, Bayes Rule Grass Wet Grass Dry Sprinkler On Sprinkler Off Inference problems: Given grass wet what is P( sprinkler on grass wet ) Given symptom what is P( disease symptom ) For any r.v.s X, Y, the conditional probability P(x y) = P(x, y) P(y) Probability, Conditional Independence
17 Conditional Probability, Bayes Rule Grass Wet Grass Dry Sprinkler On Sprinkler Off Inference problems: Given grass wet what is P( sprinkler on grass wet ) Given symptom what is P( disease symptom ) For any r.v.s X, Y, the conditional probability P(x y) = P(x, y) P(y) Since P(x, y) = P(y x)p(x), we have P(y x) = P(x y)p(y) P(x) Probability, Conditional Independence
18 Conditional Probability, Bayes Rule Grass Wet Grass Dry Sprinkler On Sprinkler Off Inference problems: Given grass wet what is P( sprinkler on grass wet ) Given symptom what is P( disease symptom ) For any r.v.s X, Y, the conditional probability P(x y) = P(x, y) P(y) Since P(x, y) = P(y x)p(x), we have P(y x) = P(x y)p(y) P(x) Expressing posterior in terms of conditional : Bayes Rule Probability, Conditional Independence
19 Product Rule & Independence Product Rule: For X 1, X 2, P(X 1, X 2 ) = P(X 1 )P(X 2 X 1 ) For X 1, X 2, X 3, P(X 1, X 2, X 3 ) = P(X 1 )P(X 2 X 1 )P(X 3 X 1, X 2 ) In general, the chain rule n P(X 1,, X n ) = P(X i X 1,..., X i 1 ) i=1 Joint distribution of n Boolean variables Specification requires 2 n 1 parameters Recall Independence: For X 1, X 2, P(X 1, X 2 ) = P(X 1 )P(X 2 ) In general n P(X 1,, X n ) = P(X i ) i=1 Independence reduces specification to n parameters Probability, Conditional Independence
20 Independence Cavity Toothache Catch Weather decomposes into Cavity Toothache Catch Weather Consider 4 variables: Toothache,Catch,Cavity,Weather Probability, Conditional Independence
21 Independence Cavity Toothache Catch Weather decomposes into Cavity Toothache Catch Weather Consider 4 variables: Toothache,Catch,Cavity,Weather Independence implies P(Toothache,Catch, Cavity, Weather) = P(Toothache, Catch, Cavity)P(Weather) Probability, Conditional Independence
22 Independence Cavity Toothache Catch Weather decomposes into Cavity Toothache Catch Weather Consider 4 variables: Toothache,Catch,Cavity,Weather Independence implies P(Toothache,Catch, Cavity, Weather) = P(Toothache, Catch, Cavity)P(Weather) In terms of the joint distribution: Probability, Conditional Independence
23 Independence Cavity Toothache Catch Weather decomposes into Cavity Toothache Catch Weather Consider 4 variables: Toothache,Catch,Cavity,Weather Independence implies P(Toothache,Catch, Cavity, Weather) In terms of the joint distribution: 32 parameters reduced to 12 = P(Toothache, Catch, Cavity)P(Weather) Probability, Conditional Independence
24 Independence Cavity Toothache Catch Weather decomposes into Cavity Toothache Catch Weather Consider 4 variables: Toothache,Catch,Cavity,Weather Independence implies P(Toothache,Catch, Cavity, Weather) = P(Toothache, Catch, Cavity)P(Weather) In terms of the joint distribution: 32 parameters reduced to 12 For boolean variables 2 n 1 reduces to n Probability, Conditional Independence
25 Independence Cavity Toothache Catch Weather decomposes into Cavity Toothache Catch Weather Consider 4 variables: Toothache,Catch,Cavity,Weather Independence implies P(Toothache,Catch, Cavity, Weather) = P(Toothache, Catch, Cavity)P(Weather) In terms of the joint distribution: 32 parameters reduced to 12 For boolean variables 2 n 1 reduces to n Absolute independence powerful but rare Probability, Conditional Independence
26 Conditional Independence X and Y are conditionally independent given Z P(X, Y Z) = P(X Z)P(Y Z) Probability, Conditional Independence
27 Conditional Independence X and Y are conditionally independent given Z P(X, Y Z) = P(X Z)P(Y Z) Example: P(Toothache,Catch Cavity) = P(Toothache Cavity)P(Catch Cavity) Probability, Conditional Independence
28 Conditional Independence X and Y are conditionally independent given Z P(X, Y Z) = P(X Z)P(Y Z) Example: P(Toothache,Catch Cavity) = P(Toothache Cavity)P(Catch Cavity) Conditional Independence simplifies joint distributions Probability, Conditional Independence
29 Conditional Independence X and Y are conditionally independent given Z P(X, Y Z) = P(X Z)P(Y Z) Example: P(Toothache,Catch Cavity) = P(Toothache Cavity)P(Catch Cavity) Conditional Independence simplifies joint distributions Often reduces from exponential to linear in n P(X, Y, Z) = P(Z)P(X Z)P(Y Z) Probability, Conditional Independence
30 Naive Bayes Model If X 1,..., X n are independent given Y n P(Y, X 1,..., X n ) = P(Y ) P(X i Y ) i=1 Probability, Conditional Independence
31 Naive Bayes Model If X 1,..., X n are independent given Y P(Y, X 1,..., X n ) = P(Y ) Example: P(Cavity,Toothache, Catch) n P(X i Y ) i=1 = P(Cavity)P(Toothache Cavity)P(Catch Cavity) Probability, Conditional Independence
32 Naive Bayes Model If X 1,..., X n are independent given Y P(Y, X 1,..., X n ) = P(Y ) Example: P(Cavity,Toothache, Catch) More generally n P(X i Y ) i=1 = P(Cavity)P(Toothache Cavity)P(Catch Cavity) P(Cause, Effect 1,..., Effect n ) = P(Cause) n P(Effect i Cause) i=1 Probability, Conditional Independence
33 Naive Bayes Model If X 1,..., X n are independent given Y P(Y, X 1,..., X n ) = P(Y ) Example: P(Cavity,Toothache, Catch) More generally n P(X i Y ) i=1 = P(Cavity)P(Toothache Cavity)P(Catch Cavity) P(Cause, Effect 1,..., Effect n ) = P(Cause) n P(Effect i Cause) i=1 Cavity Cause Toothache Catch Effect 1 Effect n Probability, Conditional Independence
34 Bayesian networks A simple, graphical notation for conditional independence assertions and hence for compact specification of full joint distributions
35 Bayesian networks A simple, graphical notation for conditional independence assertions and hence for compact specification of full joint distributions Syntax A set of nodes, one per variable A directed, acyclic graph (link implies direct influence) A conditional distribution for each node given its parents
36 Bayesian networks A simple, graphical notation for conditional independence assertions and hence for compact specification of full joint distributions Syntax A set of nodes, one per variable A directed, acyclic graph (link implies direct influence) A conditional distribution for each node given its parents Conditional distributions For each X i, P(X i Parents(X i )) In the form of a conditional probability table (CPT) Distribution of X i for each combination of parent values
37 Example Topology of network encodes conditional independence assertions Weather Cavity Toothache Catch Weather is independent of the other variables Toothache, Catch are conditionally independent given Cavity
38 Example I m at work, neighbor John calls to say my alarm is ringing, but neighbor Mary doesn t call. Sometimes it s set off by minor earthquakes. Is there a burglar? Variables: Burglar, Earthquake, Alarm, JohnCalls, MaryCalls
39 Example I m at work, neighbor John calls to say my alarm is ringing, but neighbor Mary doesn t call. Sometimes it s set off by minor earthquakes. Is there a burglar? Variables: Burglar, Earthquake, Alarm, JohnCalls, MaryCalls Network topology reflects causal knowledge
40 Example I m at work, neighbor John calls to say my alarm is ringing, but neighbor Mary doesn t call. Sometimes it s set off by minor earthquakes. Is there a burglar? Variables: Burglar, Earthquake, Alarm, JohnCalls, MaryCalls Network topology reflects causal knowledge A burglar can set the alarm off
41 Example I m at work, neighbor John calls to say my alarm is ringing, but neighbor Mary doesn t call. Sometimes it s set off by minor earthquakes. Is there a burglar? Variables: Burglar, Earthquake, Alarm, JohnCalls, MaryCalls Network topology reflects causal knowledge A burglar can set the alarm off An earthquake can set the alarm off
42 Example I m at work, neighbor John calls to say my alarm is ringing, but neighbor Mary doesn t call. Sometimes it s set off by minor earthquakes. Is there a burglar? Variables: Burglar, Earthquake, Alarm, JohnCalls, MaryCalls Network topology reflects causal knowledge A burglar can set the alarm off An earthquake can set the alarm off The alarm can cause Mary to call
43 Example I m at work, neighbor John calls to say my alarm is ringing, but neighbor Mary doesn t call. Sometimes it s set off by minor earthquakes. Is there a burglar? Variables: Burglar, Earthquake, Alarm, JohnCalls, MaryCalls Network topology reflects causal knowledge A burglar can set the alarm off An earthquake can set the alarm off The alarm can cause Mary to call The alarm can cause John to call
44 Example (Contd.) Burglary P(B).001 Earthquake P(E).002 B T T F F E T F T F P(A B,E) Alarm JohnCalls A T F P(J A) MaryCalls A T F P(M A).70.01
45 Compactness B E A J M A CPT for Boolean X i with k Boolean parents
46 Compactness B E A J M A CPT for Boolean X i with k Boolean parents 2 k rows for the combinations of parent values
47 Compactness B E A J M A CPT for Boolean X i with k Boolean parents 2 k rows for the combinations of parent values Each row requires one number
48 Compactness B E A J M A CPT for Boolean X i with k Boolean parents 2 k rows for the combinations of parent values Each row requires one number Each variable has no more than k parents
49 Compactness B E A J M A CPT for Boolean X i with k Boolean parents 2 k rows for the combinations of parent values Each row requires one number Each variable has no more than k parents The complete network requires O(n 2 k ) numbers
50 Compactness B E A J M A CPT for Boolean X i with k Boolean parents 2 k rows for the combinations of parent values Each row requires one number Each variable has no more than k parents The complete network requires O(n 2 k ) numbers Grows linearly with n
51 Compactness B E A J M A CPT for Boolean X i with k Boolean parents 2 k rows for the combinations of parent values Each row requires one number Each variable has no more than k parents The complete network requires O(n 2 k ) numbers Grows linearly with n Full joint distribution requires O(2 n )
52 Compactness B E A J M A CPT for Boolean X i with k Boolean parents 2 k rows for the combinations of parent values Each row requires one number Each variable has no more than k parents The complete network requires O(n 2 k ) numbers Grows linearly with n Full joint distribution requires O(2 n ) Example: Burglary network
53 Compactness B E A J M A CPT for Boolean X i with k Boolean parents 2 k rows for the combinations of parent values Each row requires one number Each variable has no more than k parents The complete network requires O(n 2 k ) numbers Grows linearly with n Full joint distribution requires O(2 n ) Example: Burglary network Full joint distribution requires = 31 numbers
54 Compactness B E A J M A CPT for Boolean X i with k Boolean parents 2 k rows for the combinations of parent values Each row requires one number Each variable has no more than k parents The complete network requires O(n 2 k ) numbers Grows linearly with n Full joint distribution requires O(2 n ) Example: Burglary network Full joint distribution requires = 31 numbers Bayes net requires 10 numbers
55 Global semantics Full joint distribution
56 Global semantics Full joint distribution Can be written as product of local conditionals
57 Global semantics Full joint distribution Can be written as product of local conditionals Example: P(j, m, a, b, e) = P( b)p( e)p(a b, e)p(j a)p(m a)
58 Global semantics Full joint distribution Can be written as product of local conditionals Example: P(j, m, a, b, e) = P( b)p( e)p(a b, e)p(j a)p(m a) Example: P(j, m, a, b, e) = P(b)P( e)p(a b, e)p(j a)p( m a)
59 Global semantics Full joint distribution Can be written as product of local conditionals Example: P(j, m, a, b, e) = P( b)p( e)p(a b, e)p(j a)p(m a) Example: P(j, m, a, b, e) = P(b)P( e)p(a b, e)p(j a)p( m a) Can we compute P(b j, m)?
60 Local semantics Each node is conditionally independent of its nondescendants given its parents U 1... U m Z 1j X Z nj Y 1... Y n
61 Markov blanket Each node is conditionally independent of all others given its Markov blanket, i.e., parents + children + children s parents U 1... U m Z 1j X Z nj Y 1... Y n
62 Conditional Independence in BNs Which BNs support x 1 x 2 x 3 For (a), x 1, x 2 are dependent, x 3 is a collider
63 Conditional Independence in BNs Which BNs support x 1 x 2 x 3 For (a), x 1, x 2 are dependent, x 3 is a collider For (b)(d), x 1 x 2 x 3
64 Conditional Independence (Contd.) Which BNs support x y z For (a)(b), z is not a collider, so x y z
65 Conditional Independence (Contd.) Which BNs support x y z For (a)(b), z is not a collider, so x y z For (c), z is a collider, so x and y are conditionally dependent
66 Conditional Independence (Contd.) Which BNs support x y z For (a)(b), z is not a collider, so x y z For (c), z is a collider, so x and y are conditionally dependent For (d), w is a collider, and z is a descendent of w, so x and y are conditionally dependent
67 dconnection, dseparation Definition (dconnection): X, Y, Z be disjoint sets of vertices in a directed graph G. X, Y is dconnected by Z iff an undirected path U between some x X, y Y such that for every collider C on U
68 dconnection, dseparation Definition (dconnection): X, Y, Z be disjoint sets of vertices in a directed graph G. X, Y is dconnected by Z iff an undirected path U between some x X, y Y such that for every collider C on U Either C or a descendent of C is in Z
69 dconnection, dseparation Definition (dconnection): X, Y, Z be disjoint sets of vertices in a directed graph G. X, Y is dconnected by Z iff an undirected path U between some x X, y Y such that for every collider C on U Either C or a descendent of C is in Z No noncollider on U is in Z
70 dconnection, dseparation Definition (dconnection): X, Y, Z be disjoint sets of vertices in a directed graph G. X, Y is dconnected by Z iff an undirected path U between some x X, y Y such that for every collider C on U Either C or a descendent of C is in Z No noncollider on U is in Z Otherwise X and Y are dseparated by Z
71 dconnection, dseparation Definition (dconnection): X, Y, Z be disjoint sets of vertices in a directed graph G. X, Y is dconnected by Z iff an undirected path U between some x X, y Y such that for every collider C on U Either C or a descendent of C is in Z No noncollider on U is in Z Otherwise X and Y are dseparated by Z If Z dseparates X and Y, then X Y Z for all distributions represented by the graph
72 Conditional Independence (Contd.) Examples For (a), a e b; but a, e are dependent given {b, d}
73 Conditional Independence (Contd.) Examples For (a), a e b; but a, e are dependent given {b, d} For (b) a and e are dependent given b; c and e are unconditionally dependent
74 Conditional Independence: More Examples (a) (b) For (a), Is a c e? Is a e b? Is a e c? For (b), Is a e d? Is a e c? Is a c b
75 Example: Car diagnosis Initial evidence: car won t start
76 Example: Car diagnosis Initial evidence: car won t start Testable variables (green), broken, so fix it variables (orange) Hidden variables (gray) ensure sparse structure, reduce parameters battery age alternator broken fanbelt broken battery dead no charging battery meter battery flat no oil no gas fuel line blocked starter broken lights oil light gas gauge car won t start dipstick
77 Example: Car insurance Age GoodStudent RiskAversion SeniorTrain SocioEcon Mileage VehicleYear ExtraCar DrivingSkill MakeModel DrivingHist Antilock DrivQuality Airbag CarValue HomeBase AntiTheft Ruggedness Accident OwnDamage Theft Cushioning OtherCost OwnCost MedicalCost LiabilityCost PropertyCost
78 Inference Burglary P(B).001 Earthquake P(E).002 B T T F F E T F T F P(A B,E) Alarm JohnCalls A T F P(J A) MaryCalls A T F P(M A) How can we compute P(b j, m)?
Bayesian Networks Chapter 14. Mausam (Slides by UWAI faculty & David Page)
Bayesian Networks Chapter 14 Mausam (Slides by UWAI faculty & David Page) Bayes Nets In general, joint distribution P over set of variables (X 1 x... x X n ) requires exponential space for representation
More information13.3 Inference Using Full Joint Distribution
191 The probability distribution on a single variable must sum to 1 It is also true that any joint probability distribution on any set of variables must sum to 1 Recall that any proposition a is equivalent
More informationBayesian Networks. Mausam (Slides by UWAI faculty)
Bayesian Networks Mausam (Slides by UWAI faculty) Bayes Nets In general, joint distribution P over set of variables (X 1 x... x X n ) requires exponential space for representation & inference BNs provide
More informationArtificial Intelligence. Conditional probability. Inference by enumeration. Independence. Lesson 11 (From Russell & Norvig)
Artificial Intelligence Conditional probability Conditional or posterior probabilities e.g., cavity toothache) = 0.8 i.e., given that toothache is all I know tation for conditional distributions: Cavity
More informationLogic, Probability and Learning
Logic, Probability and Learning Luc De Raedt luc.deraedt@cs.kuleuven.be Overview Logic Learning Probabilistic Learning Probabilistic Logic Learning Closely following : Russell and Norvig, AI: a modern
More informationArtificial Intelligence
Artificial Intelligence ICS461 Fall 2010 Nancy E. Reed nreed@hawaii.edu 1 Lecture #13 Uncertainty Outline Uncertainty Probability Syntax and Semantics Inference Independence and Bayes' Rule Uncertainty
More informationOutline. Uncertainty. Uncertainty. Making decisions under uncertainty. Probability. Methods for handling uncertainty
Outline Uncertainty Chapter 13 Uncertainty Probability Syntax and Semantics Inference Independence and Bayes' Rule Uncertainty Let action A t = leave for airport t minutes before flight Will A t get me
More informationBasics of Probability
Basics of Probability 1 Sample spaces, events and probabilities Begin with a set Ω the sample space e.g., 6 possible rolls of a die. ω Ω is a sample point/possible world/atomic event A probability space
More informationCS 188: Artificial Intelligence. Probability recap
CS 188: Artificial Intelligence Bayes Nets Representation and Independence Pieter Abbeel UC Berkeley Many slides over this course adapted from Dan Klein, Stuart Russell, Andrew Moore Conditional probability
More information/ Informatica. Intelligent Systems. Agents dealing with uncertainty. Example: driving to the airport. Why logic fails
Intelligent Systems Prof. dr. Paul De Bra Technische Universiteit Eindhoven debra@win.tue.nl Agents dealing with uncertainty Uncertainty Probability Syntax and Semantics Inference Independence and Bayes'
More informationLecture 2: Introduction to belief (Bayesian) networks
Lecture 2: Introduction to belief (Bayesian) networks Conditional independence What is a belief network? Independence maps (Imaps) January 7, 2008 1 COMP526 Lecture 2 Recall from last time: Conditional
More informationUncertainty. Chapter 13. Chapter 13 1
Uncertainty Chapter 13 Chapter 13 1 Attribution Modified from Stuart Russell s slides (Berkeley) Chapter 13 2 Outline Uncertainty Probability Syntax and Semantics Inference Independence and Bayes Rule
More informationCS 771 Artificial Intelligence. Reasoning under uncertainty
CS 771 Artificial Intelligence Reasoning under uncertainty Today Representing uncertainty is useful in knowledge bases Probability provides a coherent framework for uncertainty Review basic concepts in
More informationVariable Elimination 1
Variable Elimination 1 Inference Exact inference Enumeration Variable elimination Junction trees and belief propagation Approximate inference Loopy belief propagation Sampling methods: likelihood weighting,
More informationBayesian Networks. Read R&N Ch. 14.114.2. Next lecture: Read R&N 18.118.4
Bayesian Networks Read R&N Ch. 14.114.2 Next lecture: Read R&N 18.118.4 You will be expected to know Basic concepts and vocabulary of Bayesian networks. Nodes represent random variables. Directed arcs
More informationData Modeling & Analysis Techniques. Probability & Statistics. Manfred Huber 2011 1
Data Modeling & Analysis Techniques Probability & Statistics Manfred Huber 2011 1 Probability and Statistics Probability and statistics are often used interchangeably but are different, related fields
More information5 Directed acyclic graphs
5 Directed acyclic graphs (5.1) Introduction In many statistical studies we have prior knowledge about a temporal or causal ordering of the variables. In this chapter we will use directed graphs to incorporate
More informationMarginal Independence and Conditional Independence
Marginal Independence and Conditional Independence Computer Science cpsc322, Lecture 26 (Textbook Chpt 6.12) March, 19, 2010 Recap with Example Marginalization Conditional Probability Chain Rule Lecture
More informationLife of A Knowledge Base (KB)
Life of A Knowledge Base (KB) A knowledge base system is a special kind of database management system to for knowledge base management. KB extraction: knowledge extraction using statistical models in NLP/ML
More informationThe Basics of Graphical Models
The Basics of Graphical Models David M. Blei Columbia University October 3, 2015 Introduction These notes follow Chapter 2 of An Introduction to Probabilistic Graphical Models by Michael Jordan. Many figures
More informationA crash course in probability and Naïve Bayes classification
Probability theory A crash course in probability and Naïve Bayes classification Chapter 9 Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s
More informationArtificial Intelligence Mar 27, Bayesian Networks 1 P (T D)P (D) + P (T D)P ( D) =
Artificial Intelligence 15381 Mar 27, 2007 Bayesian Networks 1 Recap of last lecture Probability: precise representation of uncertainty Probability theory: optimal updating of knowledge based on new information
More informationRelational Dynamic Bayesian Networks: a report. Cristina Manfredotti
Relational Dynamic Bayesian Networks: a report Cristina Manfredotti Dipartimento di Informatica, Sistemistica e Comunicazione (D.I.S.Co.) Università degli Studi MilanoBicocca manfredotti@disco.unimib.it
More informationProbability & Statistics Primer Gregory J. Hakim University of Washington 2 January 2009 v2.0
Probability & Statistics Primer Gregory J. Hakim University of Washington 2 January 2009 v2.0 This primer provides an overview of basic concepts and definitions in probability and statistics. We shall
More information3. The Junction Tree Algorithms
A Short Course on Graphical Models 3. The Junction Tree Algorithms Mark Paskin mark@paskin.org 1 Review: conditional independence Two random variables X and Y are independent (written X Y ) iff p X ( )
More informationRandom Variables, Expectation, Distributions
Random Variables, Expectation, Distributions CS 5960/6960: Nonparametric Methods Tom Fletcher January 21, 2009 Review Random Variables Definition A random variable is a function defined on a probability
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Raquel Urtasun and Tamir Hazan TTI Chicago April 4, 2011 Raquel Urtasun and Tamir Hazan (TTIC) Graphical Models April 4, 2011 1 / 22 Bayesian Networks and independences
More informationOutline. Probability. Random Variables. Joint and Marginal Distributions Conditional Distribution Product Rule, Chain Rule, Bayes Rule.
Probability 1 Probability Random Variables Outline Joint and Marginal Distributions Conditional Distribution Product Rule, Chain Rule, Bayes Rule Inference Independence 2 Inference in Ghostbusters A ghost
More informationBayesian vs. Markov Networks
Bayesian vs. Markov Networks Le Song Machine Learning II: dvanced Topics CSE 8803ML, Spring 2012 Conditional Independence ssumptions Local Markov ssumption Global Markov ssumption 𝑋 𝑁𝑜𝑛𝑑𝑒𝑠𝑐𝑒𝑛𝑑𝑎𝑛𝑡𝑋 𝑃𝑎𝑋
More information5. Continuous Random Variables
5. Continuous Random Variables Continuous random variables can take any value in an interval. They are used to model physical characteristics such as time, length, position, etc. Examples (i) Let X be
More informationProbabilistic Networks An Introduction to Bayesian Networks and Influence Diagrams
Probabilistic Networks An Introduction to Bayesian Networks and Influence Diagrams Uffe B. Kjærulff Department of Computer Science Aalborg University Anders L. Madsen HUGIN Expert A/S 10 May 2005 2 Contents
More informationJoint Probability Distributions and Random Samples. Week 5, 2011 Stat 4570/5570 Material from Devore s book (Ed 8), and Cengage
5 Joint Probability Distributions and Random Samples Week 5, 2011 Stat 4570/5570 Material from Devore s book (Ed 8), and Cengage Two Discrete Random Variables The probability mass function (pmf) of a single
More informationExpectation Discrete RV  weighted average Continuous RV  use integral to take the weighted average
PHP 2510 Expectation, variance, covariance, correlation Expectation Discrete RV  weighted average Continuous RV  use integral to take the weighted average Variance Variance is the average of (X µ) 2
More information22c:145 Artificial Intelligence
22c:145 Artificial Intelligence Fall 2005 Uncertainty Cesare Tinelli The University of Iowa Copyright 200105 Cesare Tinelli and Hantao Zhang. a a These notes are copyrighted material and may not be used
More informationThe Joint Probability Distribution (JPD) of a set of n binary variables involve a huge number of parameters
DEFINING PROILISTI MODELS The Joint Probability Distribution (JPD) of a set of n binary variables involve a huge number of parameters 2 n (larger than 10 25 for only 100 variables). x y z p(x, y, z) 0
More informationBayesian probability theory
Bayesian probability theory Bruno A. Olshausen arch 1, 2004 Abstract Bayesian probability theory provides a mathematical framework for peforming inference, or reasoning, using probability. The foundations
More informationSection 6.1 Joint Distribution Functions
Section 6.1 Joint Distribution Functions We often care about more than one random variable at a time. DEFINITION: For any two random variables X and Y the joint cumulative probability distribution function
More informationST 371 (IV): Discrete Random Variables
ST 371 (IV): Discrete Random Variables 1 Random Variables A random variable (rv) is a function that is defined on the sample space of the experiment and that assigns a numerical variable to each possible
More informationFor a partition B 1,..., B n, where B i B j = for i. A = (A B 1 ) (A B 2 ),..., (A B n ) and thus. P (A) = P (A B i ) = P (A B i )P (B i )
Probability Review 15.075 Cynthia Rudin A probability space, defined by Kolmogorov (19031987) consists of: A set of outcomes S, e.g., for the roll of a die, S = {1, 2, 3, 4, 5, 6}, 1 1 2 1 6 for the roll
More informationIntelligent Systems: Reasoning and Recognition. Uncertainty and Plausible Reasoning
Intelligent Systems: Reasoning and Recognition James L. Crowley ENSIMAG 2 / MoSIG M1 Second Semester 2015/2016 Lesson 12 25 march 2015 Uncertainty and Plausible Reasoning MYCIN (continued)...2 Backward
More informationRandom variables P(X = 3) = P(X = 3) = 1 8, P(X = 1) = P(X = 1) = 3 8.
Random variables Remark on Notations 1. When X is a number chosen uniformly from a data set, What I call P(X = k) is called Freq[k, X] in the courseware. 2. When X is a random variable, what I call F ()
More informationThe intuitions in examples from last time give us Dseparation and inference in belief networks
CSC384: Lecture 11 Variable Elimination Last time The intuitions in examples from last time give us Dseparation and inference in belief networks a simple inference algorithm for networks without Today
More informationQuerying Joint Probability Distributions
Querying Joint Probability Distributions Sargur Srihari srihari@cedar.buffalo.edu 1 Queries of Interest Probabilistic Graphical Models (BNs and MNs) represent joint probability distributions over multiple
More informationMarkov random fields and Gibbs measures
Chapter Markov random fields and Gibbs measures 1. Conditional independence Suppose X i is a random element of (X i, B i ), for i = 1, 2, 3, with all X i defined on the same probability space (.F, P).
More informationChapter 4 Lecture Notes
Chapter 4 Lecture Notes Random Variables October 27, 2015 1 Section 4.1 Random Variables A random variable is typically a realvalued function defined on the sample space of some experiment. For instance,
More informationWelcome to Stochastic Processes 1. Welcome to Aalborg University No. 1 of 31
Welcome to Stochastic Processes 1 Welcome to Aalborg University No. 1 of 31 Welcome to Aalborg University No. 2 of 31 Course Plan Part 1: Probability concepts, random variables and random processes Lecturer:
More informationContents. TTM4155: Teletraffic Theory (Teletrafikkteori) Probability Theory Basics. Yuming Jiang. Basic Concepts Random Variables
TTM4155: Teletraffic Theory (Teletrafikkteori) Probability Theory Basics Yuming Jiang 1 Some figures taken from the web. Contents Basic Concepts Random Variables Discrete Random Variables Continuous Random
More informationIntroduction to Probability
Introduction to Probability EE 179, Lecture 15, Handout #24 Probability theory gives a mathematical characterization for experiments with random outcomes. coin toss life of lightbulb binary data sequence
More informationP (x) 0. Discrete random variables Expected value. The expected value, mean or average of a random variable x is: xp (x) = v i P (v i )
Discrete random variables Probability mass function Given a discrete random variable X taking values in X = {v 1,..., v m }, its probability mass function P : X [0, 1] is defined as: P (v i ) = Pr[X =
More informationWhat is Statistics? Lecture 1. Introduction and probability review. Idea of parametric inference
0. 1. Introduction and probability review 1.1. What is Statistics? What is Statistics? Lecture 1. Introduction and probability review There are many definitions: I will use A set of principle and procedures
More informationContinuous Random Variables
Probability 2  Notes 7 Continuous Random Variables Definition. A random variable X is said to be a continuous random variable if there is a function f X (x) (the probability density function or p.d.f.)
More informationProbability distributions
Probability distributions (Notes are heavily adapted from Harnett, Ch. 3; Hayes, sections 2.142.19; see also Hayes, Appendix B.) I. Random variables (in general) A. So far we have focused on single events,
More informationCS 331: Artificial Intelligence Fundamentals of Probability II. Joint Probability Distribution. Marginalization. Marginalization
Full Joint Probability Distributions CS 331: Artificial Intelligence Fundamentals of Probability II Toothache Cavity Catch Toothache, Cavity, Catch false false false 0.576 false false true 0.144 false
More informationp if x = 1 1 p if x = 0
Probability distributions Bernoulli distribution Two possible values (outcomes): 1 (success), 0 (failure). Parameters: p probability of success. Probability mass function: P (x; p) = { p if x = 1 1 p if
More informationData Mining Chapter 6: Models and Patterns Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University
Data Mining Chapter 6: Models and Patterns Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Models vs. Patterns Models A model is a high level, global description of a
More informationCourse: Model, Learning, and Inference: Lecture 5
Course: Model, Learning, and Inference: Lecture 5 Alan Yuille Department of Statistics, UCLA Los Angeles, CA 90095 yuille@stat.ucla.edu Abstract Probability distributions on structured representation.
More informationProbability for Estimation (review)
Probability for Estimation (review) In general, we want to develop an estimator for systems of the form: x = f x, u + η(t); y = h x + ω(t); ggggg y, ffff x We will primarily focus on discrete time linear
More informationCOS513 LECTURE 5 JUNCTION TREE ALGORITHM
COS513 LECTURE 5 JUNCTION TREE ALGORITHM D. EIS AND V. KOSTINA The junction tree algorithm is the culmination of the way graph theory and probability combine to form graphical models. After we discuss
More informationMATH 10: Elementary Statistics and Probability Chapter 5: Continuous Random Variables
MATH 10: Elementary Statistics and Probability Chapter 5: Continuous Random Variables Tony Pourmohamad Department of Mathematics De Anza College Spring 2015 Objectives By the end of this set of slides,
More informationSTAT 430/510 Probability Lecture 14: Joint Probability Distribution, Continuous Case
STAT 430/510 Probability Lecture 14: Joint Probability Distribution, Continuous Case Pengyuan (Penelope) Wang June 20, 2011 Joint density function of continuous Random Variable When X and Y are two continuous
More informationModelbased Synthesis. Tony O Hagan
Modelbased Synthesis Tony O Hagan Stochastic models Synthesising evidence through a statistical model 2 Evidence Synthesis (Session 3), Helsinki, 28/10/11 Graphical modelling The kinds of models that
More information4. Joint Distributions
Virtual Laboratories > 2. Distributions > 1 2 3 4 5 6 7 8 4. Joint Distributions Basic Theory As usual, we start with a random experiment with probability measure P on an underlying sample space. Suppose
More informationMAS108 Probability I
1 QUEEN MARY UNIVERSITY OF LONDON 2:30 pm, Thursday 3 May, 2007 Duration: 2 hours MAS108 Probability I Do not start reading the question paper until you are instructed to by the invigilators. The paper
More informationContinuous Random Variables
Continuous Random Variables COMP 245 STATISTICS Dr N A Heard Contents 1 Continuous Random Variables 2 11 Introduction 2 12 Probability Density Functions 3 13 Transformations 5 2 Mean, Variance and Quantiles
More informationContinuous Random Variables and Probability Distributions. Stat 4570/5570 Material from Devore s book (Ed 8) Chapter 4  and Cengage
4 Continuous Random Variables and Probability Distributions Stat 4570/5570 Material from Devore s book (Ed 8) Chapter 4  and Cengage Continuous r.v. A random variable X is continuous if possible values
More information3 Multiple Discrete Random Variables
3 Multiple Discrete Random Variables 3.1 Joint densities Suppose we have a probability space (Ω, F,P) and now we have two discrete random variables X and Y on it. They have probability mass functions f
More informationSTA 256: Statistics and Probability I
Al Nosedal. University of Toronto. Fall 2014 1 2 3 4 5 My momma always said: Life was like a box of chocolates. You never know what you re gonna get. Forrest Gump. Experiment, outcome, sample space, and
More informationSet theory, and set operations
1 Set theory, and set operations Sayan Mukherjee Motivation It goes without saying that a Bayesian statistician should know probability theory in depth to model. Measure theory is not needed unless we
More informationProbability Theory. Elementary rules of probability Sum rule. Product rule. p. 23
Probability Theory Uncertainty is key concept in machine learning. Probability provides consistent framework for the quantification and manipulation of uncertainty. Probability of an event is the fraction
More informationLecture 4: BK inequality 27th August and 6th September, 2007
CSL866: Percolation and Random Graphs IIT Delhi Amitabha Bagchi Scribe: Arindam Pal Lecture 4: BK inequality 27th August and 6th September, 2007 4. Preliminaries The FKG inequality allows us to lower bound
More informationBayesian Updating with Discrete Priors Class 11, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom
1 Learning Goals Bayesian Updating with Discrete Priors Class 11, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom 1. Be able to apply Bayes theorem to compute probabilities. 2. Be able to identify
More informationThe Exponential Family
The Exponential Family David M. Blei Columbia University November 3, 2015 Definition A probability density in the exponential family has this form where p.x j / D h.x/ expf > t.x/ a./g; (1) is the natural
More informationQuestion 2 Naïve Bayes (16 points)
Question 2 Naïve Bayes (16 points) About 2/3 of your email is spam so you downloaded an open source spam filter based on word occurrences that uses the Naive Bayes classifier. Assume you collected the
More informationDiscrete Mathematics and Probability Theory Fall 2009 Satish Rao,David Tse Note 11
CS 70 Discrete Mathematics and Probability Theory Fall 2009 Satish Rao,David Tse Note Conditional Probability A pharmaceutical company is marketing a new test for a certain medical condition. According
More informationFinite and discrete probability distributions
8 Finite and discrete probability distributions To understand the algorithmic aspects of number theory and algebra, and applications such as cryptography, a firm grasp of the basics of probability theory
More informationLecture 6: Discrete & Continuous Probability and Random Variables
Lecture 6: Discrete & Continuous Probability and Random Variables D. Alex Hughes Math Camp September 17, 2015 D. Alex Hughes (Math Camp) Lecture 6: Discrete & Continuous Probability and Random September
More informationSummary of Probability
Summary of Probability Mathematical Physics I Rules of Probability The probability of an event is called P(A), which is a positive number less than or equal to 1. The total probability for all possible
More informationQ1. [19 pts] Cheating at cs188blackjack
CS 188 Spring 2012 Introduction to Artificial Intelligence Practice Midterm II Solutions Q1. [19 pts] Cheating at cs188blackjack Cheating dealers have become a serious problem at the cs188blackjack tables.
More informationUniversity of California, Los Angeles Department of Statistics. Random variables
University of California, Los Angeles Department of Statistics Statistics Instructor: Nicolas Christou Random variables Discrete random variables. Continuous random variables. Discrete random variables.
More informationThe sample space for a pair of die rolls is the set. The sample space for a random number between 0 and 1 is the interval [0, 1].
Probability Theory Probability Spaces and Events Consider a random experiment with several possible outcomes. For example, we might roll a pair of dice, flip a coin three times, or choose a random real
More informationExamples and Proofs of Inference in Junction Trees
Examples and Proofs of Inference in Junction Trees Peter Lucas LIAC, Leiden University February 4, 2016 1 Representation and notation Let P(V) be a joint probability distribution, where V stands for a
More informationJoint Probability Distributions and Random Samples (Devore Chapter Five)
Joint Probability Distributions and Random Samples (Devore Chapter Five) 101634501 Probability and Statistics for Engineers Winter 20102011 Contents 1 Joint Probability Distributions 1 1.1 Two Discrete
More informationJointly Distributed Random Variables
Jointly Distributed Random Variables COMP 245 STATISTICS Dr N A Heard Contents 1 Jointly Distributed Random Variables 1 1.1 Definition......................................... 1 1.2 Joint cdfs..........................................
More informationSummary of Formulas and Concepts. Descriptive Statistics (Ch. 14)
Summary of Formulas and Concepts Descriptive Statistics (Ch. 14) Definitions Population: The complete set of numerical information on a particular quantity in which an investigator is interested. We assume
More informationStatistical Inference. Prof. Kate Calder. If the coin is fair (chance of heads = chance of tails) then
Probability Statistical Inference Question: How often would this method give the correct answer if I used it many times? Answer: Use laws of probability. 1 Example: Tossing a coin If the coin is fair (chance
More informationUncertainty in AI. Uncertainty I: Probability as Degree of Belief. The (predominantly logicbased) methods covered so far have assorted
We now examine: Uncertainty I: Probability as Degree of Belief how probability theory might be used to represent and reason with knowledge when we are uncertain about the world; how inference in the presence
More informationBayesian Tutorial (Sheet Updated 20 March)
Bayesian Tutorial (Sheet Updated 20 March) Practice Questions (for discussing in Class) Week starting 21 March 2016 1. What is the probability that the total of two dice will be greater than 8, given that
More informationDefinition: Suppose that two random variables, either continuous or discrete, X and Y have joint density
HW MATH 461/561 Lecture Notes 15 1 Definition: Suppose that two random variables, either continuous or discrete, X and Y have joint density and marginal densities f(x, y), (x, y) Λ X,Y f X (x), x Λ X,
More informationST 371 (VIII): Theory of Joint Distributions
ST 371 (VIII): Theory of Joint Distributions So far we have focused on probability distributions for single random variables. However, we are often interested in probability statements concerning two or
More informationBusiness Statistics 41000: Probability 1
Business Statistics 41000: Probability 1 Drew D. Creal University of Chicago, Booth School of Business Week 3: January 24 and 25, 2014 1 Class information Drew D. Creal Email: dcreal@chicagobooth.edu Office:
More informationIntroduction to Markov Chain Monte Carlo
Introduction to Markov Chain Monte Carlo Monte Carlo: sample from a distribution to estimate the distribution to compute max, mean Markov Chain Monte Carlo: sampling using local information Generic problem
More informationImportant Probability Distributions OPRE 6301
Important Probability Distributions OPRE 6301 Important Distributions... Certain probability distributions occur with such regularity in reallife applications that they have been given their own names.
More informationChapter 10: Introducing Probability
Chapter 10: Introducing Probability Randomness and Probability So far, in the first half of the course, we have learned how to examine and obtain data. Now we turn to another very important aspect of Statistics
More informationP(a X b) = f X (x)dx. A p.d.f. must integrate to one: f X (x)dx = 1. Z b
Continuous Random Variables The probability that a continuous random variable, X, has a value between a and b is computed by integrating its probability density function (p.d.f.) over the interval [a,b]:
More informationProbabilities and Random Variables
Probabilities and Random Variables This is an elementary overview of the basic concepts of probability theory. 1 The Probability Space The purpose of probability theory is to model random experiments so
More informationMATH 201. Final ANSWERS August 12, 2016
MATH 01 Final ANSWERS August 1, 016 Part A 1. 17 points) A bag contains three different types of dice: four 6sided dice, five 8sided dice, and six 0sided dice. A die is drawn from the bag and then rolled.
More informationExercises with solutions (1)
Exercises with solutions (). Investigate the relationship between independence and correlation. (a) Two random variables X and Y are said to be correlated if and only if their covariance C XY is not equal
More informationMASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 5 9/17/2008 RANDOM VARIABLES
MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 5 9/17/2008 RANDOM VARIABLES Contents 1. Random variables and measurable functions 2. Cumulative distribution functions 3. Discrete
More informationE3: PROBABILITY AND STATISTICS lecture notes
E3: PROBABILITY AND STATISTICS lecture notes 2 Contents 1 PROBABILITY THEORY 7 1.1 Experiments and random events............................ 7 1.2 Certain event. Impossible event............................
More information4. Joint Distributions of Two Random Variables
4. Joint Distributions of Two Random Variables 4.1 Joint Distributions of Two Discrete Random Variables Suppose the discrete random variables X and Y have supports S X and S Y, respectively. The joint
More information