# Probability, Conditional Independence

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 Probability, Conditional Independence June 19, 2012 Probability, Conditional Independence

2 Probability Sample space Ω of events Each event ω Ω has an associated measure Probability of the event P(ω) Axioms of Probability: ω, P(ω) [0, 1] P(Ω) = 1 P(ω 1 ω 2 ) = P(ω 1 ) + P(ω 2 ) P(ω 1 ω 2 ) Note: We are being informal Probability, Conditional Independence

3 Random Variables Random variables are mappings of events (to real numbers) Mapping X : Ω R Any event ω maps to X (ω) Example: Tossing a coin has two possible outcomes Denoted by {H, T } or {0, 1} Fair coin has uniform probabilities P(X = 0) = 1 2 P(X = 1) = 1 2 Random variables (r.v.s) can be Discrete, e.g., Bernoulli Continuous, e.g., Gaussian Probability, Conditional Independence

4 Distribution, Density For a continuous r.v. Distribution function F (x) = P(X x) Corresponding density function f (x)dx = df (x) Note that F (x) = x t= f (t)dt For a discrete r.v. Probability mass function f (x) = P(X = x) = P(x) We will call this the probability of a discrete event Distribution function F (x) = P(X x) Probability, Conditional Independence

5 Joint Distributions, Marginals For two continuous r.v.s X 1, X 2 Joint distribution F (x 1, x 2 ) = P(X 1 x 1, X 2 x 2 ) Joint density function f (x 1, x 2 ) can be defined as before The marginal probability density f (x 1 ) = x 2= f (x 1, x 2 )dx 2 For two discrete r.v.s X 1, X 2 Joint probability f (x 1, x 2 ) = P(X 1 = x 1, X 2 = x 2 ) = P(x 1, x 2 ) The marginal probability P(X 1 = x 1 ) = x 2 P(X 1 = x 1, X 2 = x 2 ) Can be extended to joint distribution over several r.v.s Many hard problems involve computing marginals Probability, Conditional Independence

6 Expectation The expected value of a r.v. X For continuous r.v.s E[X ] = x xp(x)dx For discrete r.v. E[X ] = i x ip i Expectation is a linear operator E[aX + by + c] = ae[x ] + be[y ] + c Probability, Conditional Independence

7 Independence Joint probability P(X 1 = x 1, X 2 = x 2 ) Probability, Conditional Independence

8 Independence Joint probability P(X 1 = x 1, X 2 = x 2 ) X 1, X 2 are different dice Probability, Conditional Independence

9 Independence Joint probability P(X 1 = x 1, X 2 = x 2 ) X 1, X 2 are different dice X 1 denotes if grass is wet, X 2 denotes if sprinkler was on Probability, Conditional Independence

10 Independence Joint probability P(X 1 = x 1, X 2 = x 2 ) X 1, X 2 are different dice X 1 denotes if grass is wet, X 2 denotes if sprinkler was on Two r.v.s are independent if P(X 1 = x 1, X 2 = x 2 ) = P(X 1 = x 1 )P(X 2 = x 2 ) Probability, Conditional Independence

11 Independence Joint probability P(X 1 = x 1, X 2 = x 2 ) X 1, X 2 are different dice X 1 denotes if grass is wet, X 2 denotes if sprinkler was on Two r.v.s are independent if P(X 1 = x 1, X 2 = x 2 ) = P(X 1 = x 1 )P(X 2 = x 2 ) Two different dice are independent Probability, Conditional Independence

12 Independence Joint probability P(X 1 = x 1, X 2 = x 2 ) X 1, X 2 are different dice X 1 denotes if grass is wet, X 2 denotes if sprinkler was on Two r.v.s are independent if P(X 1 = x 1, X 2 = x 2 ) = P(X 1 = x 1 )P(X 2 = x 2 ) Two different dice are independent If sprinkler was on, then grass will be wet dependent Probability, Conditional Independence

13 Conditional Probability, Bayes Rule Grass Wet Grass Dry Sprinkler On Sprinkler Off Inference problems: Probability, Conditional Independence

14 Conditional Probability, Bayes Rule Grass Wet Grass Dry Sprinkler On Sprinkler Off Inference problems: Given grass wet what is P( sprinkler on grass wet ) Probability, Conditional Independence

15 Conditional Probability, Bayes Rule Grass Wet Grass Dry Sprinkler On Sprinkler Off Inference problems: Given grass wet what is P( sprinkler on grass wet ) Given symptom what is P( disease symptom ) Probability, Conditional Independence

16 Conditional Probability, Bayes Rule Grass Wet Grass Dry Sprinkler On Sprinkler Off Inference problems: Given grass wet what is P( sprinkler on grass wet ) Given symptom what is P( disease symptom ) For any r.v.s X, Y, the conditional probability P(x y) = P(x, y) P(y) Probability, Conditional Independence

17 Conditional Probability, Bayes Rule Grass Wet Grass Dry Sprinkler On Sprinkler Off Inference problems: Given grass wet what is P( sprinkler on grass wet ) Given symptom what is P( disease symptom ) For any r.v.s X, Y, the conditional probability P(x y) = P(x, y) P(y) Since P(x, y) = P(y x)p(x), we have P(y x) = P(x y)p(y) P(x) Probability, Conditional Independence

18 Conditional Probability, Bayes Rule Grass Wet Grass Dry Sprinkler On Sprinkler Off Inference problems: Given grass wet what is P( sprinkler on grass wet ) Given symptom what is P( disease symptom ) For any r.v.s X, Y, the conditional probability P(x y) = P(x, y) P(y) Since P(x, y) = P(y x)p(x), we have P(y x) = P(x y)p(y) P(x) Expressing posterior in terms of conditional : Bayes Rule Probability, Conditional Independence

19 Product Rule & Independence Product Rule: For X 1, X 2, P(X 1, X 2 ) = P(X 1 )P(X 2 X 1 ) For X 1, X 2, X 3, P(X 1, X 2, X 3 ) = P(X 1 )P(X 2 X 1 )P(X 3 X 1, X 2 ) In general, the chain rule n P(X 1,, X n ) = P(X i X 1,..., X i 1 ) i=1 Joint distribution of n Boolean variables Specification requires 2 n 1 parameters Recall Independence: For X 1, X 2, P(X 1, X 2 ) = P(X 1 )P(X 2 ) In general n P(X 1,, X n ) = P(X i ) i=1 Independence reduces specification to n parameters Probability, Conditional Independence

20 Independence Cavity Toothache Catch Weather decomposes into Cavity Toothache Catch Weather Consider 4 variables: Toothache,Catch,Cavity,Weather Probability, Conditional Independence

21 Independence Cavity Toothache Catch Weather decomposes into Cavity Toothache Catch Weather Consider 4 variables: Toothache,Catch,Cavity,Weather Independence implies P(Toothache,Catch, Cavity, Weather) = P(Toothache, Catch, Cavity)P(Weather) Probability, Conditional Independence

22 Independence Cavity Toothache Catch Weather decomposes into Cavity Toothache Catch Weather Consider 4 variables: Toothache,Catch,Cavity,Weather Independence implies P(Toothache,Catch, Cavity, Weather) = P(Toothache, Catch, Cavity)P(Weather) In terms of the joint distribution: Probability, Conditional Independence

23 Independence Cavity Toothache Catch Weather decomposes into Cavity Toothache Catch Weather Consider 4 variables: Toothache,Catch,Cavity,Weather Independence implies P(Toothache,Catch, Cavity, Weather) In terms of the joint distribution: 32 parameters reduced to 12 = P(Toothache, Catch, Cavity)P(Weather) Probability, Conditional Independence

24 Independence Cavity Toothache Catch Weather decomposes into Cavity Toothache Catch Weather Consider 4 variables: Toothache,Catch,Cavity,Weather Independence implies P(Toothache,Catch, Cavity, Weather) = P(Toothache, Catch, Cavity)P(Weather) In terms of the joint distribution: 32 parameters reduced to 12 For boolean variables 2 n 1 reduces to n Probability, Conditional Independence

25 Independence Cavity Toothache Catch Weather decomposes into Cavity Toothache Catch Weather Consider 4 variables: Toothache,Catch,Cavity,Weather Independence implies P(Toothache,Catch, Cavity, Weather) = P(Toothache, Catch, Cavity)P(Weather) In terms of the joint distribution: 32 parameters reduced to 12 For boolean variables 2 n 1 reduces to n Absolute independence powerful but rare Probability, Conditional Independence

26 Conditional Independence X and Y are conditionally independent given Z P(X, Y Z) = P(X Z)P(Y Z) Probability, Conditional Independence

27 Conditional Independence X and Y are conditionally independent given Z P(X, Y Z) = P(X Z)P(Y Z) Example: P(Toothache,Catch Cavity) = P(Toothache Cavity)P(Catch Cavity) Probability, Conditional Independence

28 Conditional Independence X and Y are conditionally independent given Z P(X, Y Z) = P(X Z)P(Y Z) Example: P(Toothache,Catch Cavity) = P(Toothache Cavity)P(Catch Cavity) Conditional Independence simplifies joint distributions Probability, Conditional Independence

29 Conditional Independence X and Y are conditionally independent given Z P(X, Y Z) = P(X Z)P(Y Z) Example: P(Toothache,Catch Cavity) = P(Toothache Cavity)P(Catch Cavity) Conditional Independence simplifies joint distributions Often reduces from exponential to linear in n P(X, Y, Z) = P(Z)P(X Z)P(Y Z) Probability, Conditional Independence

30 Naive Bayes Model If X 1,..., X n are independent given Y n P(Y, X 1,..., X n ) = P(Y ) P(X i Y ) i=1 Probability, Conditional Independence

31 Naive Bayes Model If X 1,..., X n are independent given Y P(Y, X 1,..., X n ) = P(Y ) Example: P(Cavity,Toothache, Catch) n P(X i Y ) i=1 = P(Cavity)P(Toothache Cavity)P(Catch Cavity) Probability, Conditional Independence

32 Naive Bayes Model If X 1,..., X n are independent given Y P(Y, X 1,..., X n ) = P(Y ) Example: P(Cavity,Toothache, Catch) More generally n P(X i Y ) i=1 = P(Cavity)P(Toothache Cavity)P(Catch Cavity) P(Cause, Effect 1,..., Effect n ) = P(Cause) n P(Effect i Cause) i=1 Probability, Conditional Independence

33 Naive Bayes Model If X 1,..., X n are independent given Y P(Y, X 1,..., X n ) = P(Y ) Example: P(Cavity,Toothache, Catch) More generally n P(X i Y ) i=1 = P(Cavity)P(Toothache Cavity)P(Catch Cavity) P(Cause, Effect 1,..., Effect n ) = P(Cause) n P(Effect i Cause) i=1 Cavity Cause Toothache Catch Effect 1 Effect n Probability, Conditional Independence

34 Bayesian networks A simple, graphical notation for conditional independence assertions and hence for compact specification of full joint distributions

35 Bayesian networks A simple, graphical notation for conditional independence assertions and hence for compact specification of full joint distributions Syntax A set of nodes, one per variable A directed, acyclic graph (link implies direct influence) A conditional distribution for each node given its parents

36 Bayesian networks A simple, graphical notation for conditional independence assertions and hence for compact specification of full joint distributions Syntax A set of nodes, one per variable A directed, acyclic graph (link implies direct influence) A conditional distribution for each node given its parents Conditional distributions For each X i, P(X i Parents(X i )) In the form of a conditional probability table (CPT) Distribution of X i for each combination of parent values

37 Example Topology of network encodes conditional independence assertions Weather Cavity Toothache Catch Weather is independent of the other variables Toothache, Catch are conditionally independent given Cavity

38 Example I m at work, neighbor John calls to say my alarm is ringing, but neighbor Mary doesn t call. Sometimes it s set off by minor earthquakes. Is there a burglar? Variables: Burglar, Earthquake, Alarm, JohnCalls, MaryCalls

39 Example I m at work, neighbor John calls to say my alarm is ringing, but neighbor Mary doesn t call. Sometimes it s set off by minor earthquakes. Is there a burglar? Variables: Burglar, Earthquake, Alarm, JohnCalls, MaryCalls Network topology reflects causal knowledge

40 Example I m at work, neighbor John calls to say my alarm is ringing, but neighbor Mary doesn t call. Sometimes it s set off by minor earthquakes. Is there a burglar? Variables: Burglar, Earthquake, Alarm, JohnCalls, MaryCalls Network topology reflects causal knowledge A burglar can set the alarm off

41 Example I m at work, neighbor John calls to say my alarm is ringing, but neighbor Mary doesn t call. Sometimes it s set off by minor earthquakes. Is there a burglar? Variables: Burglar, Earthquake, Alarm, JohnCalls, MaryCalls Network topology reflects causal knowledge A burglar can set the alarm off An earthquake can set the alarm off

42 Example I m at work, neighbor John calls to say my alarm is ringing, but neighbor Mary doesn t call. Sometimes it s set off by minor earthquakes. Is there a burglar? Variables: Burglar, Earthquake, Alarm, JohnCalls, MaryCalls Network topology reflects causal knowledge A burglar can set the alarm off An earthquake can set the alarm off The alarm can cause Mary to call

43 Example I m at work, neighbor John calls to say my alarm is ringing, but neighbor Mary doesn t call. Sometimes it s set off by minor earthquakes. Is there a burglar? Variables: Burglar, Earthquake, Alarm, JohnCalls, MaryCalls Network topology reflects causal knowledge A burglar can set the alarm off An earthquake can set the alarm off The alarm can cause Mary to call The alarm can cause John to call

44 Example (Contd.) Burglary P(B).001 Earthquake P(E).002 B T T F F E T F T F P(A B,E) Alarm JohnCalls A T F P(J A) MaryCalls A T F P(M A).70.01

45 Compactness B E A J M A CPT for Boolean X i with k Boolean parents

46 Compactness B E A J M A CPT for Boolean X i with k Boolean parents 2 k rows for the combinations of parent values

47 Compactness B E A J M A CPT for Boolean X i with k Boolean parents 2 k rows for the combinations of parent values Each row requires one number

48 Compactness B E A J M A CPT for Boolean X i with k Boolean parents 2 k rows for the combinations of parent values Each row requires one number Each variable has no more than k parents

49 Compactness B E A J M A CPT for Boolean X i with k Boolean parents 2 k rows for the combinations of parent values Each row requires one number Each variable has no more than k parents The complete network requires O(n 2 k ) numbers

50 Compactness B E A J M A CPT for Boolean X i with k Boolean parents 2 k rows for the combinations of parent values Each row requires one number Each variable has no more than k parents The complete network requires O(n 2 k ) numbers Grows linearly with n

51 Compactness B E A J M A CPT for Boolean X i with k Boolean parents 2 k rows for the combinations of parent values Each row requires one number Each variable has no more than k parents The complete network requires O(n 2 k ) numbers Grows linearly with n Full joint distribution requires O(2 n )

52 Compactness B E A J M A CPT for Boolean X i with k Boolean parents 2 k rows for the combinations of parent values Each row requires one number Each variable has no more than k parents The complete network requires O(n 2 k ) numbers Grows linearly with n Full joint distribution requires O(2 n ) Example: Burglary network

53 Compactness B E A J M A CPT for Boolean X i with k Boolean parents 2 k rows for the combinations of parent values Each row requires one number Each variable has no more than k parents The complete network requires O(n 2 k ) numbers Grows linearly with n Full joint distribution requires O(2 n ) Example: Burglary network Full joint distribution requires = 31 numbers

54 Compactness B E A J M A CPT for Boolean X i with k Boolean parents 2 k rows for the combinations of parent values Each row requires one number Each variable has no more than k parents The complete network requires O(n 2 k ) numbers Grows linearly with n Full joint distribution requires O(2 n ) Example: Burglary network Full joint distribution requires = 31 numbers Bayes net requires 10 numbers

55 Global semantics Full joint distribution

56 Global semantics Full joint distribution Can be written as product of local conditionals

57 Global semantics Full joint distribution Can be written as product of local conditionals Example: P(j, m, a, b, e) = P( b)p( e)p(a b, e)p(j a)p(m a)

58 Global semantics Full joint distribution Can be written as product of local conditionals Example: P(j, m, a, b, e) = P( b)p( e)p(a b, e)p(j a)p(m a) Example: P(j, m, a, b, e) = P(b)P( e)p(a b, e)p(j a)p( m a)

59 Global semantics Full joint distribution Can be written as product of local conditionals Example: P(j, m, a, b, e) = P( b)p( e)p(a b, e)p(j a)p(m a) Example: P(j, m, a, b, e) = P(b)P( e)p(a b, e)p(j a)p( m a) Can we compute P(b j, m)?

60 Local semantics Each node is conditionally independent of its nondescendants given its parents U 1... U m Z 1j X Z nj Y 1... Y n

61 Markov blanket Each node is conditionally independent of all others given its Markov blanket, i.e., parents + children + children s parents U 1... U m Z 1j X Z nj Y 1... Y n

62 Conditional Independence in BNs Which BNs support x 1 x 2 x 3 For (a), x 1, x 2 are dependent, x 3 is a collider

63 Conditional Independence in BNs Which BNs support x 1 x 2 x 3 For (a), x 1, x 2 are dependent, x 3 is a collider For (b)-(d), x 1 x 2 x 3

64 Conditional Independence (Contd.) Which BNs support x y z For (a)-(b), z is not a collider, so x y z

65 Conditional Independence (Contd.) Which BNs support x y z For (a)-(b), z is not a collider, so x y z For (c), z is a collider, so x and y are conditionally dependent

66 Conditional Independence (Contd.) Which BNs support x y z For (a)-(b), z is not a collider, so x y z For (c), z is a collider, so x and y are conditionally dependent For (d), w is a collider, and z is a descendent of w, so x and y are conditionally dependent

67 d-connection, d-separation Definition (d-connection): X, Y, Z be disjoint sets of vertices in a directed graph G. X, Y is d-connected by Z iff an undirected path U between some x X, y Y such that for every collider C on U

68 d-connection, d-separation Definition (d-connection): X, Y, Z be disjoint sets of vertices in a directed graph G. X, Y is d-connected by Z iff an undirected path U between some x X, y Y such that for every collider C on U Either C or a descendent of C is in Z

69 d-connection, d-separation Definition (d-connection): X, Y, Z be disjoint sets of vertices in a directed graph G. X, Y is d-connected by Z iff an undirected path U between some x X, y Y such that for every collider C on U Either C or a descendent of C is in Z No non-collider on U is in Z

70 d-connection, d-separation Definition (d-connection): X, Y, Z be disjoint sets of vertices in a directed graph G. X, Y is d-connected by Z iff an undirected path U between some x X, y Y such that for every collider C on U Either C or a descendent of C is in Z No non-collider on U is in Z Otherwise X and Y are d-separated by Z

71 d-connection, d-separation Definition (d-connection): X, Y, Z be disjoint sets of vertices in a directed graph G. X, Y is d-connected by Z iff an undirected path U between some x X, y Y such that for every collider C on U Either C or a descendent of C is in Z No non-collider on U is in Z Otherwise X and Y are d-separated by Z If Z d-separates X and Y, then X Y Z for all distributions represented by the graph

72 Conditional Independence (Contd.) Examples For (a), a e b; but a, e are dependent given {b, d}

73 Conditional Independence (Contd.) Examples For (a), a e b; but a, e are dependent given {b, d} For (b) a and e are dependent given b; c and e are unconditionally dependent

74 Conditional Independence: More Examples (a) (b) For (a), Is a c e? Is a e b? Is a e c? For (b), Is a e d? Is a e c? Is a c b

75 Example: Car diagnosis Initial evidence: car won t start

76 Example: Car diagnosis Initial evidence: car won t start Testable variables (green), broken, so fix it variables (orange) Hidden variables (gray) ensure sparse structure, reduce parameters battery age alternator broken fanbelt broken battery dead no charging battery meter battery flat no oil no gas fuel line blocked starter broken lights oil light gas gauge car won t start dipstick

77 Example: Car insurance Age GoodStudent RiskAversion SeniorTrain SocioEcon Mileage VehicleYear ExtraCar DrivingSkill MakeModel DrivingHist Antilock DrivQuality Airbag CarValue HomeBase AntiTheft Ruggedness Accident OwnDamage Theft Cushioning OtherCost OwnCost MedicalCost LiabilityCost PropertyCost

78 Inference Burglary P(B).001 Earthquake P(E).002 B T T F F E T F T F P(A B,E) Alarm JohnCalls A T F P(J A) MaryCalls A T F P(M A) How can we compute P(b j, m)?

### Bayesian Networks Chapter 14. Mausam (Slides by UW-AI faculty & David Page)

Bayesian Networks Chapter 14 Mausam (Slides by UW-AI faculty & David Page) Bayes Nets In general, joint distribution P over set of variables (X 1 x... x X n ) requires exponential space for representation

### 13.3 Inference Using Full Joint Distribution

191 The probability distribution on a single variable must sum to 1 It is also true that any joint probability distribution on any set of variables must sum to 1 Recall that any proposition a is equivalent

### Bayesian Networks. Mausam (Slides by UW-AI faculty)

Bayesian Networks Mausam (Slides by UW-AI faculty) Bayes Nets In general, joint distribution P over set of variables (X 1 x... x X n ) requires exponential space for representation & inference BNs provide

### Artificial Intelligence. Conditional probability. Inference by enumeration. Independence. Lesson 11 (From Russell & Norvig)

Artificial Intelligence Conditional probability Conditional or posterior probabilities e.g., cavity toothache) = 0.8 i.e., given that toothache is all I know tation for conditional distributions: Cavity

### Logic, Probability and Learning

Logic, Probability and Learning Luc De Raedt luc.deraedt@cs.kuleuven.be Overview Logic Learning Probabilistic Learning Probabilistic Logic Learning Closely following : Russell and Norvig, AI: a modern

### Artificial Intelligence

Artificial Intelligence ICS461 Fall 2010 Nancy E. Reed nreed@hawaii.edu 1 Lecture #13 Uncertainty Outline Uncertainty Probability Syntax and Semantics Inference Independence and Bayes' Rule Uncertainty

### Outline. Uncertainty. Uncertainty. Making decisions under uncertainty. Probability. Methods for handling uncertainty

Outline Uncertainty Chapter 13 Uncertainty Probability Syntax and Semantics Inference Independence and Bayes' Rule Uncertainty Let action A t = leave for airport t minutes before flight Will A t get me

### Basics of Probability

Basics of Probability 1 Sample spaces, events and probabilities Begin with a set Ω the sample space e.g., 6 possible rolls of a die. ω Ω is a sample point/possible world/atomic event A probability space

### CS 188: Artificial Intelligence. Probability recap

CS 188: Artificial Intelligence Bayes Nets Representation and Independence Pieter Abbeel UC Berkeley Many slides over this course adapted from Dan Klein, Stuart Russell, Andrew Moore Conditional probability

### / Informatica. Intelligent Systems. Agents dealing with uncertainty. Example: driving to the airport. Why logic fails

Intelligent Systems Prof. dr. Paul De Bra Technische Universiteit Eindhoven debra@win.tue.nl Agents dealing with uncertainty Uncertainty Probability Syntax and Semantics Inference Independence and Bayes'

### Lecture 2: Introduction to belief (Bayesian) networks

Lecture 2: Introduction to belief (Bayesian) networks Conditional independence What is a belief network? Independence maps (I-maps) January 7, 2008 1 COMP-526 Lecture 2 Recall from last time: Conditional

### Uncertainty. Chapter 13. Chapter 13 1

Uncertainty Chapter 13 Chapter 13 1 Attribution Modified from Stuart Russell s slides (Berkeley) Chapter 13 2 Outline Uncertainty Probability Syntax and Semantics Inference Independence and Bayes Rule

### CS 771 Artificial Intelligence. Reasoning under uncertainty

CS 771 Artificial Intelligence Reasoning under uncertainty Today Representing uncertainty is useful in knowledge bases Probability provides a coherent framework for uncertainty Review basic concepts in

### Variable Elimination 1

Variable Elimination 1 Inference Exact inference Enumeration Variable elimination Junction trees and belief propagation Approximate inference Loopy belief propagation Sampling methods: likelihood weighting,

### Bayesian Networks. Read R&N Ch. 14.1-14.2. Next lecture: Read R&N 18.1-18.4

Bayesian Networks Read R&N Ch. 14.1-14.2 Next lecture: Read R&N 18.1-18.4 You will be expected to know Basic concepts and vocabulary of Bayesian networks. Nodes represent random variables. Directed arcs

### Data Modeling & Analysis Techniques. Probability & Statistics. Manfred Huber 2011 1

Data Modeling & Analysis Techniques Probability & Statistics Manfred Huber 2011 1 Probability and Statistics Probability and statistics are often used interchangeably but are different, related fields

### 5 Directed acyclic graphs

5 Directed acyclic graphs (5.1) Introduction In many statistical studies we have prior knowledge about a temporal or causal ordering of the variables. In this chapter we will use directed graphs to incorporate

### Marginal Independence and Conditional Independence

Marginal Independence and Conditional Independence Computer Science cpsc322, Lecture 26 (Textbook Chpt 6.1-2) March, 19, 2010 Recap with Example Marginalization Conditional Probability Chain Rule Lecture

### Life of A Knowledge Base (KB)

Life of A Knowledge Base (KB) A knowledge base system is a special kind of database management system to for knowledge base management. KB extraction: knowledge extraction using statistical models in NLP/ML

### The Basics of Graphical Models

The Basics of Graphical Models David M. Blei Columbia University October 3, 2015 Introduction These notes follow Chapter 2 of An Introduction to Probabilistic Graphical Models by Michael Jordan. Many figures

### A crash course in probability and Naïve Bayes classification

Probability theory A crash course in probability and Naïve Bayes classification Chapter 9 Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s

### Artificial Intelligence Mar 27, Bayesian Networks 1 P (T D)P (D) + P (T D)P ( D) =

Artificial Intelligence 15-381 Mar 27, 2007 Bayesian Networks 1 Recap of last lecture Probability: precise representation of uncertainty Probability theory: optimal updating of knowledge based on new information

### Relational Dynamic Bayesian Networks: a report. Cristina Manfredotti

Relational Dynamic Bayesian Networks: a report Cristina Manfredotti Dipartimento di Informatica, Sistemistica e Comunicazione (D.I.S.Co.) Università degli Studi Milano-Bicocca manfredotti@disco.unimib.it

### Probability & Statistics Primer Gregory J. Hakim University of Washington 2 January 2009 v2.0

Probability & Statistics Primer Gregory J. Hakim University of Washington 2 January 2009 v2.0 This primer provides an overview of basic concepts and definitions in probability and statistics. We shall

### 3. The Junction Tree Algorithms

A Short Course on Graphical Models 3. The Junction Tree Algorithms Mark Paskin mark@paskin.org 1 Review: conditional independence Two random variables X and Y are independent (written X Y ) iff p X ( )

### Random Variables, Expectation, Distributions

Random Variables, Expectation, Distributions CS 5960/6960: Nonparametric Methods Tom Fletcher January 21, 2009 Review Random Variables Definition A random variable is a function defined on a probability

### Probabilistic Graphical Models

Probabilistic Graphical Models Raquel Urtasun and Tamir Hazan TTI Chicago April 4, 2011 Raquel Urtasun and Tamir Hazan (TTI-C) Graphical Models April 4, 2011 1 / 22 Bayesian Networks and independences

### Outline. Probability. Random Variables. Joint and Marginal Distributions Conditional Distribution Product Rule, Chain Rule, Bayes Rule.

Probability 1 Probability Random Variables Outline Joint and Marginal Distributions Conditional Distribution Product Rule, Chain Rule, Bayes Rule Inference Independence 2 Inference in Ghostbusters A ghost

### Bayesian vs. Markov Networks

Bayesian vs. Markov Networks Le Song Machine Learning II: dvanced Topics CSE 8803ML, Spring 2012 Conditional Independence ssumptions Local Markov ssumption Global Markov ssumption 𝑋 𝑁𝑜𝑛𝑑𝑒𝑠𝑐𝑒𝑛𝑑𝑎𝑛𝑡𝑋 𝑃𝑎𝑋

### 5. Continuous Random Variables

5. Continuous Random Variables Continuous random variables can take any value in an interval. They are used to model physical characteristics such as time, length, position, etc. Examples (i) Let X be

### Probabilistic Networks An Introduction to Bayesian Networks and Influence Diagrams

Probabilistic Networks An Introduction to Bayesian Networks and Influence Diagrams Uffe B. Kjærulff Department of Computer Science Aalborg University Anders L. Madsen HUGIN Expert A/S 10 May 2005 2 Contents

### Joint Probability Distributions and Random Samples. Week 5, 2011 Stat 4570/5570 Material from Devore s book (Ed 8), and Cengage

5 Joint Probability Distributions and Random Samples Week 5, 2011 Stat 4570/5570 Material from Devore s book (Ed 8), and Cengage Two Discrete Random Variables The probability mass function (pmf) of a single

### Expectation Discrete RV - weighted average Continuous RV - use integral to take the weighted average

PHP 2510 Expectation, variance, covariance, correlation Expectation Discrete RV - weighted average Continuous RV - use integral to take the weighted average Variance Variance is the average of (X µ) 2

### 22c:145 Artificial Intelligence

22c:145 Artificial Intelligence Fall 2005 Uncertainty Cesare Tinelli The University of Iowa Copyright 2001-05 Cesare Tinelli and Hantao Zhang. a a These notes are copyrighted material and may not be used

### The Joint Probability Distribution (JPD) of a set of n binary variables involve a huge number of parameters

DEFINING PROILISTI MODELS The Joint Probability Distribution (JPD) of a set of n binary variables involve a huge number of parameters 2 n (larger than 10 25 for only 100 variables). x y z p(x, y, z) 0

### Bayesian probability theory

Bayesian probability theory Bruno A. Olshausen arch 1, 2004 Abstract Bayesian probability theory provides a mathematical framework for peforming inference, or reasoning, using probability. The foundations

### Section 6.1 Joint Distribution Functions

Section 6.1 Joint Distribution Functions We often care about more than one random variable at a time. DEFINITION: For any two random variables X and Y the joint cumulative probability distribution function

### ST 371 (IV): Discrete Random Variables

ST 371 (IV): Discrete Random Variables 1 Random Variables A random variable (rv) is a function that is defined on the sample space of the experiment and that assigns a numerical variable to each possible

### For a partition B 1,..., B n, where B i B j = for i. A = (A B 1 ) (A B 2 ),..., (A B n ) and thus. P (A) = P (A B i ) = P (A B i )P (B i )

Probability Review 15.075 Cynthia Rudin A probability space, defined by Kolmogorov (1903-1987) consists of: A set of outcomes S, e.g., for the roll of a die, S = {1, 2, 3, 4, 5, 6}, 1 1 2 1 6 for the roll

### Intelligent Systems: Reasoning and Recognition. Uncertainty and Plausible Reasoning

Intelligent Systems: Reasoning and Recognition James L. Crowley ENSIMAG 2 / MoSIG M1 Second Semester 2015/2016 Lesson 12 25 march 2015 Uncertainty and Plausible Reasoning MYCIN (continued)...2 Backward

### Random variables P(X = 3) = P(X = 3) = 1 8, P(X = 1) = P(X = 1) = 3 8.

Random variables Remark on Notations 1. When X is a number chosen uniformly from a data set, What I call P(X = k) is called Freq[k, X] in the courseware. 2. When X is a random variable, what I call F ()

### The intuitions in examples from last time give us D-separation and inference in belief networks

CSC384: Lecture 11 Variable Elimination Last time The intuitions in examples from last time give us D-separation and inference in belief networks a simple inference algorithm for networks without Today

### Querying Joint Probability Distributions

Querying Joint Probability Distributions Sargur Srihari srihari@cedar.buffalo.edu 1 Queries of Interest Probabilistic Graphical Models (BNs and MNs) represent joint probability distributions over multiple

### Markov random fields and Gibbs measures

Chapter Markov random fields and Gibbs measures 1. Conditional independence Suppose X i is a random element of (X i, B i ), for i = 1, 2, 3, with all X i defined on the same probability space (.F, P).

### Chapter 4 Lecture Notes

Chapter 4 Lecture Notes Random Variables October 27, 2015 1 Section 4.1 Random Variables A random variable is typically a real-valued function defined on the sample space of some experiment. For instance,

### Welcome to Stochastic Processes 1. Welcome to Aalborg University No. 1 of 31

Welcome to Stochastic Processes 1 Welcome to Aalborg University No. 1 of 31 Welcome to Aalborg University No. 2 of 31 Course Plan Part 1: Probability concepts, random variables and random processes Lecturer:

### Contents. TTM4155: Teletraffic Theory (Teletrafikkteori) Probability Theory Basics. Yuming Jiang. Basic Concepts Random Variables

TTM4155: Teletraffic Theory (Teletrafikkteori) Probability Theory Basics Yuming Jiang 1 Some figures taken from the web. Contents Basic Concepts Random Variables Discrete Random Variables Continuous Random

### Introduction to Probability

Introduction to Probability EE 179, Lecture 15, Handout #24 Probability theory gives a mathematical characterization for experiments with random outcomes. coin toss life of lightbulb binary data sequence

### P (x) 0. Discrete random variables Expected value. The expected value, mean or average of a random variable x is: xp (x) = v i P (v i )

Discrete random variables Probability mass function Given a discrete random variable X taking values in X = {v 1,..., v m }, its probability mass function P : X [0, 1] is defined as: P (v i ) = Pr[X =

### What is Statistics? Lecture 1. Introduction and probability review. Idea of parametric inference

0. 1. Introduction and probability review 1.1. What is Statistics? What is Statistics? Lecture 1. Introduction and probability review There are many definitions: I will use A set of principle and procedures

### Continuous Random Variables

Probability 2 - Notes 7 Continuous Random Variables Definition. A random variable X is said to be a continuous random variable if there is a function f X (x) (the probability density function or p.d.f.)

### Probability distributions

Probability distributions (Notes are heavily adapted from Harnett, Ch. 3; Hayes, sections 2.14-2.19; see also Hayes, Appendix B.) I. Random variables (in general) A. So far we have focused on single events,

### CS 331: Artificial Intelligence Fundamentals of Probability II. Joint Probability Distribution. Marginalization. Marginalization

Full Joint Probability Distributions CS 331: Artificial Intelligence Fundamentals of Probability II Toothache Cavity Catch Toothache, Cavity, Catch false false false 0.576 false false true 0.144 false

### p if x = 1 1 p if x = 0

Probability distributions Bernoulli distribution Two possible values (outcomes): 1 (success), 0 (failure). Parameters: p probability of success. Probability mass function: P (x; p) = { p if x = 1 1 p if

### Data Mining Chapter 6: Models and Patterns Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

Data Mining Chapter 6: Models and Patterns Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Models vs. Patterns Models A model is a high level, global description of a

### Course: Model, Learning, and Inference: Lecture 5

Course: Model, Learning, and Inference: Lecture 5 Alan Yuille Department of Statistics, UCLA Los Angeles, CA 90095 yuille@stat.ucla.edu Abstract Probability distributions on structured representation.

### Probability for Estimation (review)

Probability for Estimation (review) In general, we want to develop an estimator for systems of the form: x = f x, u + η(t); y = h x + ω(t); ggggg y, ffff x We will primarily focus on discrete time linear

### COS513 LECTURE 5 JUNCTION TREE ALGORITHM

COS513 LECTURE 5 JUNCTION TREE ALGORITHM D. EIS AND V. KOSTINA The junction tree algorithm is the culmination of the way graph theory and probability combine to form graphical models. After we discuss

### MATH 10: Elementary Statistics and Probability Chapter 5: Continuous Random Variables

MATH 10: Elementary Statistics and Probability Chapter 5: Continuous Random Variables Tony Pourmohamad Department of Mathematics De Anza College Spring 2015 Objectives By the end of this set of slides,

### STAT 430/510 Probability Lecture 14: Joint Probability Distribution, Continuous Case

STAT 430/510 Probability Lecture 14: Joint Probability Distribution, Continuous Case Pengyuan (Penelope) Wang June 20, 2011 Joint density function of continuous Random Variable When X and Y are two continuous

### Model-based Synthesis. Tony O Hagan

Model-based Synthesis Tony O Hagan Stochastic models Synthesising evidence through a statistical model 2 Evidence Synthesis (Session 3), Helsinki, 28/10/11 Graphical modelling The kinds of models that

### 4. Joint Distributions

Virtual Laboratories > 2. Distributions > 1 2 3 4 5 6 7 8 4. Joint Distributions Basic Theory As usual, we start with a random experiment with probability measure P on an underlying sample space. Suppose

### MAS108 Probability I

1 QUEEN MARY UNIVERSITY OF LONDON 2:30 pm, Thursday 3 May, 2007 Duration: 2 hours MAS108 Probability I Do not start reading the question paper until you are instructed to by the invigilators. The paper

### Continuous Random Variables

Continuous Random Variables COMP 245 STATISTICS Dr N A Heard Contents 1 Continuous Random Variables 2 11 Introduction 2 12 Probability Density Functions 3 13 Transformations 5 2 Mean, Variance and Quantiles

### Continuous Random Variables and Probability Distributions. Stat 4570/5570 Material from Devore s book (Ed 8) Chapter 4 - and Cengage

4 Continuous Random Variables and Probability Distributions Stat 4570/5570 Material from Devore s book (Ed 8) Chapter 4 - and Cengage Continuous r.v. A random variable X is continuous if possible values

### 3 Multiple Discrete Random Variables

3 Multiple Discrete Random Variables 3.1 Joint densities Suppose we have a probability space (Ω, F,P) and now we have two discrete random variables X and Y on it. They have probability mass functions f

### STA 256: Statistics and Probability I

Al Nosedal. University of Toronto. Fall 2014 1 2 3 4 5 My momma always said: Life was like a box of chocolates. You never know what you re gonna get. Forrest Gump. Experiment, outcome, sample space, and

### Set theory, and set operations

1 Set theory, and set operations Sayan Mukherjee Motivation It goes without saying that a Bayesian statistician should know probability theory in depth to model. Measure theory is not needed unless we

### Probability Theory. Elementary rules of probability Sum rule. Product rule. p. 23

Probability Theory Uncertainty is key concept in machine learning. Probability provides consistent framework for the quantification and manipulation of uncertainty. Probability of an event is the fraction

### Lecture 4: BK inequality 27th August and 6th September, 2007

CSL866: Percolation and Random Graphs IIT Delhi Amitabha Bagchi Scribe: Arindam Pal Lecture 4: BK inequality 27th August and 6th September, 2007 4. Preliminaries The FKG inequality allows us to lower bound

### Bayesian Updating with Discrete Priors Class 11, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom

1 Learning Goals Bayesian Updating with Discrete Priors Class 11, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom 1. Be able to apply Bayes theorem to compute probabilities. 2. Be able to identify

### The Exponential Family

The Exponential Family David M. Blei Columbia University November 3, 2015 Definition A probability density in the exponential family has this form where p.x j / D h.x/ expf > t.x/ a./g; (1) is the natural

### Question 2 Naïve Bayes (16 points)

Question 2 Naïve Bayes (16 points) About 2/3 of your email is spam so you downloaded an open source spam filter based on word occurrences that uses the Naive Bayes classifier. Assume you collected the

### Discrete Mathematics and Probability Theory Fall 2009 Satish Rao,David Tse Note 11

CS 70 Discrete Mathematics and Probability Theory Fall 2009 Satish Rao,David Tse Note Conditional Probability A pharmaceutical company is marketing a new test for a certain medical condition. According

### Finite and discrete probability distributions

8 Finite and discrete probability distributions To understand the algorithmic aspects of number theory and algebra, and applications such as cryptography, a firm grasp of the basics of probability theory

### Lecture 6: Discrete & Continuous Probability and Random Variables

Lecture 6: Discrete & Continuous Probability and Random Variables D. Alex Hughes Math Camp September 17, 2015 D. Alex Hughes (Math Camp) Lecture 6: Discrete & Continuous Probability and Random September

### Summary of Probability

Summary of Probability Mathematical Physics I Rules of Probability The probability of an event is called P(A), which is a positive number less than or equal to 1. The total probability for all possible

### Q1. [19 pts] Cheating at cs188-blackjack

CS 188 Spring 2012 Introduction to Artificial Intelligence Practice Midterm II Solutions Q1. [19 pts] Cheating at cs188-blackjack Cheating dealers have become a serious problem at the cs188-blackjack tables.

### University of California, Los Angeles Department of Statistics. Random variables

University of California, Los Angeles Department of Statistics Statistics Instructor: Nicolas Christou Random variables Discrete random variables. Continuous random variables. Discrete random variables.

### The sample space for a pair of die rolls is the set. The sample space for a random number between 0 and 1 is the interval [0, 1].

Probability Theory Probability Spaces and Events Consider a random experiment with several possible outcomes. For example, we might roll a pair of dice, flip a coin three times, or choose a random real

### Examples and Proofs of Inference in Junction Trees

Examples and Proofs of Inference in Junction Trees Peter Lucas LIAC, Leiden University February 4, 2016 1 Representation and notation Let P(V) be a joint probability distribution, where V stands for a

### Joint Probability Distributions and Random Samples (Devore Chapter Five)

Joint Probability Distributions and Random Samples (Devore Chapter Five) 1016-345-01 Probability and Statistics for Engineers Winter 2010-2011 Contents 1 Joint Probability Distributions 1 1.1 Two Discrete

### Jointly Distributed Random Variables

Jointly Distributed Random Variables COMP 245 STATISTICS Dr N A Heard Contents 1 Jointly Distributed Random Variables 1 1.1 Definition......................................... 1 1.2 Joint cdfs..........................................

### Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Summary of Formulas and Concepts Descriptive Statistics (Ch. 1-4) Definitions Population: The complete set of numerical information on a particular quantity in which an investigator is interested. We assume

### Statistical Inference. Prof. Kate Calder. If the coin is fair (chance of heads = chance of tails) then

Probability Statistical Inference Question: How often would this method give the correct answer if I used it many times? Answer: Use laws of probability. 1 Example: Tossing a coin If the coin is fair (chance

### Uncertainty in AI. Uncertainty I: Probability as Degree of Belief. The (predominantly logic-based) methods covered so far have assorted

We now examine: Uncertainty I: Probability as Degree of Belief how probability theory might be used to represent and reason with knowledge when we are uncertain about the world; how inference in the presence

### Bayesian Tutorial (Sheet Updated 20 March)

Bayesian Tutorial (Sheet Updated 20 March) Practice Questions (for discussing in Class) Week starting 21 March 2016 1. What is the probability that the total of two dice will be greater than 8, given that

### Definition: Suppose that two random variables, either continuous or discrete, X and Y have joint density

HW MATH 461/561 Lecture Notes 15 1 Definition: Suppose that two random variables, either continuous or discrete, X and Y have joint density and marginal densities f(x, y), (x, y) Λ X,Y f X (x), x Λ X,

### ST 371 (VIII): Theory of Joint Distributions

ST 371 (VIII): Theory of Joint Distributions So far we have focused on probability distributions for single random variables. However, we are often interested in probability statements concerning two or

### Business Statistics 41000: Probability 1

Business Statistics 41000: Probability 1 Drew D. Creal University of Chicago, Booth School of Business Week 3: January 24 and 25, 2014 1 Class information Drew D. Creal Email: dcreal@chicagobooth.edu Office:

### Introduction to Markov Chain Monte Carlo

Introduction to Markov Chain Monte Carlo Monte Carlo: sample from a distribution to estimate the distribution to compute max, mean Markov Chain Monte Carlo: sampling using local information Generic problem

### Important Probability Distributions OPRE 6301

Important Probability Distributions OPRE 6301 Important Distributions... Certain probability distributions occur with such regularity in real-life applications that they have been given their own names.

### Chapter 10: Introducing Probability

Chapter 10: Introducing Probability Randomness and Probability So far, in the first half of the course, we have learned how to examine and obtain data. Now we turn to another very important aspect of Statistics

### P(a X b) = f X (x)dx. A p.d.f. must integrate to one: f X (x)dx = 1. Z b

Continuous Random Variables The probability that a continuous random variable, X, has a value between a and b is computed by integrating its probability density function (p.d.f.) over the interval [a,b]:

### Probabilities and Random Variables

Probabilities and Random Variables This is an elementary overview of the basic concepts of probability theory. 1 The Probability Space The purpose of probability theory is to model random experiments so

### MATH 201. Final ANSWERS August 12, 2016

MATH 01 Final ANSWERS August 1, 016 Part A 1. 17 points) A bag contains three different types of dice: four 6-sided dice, five 8-sided dice, and six 0-sided dice. A die is drawn from the bag and then rolled.

### Exercises with solutions (1)

Exercises with solutions (). Investigate the relationship between independence and correlation. (a) Two random variables X and Y are said to be correlated if and only if their covariance C XY is not equal

### MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 5 9/17/2008 RANDOM VARIABLES

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 5 9/17/2008 RANDOM VARIABLES Contents 1. Random variables and measurable functions 2. Cumulative distribution functions 3. Discrete