Save this PDF as:

Size: px
Start display at page:

## Transcription

2 You will be expected to know Basic concepts and vocabulary of Bayesian networks. Nodes represent random variables. Directed arcs represent (informally) direct influences. Conditional probability tables, P( Xi Parents(Xi) ). Given a Bayesian network: Write down the full joint distribution it represents. Given a full joint distribution in factored form: Draw the Bayesian network that represents it. Given a variable ordering and some background assertions of conditional independence among the variables: Write down the factored form of the full joint distribution, as simplified by the conditional independence assertions.

3 Faith, and it s an uncertain world entirely. --- Errol Flynn, Captain Blood (1935) = We need probability theory for our agents!! The world is chaos!! Could you design a logical agent that does the right thing below??

4 Extended example of 3-way Bayesian Networks Conditionally independent effects: p(a,b,c) = p(b A)p(C A)p(A) A B and C are conditionally independent Given A B C E.g., A is a disease, and we model B and C as conditionally independent symptoms given A E.g., A is Fire, B is Heat, C is Smoke. Where there s Smoke, there s Fire. If we see Smoke, we can infer Fire. If we see Smoke, observing Heat tells us very little additional information.

5 Extended example of 3-way Bayesian Networks Suppose I build a fire in my fireplace about once every 10 days B= Smoke Fire P(Smoke) t f P(fire) 0.1 A= Fire C= Heat Fire P(Heat) t f Conditionally independent effects: P(A,B,C) = P(B A)P(C A)P(A) Smoke and Heat are conditionally independent given Fire. If we see B=Smoke, observing C=Heat tells us very little additional information.

6 Extended example of 3-way Bayesian Networks P(Fire) 0. 1 What is P(Fire=t Smoke=t)? P(Fire=t Smoke=t) =P(Fire=t & Smoke=t) / P(Smoke=t) A= Fire B= Smoke Fire P(Smoke) t f C= Heat Fire P(Heat) t f

7 Extended example of 3-way Bayesian Networks P(Fire) 0. 1 A= Fire What is P(Fire=t & Smoke=t)? P(Fire=t & Smoke=t) =Σ_heat P(Fire=t&Smoke=t&heat) =Σ_heat P(Smoke=t&heat Fire=t)P(Fire=t) =Σ_heat P(Smoke=t Fire=t) P(heat Fire=t)P(Fire=t) =P(Smoke=t Fire=t) P(heat=t Fire=t)P(Fire=t) +P(Smoke=t Fire=t)P(heat=f Fire=t)P(Fire=t) = (.90x.99x.1)+(.90x.01x.1) = 0.09 B= Smoke Fire P(Smoke) t f C= Heat Fire P(Heat) t f

8 Extended example of 3-way Bayesian Networks B= Smoke Fire P(Smoke) t f P(Fire) 0. 1 A= Fire C= Heat Fire P(Heat) t f What is P(Smoke=t)? P(Smoke=t) =Σ_fire Σ_heat P(Smoke=t&fire&heat) =Σ_fire Σ_heat P(Smoke=t&heat fire)p(fire) =Σ_fire Σ_heat P(Smoke=t fire) P(heat fire)p(fire) =P(Smoke=t fire=t) P(heat=t fire=t)p(fire=t) +P(Smoke=t fire=t)p(heat=f fire=t)p(fire=t) +P(Smoke=t fire=f) P(heat=t fire=f)p(fire=f) +P(Smoke=t fire=f)p(heat=f fire=f)p(fire=f) = (.90x.99x.1)+(.90x.01x.1) +(.001x.0001x.9)+(.001x.9999x.9)

9 Extended example of 3-way Bayesian Networks P(Fire) 0. 1 A= Fire What is P(Fire=t Smoke=t)? P(Fire=t Smoke=t) =P(Fire=t & Smoke=t) / P(Smoke=t) 0.09 / So we ve just proven that Where there s smoke, there s (probably) fire. B= Smoke Fire P(Smoke) t f C= Heat Fire P(Heat) t f

10 Bayesian Network A Bayesian network specifies a joint distribution in a structured form: A B p(a,b,c) = p(c A,B)p(A)p(B) C Dependence/independence represented via a directed graph: Node Directed Edge Absence of Edge = random variable = conditional dependence = conditional independence Allows concise view of joint distribution relationships: Graph nodes and edges show conditional relationships between variables. Tables provide probability data.

11 Bayesian Networks Structure of the graph Conditional independence relations In general, p(x 1, X 2,...X N ) = Π p(x i parents(x i ) ) The full joint distribution The graph-structured approximation Requires that graph is acyclic (no directed cycles) 2 components to a Bayesian network The graph structure (conditional independence assumptions) The numerical probabilities (for each variable given its parents) Also known as belief networks, graphical models, causal networks

12 Examples of 3-way Bayesian Networks A B C Marginal Independence: p(a,b,c) = p(a) p(b) p(c)

13 Examples of 3-way Bayesian Networks A B Independent Causes: p(a,b,c) = p(c A,B)p(A)p(B) C Explaining away effect: Given C, observing A makes B less likely e.g., earthquake/burglary/alarm example A and B are (marginally) independent but become dependent once C is known

14 Examples of 3-way Bayesian Networks A B C Markov dependence: p(a,b,c) = p(c B) p(b A)p(A)

15 Burglar Alarm Example Consider the following 5 binary variables: B = a burglary occurs at your house E = an earthquake occurs at your house A = the alarm goes off J = John calls to report the alarm M = Mary calls to report the alarm What is P(B M, J)? (for example) We can use the full joint distribution to answer this question Requires 2 5 = 32 probabilities Can we use prior domain knowledge to come up with a Bayesian network that requires fewer probabilities?

16 The Desired Bayesian Network Only requires 10 probabilities!

17 Constructing a Bayesian Network: Step 1 Order the variables in terms of influence (may be a partial order) e.g., {E, B} -> {A} -> {J, M} P(J, M, A, E, B) = P(J, M A, E, B) P(A E, B) P(E, B) P(J, M A) P(A E, B) P(E) P(B) P(J A) P(M A) P(A E, B) P(E) P(B) These conditional independence assumptions are reflected in the graph structure of the Bayesian network

18 Constructing this Bayesian Network: Step 2 P(J, M, A, E, B) = P(J A) P(M A) P(A E, B) P(E) P(B) There are 3 conditional probability tables (CPDs) to be determined: P(J A), P(M A), P(A E, B) Requiring = 8 probabilities And 2 marginal probabilities P(E), P(B) -> 2 more probabilities Where do these probabilities come from? Expert knowledge From data (relative frequency estimates) Or a combination of both - see discussion in Section 20.1 and 20.2 (optional)

19 The Resulting Bayesian Network

20 Example of Answering a Probability Query So, what is P(B M, J)? E.g., say, P(b m, j), i.e., P(B=true M=true J=false) P(b m, j) = P(b, m, j) / P(m, j) ;by definition P(b, m, j) = ΣA {a, a}σe {e, e} P( j, m, A, E, b) ;marginal P(J, M, A, E, B) P(J A) P(M A) P(A E, B) P(E) P(B) ; conditional indep. P( j, m, A, E, b) P( j A) P(m A) P(A E, b) P(E) P(b) Say, work the case A=a E= e P( j, m, a, e, b) P( j a) P(m a) P(a e, b) P( e) P(b) 0.10 x 0.70 x 0.94 x x Similar for the cases of a e, a e, a e. Similar for P(m, j). Then just divide to get P(b m, j).

21 Inference in Bayesian Networks X = { X1, X2,, Xk } = query variables of interest E = { E1,, El } = evidence variables that are observed (e, an event) Y = { Y1,, Ym } = hidden variables (nonevidence, nonquery) What is the posterior distribution of X, given E? P( X e ) = α Σ y P( X, y, e ) What is the most likely assignment of values to X, given E? argmax x P( x e ) = argmax x Σ y P( x, y, e )

22 P(A).05 Disease1 Inference in Bayesian Networks A A B P(C A,B) t t.95 t f.90 f t.90 f f.005 TempReg P(B).02 Disease2 C D Simple Example B C P(D C) t.95 f.002 Fever } } } Query Variables A, B Hidden Variable C Evidence Variable D Note: Not an anatomically correct model of how diseases cause fever! Suppose that two different diseases influence some imaginary internal body temperature regulator, which in turn influences whether fever is present.

23 P(A).05 Disease1 Inference in Bayesian Networks A A B P(C A,B) t t.95 t f.90 f t.90 f f.005 TempReg P(B).02 Disease2 C D Simple Example B C P(D C) t.95 f.002 Fever What is the posterior conditional distribution of our query variables, given that fever was observed? P(A,B d) = α Σ c P(A,B,c,d) = α Σ c P(A)P(B)P(c A,B)P(d c) = α P(A)P(B) Σ c P(c A,B)P(d c) P(a,b d) = α P(a)P(b) Σ c P(c a,b)p(d c) = α P(a)P(b){ P(c a,b)p(d c)+p( c a,b)p(d c) } = α.05x.02x{.95x x.002} α P( a,b d) = α P( a)p(b) Σ c P(c a,b)p(d c) = α P( a)p(b){ P(c a,b)p(d c)+p( c a,b)p(d c) } = α.95x.02x{.90x x.002} α P(a, b d) = α P(a)P( b) Σ c P(c a, b)p(d c) = α P(a)P( b){ P(c a, b)p(d c)+p( c a, b)p(d c) } = α.05x.98x{.90x x.002} α P( a, b d) = α P( a)p( b) Σ c P(c a, b)p(d c) = α P( a)p( b){ P(c a, b)p(d c)+p( c a, b)p(d c) } = α.95x.98x{.005x x.002} α α 1 / ( ) 1 /

24 P(A).05 Disease1 Inference in Bayesian Networks A A B P(C A,B) t t.95 t f.90 f t.90 f f.005 TempReg P(B).02 Disease2 C D Simple Example P(a,b d) = α P(a)P(b) Σ c P(c a,b)p(d c) = α P(a)P(b){ P(c a,b)p(d c)+p( c a,b)p(d c) } = α.05x.02x{.95x x.002} α P( a,b d) = α P( a)p(b) Σ c P(c a,b)p(d c) = α P( a)p(b){ P(c a,b)p(d c)+p( c a,b)p(d c) } = α.95x.02x{.90x x.002} α P(a, b d) = α P(a)P( b) Σ c P(c a, b)p(d c) = α P(a)P( b){ P(c a, b)p(d c)+p( c a, b)p(d c) } = α.05x.98x{.90x x.002} α P( a, b d) = α P( a)p( b) Σ c P(c a, b)p(d c) = α P( a)p( b){ P(c a, b)p(d c)+p( c a, b)p(d c) } = α.95x.98x{.005x x.002} α α 1 / ( ) 1 / B C P(D C) t.95 f.002 Fever What is the most likely posterior conditional assignment of values to our query variables, given that fever was observed? argmax {a,b} P( a, b d ) = argmax {a,b} Σ c P( a,b,c,d ) = { a, b }

25 P(A).05 Disease1 Inference in Bayesian Networks A A B P(C A,B) t t.95 t f.90 f t.90 f f.005 TempReg P(B).02 Disease2 C D Simple Example B C P(D C) t.95 f.002 Fever What is the posterior conditional distribution of A, given that fever was observed? (I.e., temporarily make B into a hidden variable.) We can use P(A,B d) from above. P(A d) = α Σ b P(A,b d) P(a d) = Σ b P(a,b d) = P(a,b d)+p(a, b d) = ( ).656 P( a d) = Σ b P( a,b d) = P( a,b d)+p( a, b d) = ( ).344 A B P(A,B d) from above t t.014 f t.248 t f.642 f f.096 This is a marginalization, so we expect from theory that α = 1; but check for round-off error.

26 P(A).05 Disease1 Inference in Bayesian Networks A A B P(C A,B) t t.95 t f.90 f t.90 f f.005 TempReg P(B).02 Disease2 C D Simple Example B C P(D C) t.95 f.002 Fever What is the posterior conditional distribution of A, given that fever was observed, and that further lab tests definitely rule out Disease2? (I.e., temporarily make B into an evidence variable with B = false.) P(A b,d) = α P(A, b d) P(a b,d) = α P(a, b d) α P( a b,d) = α P( a, b d) α A B P(A,B d) from above t t.014 f t.248 t f.642 f f.096 α 1 / ( ) 1 /

27 General Strategy for inference Want to compute P(q e) Step 1: P(q e) = P(q,e)/P(e) = α P(q,e), since P(e) is constant wrt Q Step 2: P(q,e) = Σ a..z P(q, e, a, b,. z), by the law of total probability Step 3: Σ a..z P(q, e, a, b,. z) = Σ a..z Π i P(variable i parents i) (using Bayesian network factoring) Step 4: Distribute summations across product terms for efficient computation Section 14.4 discusses exact inference in Bayesian Networks. The complexity depends strongly on the network structure. The general case is intractable, but there are things you can do. Section 14.5 discusses approximation by sampling.

28 Number of Probabilities in Bayesian Networks Consider n binary variables Unconstrained joint distribution requires O(2 n ) probabilities If we have a Bayesian network, with a maximum of k parents for any node, then we need O(n 2 k ) probabilities Example Full unconstrained joint distribution n = 30, k = 4: need 10 9 probabilities for full joint distribution Bayesian network n = 30, k = 4: need 480 probabilities

29 The Bayesian Network from a different Variable Ordering

30 The Bayesian Network from a different Variable Ordering

31 Given a graph, can we read off conditional independencies? The Markov Blanket of X (the gray area in the figure) X is conditionally independent of everything else, GIVEN the values of: * X s parents * X s children * X s children s parents X is conditionally independent of its non-descendants, GIVEN the values of its parents.

32 Naïve Bayes Model X 1 X 2 X 3 X n C P(C X 1,,X n ) = α Π P(X i C) P (C) Features X are conditionally independent given the class variable C Widely used in machine learning e.g., spam classification: X s = counts of words in s Probabilities P(C) and P(Xi C) can easily be estimated from labeled data

33 Naïve Bayes Model (2) P(C X 1, X n ) = α Π P(X i C) P (C) Probabilities P(C) and P(Xi C) can easily be estimated from labeled data P(C = cj) #(Examples with class label cj) / #(Examples) P(Xi = xik C = cj) #(Examples with Xi value xik and class label cj) / #(Examples with class label cj) Usually easiest to work with logs log [ P(C X 1, X n ) ] = log α + Σ [ log P(X i C) + log P (C) ] DANGER: Suppose ZERO examples with Xi value xik and class label cj? An unseen example with Xi value xik will NEVER predict class label cj! Practical solutions: Pseudocounts, e.g., add 1 to every #(), etc. Theoretical solutions: Bayesian inference, beta distribution, etc.

34 Hidden Markov Model (HMM) Y 1 Y 2 Y 3 Y n Observed S 1 S 2 S 3 S n Hidden Two key assumptions: 1. hidden state sequence is Markov 2. observation Y t is CI of all other variables given S t Widely used in speech recognition, protein sequence models Since this is a Bayesian network polytree, inference is linear in n

35 Summary Bayesian networks represent a joint distribution using a graph The graph encodes a set of conditional independence assumptions Answering queries (or inference or reasoning) in a Bayesian network amounts to efficient computation of appropriate conditional probabilities Probabilistic inference is intractable in the general case But can be carried out in linear time for certain classes of Bayesian networks

### Bayesian Networks. Mausam (Slides by UW-AI faculty)

Bayesian Networks Mausam (Slides by UW-AI faculty) Bayes Nets In general, joint distribution P over set of variables (X 1 x... x X n ) requires exponential space for representation & inference BNs provide

### Bayesian Networks Chapter 14. Mausam (Slides by UW-AI faculty & David Page)

Bayesian Networks Chapter 14 Mausam (Slides by UW-AI faculty & David Page) Bayes Nets In general, joint distribution P over set of variables (X 1 x... x X n ) requires exponential space for representation

### Logic, Probability and Learning

Logic, Probability and Learning Luc De Raedt luc.deraedt@cs.kuleuven.be Overview Logic Learning Probabilistic Learning Probabilistic Logic Learning Closely following : Russell and Norvig, AI: a modern

### Lecture 2: Introduction to belief (Bayesian) networks

Lecture 2: Introduction to belief (Bayesian) networks Conditional independence What is a belief network? Independence maps (I-maps) January 7, 2008 1 COMP-526 Lecture 2 Recall from last time: Conditional

### 13.3 Inference Using Full Joint Distribution

191 The probability distribution on a single variable must sum to 1 It is also true that any joint probability distribution on any set of variables must sum to 1 Recall that any proposition a is equivalent

### Artificial Intelligence Mar 27, Bayesian Networks 1 P (T D)P (D) + P (T D)P ( D) =

Artificial Intelligence 15-381 Mar 27, 2007 Bayesian Networks 1 Recap of last lecture Probability: precise representation of uncertainty Probability theory: optimal updating of knowledge based on new information

### Probability, Conditional Independence

Probability, Conditional Independence June 19, 2012 Probability, Conditional Independence Probability Sample space Ω of events Each event ω Ω has an associated measure Probability of the event P(ω) Axioms

### Querying Joint Probability Distributions

Querying Joint Probability Distributions Sargur Srihari srihari@cedar.buffalo.edu 1 Queries of Interest Probabilistic Graphical Models (BNs and MNs) represent joint probability distributions over multiple

### Life of A Knowledge Base (KB)

Life of A Knowledge Base (KB) A knowledge base system is a special kind of database management system to for knowledge base management. KB extraction: knowledge extraction using statistical models in NLP/ML

### CS 188: Artificial Intelligence. Probability recap

CS 188: Artificial Intelligence Bayes Nets Representation and Independence Pieter Abbeel UC Berkeley Many slides over this course adapted from Dan Klein, Stuart Russell, Andrew Moore Conditional probability

### Cell Phone based Activity Detection using Markov Logic Network

Cell Phone based Activity Detection using Markov Logic Network Somdeb Sarkhel sxs104721@utdallas.edu 1 Introduction Mobile devices are becoming increasingly sophisticated and the latest generation of smart

### The Basics of Graphical Models

The Basics of Graphical Models David M. Blei Columbia University October 3, 2015 Introduction These notes follow Chapter 2 of An Introduction to Probabilistic Graphical Models by Michael Jordan. Many figures

### Relational Dynamic Bayesian Networks: a report. Cristina Manfredotti

Relational Dynamic Bayesian Networks: a report Cristina Manfredotti Dipartimento di Informatica, Sistemistica e Comunicazione (D.I.S.Co.) Università degli Studi Milano-Bicocca manfredotti@disco.unimib.it

### A crash course in probability and Naïve Bayes classification

Probability theory A crash course in probability and Naïve Bayes classification Chapter 9 Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s

### Artificial Intelligence. Conditional probability. Inference by enumeration. Independence. Lesson 11 (From Russell & Norvig)

Artificial Intelligence Conditional probability Conditional or posterior probabilities e.g., cavity toothache) = 0.8 i.e., given that toothache is all I know tation for conditional distributions: Cavity

### Christfried Webers. Canberra February June 2015

c Statistical Group and College of Engineering and Computer Science Canberra February June (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 829 c Part VIII Linear Classification 2 Logistic

### Chapter 28. Bayesian Networks

Chapter 28. Bayesian Networks The Quest for Artificial Intelligence, Nilsson, N. J., 2009. Lecture Notes on Artificial Intelligence, Spring 2012 Summarized by Kim, Byoung-Hee and Lim, Byoung-Kwon Biointelligence

### p if x = 1 1 p if x = 0

Probability distributions Bernoulli distribution Two possible values (outcomes): 1 (success), 0 (failure). Parameters: p probability of success. Probability mass function: P (x; p) = { p if x = 1 1 p if

### Decision Trees and Networks

Lecture 21: Uncertainty 6 Today s Lecture Victor R. Lesser CMPSCI 683 Fall 2010 Decision Trees and Networks Decision Trees A decision tree is an explicit representation of all the possible scenarios from

10-601 Machine Learning http://www.cs.cmu.edu/afs/cs/academic/class/10601-f10/index.html Course data All up-to-date info is on the course web page: http://www.cs.cmu.edu/afs/cs/academic/class/10601-f10/index.html

### Probabilistic Graphical Models

Probabilistic Graphical Models Raquel Urtasun and Tamir Hazan TTI Chicago April 4, 2011 Raquel Urtasun and Tamir Hazan (TTI-C) Graphical Models April 4, 2011 1 / 22 Bayesian Networks and independences

### Course: Model, Learning, and Inference: Lecture 5

Course: Model, Learning, and Inference: Lecture 5 Alan Yuille Department of Statistics, UCLA Los Angeles, CA 90095 yuille@stat.ucla.edu Abstract Probability distributions on structured representation.

### Data Modeling & Analysis Techniques. Probability & Statistics. Manfred Huber 2011 1

Data Modeling & Analysis Techniques Probability & Statistics Manfred Huber 2011 1 Probability and Statistics Probability and statistics are often used interchangeably but are different, related fields

### Learning from Data: Naive Bayes

Semester 1 http://www.anc.ed.ac.uk/ amos/lfd/ Naive Bayes Typical example: Bayesian Spam Filter. Naive means naive. Bayesian methods can be much more sophisticated. Basic assumption: conditional independence.

### Basics of Statistical Machine Learning

CS761 Spring 2013 Advanced Machine Learning Basics of Statistical Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu Modern machine learning is rooted in statistics. You will find many familiar

### Question 2 Naïve Bayes (16 points)

Question 2 Naïve Bayes (16 points) About 2/3 of your email is spam so you downloaded an open source spam filter based on word occurrences that uses the Naive Bayes classifier. Assume you collected the

### CS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning.

Lecture Machine Learning Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square, x5 http://www.cs.pitt.edu/~milos/courses/cs75/ Administration Instructor: Milos Hauskrecht milos@cs.pitt.edu 539 Sennott

### Unit 19: Probability Models

Unit 19: Probability Models Summary of Video Probability is the language of uncertainty. Using statistics, we can better predict the outcomes of random phenomena over the long term from the very complex,

### Big Data, Machine Learning, Causal Models

Big Data, Machine Learning, Causal Models Sargur N. Srihari University at Buffalo, The State University of New York USA Int. Conf. on Signal and Image Processing, Bangalore January 2014 1 Plan of Discussion

### Bayes and Naïve Bayes. cs534-machine Learning

Bayes and aïve Bayes cs534-machine Learning Bayes Classifier Generative model learns Prediction is made by and where This is often referred to as the Bayes Classifier, because of the use of the Bayes rule

### Incorporating Evidence in Bayesian networks with the Select Operator

Incorporating Evidence in Bayesian networks with the Select Operator C.J. Butz and F. Fang Department of Computer Science, University of Regina Regina, Saskatchewan, Canada SAS 0A2 {butz, fang11fa}@cs.uregina.ca

### 5 Directed acyclic graphs

5 Directed acyclic graphs (5.1) Introduction In many statistical studies we have prior knowledge about a temporal or causal ordering of the variables. In this chapter we will use directed graphs to incorporate

### Large-Sample Learning of Bayesian Networks is NP-Hard

Journal of Machine Learning Research 5 (2004) 1287 1330 Submitted 3/04; Published 10/04 Large-Sample Learning of Bayesian Networks is NP-Hard David Maxwell Chickering David Heckerman Christopher Meek Microsoft

### Outline. Probability. Random Variables. Joint and Marginal Distributions Conditional Distribution Product Rule, Chain Rule, Bayes Rule.

Probability 1 Probability Random Variables Outline Joint and Marginal Distributions Conditional Distribution Product Rule, Chain Rule, Bayes Rule Inference Independence 2 Inference in Ghostbusters A ghost

### Intelligent Systems: Reasoning and Recognition. Uncertainty and Plausible Reasoning

Intelligent Systems: Reasoning and Recognition James L. Crowley ENSIMAG 2 / MoSIG M1 Second Semester 2015/2016 Lesson 12 25 march 2015 Uncertainty and Plausible Reasoning MYCIN (continued)...2 Backward

### Classification Problems

Classification Read Chapter 4 in the text by Bishop, except omit Sections 4.1.6, 4.1.7, 4.2.4, 4.3.3, 4.3.5, 4.3.6, 4.4, and 4.5. Also, review sections 1.5.1, 1.5.2, 1.5.3, and 1.5.4. Classification Problems

### Statistical Machine Learning from Data

Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Gaussian Mixture Models Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique

### 1 Maximum likelihood estimation

COS 424: Interacting with Data Lecturer: David Blei Lecture #4 Scribes: Wei Ho, Michael Ye February 14, 2008 1 Maximum likelihood estimation 1.1 MLE of a Bernoulli random variable (coin flips) Given N

### Exponential Random Graph Models for Social Network Analysis. Danny Wyatt 590AI March 6, 2009

Exponential Random Graph Models for Social Network Analysis Danny Wyatt 590AI March 6, 2009 Traditional Social Network Analysis Covered by Eytan Traditional SNA uses descriptive statistics Path lengths

### Part III: Machine Learning. CS 188: Artificial Intelligence. Machine Learning This Set of Slides. Parameter Estimation. Estimation: Smoothing

CS 188: Artificial Intelligence Lecture 20: Dynamic Bayes Nets, Naïve Bayes Pieter Abbeel UC Berkeley Slides adapted from Dan Klein. Part III: Machine Learning Up until now: how to reason in a model and

### Statistics in Geophysics: Introduction and Probability Theory

Statistics in Geophysics: Introduction and Steffen Unkel Department of Statistics Ludwig-Maximilians-University Munich, Germany Winter Term 2013/14 1/32 What is Statistics? Introduction Statistics is the

### Machine Learning. CS 188: Artificial Intelligence Naïve Bayes. Example: Digit Recognition. Other Classification Tasks

CS 188: Artificial Intelligence Naïve Bayes Machine Learning Up until now: how use a model to make optimal decisions Machine learning: how to acquire a model from data / experience Learning parameters

### Bayesian Analysis for the Social Sciences

Bayesian Analysis for the Social Sciences Simon Jackman Stanford University http://jackman.stanford.edu/bass November 9, 2012 Simon Jackman (Stanford) Bayesian Analysis for the Social Sciences November

### 15.0 More Hypothesis Testing

15.0 More Hypothesis Testing 1 Answer Questions Type I and Type II Error Power Calculation Bayesian Hypothesis Testing 15.1 Type I and Type II Error In the philosophy of hypothesis testing, the null hypothesis

### CHAPTER 2 Estimating Probabilities

CHAPTER 2 Estimating Probabilities Machine Learning Copyright c 2016. Tom M. Mitchell. All rights reserved. *DRAFT OF January 24, 2016* *PLEASE DO NOT DISTRIBUTE WITHOUT AUTHOR S PERMISSION* This is a

### Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Summary of Formulas and Concepts Descriptive Statistics (Ch. 1-4) Definitions Population: The complete set of numerical information on a particular quantity in which an investigator is interested. We assume

### CS570 Introduction to Data Mining. Classification and Prediction. Partial slide credits: Han and Kamber Tan,Steinbach, Kumar

CS570 Introduction to Data Mining Classification and Prediction Partial slide credits: Han and Kamber Tan,Steinbach, Kumar 1 Classification and Prediction Overview Classification algorithms and methods

### MAS108 Probability I

1 QUEEN MARY UNIVERSITY OF LONDON 2:30 pm, Thursday 3 May, 2007 Duration: 2 hours MAS108 Probability I Do not start reading the question paper until you are instructed to by the invigilators. The paper

### Experimental Uncertainty and Probability

02/04/07 PHY310: Statistical Data Analysis 1 PHY310: Lecture 03 Experimental Uncertainty and Probability Road Map The meaning of experimental uncertainty The fundamental concepts of probability 02/04/07

### Bayesian probability theory

Bayesian probability theory Bruno A. Olshausen arch 1, 2004 Abstract Bayesian probability theory provides a mathematical framework for peforming inference, or reasoning, using probability. The foundations

### Probability for AI students who have seen some probability before but didn t understand any of it

Probability for AI students who have seen some probability before but didn t understand any of it I am inflicting these proofs on you for two reasons: 1. These kind of manipulations will need to be second

LOCUS Definition: The set of all points (and only those points) which satisfy the given geometrical condition(s) (or properties) is called a locus. Eg. The set of points in a plane which are at a constant

### CONTINGENCY (CROSS- TABULATION) TABLES

CONTINGENCY (CROSS- TABULATION) TABLES Presents counts of two or more variables A 1 A 2 Total B 1 a b a+b B 2 c d c+d Total a+c b+d n = a+b+c+d 1 Joint, Marginal, and Conditional Probability We study methods

### Parametric Models Part I: Maximum Likelihood and Bayesian Density Estimation

Parametric Models Part I: Maximum Likelihood and Bayesian Density Estimation Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2015 CS 551, Fall 2015

### CSCI567 Machine Learning (Fall 2014)

CSCI567 Machine Learning (Fall 2014) Drs. Sha & Liu {feisha,yanliu.cs}@usc.edu September 22, 2014 Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 22, 2014 1 /

### Compression algorithm for Bayesian network modeling of binary systems

Compression algorithm for Bayesian network modeling of binary systems I. Tien & A. Der Kiureghian University of California, Berkeley ABSTRACT: A Bayesian network (BN) is a useful tool for analyzing the

### SYSM 6304: Risk and Decision Analysis Lecture 5: Methods of Risk Analysis

SYSM 6304: Risk and Decision Analysis Lecture 5: Methods of Risk Analysis M. Vidyasagar Cecil & Ida Green Chair The University of Texas at Dallas Email: M.Vidyasagar@utdallas.edu October 17, 2015 Outline

### Message-passing sequential detection of multiple change points in networks

Message-passing sequential detection of multiple change points in networks Long Nguyen, Arash Amini Ram Rajagopal University of Michigan Stanford University ISIT, Boston, July 2012 Nguyen/Amini/Rajagopal

### STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct

### Model-based Synthesis. Tony O Hagan

Model-based Synthesis Tony O Hagan Stochastic models Synthesising evidence through a statistical model 2 Evidence Synthesis (Session 3), Helsinki, 28/10/11 Graphical modelling The kinds of models that

### Lecture 11: Graphical Models for Inference

Lecture 11: Graphical Models for Inference So far we have seen two graphical models that are used for inference - the Bayesian network and the Join tree. These two both represent the same joint probability

### Probabilistic Networks An Introduction to Bayesian Networks and Influence Diagrams

Probabilistic Networks An Introduction to Bayesian Networks and Influence Diagrams Uffe B. Kjærulff Department of Computer Science Aalborg University Anders L. Madsen HUGIN Expert A/S 10 May 2005 2 Contents

### Lecture 9: Introduction to Pattern Analysis

Lecture 9: Introduction to Pattern Analysis g Features, patterns and classifiers g Components of a PR system g An example g Probability definitions g Bayes Theorem g Gaussian densities Features, patterns

### CS 331: Artificial Intelligence Fundamentals of Probability II. Joint Probability Distribution. Marginalization. Marginalization

Full Joint Probability Distributions CS 331: Artificial Intelligence Fundamentals of Probability II Toothache Cavity Catch Toothache, Cavity, Catch false false false 0.576 false false true 0.144 false

### Neural Networks for Machine Learning. Lecture 13a The ups and downs of backpropagation

Neural Networks for Machine Learning Lecture 13a The ups and downs of backpropagation Geoffrey Hinton Nitish Srivastava, Kevin Swersky Tijmen Tieleman Abdel-rahman Mohamed A brief history of backpropagation

### Linear Models for Classification

Linear Models for Classification Sumeet Agarwal, EEL709 (Most figures from Bishop, PRML) Approaches to classification Discriminant function: Directly assigns each data point x to a particular class Ci

### Natural Language Processing. Today. Word Prediction. Lecture 6 9/10/2015. Jim Martin. Language modeling with N-grams. Guess the next word...

Natural Language Processing Lecture 6 9/10/2015 Jim Martin Today Language modeling with N-grams Basic counting Probabilistic model Independence assumptions 9/8/15 Speech and Language Processing - Jurafsky

### Projektgruppe. Categorization of text documents via classification

Projektgruppe Steffen Beringer Categorization of text documents via classification 4. Juni 2010 Content Motivation Text categorization Classification in the machine learning Document indexing Construction

### Probability Theory. Elementary rules of probability Sum rule. Product rule. p. 23

Probability Theory Uncertainty is key concept in machine learning. Probability provides consistent framework for the quantification and manipulation of uncertainty. Probability of an event is the fraction

### Tagging with Hidden Markov Models

Tagging with Hidden Markov Models Michael Collins 1 Tagging Problems In many NLP problems, we would like to model pairs of sequences. Part-of-speech (POS) tagging is perhaps the earliest, and most famous,

### Informatics 2D Reasoning and Agents Semester 2, 2015-16

Informatics 2D Reasoning and Agents Semester 2, 2015-16 Alex Lascarides alex@inf.ed.ac.uk Lecture 29 Decision Making Under Uncertainty 24th March 2016 Informatics UoE Informatics 2D 1 Where are we? Last

### The Certainty-Factor Model

The Certainty-Factor Model David Heckerman Departments of Computer Science and Pathology University of Southern California HMR 204, 2025 Zonal Ave Los Angeles, CA 90033 dheck@sumex-aim.stanford.edu 1 Introduction

### Linear Classification. Volker Tresp Summer 2015

Linear Classification Volker Tresp Summer 2015 1 Classification Classification is the central task of pattern recognition Sensors supply information about an object: to which class do the object belong

### Study Manual. Probabilistic Reasoning. Silja Renooij. August 2015

Study Manual Probabilistic Reasoning 2015 2016 Silja Renooij August 2015 General information This study manual was designed to help guide your self studies. As such, it does not include material that is

### A Few Basics of Probability

A Few Basics of Probability Philosophy 57 Spring, 2004 1 Introduction This handout distinguishes between inductive and deductive logic, and then introduces probability, a concept essential to the study

### Bayesian Network Development

Bayesian Network Development Kenneth BACLAWSKI College of Computer Science, Northeastern University Boston, Massachusetts 02115 USA Ken@Baclawski.com Abstract Bayesian networks are a popular mechanism

### 15-381: Artificial Intelligence. Probabilistic Reasoning and Inference

5-38: Artificial Intelligence robabilistic Reasoning and Inference Advantages of probabilistic reasoning Appropriate for complex, uncertain, environments - Will it rain tomorrow? Applies naturally to many

### Agenda. Interface Agents. Interface Agents

Agenda Marcelo G. Armentano Problem Overview Interface Agents Probabilistic approach Monitoring user actions Model of the application Model of user intentions Example Summary ISISTAN Research Institute

### A. The answer as per this document is No, it cannot exist keeping all distances rational.

Rational Distance Conor.williams@gmail.com www.unsolvedproblems.org: Q. Given a unit square, can you find any point in the same plane, either inside or outside the square, that is a rational distance from

### Reasoning Under Uncertainty: Variable Elimination Example

Reasoning Under Uncertainty: Variable Elimination Example CPSC 322 Lecture 29 March 26, 2007 Textbook 9.5 Reasoning Under Uncertainty: Variable Elimination Example CPSC 322 Lecture 29, Slide 1 Lecture

### Knowledge-Based Probabilistic Reasoning from Expert Systems to Graphical Models

Knowledge-Based Probabilistic Reasoning from Expert Systems to Graphical Models By George F. Luger & Chayan Chakrabarti {luger cc} @cs.unm.edu Department of Computer Science University of New Mexico Albuquerque

### CCNY. BME I5100: Biomedical Signal Processing. Linear Discrimination. Lucas C. Parra Biomedical Engineering Department City College of New York

BME I5100: Biomedical Signal Processing Linear Discrimination Lucas C. Parra Biomedical Engineering Department CCNY 1 Schedule Week 1: Introduction Linear, stationary, normal - the stuff biology is not

### DETERMINING THE CONDITIONAL PROBABILITIES IN BAYESIAN NETWORKS

Hacettepe Journal of Mathematics and Statistics Volume 33 (2004), 69 76 DETERMINING THE CONDITIONAL PROBABILITIES IN BAYESIAN NETWORKS Hülya Olmuş and S. Oral Erbaş Received 22 : 07 : 2003 : Accepted 04

### Finding soon-to-fail disks in a haystack

Finding soon-to-fail disks in a haystack Moises Goldszmidt Microsoft Research Abstract This paper presents a detector of soon-to-fail disks based on a combination of statistical models. During operation

### MACHINE LEARNING IN HIGH ENERGY PHYSICS

MACHINE LEARNING IN HIGH ENERGY PHYSICS LECTURE #1 Alex Rogozhnikov, 2015 INTRO NOTES 4 days two lectures, two practice seminars every day this is introductory track to machine learning kaggle competition!

### Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression

Logistic Regression Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max

### Basics of Probability

Basics of Probability 1 Sample spaces, events and probabilities Begin with a set Ω the sample space e.g., 6 possible rolls of a die. ω Ω is a sample point/possible world/atomic event A probability space

### Welcome to the HVAC Sales Academy Learning Management System (LMS) Walkthrough:

Welcome to the HVAC Sales Academy Learning Management System (LMS) Walkthrough: In this walkthrough we will show you the basics of navigating around inside our new LMS system. This should help make it

### Bayesian Updating with Discrete Priors Class 11, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom

1 Learning Goals Bayesian Updating with Discrete Priors Class 11, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom 1. Be able to apply Bayes theorem to compute probabilities. 2. Be able to identify

### Supervised Learning (Big Data Analytics)

Supervised Learning (Big Data Analytics) Vibhav Gogate Department of Computer Science The University of Texas at Dallas Practical advice Goal of Big Data Analytics Uncover patterns in Data. Can be used

### 3. The Junction Tree Algorithms

A Short Course on Graphical Models 3. The Junction Tree Algorithms Mark Paskin mark@paskin.org 1 Review: conditional independence Two random variables X and Y are independent (written X Y ) iff p X ( )

### Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification

Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification Tina R. Patil, Mrs. S. S. Sherekar Sant Gadgebaba Amravati University, Amravati tnpatil2@gmail.com, ss_sherekar@rediffmail.com

### Data Mining Chapter 6: Models and Patterns Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

Data Mining Chapter 6: Models and Patterns Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Models vs. Patterns Models A model is a high level, global description of a

### Bayesian Tutorial (Sheet Updated 20 March)

Bayesian Tutorial (Sheet Updated 20 March) Practice Questions (for discussing in Class) Week starting 21 March 2016 1. What is the probability that the total of two dice will be greater than 8, given that

### Applied Algorithm Design Lecture 5

Applied Algorithm Design Lecture 5 Pietro Michiardi Eurecom Pietro Michiardi (Eurecom) Applied Algorithm Design Lecture 5 1 / 86 Approximation Algorithms Pietro Michiardi (Eurecom) Applied Algorithm Design

### A Bayesian Approach for on-line max auditing of Dynamic Statistical Databases

A Bayesian Approach for on-line max auditing of Dynamic Statistical Databases Gerardo Canfora Bice Cavallo University of Sannio, Benevento, Italy, {gerardo.canfora,bice.cavallo}@unisannio.it ABSTRACT In

### CS 2750 Machine Learning. Lecture 1. Machine Learning. CS 2750 Machine Learning.

Lecture 1 Machine Learning Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square, x-5 http://www.cs.pitt.edu/~milos/courses/cs75/ Administration Instructor: Milos Hauskrecht milos@cs.pitt.edu 539 Sennott

### On Attacking Statistical Spam Filters

On Attacking Statistical Spam Filters Gregory L. Wittel and S. Felix Wu Department of Computer Science University of California, Davis One Shields Avenue, Davis, CA 95616 USA Paper review by Deepak Chinavle