# Bayesian Networks Tutorial I Basics of Probability Theory and Bayesian Networks

Save this PDF as:

Size: px
Start display at page:

Download "Bayesian Networks Tutorial I Basics of Probability Theory and Bayesian Networks"

## Transcription

1 Bayesian Networks Tutorial I Basics of Probability Theory and Bayesian Networks Peter Lucas LIACS, Leiden University Elementary probability theory We adopt the following notations with respect to probability distributions and Boolean variables. Let X denote a variable; if X is a binary variable, e.g. taking either the value true or false, X = true is also denoted by simply x; similarly, X = false is also referred to by x. Furthermore, when it is not really important to know what value a variable takes, we just refer to the variable itself. For example, P(X) either stands for P(x) or P( x). Of course, a statement like P(X) = 0.7 would be silly, as in this case it does matter where X really stands for: if it would be x, then P(x) = 0.7 and P( x) = 0.3, and if it would be x, then P( x) = 0.7 and P(x) = 0.3. Finally, the expression X P(X) means summing over all possible values of the variable X; if X is binary, then P(X) = P(x)+P( x) X This notation can be generalised for joint probability distributions, e.g. P(X,Y,a) = P(x,y,a)+P( x,y,a)+p(x, y,a)+p( x, y,a) X,Y Of course, the same notation can also be used for conditional probability distributions, as those conform to the axioms of probability theory as well: P(X Y) = P(x Y)+P( x Y) X Exercise 1 Let P be a probability distribution defined on the set of Boolean expressions B, i.e. the following axioms hold for P: P( ) = 1; P( ) = 0; P(x y) = P(x)+P(y), if x y =, i.e. x and y are disjoint, x,y B. 1

2 a. Draw a Venn diagram with x, y and x y, for the case that x y. b. Now prove that the following holds in general: P(x y) = P(x)+P(y) P(x y) Exercise 2 Let P be a joint probability distribution, defined as follows (note that P(A,B) is a shorthand for P(A B)). P(a,b) = 0.3 P( a,b) = 0.2 P(a, b) = 0.4 P( a, b) = 0.1 a. Compute P(a) and P(b). b. Compute P(a b). c. Using these last results, use Bayes rule to compute P(b a). Exercise 3 a. Prove that Bayes rule holds, based on the definition of conditional probabilities: P(X Y) = P(Y X)P(X) P(Y) b. Why is this rule sometimes called the arc reversal rule? c. Let P(B A 1,...,A n ) be a family of conditional probability distributions. Explain why computing P(B A 1,...,A n ) may be computationally hard when the variables are discrete (possibly binary), and discuss possible solutions. Exercise 4 Consider joint probability distribution P(X1,X 2,X 3,X 4 ): P(x 1,x 2,x 3,x 4 ) = 0.1 P(x 1, x 2,x 3,x 4 ) = 0.04 P(x 1,x 2, x 3,x 4 ) = 0.03 P(x 1,x 2,x 3, x 4 ) = 0.1 P( x 1,x 2,x 3,x 4 ) = 0.0 P( x 1, x 2,x 3,x 4 ) = 0.2 P( x 1,x 2, x 3,x 4 ) = 0.08 P( x 1,x 2,x 3, x 4 ) = 0.1 P(x 1, x 2, x 3,x 4 ) = P(x 1, x 2,x 3, x 4 ) = 0.1 P(x 1,x 2, x 3, x 4 ) =

3 P( x 1, x 2, x 3,x 4 ) = P( x 1, x 2,x 3, x 4 ) = 0.01 P( x 1,x 2, x 3, x 4 ) = 0.01 P(x 1, x 2, x 3, x 4 ) = P( x 1, x 2, x 3, x 4 ) = 0.2 a. Compute P(x 2 x 3 x 1 x 4 ). b. Split up P(X 1,X 2,X 3,X 4 ) into three factors, and compute the values of these factors for X j =, j = 1,...,4. Exercise 5 Let X, Y and Z three different stochastic variables. Assume that X is conditionally independent of Y given Z. a. Try to define a probability distribution P(X Y,Z) (i.e. supply the numbers for that distribution) for which the conditional independence property holds. b. Prove that from P(X Y,Z) = P(X Z) it follows that P(Y X,Z) = P(Y Z). Exercise 6 a. Prove that if the sets of variables X and Y are conditionally independent given the set of variables Z, i.e. P(X Y,Z) = P(X Z) it follows that P(X,Y Z) = P(X Z)P(Y Z) b. Explain why the use of marginalisation is often combined with conditioning. Exercise 7 The probability distributions of Bayesian networks are often based on the special probability distributions one finds described in textbooks of probability theory and statistics. Examples are: the Bernoulli distribution: f(x;p) = p x (1 p) 1 x, where p is the parameter of the distribution (the probability of success of an experiment) and x is the outcome (0, i.e. failure, or 1, i.e. success). So, f(0) = 1 p and f(1) = p. the binomial distribution: f(x;p,n) = ( n x) p x (1 p) n x, where p is again the probability of success; n represents the number of experiments and x the number of successes out of n (thus, there are n x failures). Note the relationship of this distribution with the Bernoulli distribution. 3

4 The notation f( ; ) says that the elements after the semicolon are parameters. We also have the notation f(,..., ) for joint probability distributions, where the parameters are uncertain. Now, represent each of these distributions as two Bayesian networks, where for BN1: the parameters p and n are assumed to be fixed; BN2: the parameters p and n are assumed to be uncertain, assuming that f(x,p) = P(x p)g(p) for Bernoulli and f(x,p,n) = P(x p,n)g(p)h(n) for the binomial distribution (here g and h are probability density the continuous case or mass functions the discrete case). Bayesian networks and naive probabilistic reasoning Recall that a Bayesian network B is defined as a pair B = (G,P), where G = (V(G),A(G)) is an acyclic directed graph with set of vertices (or nodes) V(G) = {X 1,...,X n } and arcs A(G) V(G) V(G), and a joint probability distribution P defined on the variables corresponding to the vertices V(G), as follows: P(X 1,...,X n ) = n P(X i π(x i )) i=1 where π(x i ) stands for the set of parents (direct ancestors) of X i. Another, more precise version of the definition of the joint probability distribution of a Bayesian network, is to say that X V(G) is the set of random variables corresponding to the vertices of G, where V(G) = {v 1,...,v n } are the associated vertices. The joint probability distribution P is then defined as follows: P(X V (G) ) = P(X v X π(v) ) v V (G) where X π(v) are the random variables that correspond to the parents of vertex v V(G). However, although mathematically more beautiful, the latter definition is also slightly more difficult to understand. It is often used in mathematically oriented book, but less often in computer-science books. Exercise 8 Consider the Naive Bayes network given in Figure 1. Given the evidence: E = {temp > 37.5}, compute the probability of flu using Bayes rule, i. e., P(flu temp > 37.5). Exercise 9 Consider Figure 2, which displays a Bayesian network in which the three vertices A, B and C interact to cause effect D through a noisy-and. The intermediate variables I A, I B and I C are note indicated in the figure, but it is assumed that for computational purposes these intermediate variables are implicitly present. The following probabilities have been specified by the designer of the Bayesian network: 4

5 Myalgia y/n P(myalgia flu) = 0.96 P(myalgia pneu) = 0.28 Disease pneu/flu Fever y/n P(fever flu) = 0.88 P(fever pneu) = 0.82 P(flu) = 0.67 P(pneu) = 0.33 TEMP 37.5/ > 37.5 P(TEMP 37.5 flu) = 0.20 P(TEMP 37.5 pneu) = 0.26 Figure 1: Naive Bayesian network. P(i A a) = 0.7 P(i A a) = 0.9 P(i B b) = 0.4 P(i B b) = 0.8 P(i C c) = 0.3 P(i C c) = 0.3 P(a) = 0.4 P(b) = 0.7 P(c) = 0.8 P(e d) = 0.2 P(e d) = 0.6 a. Compute P (e) = P(e a,b,c), i.e. the marginal probability of e given that A = B = C = true. b. Compute P (e) = P(e a,b). A B C D E Figure 2: Bayesian network: noisy-and. Exercise 10 Note: this exercise is similar to the exercise at the end of Lecture 2. Consider Figure 3. a. Define a probability distribution P for this Bayesian network B. 5

6 V 1 V 2 V 3 V 4 Figure 3: Bayesian network. b. Compute the marginal probability distribution P(V 4 ). c. Next, assume that that we know that v 2 (V 2 = true. What is the value of P (v 2 ) and P ( v 2 )? Compute P (V 4 ), i.e. P (v 4 ) and P ( v 4 ), and also P (V 1 ). d. Finally, assume that V 4 = true (hence, V 2 is again unknown). Compute the marginal probability distribution P (V 2 ). e. Based on the probability distribution P defined in 3a, for the network B = (G,P) shown in Figure 3, compute the joint probability distribution of V 1,V 2,V 3 and V 4 for a fixed set of values; for example P(v 1, v 2,v 3,v 4 ). Exercise 11 V 1 V 5 V 2 V 4 V 3 Figure 4: Bayesian network of Exercise 11. Consider the Bayesian network B = (G,P) shown in Figure 4, where G = (V(G),A(G)) is the directed acyclic graph shown in the figure, and P is a probability distribution defined on the variables corresponding to the vertices V(G) = {V 1,V 2,V 3,V 4,V 5 }. The following (local) probability distributions are defined for P: P(v 1 ) = 0.2 P(v 5 ) = 0.8 P(v 2 v 1 ) = 0.5 P(v 2 v 1 ) = 0.4 P(v 3 v 2,v 4 ) = 0.4 P(v 3 v 2,v 4 ) = 0.7 P(v 3 v 2, v 4 ) = 0.3 P(v 3 v 2, v 4 ) = 0.6 P(v 4 v 1,v 5 ) = 0.6 P(v 4 v 1,v 5 ) = 0.2 P(v 4 v 1, v 5 ) = 0 P(v 4 v 1, v 5 ) = 1 a. Compute the probability P( v 5 v 3 ), i.e. for V 5 equal to false given that V 3 is equal to true, if it is known that P(v 3 )

7 b. What are the consequences of the fact that this Bayesian network contains a cycle in the underlying (undirected) graph? Exercise 12 Consider the Bayesian network B = (G,P) shown in Figure 3. a. Enumerate all conditional independence assumptions U V W for this Bayesian network. b. Enumerate all the unconditional independence assumptions. c. Enumerate the dependencies U V W. 7

### Examples and Proofs of Inference in Junction Trees

Examples and Proofs of Inference in Junction Trees Peter Lucas LIAC, Leiden University February 4, 2016 1 Representation and notation Let P(V) be a joint probability distribution, where V stands for a

### Lecture 2: Introduction to belief (Bayesian) networks

Lecture 2: Introduction to belief (Bayesian) networks Conditional independence What is a belief network? Independence maps (I-maps) January 7, 2008 1 COMP-526 Lecture 2 Recall from last time: Conditional

### Probability, Conditional Independence

Probability, Conditional Independence June 19, 2012 Probability, Conditional Independence Probability Sample space Ω of events Each event ω Ω has an associated measure Probability of the event P(ω) Axioms

### CS 188: Artificial Intelligence. Probability recap

CS 188: Artificial Intelligence Bayes Nets Representation and Independence Pieter Abbeel UC Berkeley Many slides over this course adapted from Dan Klein, Stuart Russell, Andrew Moore Conditional probability

### Artificial Intelligence Mar 27, Bayesian Networks 1 P (T D)P (D) + P (T D)P ( D) =

Artificial Intelligence 15-381 Mar 27, 2007 Bayesian Networks 1 Recap of last lecture Probability: precise representation of uncertainty Probability theory: optimal updating of knowledge based on new information

### A crash course in probability and Naïve Bayes classification

Probability theory A crash course in probability and Naïve Bayes classification Chapter 9 Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s

### 13.3 Inference Using Full Joint Distribution

191 The probability distribution on a single variable must sum to 1 It is also true that any joint probability distribution on any set of variables must sum to 1 Recall that any proposition a is equivalent

### Intelligent Systems: Reasoning and Recognition. Uncertainty and Plausible Reasoning

Intelligent Systems: Reasoning and Recognition James L. Crowley ENSIMAG 2 / MoSIG M1 Second Semester 2015/2016 Lesson 12 25 march 2015 Uncertainty and Plausible Reasoning MYCIN (continued)...2 Backward

### Querying Joint Probability Distributions

Querying Joint Probability Distributions Sargur Srihari srihari@cedar.buffalo.edu 1 Queries of Interest Probabilistic Graphical Models (BNs and MNs) represent joint probability distributions over multiple

### Data Modeling & Analysis Techniques. Probability & Statistics. Manfred Huber 2011 1

Data Modeling & Analysis Techniques Probability & Statistics Manfred Huber 2011 1 Probability and Statistics Probability and statistics are often used interchangeably but are different, related fields

### 5 Directed acyclic graphs

5 Directed acyclic graphs (5.1) Introduction In many statistical studies we have prior knowledge about a temporal or causal ordering of the variables. In this chapter we will use directed graphs to incorporate

### Logic, Probability and Learning

Logic, Probability and Learning Luc De Raedt luc.deraedt@cs.kuleuven.be Overview Logic Learning Probabilistic Learning Probabilistic Logic Learning Closely following : Russell and Norvig, AI: a modern

### The Basics of Graphical Models

The Basics of Graphical Models David M. Blei Columbia University October 3, 2015 Introduction These notes follow Chapter 2 of An Introduction to Probabilistic Graphical Models by Michael Jordan. Many figures

### Probability Theory. Elementary rules of probability Sum rule. Product rule. p. 23

Probability Theory Uncertainty is key concept in machine learning. Probability provides consistent framework for the quantification and manipulation of uncertainty. Probability of an event is the fraction

### The Joint Probability Distribution (JPD) of a set of n binary variables involve a huge number of parameters

DEFINING PROILISTI MODELS The Joint Probability Distribution (JPD) of a set of n binary variables involve a huge number of parameters 2 n (larger than 10 25 for only 100 variables). x y z p(x, y, z) 0

### Sets and set operations

CS 441 Discrete Mathematics for CS Lecture 7 Sets and set operations Milos Hauskrecht milos@cs.pitt.edu 5329 Sennott Square asic discrete structures Discrete math = study of the discrete structures used

### We have discussed the notion of probabilistic dependence above and indicated that dependence is

1 CHAPTER 7 Online Supplement Covariance and Correlation for Measuring Dependence We have discussed the notion of probabilistic dependence above and indicated that dependence is defined in terms of conditional

### Warm-up Tangent circles Angles inside circles Power of a point. Geometry. Circles. Misha Lavrov. ARML Practice 12/08/2013

Circles ARML Practice 12/08/2013 Solutions Warm-up problems 1 A circular arc with radius 1 inch is rocking back and forth on a flat table. Describe the path traced out by the tip. 2 A circle of radius

10-601 Machine Learning http://www.cs.cmu.edu/afs/cs/academic/class/10601-f10/index.html Course data All up-to-date info is on the course web page: http://www.cs.cmu.edu/afs/cs/academic/class/10601-f10/index.html

### Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Summary of Formulas and Concepts Descriptive Statistics (Ch. 1-4) Definitions Population: The complete set of numerical information on a particular quantity in which an investigator is interested. We assume

### Lecture 1 Introduction Properties of Probability Methods of Enumeration Asrat Temesgen Stockholm University

Lecture 1 Introduction Properties of Probability Methods of Enumeration Asrat Temesgen Stockholm University 1 Chapter 1 Probability 1.1 Basic Concepts In the study of statistics, we consider experiments

### Bayesian Networks. Read R&N Ch. 14.1-14.2. Next lecture: Read R&N 18.1-18.4

Bayesian Networks Read R&N Ch. 14.1-14.2 Next lecture: Read R&N 18.1-18.4 You will be expected to know Basic concepts and vocabulary of Bayesian networks. Nodes represent random variables. Directed arcs

### Jointly Distributed Random Variables

Jointly Distributed Random Variables COMP 245 STATISTICS Dr N A Heard Contents 1 Jointly Distributed Random Variables 1 1.1 Definition......................................... 1 1.2 Joint cdfs..........................................

### Probability for Estimation (review)

Probability for Estimation (review) In general, we want to develop an estimator for systems of the form: x = f x, u + η(t); y = h x + ω(t); ggggg y, ffff x We will primarily focus on discrete time linear

### 2.1 Sets, power sets. Cartesian Products.

Lecture 8 2.1 Sets, power sets. Cartesian Products. Set is an unordered collection of objects. - used to group objects together, - often the objects with similar properties This description of a set (without

### Welcome to Stochastic Processes 1. Welcome to Aalborg University No. 1 of 31

Welcome to Stochastic Processes 1 Welcome to Aalborg University No. 1 of 31 Welcome to Aalborg University No. 2 of 31 Course Plan Part 1: Probability concepts, random variables and random processes Lecturer:

### CHAPTER 2 Estimating Probabilities

CHAPTER 2 Estimating Probabilities Machine Learning Copyright c 2016. Tom M. Mitchell. All rights reserved. *DRAFT OF January 24, 2016* *PLEASE DO NOT DISTRIBUTE WITHOUT AUTHOR S PERMISSION* This is a

### 4. Joint Distributions

Virtual Laboratories > 2. Distributions > 1 2 3 4 5 6 7 8 4. Joint Distributions Basic Theory As usual, we start with a random experiment with probability measure P on an underlying sample space. Suppose

### Basic Probability Concepts

page 1 Chapter 1 Basic Probability Concepts 1.1 Sample and Event Spaces 1.1.1 Sample Space A probabilistic (or statistical) experiment has the following characteristics: (a) the set of all possible outcomes

### Probabilistic Graphical Models

Probabilistic Graphical Models Raquel Urtasun and Tamir Hazan TTI Chicago April 4, 2011 Raquel Urtasun and Tamir Hazan (TTI-C) Graphical Models April 4, 2011 1 / 22 Bayesian Networks and independences

### Discrete Mathematics and Probability Theory Fall 2009 Satish Rao,David Tse Note 11

CS 70 Discrete Mathematics and Probability Theory Fall 2009 Satish Rao,David Tse Note Conditional Probability A pharmaceutical company is marketing a new test for a certain medical condition. According

### Lecture 16 : Relations and Functions DRAFT

CS/Math 240: Introduction to Discrete Mathematics 3/29/2011 Lecture 16 : Relations and Functions Instructor: Dieter van Melkebeek Scribe: Dalibor Zelený DRAFT In Lecture 3, we described a correspondence

### The equation for the 3-input XOR gate is derived as follows

The equation for the 3-input XOR gate is derived as follows The last four product terms in the above derivation are the four 1-minterms in the 3-input XOR truth table. For 3 or more inputs, the XOR gate

### + Section 6.2 and 6.3

Section 6.2 and 6.3 Learning Objectives After this section, you should be able to DEFINE and APPLY basic rules of probability CONSTRUCT Venn diagrams and DETERMINE probabilities DETERMINE probabilities

### Elements of probability theory

The role of probability theory in statistics We collect data so as to provide evidentiary support for answers we give to our many questions about the world (and in our particular case, about the business

### m (t) = e nt m Y ( t) = e nt (pe t + q) n = (pe t e t + qe t ) n = (qe t + p) n

1. For a discrete random variable Y, prove that E[aY + b] = ae[y] + b and V(aY + b) = a 2 V(Y). Solution: E[aY + b] = E[aY] + E[b] = ae[y] + b where each step follows from a theorem on expected value from

### Lecture 4: BK inequality 27th August and 6th September, 2007

CSL866: Percolation and Random Graphs IIT Delhi Amitabha Bagchi Scribe: Arindam Pal Lecture 4: BK inequality 27th August and 6th September, 2007 4. Preliminaries The FKG inequality allows us to lower bound

### Artificial Intelligence. Conditional probability. Inference by enumeration. Independence. Lesson 11 (From Russell & Norvig)

Artificial Intelligence Conditional probability Conditional or posterior probabilities e.g., cavity toothache) = 0.8 i.e., given that toothache is all I know tation for conditional distributions: Cavity

### Chapter 3. Cartesian Products and Relations. 3.1 Cartesian Products

Chapter 3 Cartesian Products and Relations The material in this chapter is the first real encounter with abstraction. Relations are very general thing they are a special type of subset. After introducing

### The Big 50 Revision Guidelines for S1

The Big 50 Revision Guidelines for S1 If you can understand all of these you ll do very well 1. Know what is meant by a statistical model and the Modelling cycle of continuous refinement 2. Understand

### princeton univ. F 13 cos 521: Advanced Algorithm Design Lecture 6: Provable Approximation via Linear Programming Lecturer: Sanjeev Arora

princeton univ. F 13 cos 521: Advanced Algorithm Design Lecture 6: Provable Approximation via Linear Programming Lecturer: Sanjeev Arora Scribe: One of the running themes in this course is the notion of

### STAT 360 Probability and Statistics. Fall 2012

STAT 360 Probability and Statistics Fall 2012 1) General information: Crosslisted course offered as STAT 360, MATH 360 Semester: Fall 2012, Aug 20--Dec 07 Course name: Probability and Statistics Number

### Finite and discrete probability distributions

8 Finite and discrete probability distributions To understand the algorithmic aspects of number theory and algebra, and applications such as cryptography, a firm grasp of the basics of probability theory

### Sections 2.1, 2.2 and 2.4

SETS Sections 2.1, 2.2 and 2.4 Chapter Summary Sets The Language of Sets Set Operations Set Identities Introduction Sets are one of the basic building blocks for the types of objects considered in discrete

### Probability & Probability Distributions

Probability & Probability Distributions Carolyn J. Anderson EdPsych 580 Fall 2005 Probability & Probability Distributions p. 1/61 Probability & Probability Distributions Elementary Probability Theory Definitions

### . 0 1 10 2 100 11 1000 3 20 1 2 3 4 5 6 7 8 9

Introduction The purpose of this note is to find and study a method for determining and counting all the positive integer divisors of a positive integer Let N be a given positive integer We say d is a

### Probability Distributions

CHAPTER 5 Probability Distributions CHAPTER OUTLINE 5.1 Probability Distribution of a Discrete Random Variable 5.2 Mean and Standard Deviation of a Probability Distribution 5.3 The Binomial Distribution

### 6.3 Conditional Probability and Independence

222 CHAPTER 6. PROBABILITY 6.3 Conditional Probability and Independence Conditional Probability Two cubical dice each have a triangle painted on one side, a circle painted on two sides and a square painted

### Statistical Foundations: Measures of Location and Central Tendency and Summation and Expectation

Statistical Foundations: and Central Tendency and and Lecture 4 September 5, 2006 Psychology 790 Lecture #4-9/05/2006 Slide 1 of 26 Today s Lecture Today s Lecture Where this Fits central tendency/location

### Problems on Discrete Mathematics 1

Problems on Discrete Mathematics 1 Chung-Chih Li 2 Kishan Mehrotra 3 Syracuse University, New York L A TEX at January 11, 2007 (Part I) 1 No part of this book can be reproduced without permission from

### Math 115 Spring 2011 Written Homework 5 Solutions

. Evaluate each series. a) 4 7 0... 55 Math 5 Spring 0 Written Homework 5 Solutions Solution: We note that the associated sequence, 4, 7, 0,..., 55 appears to be an arithmetic sequence. If the sequence

### Random variables P(X = 3) = P(X = 3) = 1 8, P(X = 1) = P(X = 1) = 3 8.

Random variables Remark on Notations 1. When X is a number chosen uniformly from a data set, What I call P(X = k) is called Freq[k, X] in the courseware. 2. When X is a random variable, what I call F ()

### Bayesian Updating with Discrete Priors Class 11, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom

1 Learning Goals Bayesian Updating with Discrete Priors Class 11, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom 1. Be able to apply Bayes theorem to compute probabilities. 2. Be able to identify

### An Introduction to Basic Statistics and Probability

An Introduction to Basic Statistics and Probability Shenek Heyward NCSU An Introduction to Basic Statistics and Probability p. 1/4 Outline Basic probability concepts Conditional probability Discrete Random

### Statistical Inference. Prof. Kate Calder. If the coin is fair (chance of heads = chance of tails) then

Probability Statistical Inference Question: How often would this method give the correct answer if I used it many times? Answer: Use laws of probability. 1 Example: Tossing a coin If the coin is fair (chance

### Joint Probability Distributions and Random Samples (Devore Chapter Five)

Joint Probability Distributions and Random Samples (Devore Chapter Five) 1016-345-01 Probability and Statistics for Engineers Winter 2010-2011 Contents 1 Joint Probability Distributions 1 1.1 Two Discrete

### Formal Modeling in Cognitive Science

Formal Modeling in Cognitive Science Joint, Marginal, and Miles Osborne (originally: Frank Keller) School of Informatics University of Edinburgh miles@inf.ed.ac.uk February 4, 2008 Miles Osborne (originally:

### Class One: Degree Sequences

Class One: Degree Sequences For our purposes a graph is a just a bunch of points, called vertices, together with lines or curves, called edges, joining certain pairs of vertices. Three small examples of

### The Foundations: Logic and Proofs. Chapter 1, Part III: Proofs

The Foundations: Logic and Proofs Chapter 1, Part III: Proofs Rules of Inference Section 1.6 Section Summary Valid Arguments Inference Rules for Propositional Logic Using Rules of Inference to Build Arguments

### Life of A Knowledge Base (KB)

Life of A Knowledge Base (KB) A knowledge base system is a special kind of database management system to for knowledge base management. KB extraction: knowledge extraction using statistical models in NLP/ML

### Reliability Applications (Independence and Bayes Rule)

Reliability Applications (Independence and Bayes Rule ECE 313 Probability with Engineering Applications Lecture 5 Professor Ravi K. Iyer University of Illinois Today s Topics Review of Physical vs. Stochastic

### Outline. Probability. Random Variables. Joint and Marginal Distributions Conditional Distribution Product Rule, Chain Rule, Bayes Rule.

Probability 1 Probability Random Variables Outline Joint and Marginal Distributions Conditional Distribution Product Rule, Chain Rule, Bayes Rule Inference Independence 2 Inference in Ghostbusters A ghost

### Bayesian Tutorial (Sheet Updated 20 March)

Bayesian Tutorial (Sheet Updated 20 March) Practice Questions (for discussing in Class) Week starting 21 March 2016 1. What is the probability that the total of two dice will be greater than 8, given that

### Lecture 4: Probability Distributions and Probability Densities - 2

Lecture 4: Probability Distributions and Probability Densities - 2 Assist. Prof. Dr. Emel YAVUZ DUMAN MCB1007 Introduction to Probability and Statistics İstanbul Kültür University Outline 1 Multivariate

### Mining Social Network Graphs

Mining Social Network Graphs Debapriyo Majumdar Data Mining Fall 2014 Indian Statistical Institute Kolkata November 13, 17, 2014 Social Network No introduc+on required Really? We s7ll need to understand

### Student Outcomes. Lesson Notes. Classwork. Discussion (10 minutes)

NYS COMMON CORE MATHEMATICS CURRICULUM Lesson 5 8 Student Outcomes Students know the definition of a number raised to a negative exponent. Students simplify and write equivalent expressions that contain

### Chapter 3: The basic concepts of probability

Chapter 3: The basic concepts of probability Experiment: a measurement process that produces quantifiable results (e.g. throwing two dice, dealing cards, at poker, measuring heights of people, recording

### Unit 4 The Bernoulli and Binomial Distributions

PubHlth 540 4. Bernoulli and Binomial Page 1 of 19 Unit 4 The Bernoulli and Binomial Distributions Topic 1. Review What is a Discrete Probability Distribution... 2. Statistical Expectation.. 3. The Population

### Just the Factors, Ma am

1 Introduction Just the Factors, Ma am The purpose of this note is to find and study a method for determining and counting all the positive integer divisors of a positive integer Let N be a given positive

### Probability distributions

Probability distributions (Notes are heavily adapted from Harnett, Ch. 3; Hayes, sections 2.14-2.19; see also Hayes, Appendix B.) I. Random variables (in general) A. So far we have focused on single events,

### Study Manual. Probabilistic Reasoning. Silja Renooij. August 2015

Study Manual Probabilistic Reasoning 2015 2016 Silja Renooij August 2015 General information This study manual was designed to help guide your self studies. As such, it does not include material that is

### Causal Independence for Probability Assessment and Inference Using Bayesian Networks

Causal Independence for Probability Assessment and Inference Using Bayesian Networks David Heckerman and John S. Breese Copyright c 1993 IEEE. Reprinted from D. Heckerman, J. Breese. Causal Independence

### Bayesian probability theory

Bayesian probability theory Bruno A. Olshausen arch 1, 2004 Abstract Bayesian probability theory provides a mathematical framework for peforming inference, or reasoning, using probability. The foundations

### Experimental Uncertainty and Probability

02/04/07 PHY310: Statistical Data Analysis 1 PHY310: Lecture 03 Experimental Uncertainty and Probability Road Map The meaning of experimental uncertainty The fundamental concepts of probability 02/04/07

### Tagging with Hidden Markov Models

Tagging with Hidden Markov Models Michael Collins 1 Tagging Problems In many NLP problems, we would like to model pairs of sequences. Part-of-speech (POS) tagging is perhaps the earliest, and most famous,

### 15-381: Artificial Intelligence. Probabilistic Reasoning and Inference

5-38: Artificial Intelligence robabilistic Reasoning and Inference Advantages of probabilistic reasoning Appropriate for complex, uncertain, environments - Will it rain tomorrow? Applies naturally to many

### 4. Joint Distributions of Two Random Variables

4. Joint Distributions of Two Random Variables 4.1 Joint Distributions of Two Discrete Random Variables Suppose the discrete random variables X and Y have supports S X and S Y, respectively. The joint

### Question 2 Naïve Bayes (16 points)

Question 2 Naïve Bayes (16 points) About 2/3 of your email is spam so you downloaded an open source spam filter based on word occurrences that uses the Naive Bayes classifier. Assume you collected the

### Elementary Number Theory We begin with a bit of elementary number theory, which is concerned

CONSTRUCTION OF THE FINITE FIELDS Z p S. R. DOTY Elementary Number Theory We begin with a bit of elementary number theory, which is concerned solely with questions about the set of integers Z = {0, ±1,

### Bayes Theorem. Bayes Theorem- Example. Evaluation of Medical Screening Procedure. Evaluation of Medical Screening Procedure

Bayes Theorem P(C A) P(A) P(A C) = P(C A) P(A) + P(C B) P(B) P(E B) P(B) P(B E) = P(E B) P(B) + P(E A) P(A) P(D A) P(A) P(A D) = P(D A) P(A) + P(D B) P(B) Cost of procedure is \$1,000,000 Data regarding

### Normal distribution. ) 2 /2σ. 2π σ

Normal distribution The normal distribution is the most widely known and used of all distributions. Because the normal distribution approximates many natural phenomena so well, it has developed into a

### Introduction to Statistical Machine Learning

CHAPTER Introduction to Statistical Machine Learning We start with a gentle introduction to statistical machine learning. Readers familiar with machine learning may wish to skip directly to Section 2,

### Discrete Mathematics and Probability Theory Fall 2009 Satish Rao, David Tse Note 2

CS 70 Discrete Mathematics and Probability Theory Fall 2009 Satish Rao, David Tse Note 2 Proofs Intuitively, the concept of proof should already be familiar We all like to assert things, and few of us

### Discrete Mathematics: Solutions to Homework (12%) For each of the following sets, determine whether {2} is an element of that set.

Discrete Mathematics: Solutions to Homework 2 1. (12%) For each of the following sets, determine whether {2} is an element of that set. (a) {x R x is an integer greater than 1} (b) {x R x is the square

### THE MULTINOMIAL DISTRIBUTION. Throwing Dice and the Multinomial Distribution

THE MULTINOMIAL DISTRIBUTION Discrete distribution -- The Outcomes Are Discrete. A generalization of the binomial distribution from only 2 outcomes to k outcomes. Typical Multinomial Outcomes: red A area1

### Sampling Distributions

Sampling Distributions You have seen probability distributions of various types. The normal distribution is an example of a continuous distribution that is often used for quantitative measures such as

### Probability OPRE 6301

Probability OPRE 6301 Random Experiment... Recall that our eventual goal in this course is to go from the random sample to the population. The theory that allows for this transition is the theory of probability.

### Markov random fields and Gibbs measures

Chapter Markov random fields and Gibbs measures 1. Conditional independence Suppose X i is a random element of (X i, B i ), for i = 1, 2, 3, with all X i defined on the same probability space (.F, P).

### Announcements. CompSci 230 Discrete Math for Computer Science Sets. Introduction to Sets. Sets

CompSci 230 Discrete Math for Computer Science Sets September 12, 2013 Prof. Rodger Slides modified from Rosen 1 nnouncements Read for next time Chap. 2.3-2.6 Homework 2 due Tuesday Recitation 3 on Friday

### Worked examples Basic Concepts of Probability Theory

Worked examples Basic Concepts of Probability Theory Example 1 A regular tetrahedron is a body that has four faces and, if is tossed, the probability that it lands on any face is 1/4. Suppose that one

### 3. The Junction Tree Algorithms

A Short Course on Graphical Models 3. The Junction Tree Algorithms Mark Paskin mark@paskin.org 1 Review: conditional independence Two random variables X and Y are independent (written X Y ) iff p X ( )

### Chapter 4. Probability and Probability Distributions

Chapter 4. robability and robability Distributions Importance of Knowing robability To know whether a sample is not identical to the population from which it was selected, it is necessary to assess the

### Model-based Synthesis. Tony O Hagan

Model-based Synthesis Tony O Hagan Stochastic models Synthesising evidence through a statistical model 2 Evidence Synthesis (Session 3), Helsinki, 28/10/11 Graphical modelling The kinds of models that

### A Few Basics of Probability

A Few Basics of Probability Philosophy 57 Spring, 2004 1 Introduction This handout distinguishes between inductive and deductive logic, and then introduces probability, a concept essential to the study

### Chapter 9: Hypothesis Testing Sections

Chapter 9: Hypothesis Testing Sections Skip: 9.2 Testing Simple Hypotheses Skip: 9.3 Uniformly Most Powerful Tests Skip: 9.4 Two-Sided Alternatives 9.5 The t Test 9.6 Comparing the Means of Two Normal

### Formal Languages and Automata Theory - Regular Expressions and Finite Automata -

Formal Languages and Automata Theory - Regular Expressions and Finite Automata - Samarjit Chakraborty Computer Engineering and Networks Laboratory Swiss Federal Institute of Technology (ETH) Zürich March

### Handout #Ch7 San Skulrattanakulchai Gustavus Adolphus College Dec 6, 2010. Chapter 7: Digraphs

MCS-236: Graph Theory Handout #Ch7 San Skulrattanakulchai Gustavus Adolphus College Dec 6, 2010 Chapter 7: Digraphs Strong Digraphs Definitions. A digraph is an ordered pair (V, E), where V is the set

### Bayesian Networks Chapter 14. Mausam (Slides by UW-AI faculty & David Page)

Bayesian Networks Chapter 14 Mausam (Slides by UW-AI faculty & David Page) Bayes Nets In general, joint distribution P over set of variables (X 1 x... x X n ) requires exponential space for representation