# CS 771 Artificial Intelligence. Reasoning under uncertainty

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 CS 771 Artificial Intelligence Reasoning under uncertainty

2 Today Representing uncertainty is useful in knowledge bases Probability provides a coherent framework for uncertainty Review basic concepts in probability Emphasis on conditional probability and conditional independence Full joint distributions are difficult to work with Conditional independence assumptions allow us to model real-world phenomena with much simpler models Bayesian networks are a systematic way to build compact, structured distributions Reading: Chapter 13; Chapter

3 History of Probability in AI Early AI (1950 s and 1960 s) Attempts to solve AI problems using probability met with mixed success Logical AI (1970 s, 80 s) Recognized that working with full probability models is intractable Abandoned probabilistic approaches Focused on logic-based representations Probabilistic AI (1990 s-present) Judea Pearl invents Bayesian networks in 1988 Realization that working w/ approximate probability models is tractable and useful Development of machine learning techniques to learn such models from data Probabilistic techniques now widely used in vision, speech recognition, robotics, language modeling, game-playing, etc.

4 Acting under uncertainty Let action A t = leave for airport t minutes before flight Will A t get me there on time? Problems: 1. partial observability (road state, other drivers' plans, etc.) 2. noisy sensors (traffic reports) 3. uncertainty in action outcomes (flat tire, etc.) 4. immense complexity of modeling and predicting traffic Hence a purely logical approach either 1. risks falsehood: A 25 will get me there on time, or 2. leads to conclusions that are too weak for decision making: A 25 will get me there on time if there's no accident on the bridge and it doesn't rain and my tires remain intact etc etc. (A 1440 might reasonably be said to get me there on time but I'd have to stay overnight in the airport )

5 Summarizing uncertainty Consider an example of uncertain reasoning : diagnosing a dental patient s toothache Let s try to write rules for dental diagnosis using propositional logic Consider the following simple rule : Toothache Cavity The rule is wrong Not all patients with toothache have cavities Toothache Cavity V GumProblem V Abscess In order to make this rule true we have to add almost an infinite list of possible problems We could try turning the rule into a causal rule Cavity Toothache Not all cavities cause pain The connection between toothache and cavities is just not logical consequence in other direction This is true in medical domain as well as most other judgmental domain :law, business, design, automobile repair, gardening, dating and so on An agent s knowledge can at best provide only a degree a belief in the relevant sentences Main tool for doing so is probability theory E.g., there is 80% chance that the patient who has toothache has cavity

6 Probability Probabilistic assertions summarize effects of laziness: failure to enumerate exceptions, qualifications, etc. ignorance: lack of relevant facts, initial conditions, etc. Subjective probability: Probabilities relate propositions to agent's own state of knowledge e.g., P(A 25 no reported accidents) = 0.06 These are not assertions about the world Probabilities of propositions change with new evidence: e.g., P(A 25 no reported accidents, 5 a.m.) = 0.15

7 Making decisions under uncertainty Suppose I believe the following: P(A 25 gets me there on time ) = 0.04 P(A 90 gets me there on time ) = 0.70 P(A 120 gets me there on time ) = 0.95 P(A 1440 gets me there on time ) = Which action to choose? Depends on my preferences for missing flight vs. time spent waiting, etc. Utility theory is used to represent and infer preferences Decision theory = probability theory + utility theory

8 Syntax Basic element: random variable Similar to propositional logic: possible worlds defined by assignment of values to random variables. Boolean random variables e.g., Cavity (do I have a cavity?) Discrete random variables e.g., Dice is one of <1,2,3,4,5,6> Domain values must be exhaustive and mutually exclusive Elementary proposition constructed by assignment of a value to a random variable: e.g., Weather = sunny, Cavity = false (abbreviated as cavity) Complex propositions formed from elementary propositions and standard logical connectives e.g., Weather = sunny Cavity = false

9 Syntax Atomic event: A complete specification of the state of the world about which the agent is uncertain e.g. Imagine flipping two coins The set of all possible worlds is: S={(H,H),(H,T),(T,H),(T,T)} Meaning there are 4 distinct atomic events in this world Atomic events are mutually exclusive and exhaustive Two events are mutually exclusive if they cannot occur at the same time. An example is tossing a coin once, which can result in either heads or tails, but not both. In probability theory and logic, a set of events is jointly or collectively exhaustive if at least one of the events must occur. For example, when rolling a six-sided die, the outcomes 1, 2, 3, 4, 5, and 6 are collectively exhaustive, because they encompass the entire range of possible outcomes.

10 Axioms of probability Given a set of possible worlds S P(A) 0 for all atomic events A P(S) = 1 If A and B are mutually exclusive, then: P(A B) = P(A) + P(B) Refer to P(A) as probability of event A e.g. if coins are fair P({H,H}) = ¼

11 Probability and Logic Probability can be viewed as a generalization of propositional logic P(a): a is any sentence in propositional logic Belief of agent in a is no longer restricted to true, false, unknown P(a) can range from 0 to 1 P(a) = 0, and P(a) = 1 are special cases So logic can be viewed as a special case of probability

12 Basic Probability Theory General case for A, B: P(A B) = P(A) + P(B) P(A B) e.g., imagine I flip two coins Events {(H,H),(H,T),(T,H),(T,T)} are all equally likely Consider event E that the 1 st coin is heads: E={(H,H),(H,T)} And event F that the 2 nd coin is heads: F={(H,H),(T,H)} P(E F) = P(E) + P(F) P(E F) = ½ + ½ - ¼ = ¾

13 Conditional Probability The 2 dice problem Suppose I roll two fair dice and 1 st dice is a 4 What is probability that sum of the two dice is 6? 6 possible events, given 1 st dice is 4 (4,1),(4,2),(4,3),(4,4),(4,5),(4,6) Since all events (originally) had same probability, these 6 events should have equal probability too Probability is thus 1/6

14 Conditional Probability Let A denote event that sum of dice is 6 Let B denote event that 1 st dice is 4 Conditional Probability denoted as: P(A B) Probability of event A given event B General formula given by: Probability of A B relative to probability of B What is P(sum of dice = 3 1 st dice is 4)? Let C denote event that sum of dice is 3 P(B) is same, but P(C B) = 0

15 Random Variables Often interested in some function of events, rather than the actual event Care that sum of two dice is 4, not that the event was (1,3), (2,2) or (3,1) Random Variable is a real-valued function on space of all possible worlds e.g. let Y = Number of heads in 2 coin flips P(Y=0) = P({T,T}) = ¼ P(Y=1) = P({H,T} {T,H}) = ½

16 Prior (Unconditional) Probability Probability distribution gives values for all possible assignments: Joint probability distribution for a set of random variables gives the probability of every atomic event on those random variables P(A,B) is shorthand for P(A B) Sunny Rainy Cloudy Snowy P(Weather) P(Weather,Cavity) Sunny Rainy Cloudy Snowy Cavity Cavity Joint distributions are normalized: S a S b P(A=a, B=b) = 1

17 Computing Probabilities Say we are given following joint distribution: Joint distribution for k binary variables has 2 k probabilities!

18 Computing Probabilities Say we are given following joint distribution: What is P(cavity)? Law of Total Probability (aka marginalization) P(a) = S b P(a, b) = S b P(a b) P(b)

19 Computing Probabilities What is P(cavity toothache)? Can get any conditional probability from joint distribution

20 Computing Probabilities: Normalization What is P(Cavity Toothache=toothache)? This is a distribution over the 2 states: {cavity, cavity} Distributions will be denoted w/ capital letters; Probabilities will be denoted w/ lowercase letters. αp(cavity,toothache) P(Cavity toothache) Cavity = cavity = 0.12 Cavity = cavity = 0.08

21 Computing Probabilities: The Chain Rule We can always write P(a, b, c, z) = P(a b, c,. z) P(b, c, z) (by definition of joint probability) Repeatedly applying this idea, we can write P(a, b, c, z) = P(a b, c,. z) P(b c,.. z) P(c.. z)..p(z) Semantically different factorizations w/ different orderings P(a, b, c, z) = P(z y, x,. a) P(y x,.. a) P(x.. a)..p(a)

22 Independence A and B are independent iff P(A B) = P(A) or equivalently, P(B A) = P(B) or equivalently, P(A,B) = P(A) P(B) Whether B happens, does not affect how often A happens e.g., for n independent biased coins, O(2 n ) O(n) Absolute independence is powerful but rare e.g., consider field of dentistry. Many variables, none of which are independent. What should we do?

23 Conditional independence P(Toothache, Cavity, Catch) has = 7 independent entries If I have a cavity, the probability that the probe catches doesn't depend on whether I have a toothache: (1) P(Catch Toothache, cavity) = P(Catch cavity) The same independence holds if I haven't got a cavity: (2) P(Catch Toothache, cavity) = P(Catch cavity) Catch is conditionally independent of Toothache given Cavity: P(Catch Toothache,Cavity) = P(Catch Cavity) Equivalent statements: P(Toothache Catch, Cavity) = P(Toothache Cavity) P(Toothache, Catch Cavity) = P(Toothache Cavity) P(Catch Cavity)

24 Conditional independence... Write out full joint distribution using chain rule: P(Toothache, Catch, Cavity) = P(Toothache Catch, Cavity) P(Catch, Cavity) = P(Toothache Catch, Cavity) P(Catch Cavity) P(Cavity) = P(Toothache Cavity) P(Catch Cavity) P(Cavity) P(Toothache Cavity) toothache toothache Cavity = cavity Cavity = cavity P(Catch Cavity) catch catch Cavity = cavity Cavity = cavity P(Cavity) Cavity = cavity 0.55 Cavity = cavity 0.45 P(toothache,catch, cavity) =?? = = 0.09

25 Conditional independence... Write out full joint distribution using chain rule: P(Toothache, Catch, Cavity) = P(Toothache Catch, Cavity) P(Catch, Cavity) = P(Toothache Catch, Cavity) P(Catch Cavity) P(Cavity) = P(Toothache Cavity) P(Catch Cavity) P(Cavity) P(Toothache Cavity) toothache toothache Cavity = cavity Cavity = cavity P(Catch Cavity) catch catch Cavity = cavity Cavity = cavity P(Cavity) Cavity = cavity 0.55 Cavity = cavity 0.45 Requires only = 5 parameters! Use of conditional independence can reduce size of representation of the joint distribution from exponential in n to linear in n. Conditional independence is our most basic and robust form of knowledge about uncertain environments.

26 Conditional Independence vs Independence Conditional independence does not imply independence Example: A = height B = reading ability C = age P(reading ability age, height) = P(reading ability age) P(height reading ability, age) = P(height age) Note: Height and reading ability are dependent (not independent) but are conditionally independent given age

27 Bayes Rule Two jug problem Jug 1 contains: 2 white balls & 7 black balls Jug 2 contains: 5 white balls & 6 black balls Flip a fair coin and draw a ball from Jug 1 if heads; Jug 2 if tails What is probability that coin was heads, given a white ball was selected? Want to compute P(H W) Have P(H) = P(T) = ½, P(W H) = 2/9 and P(W T) = 5/11

28 Bayes' Rule Derived from product rule: P(a b) = P(a b) P(b) = P(b a) P(a) P(a b) = P(b a) P(a) / P(b) or in distribution form P(Y X) = P(X Y) P(Y) / P(X) = αp(x Y) P(Y) = αp(x,y) Useful for assessing diagnostic probability from causal probability: e.g., let M be meningitis, S be stiff neck: P(m s) = P(s m) P(m) / P(s) = / 0.1 = Note: posterior probability of meningitis still very small!

29 Bayes' Rule P(a b, c) =?? = P(b, c a) P(a) / P(b,c) P(a, b c, d) =?? = P(c, d a, b) P(a, b) / P(c, d) Both are examples of basic pattern p(x y) = p(y x)p(x)/p(y) (it helps to group variables together, e.g., y = (a,b), x = (c, d))

30 Decision Theory why probabilities are useful Consider 2 possible actions that can be recommended by a medical decision-making system: a = operate b = don t operate 2 possible states of the world c = patient has cancer, c = patient doesn t have cancer Agent s degree of belief in c is P(c), so P( c) = 1 - P(c) Utility (to agent) associated with various outcomes: Take action a and patient has cancer: utility = \$30k Take action a and patient has no cancer: utility = -\$50k Take action b and patient has cancer: utility = -\$100k Take action b and patient has no cancer: utility = 0.

31 Maximizing expected utility What action should the agent take? Rational agent should maximize expected utility Expected cost of actions: E[ utility(a) ] = 30 P(c) 50 [1 - P(c) ] E[ utility(b) ] = -100 P(c) Break even point? 30 P(c) P(c) = -100 P(c) 100 P(c) + 30 P(c) + 50 P(c) = 50 => P(c) = 50/180 ~ 0.28 If P(c) > 0.28, the optimal decision is to operate Original theory from economics, cognitive science (1950 s) - But widely used in modern AI, e.g., in robotics, vision, game-playing Can only make optimal decisions if know the probabilities

32 What does all this have to do with AI? Logic-based knowledge representation Set of sentences in KB Agent s belief in any sentence is: true, false, or unknown In real-world problems there is uncertainty P(snow in New York on January 1) is not 0 or 1 or unknown P(pit in square 2,2 evidence so far) Ignoring this uncertainty can lead to brittle systems and inefficient use of information Uncertainty is due to: Things we did not measure (which is always the case) E.g., in economic forecasting Imperfect knowledge P(symptom disease) -> we are not 100% sure Noisy measurements P(speed > 50 sensor reading > 50) is not 1

33 Agents, Probabilities & Degrees of Belief What we were taught in school ( frequentist view) P(a) represents frequency that event a will happen in repeated trials Degree of belief P(a) represents an agent s degree of belief that event a is true This is a more general view of probability Agent s probability is based on what information they have E.g., based on data or based on a theory Examples: a = life exists on another planet What is P(a)? We will all assign different probabilities a = There will be a tornado tomorrow What is P(a)? Probabilities can vary from agent to agent depending on their models of the world and how much data they have

34 More on Degrees of Belief Our interpretation of P(a e) is that it is an agent s degree of belief in the proposition a, given evidence e Note that proposition a is true or false in the real-world P(a e) reflects the agent s uncertainty or ignorance The degree of belief interpretation does not mean that we need new or different rules for working with probabilities The same rules (Bayes rule, law of total probability, probabilities sum to 1) still apply our interpretation is different

35 Constructing a Propositional Probabilistic Knowledge Base Define all variables of interest: A, B, C, Z Define a joint probability table for P(A, B, C, Z) Given this table, we have seen how to compute the answer to a query, P(query evidence), where query and evidence = any propositional sentence 2 major problems: Computation time: P(a b) requires summing out other variables in the model e.g., O(m K-1 ) with K variables Model specification Joint table has O(m K ) entries where do all the numbers come from? These 2 problems effectively halted the use of probability in AI research from the 1960 s up until about 1990

36 Bayesian Networks

37 A Whodunit You return home from a long day to find that your house guest has been murdered. There are two culprits: 1) The Butler; and 2) The Cook There are three possible weapons: 1) A knife; 2) A gun; and 3) A candlestick Let s use probabilistic reasoning to find out whodunit?

38 Representing the problem There are 2 uncertain quantities Culprit = {Butler, Cook} Weapon = {Knife, Pistol, Candlestick} What distributions should we use? Butler is an upstanding guy Cook has a checkered past Butler keeps a pistol from his army days Cook has access to many kitchen knives The Butler is much older than the cook

39 Representing the problem What distributions should we use? Butler is an upstanding guy Cook has a checkered past Butler keeps a pistol from his army days Cook has access to many kitchen knives The Butler is much older than the cook Pistol Knife Candlestick P(weapon Culprit=Butler) Pistol Knife Candlestick P(weapon Culprit=Cook) Butler Cook P(Culprit)

40 Solving the Crime If we observe that the murder weapon was a pistol, who is the most likely culprit? The Butler!

41 Your 1 st Bayesian Network Culprit Weapon Each node represents a random variable Arrows indicate cause-effect relationship Shaded nodes represent observed variables Whodunit model in words : Culprit chooses a weapon; You observe the weapon and infer the culprit

42 Bayesian Networks Represent dependence/independence via a directed graph Nodes = random variables Edges = direct dependence Structure of the graph Conditional independence relations Recall the chain rule of repeated conditioning: The full joint distribution The graph-structured approximation Requires that graph is acyclic (no directed cycles) 2 components to a Bayesian network The graph structure (conditional independence assumptions) The numerical probabilities (for each variable given its parents)

43 Example of a simple Bayesian network A B p(a,b,c) = p(c A,B)p(A B)p(B) = p(c A,B)p(A)p(B) C Probability model has simple factored form Directed edges => direct dependence Absence of an edge => conditional independence Also known as belief networks, graphical models, causal networks Other formulations, e.g., undirected graphical models

44 Examples of 3-way Bayesian Networks A B C Marginal Independence: p(a,b,c) = p(a) p(b) p(c)

45 Examples of 3-way Bayesian Networks Conditionally independent effects: p(a,b,c) = p(b A)p(C A)p(A) A B and C are conditionally independent Given A B C e.g., A is a disease, and we model B and C as conditionally independent symptoms given A e.g. A is culprit, B is murder weapon and C is fingerprints on door to the guest s room

46 Examples of 3-way Bayesian Networks A B Independent Causes: p(a,b,c) = p(c A,B)p(A)p(B) C C depends on A and B

47 Examples of 3-way Bayesian Networks A B C Markov chain dependence: p(a,b,c) = p(c B) p(b A)p(A) e.g. If Prof. Watkins goes to party, then I might go to party. If I go to party, then my wife might go to party.

48 Example What is the probability that all five boolean variables are simultaneously true? What is the probability that all five boolean variables are simultaneously false? What is the probability that A is false given that four other variables are known to be true?

49 Bigger Example Consider the following 5 binary variables: B = a burglary occurs at your house E = an earthquake occurs at your house A = the alarm goes off J = John calls to report the alarm M = Mary calls to report the alarm Sample Query: What is P(B M, J)? Using full joint distribution to answer this question requires 2 5-1= 31 parameters Can we use prior domain knowledge to come up with a Bayesian network that requires fewer probabilities?

50 Constructing a Bayesian Network (1) Order variables in terms of causality (may be a partial order) e.g., {E, B} -> {A} -> {J, M} P(J, M, A, E, B) = P(J, M A, E, B) P(A E, B) P(E, B) P(J, M A) P(A E, B) P(E) P(B) P(J A) P(M A) P(A E, B) P(E) P(B) These conditional independence assumptions are reflected in the graph structure of the Bayesian network

51 The Resulting Bayesian Network

52 Constructing this Bayesian Network (2) P(J,M,A,E,B) = P(J A) P(M A) P(A E,B) P(E) P(B) There are 3 conditional probability tables to be determined: P(J A), P(M A), P(A E,B) Requires = 8 probabilities And 2 marginal probabilities P(E),P(B) 10 parameters in Bayesian Network; 31 parameters in joint distribution Where do these probabilities come from? Expert knowledge From data (relative frequency estimates)

53 Number of Probabilities in Bayes Nets Consider n binary variables Unconstrained joint distribution requires O(2 n ) probabilities If we have a Bayesian network, with a maximum of k parents for any node, then we need O(n 2 k ) probabilities Example Full unconstrained joint distribution n = 30: need 10 9 probabilities for full joint distribution Bayesian network n = 30, k = 4: need 480 probabilities

54 The Bayesian Network from a different Variable Ordering {M} -> {J} -> {A} -> {B} -> {E} P(J, M, A, E, B) = P(M) P(J M) P(A M,J) P(B A) P(E A,B)

55 Inference (Reasoning) in Bayes Nets Consider answering a query in a Bayesian Network Q = set of query variables e = evidence (set of instantiated variable-value pairs) Inference = computation of conditional distribution P(Q e) Examples P(burglary alarm) P(earthquake JohnCalls, MaryCalls) Can we use structure of the Bayesian Network to answer queries efficiently? Answer = yes Generally speaking, complexity is inversely proportional to sparsity of graph

56 Inference by Variable Elimination Say that query is P(B j,m) P(B j,m) = P(B,j,m) / P(j,m) = α P(B,j,m) Apply evidence to expression for joint distribution P(j,m,A,E,B) = P(j A)P(m A)P(A E,B)P(E)P(B) Marginalize out A and E Distribution over variable B i.e. over states {b, b} Sum is over states of variable A i.e. {a, a}

57 Complexity of Bayes Net Inference Assume the network is a polytree D Only a single directed path between any 2 nodes Complexity scales as O(n m K+1 ) n = number of variables m = arity of variables A B C E K = maximum number of parents for any node Compare to O(m n-1 ) for brute-force method If network is not a polytree? Can cluster variables to render new graph that is a tree Complexity is then O(n m W+1 ), where W = # variables in largest cluster

58 Naïve Bayes Model C X 1 X 2 X 3 X n P(C X 1, X n ) = a P P(X i C) P (C) Features X are conditionally independent given the class variable C Widely used in machine learning e.g., spam classification: X s = counts of words in s Probabilities P(C) and P(Xi C) can easily be estimated from labeled data

59 Hidden Markov Model (HMM) Observed Y 1 Y 2 Y 3 Y n S 1 S 2 S 3 S n Hidden Two key assumptions: 1. hidden state sequence is Markov 2. observation Y t is Conditionally Independent of all other variables given S t Widely used in speech recognition, protein sequence models

60 Summary Bayesian networks represent joint distributions using a graph The graph encodes a set of conditional independence assumptions Answering queries (i.e. inference) in a Bayesian network amounts to efficient computation of appropriate conditional probabilities Probabilistic inference is intractable in the general case Can be done in linear time for certain classes of Bayesian networks

### 13.3 Inference Using Full Joint Distribution

191 The probability distribution on a single variable must sum to 1 It is also true that any joint probability distribution on any set of variables must sum to 1 Recall that any proposition a is equivalent

### Logic, Probability and Learning

Logic, Probability and Learning Luc De Raedt luc.deraedt@cs.kuleuven.be Overview Logic Learning Probabilistic Learning Probabilistic Logic Learning Closely following : Russell and Norvig, AI: a modern

### Basics of Probability

Basics of Probability 1 Sample spaces, events and probabilities Begin with a set Ω the sample space e.g., 6 possible rolls of a die. ω Ω is a sample point/possible world/atomic event A probability space

### Artificial Intelligence. Conditional probability. Inference by enumeration. Independence. Lesson 11 (From Russell & Norvig)

Artificial Intelligence Conditional probability Conditional or posterior probabilities e.g., cavity toothache) = 0.8 i.e., given that toothache is all I know tation for conditional distributions: Cavity

### Probability, Conditional Independence

Probability, Conditional Independence June 19, 2012 Probability, Conditional Independence Probability Sample space Ω of events Each event ω Ω has an associated measure Probability of the event P(ω) Axioms

### Bayesian Networks. Read R&N Ch. 14.1-14.2. Next lecture: Read R&N 18.1-18.4

Bayesian Networks Read R&N Ch. 14.1-14.2 Next lecture: Read R&N 18.1-18.4 You will be expected to know Basic concepts and vocabulary of Bayesian networks. Nodes represent random variables. Directed arcs

### CS 188: Artificial Intelligence. Probability recap

CS 188: Artificial Intelligence Bayes Nets Representation and Independence Pieter Abbeel UC Berkeley Many slides over this course adapted from Dan Klein, Stuart Russell, Andrew Moore Conditional probability

### Marginal Independence and Conditional Independence

Marginal Independence and Conditional Independence Computer Science cpsc322, Lecture 26 (Textbook Chpt 6.1-2) March, 19, 2010 Recap with Example Marginalization Conditional Probability Chain Rule Lecture

### Bayesian Networks Chapter 14. Mausam (Slides by UW-AI faculty & David Page)

Bayesian Networks Chapter 14 Mausam (Slides by UW-AI faculty & David Page) Bayes Nets In general, joint distribution P over set of variables (X 1 x... x X n ) requires exponential space for representation

### Bayesian Networks. Mausam (Slides by UW-AI faculty)

Bayesian Networks Mausam (Slides by UW-AI faculty) Bayes Nets In general, joint distribution P over set of variables (X 1 x... x X n ) requires exponential space for representation & inference BNs provide

### Outline. Probability. Random Variables. Joint and Marginal Distributions Conditional Distribution Product Rule, Chain Rule, Bayes Rule.

Probability 1 Probability Random Variables Outline Joint and Marginal Distributions Conditional Distribution Product Rule, Chain Rule, Bayes Rule Inference Independence 2 Inference in Ghostbusters A ghost

### CS 331: Artificial Intelligence Fundamentals of Probability II. Joint Probability Distribution. Marginalization. Marginalization

Full Joint Probability Distributions CS 331: Artificial Intelligence Fundamentals of Probability II Toothache Cavity Catch Toothache, Cavity, Catch false false false 0.576 false false true 0.144 false

### Lecture 2: Introduction to belief (Bayesian) networks

Lecture 2: Introduction to belief (Bayesian) networks Conditional independence What is a belief network? Independence maps (I-maps) January 7, 2008 1 COMP-526 Lecture 2 Recall from last time: Conditional

### The Basics of Graphical Models

The Basics of Graphical Models David M. Blei Columbia University October 3, 2015 Introduction These notes follow Chapter 2 of An Introduction to Probabilistic Graphical Models by Michael Jordan. Many figures

### Data Modeling & Analysis Techniques. Probability & Statistics. Manfred Huber 2011 1

Data Modeling & Analysis Techniques Probability & Statistics Manfred Huber 2011 1 Probability and Statistics Probability and statistics are often used interchangeably but are different, related fields

### A Few Basics of Probability

A Few Basics of Probability Philosophy 57 Spring, 2004 1 Introduction This handout distinguishes between inductive and deductive logic, and then introduces probability, a concept essential to the study

10-601 Machine Learning http://www.cs.cmu.edu/afs/cs/academic/class/10601-f10/index.html Course data All up-to-date info is on the course web page: http://www.cs.cmu.edu/afs/cs/academic/class/10601-f10/index.html

### 15-381: Artificial Intelligence. Probabilistic Reasoning and Inference

5-38: Artificial Intelligence robabilistic Reasoning and Inference Advantages of probabilistic reasoning Appropriate for complex, uncertain, environments - Will it rain tomorrow? Applies naturally to many

### Statistical Inference. Prof. Kate Calder. If the coin is fair (chance of heads = chance of tails) then

Probability Statistical Inference Question: How often would this method give the correct answer if I used it many times? Answer: Use laws of probability. 1 Example: Tossing a coin If the coin is fair (chance

### Bayesian Tutorial (Sheet Updated 20 March)

Bayesian Tutorial (Sheet Updated 20 March) Practice Questions (for discussing in Class) Week starting 21 March 2016 1. What is the probability that the total of two dice will be greater than 8, given that

### Artificial Intelligence Mar 27, Bayesian Networks 1 P (T D)P (D) + P (T D)P ( D) =

Artificial Intelligence 15-381 Mar 27, 2007 Bayesian Networks 1 Recap of last lecture Probability: precise representation of uncertainty Probability theory: optimal updating of knowledge based on new information

### Definition and Calculus of Probability

In experiments with multivariate outcome variable, knowledge of the value of one variable may help predict another. For now, the word prediction will mean update the probabilities of events regarding the

### Probability OPRE 6301

Probability OPRE 6301 Random Experiment... Recall that our eventual goal in this course is to go from the random sample to the population. The theory that allows for this transition is the theory of probability.

### STA 371G: Statistics and Modeling

STA 371G: Statistics and Modeling Decision Making Under Uncertainty: Probability, Betting Odds and Bayes Theorem Mingyuan Zhou McCombs School of Business The University of Texas at Austin http://mingyuanzhou.github.io/sta371g

### 6.3 Conditional Probability and Independence

222 CHAPTER 6. PROBABILITY 6.3 Conditional Probability and Independence Conditional Probability Two cubical dice each have a triangle painted on one side, a circle painted on two sides and a square painted

### Discrete Mathematics and Probability Theory Fall 2009 Satish Rao,David Tse Note 11

CS 70 Discrete Mathematics and Probability Theory Fall 2009 Satish Rao,David Tse Note Conditional Probability A pharmaceutical company is marketing a new test for a certain medical condition. According

### Chapter 4 Lecture Notes

Chapter 4 Lecture Notes Random Variables October 27, 2015 1 Section 4.1 Random Variables A random variable is typically a real-valued function defined on the sample space of some experiment. For instance,

### Attribution. Modified from Stuart Russell s slides (Berkeley) Parts of the slides are inspired by Dan Klein s lecture material for CS 188 (Berkeley)

Machine Learning 1 Attribution Modified from Stuart Russell s slides (Berkeley) Parts of the slides are inspired by Dan Klein s lecture material for CS 188 (Berkeley) 2 Outline Inductive learning Decision

### Intelligent Systems: Reasoning and Recognition. Uncertainty and Plausible Reasoning

Intelligent Systems: Reasoning and Recognition James L. Crowley ENSIMAG 2 / MoSIG M1 Second Semester 2015/2016 Lesson 12 25 march 2015 Uncertainty and Plausible Reasoning MYCIN (continued)...2 Backward

### Life of A Knowledge Base (KB)

Life of A Knowledge Base (KB) A knowledge base system is a special kind of database management system to for knowledge base management. KB extraction: knowledge extraction using statistical models in NLP/ML

### Basic Probability Theory I

A Probability puzzler!! Basic Probability Theory I Dr. Tom Ilvento FREC 408 Our Strategy with Probability Generally, we want to get to an inference from a sample to a population. In this case the population

### A crash course in probability and Naïve Bayes classification

Probability theory A crash course in probability and Naïve Bayes classification Chapter 9 Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s

### Discrete Mathematics for CS Fall 2006 Papadimitriou & Vazirani Lecture 22

CS 70 Discrete Mathematics for CS Fall 2006 Papadimitriou & Vazirani Lecture 22 Introduction to Discrete Probability Probability theory has its origins in gambling analyzing card games, dice, roulette

### Cell Phone based Activity Detection using Markov Logic Network

Cell Phone based Activity Detection using Markov Logic Network Somdeb Sarkhel sxs104721@utdallas.edu 1 Introduction Mobile devices are becoming increasingly sophisticated and the latest generation of smart

### E3: PROBABILITY AND STATISTICS lecture notes

E3: PROBABILITY AND STATISTICS lecture notes 2 Contents 1 PROBABILITY THEORY 7 1.1 Experiments and random events............................ 7 1.2 Certain event. Impossible event............................

### Chapter 28. Bayesian Networks

Chapter 28. Bayesian Networks The Quest for Artificial Intelligence, Nilsson, N. J., 2009. Lecture Notes on Artificial Intelligence, Spring 2012 Summarized by Kim, Byoung-Hee and Lim, Byoung-Kwon Biointelligence

### Unit 19: Probability Models

Unit 19: Probability Models Summary of Video Probability is the language of uncertainty. Using statistics, we can better predict the outcomes of random phenomena over the long term from the very complex,

### People have thought about, and defined, probability in different ways. important to note the consequences of the definition:

PROBABILITY AND LIKELIHOOD, A BRIEF INTRODUCTION IN SUPPORT OF A COURSE ON MOLECULAR EVOLUTION (BIOL 3046) Probability The subject of PROBABILITY is a branch of mathematics dedicated to building models

### m (t) = e nt m Y ( t) = e nt (pe t + q) n = (pe t e t + qe t ) n = (qe t + p) n

1. For a discrete random variable Y, prove that E[aY + b] = ae[y] + b and V(aY + b) = a 2 V(Y). Solution: E[aY + b] = E[aY] + E[b] = ae[y] + b where each step follows from a theorem on expected value from

### Welcome to Stochastic Processes 1. Welcome to Aalborg University No. 1 of 31

Welcome to Stochastic Processes 1 Welcome to Aalborg University No. 1 of 31 Welcome to Aalborg University No. 2 of 31 Course Plan Part 1: Probability concepts, random variables and random processes Lecturer:

### Probability distributions

Probability distributions (Notes are heavily adapted from Harnett, Ch. 3; Hayes, sections 2.14-2.19; see also Hayes, Appendix B.) I. Random variables (in general) A. So far we have focused on single events,

### Combinatorics: The Fine Art of Counting

Combinatorics: The Fine Art of Counting Week 7 Lecture Notes Discrete Probability Continued Note Binomial coefficients are written horizontally. The symbol ~ is used to mean approximately equal. The Bernoulli

### Basic Probability. Probability: The part of Mathematics devoted to quantify uncertainty

AMS 5 PROBABILITY Basic Probability Probability: The part of Mathematics devoted to quantify uncertainty Frequency Theory Bayesian Theory Game: Playing Backgammon. The chance of getting (6,6) is 1/36.

### Random variables P(X = 3) = P(X = 3) = 1 8, P(X = 1) = P(X = 1) = 3 8.

Random variables Remark on Notations 1. When X is a number chosen uniformly from a data set, What I call P(X = k) is called Freq[k, X] in the courseware. 2. When X is a random variable, what I call F ()

### Course: Model, Learning, and Inference: Lecture 5

Course: Model, Learning, and Inference: Lecture 5 Alan Yuille Department of Statistics, UCLA Los Angeles, CA 90095 yuille@stat.ucla.edu Abstract Probability distributions on structured representation.

### Discrete Mathematics and Probability Theory Fall 2009 Satish Rao, David Tse Note 10

CS 70 Discrete Mathematics and Probability Theory Fall 2009 Satish Rao, David Tse Note 10 Introduction to Discrete Probability Probability theory has its origins in gambling analyzing card games, dice,

### The Calculus of Probability

The Calculus of Probability Let A and B be events in a sample space S. Partition rule: P(A) = P(A B) + P(A B ) Example: Roll a pair of fair dice P(Total of 10) = P(Total of 10 and double) + P(Total of

### Data Mining Chapter 6: Models and Patterns Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

Data Mining Chapter 6: Models and Patterns Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Models vs. Patterns Models A model is a high level, global description of a

### Part III: Machine Learning. CS 188: Artificial Intelligence. Machine Learning This Set of Slides. Parameter Estimation. Estimation: Smoothing

CS 188: Artificial Intelligence Lecture 20: Dynamic Bayes Nets, Naïve Bayes Pieter Abbeel UC Berkeley Slides adapted from Dan Klein. Part III: Machine Learning Up until now: how to reason in a model and

### Decision Trees and Networks

Lecture 21: Uncertainty 6 Today s Lecture Victor R. Lesser CMPSCI 683 Fall 2010 Decision Trees and Networks Decision Trees A decision tree is an explicit representation of all the possible scenarios from

### Lecture 11: Graphical Models for Inference

Lecture 11: Graphical Models for Inference So far we have seen two graphical models that are used for inference - the Bayesian network and the Join tree. These two both represent the same joint probability

### Querying Joint Probability Distributions

Querying Joint Probability Distributions Sargur Srihari srihari@cedar.buffalo.edu 1 Queries of Interest Probabilistic Graphical Models (BNs and MNs) represent joint probability distributions over multiple

### Lecture Note 1 Set and Probability Theory. MIT 14.30 Spring 2006 Herman Bennett

Lecture Note 1 Set and Probability Theory MIT 14.30 Spring 2006 Herman Bennett 1 Set Theory 1.1 Definitions and Theorems 1. Experiment: any action or process whose outcome is subject to uncertainty. 2.

### Math/Stats 425 Introduction to Probability. 1. Uncertainty and the axioms of probability

Math/Stats 425 Introduction to Probability 1. Uncertainty and the axioms of probability Processes in the real world are random if outcomes cannot be predicted with certainty. Example: coin tossing, stock

### Probabilistic Graphical Models

Probabilistic Graphical Models Raquel Urtasun and Tamir Hazan TTI Chicago April 4, 2011 Raquel Urtasun and Tamir Hazan (TTI-C) Graphical Models April 4, 2011 1 / 22 Bayesian Networks and independences

### Supervised Learning (Big Data Analytics)

Supervised Learning (Big Data Analytics) Vibhav Gogate Department of Computer Science The University of Texas at Dallas Practical advice Goal of Big Data Analytics Uncover patterns in Data. Can be used

### Model-based Synthesis. Tony O Hagan

Model-based Synthesis Tony O Hagan Stochastic models Synthesising evidence through a statistical model 2 Evidence Synthesis (Session 3), Helsinki, 28/10/11 Graphical modelling The kinds of models that

### Learning from Data: Naive Bayes

Semester 1 http://www.anc.ed.ac.uk/ amos/lfd/ Naive Bayes Typical example: Bayesian Spam Filter. Naive means naive. Bayesian methods can be much more sophisticated. Basic assumption: conditional independence.

### ST 371 (IV): Discrete Random Variables

ST 371 (IV): Discrete Random Variables 1 Random Variables A random variable (rv) is a function that is defined on the sample space of the experiment and that assigns a numerical variable to each possible

### Basic Probability Theory (I)

Basic Probability Theory (I) Intro to Bayesian Data Analysis & Cognitive Modeling Adrian Brasoveanu [partly based on slides by Sharon Goldwater & Frank Keller and John K. Kruschke] Fall 2012 UCSC Linguistics

### The Foundations: Logic and Proofs. Chapter 1, Part III: Proofs

The Foundations: Logic and Proofs Chapter 1, Part III: Proofs Rules of Inference Section 1.6 Section Summary Valid Arguments Inference Rules for Propositional Logic Using Rules of Inference to Build Arguments

### The sample space for a pair of die rolls is the set. The sample space for a random number between 0 and 1 is the interval [0, 1].

Probability Theory Probability Spaces and Events Consider a random experiment with several possible outcomes. For example, we might roll a pair of dice, flip a coin three times, or choose a random real

### Section 6-5 Sample Spaces and Probability

492 6 SEQUENCES, SERIES, AND PROBABILITY 52. How many committees of 4 people are possible from a group of 9 people if (A) There are no restrictions? (B) Both Juan and Mary must be on the committee? (C)

### Practice Problems for Homework #8. Markov Chains. Read Sections 7.1-7.3. Solve the practice problems below.

Practice Problems for Homework #8. Markov Chains. Read Sections 7.1-7.3 Solve the practice problems below. Open Homework Assignment #8 and solve the problems. 1. (10 marks) A computer system can operate

### Probability & Probability Distributions

Probability & Probability Distributions Carolyn J. Anderson EdPsych 580 Fall 2005 Probability & Probability Distributions p. 1/61 Probability & Probability Distributions Elementary Probability Theory Definitions

### Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Summary of Formulas and Concepts Descriptive Statistics (Ch. 1-4) Definitions Population: The complete set of numerical information on a particular quantity in which an investigator is interested. We assume

### Tagging with Hidden Markov Models

Tagging with Hidden Markov Models Michael Collins 1 Tagging Problems In many NLP problems, we would like to model pairs of sequences. Part-of-speech (POS) tagging is perhaps the earliest, and most famous,

### The Story. Probability - I. Plan. Example. A Probability Tree. Draw a probability tree.

Great Theoretical Ideas In Computer Science Victor Adamchi CS 5-5 Carnegie Mellon University Probability - I The Story The theory of probability was originally developed by a French mathematician Blaise

### What Is Probability?

1 What Is Probability? The idea: Uncertainty can often be "quantified" i.e., we can talk about degrees of certainty or uncertainty. This is the idea of probability: a higher probability expresses a higher

### MAT 1000. Mathematics in Today's World

MAT 1000 Mathematics in Today's World We talked about Cryptography Last Time We will talk about probability. Today There are four rules that govern probabilities. One good way to analyze simple probabilities

### + Section 6.2 and 6.3

Section 6.2 and 6.3 Learning Objectives After this section, you should be able to DEFINE and APPLY basic rules of probability CONSTRUCT Venn diagrams and DETERMINE probabilities DETERMINE probabilities

### Business Statistics 41000: Probability 1

Business Statistics 41000: Probability 1 Drew D. Creal University of Chicago, Booth School of Business Week 3: January 24 and 25, 2014 1 Class information Drew D. Creal Email: dcreal@chicagobooth.edu Office:

### Chapter 3: The basic concepts of probability

Chapter 3: The basic concepts of probability Experiment: a measurement process that produces quantifiable results (e.g. throwing two dice, dealing cards, at poker, measuring heights of people, recording

### Chapter 13 & 14 - Probability PART

Chapter 13 & 14 - Probability PART IV : PROBABILITY Dr. Joseph Brennan Math 148, BU Dr. Joseph Brennan (Math 148, BU) Chapter 13 & 14 - Probability 1 / 91 Why Should We Learn Probability Theory? Dr. Joseph

### Decision Making under Uncertainty

6.825 Techniques in Artificial Intelligence Decision Making under Uncertainty How to make one decision in the face of uncertainty Lecture 19 1 In the next two lectures, we ll look at the question of how

### CHAPTER 2 Estimating Probabilities

CHAPTER 2 Estimating Probabilities Machine Learning Copyright c 2016. Tom M. Mitchell. All rights reserved. *DRAFT OF January 24, 2016* *PLEASE DO NOT DISTRIBUTE WITHOUT AUTHOR S PERMISSION* This is a

### P (A) = lim P (A) = N(A)/N,

1.1 Probability, Relative Frequency and Classical Definition. Probability is the study of random or non-deterministic experiments. Suppose an experiment can be repeated any number of times, so that we

### Question 2 Naïve Bayes (16 points)

Question 2 Naïve Bayes (16 points) About 2/3 of your email is spam so you downloaded an open source spam filter based on word occurrences that uses the Naive Bayes classifier. Assume you collected the

### Chapter 4 Probability

The Big Picture of Statistics Chapter 4 Probability Section 4-2: Fundamentals Section 4-3: Addition Rule Sections 4-4, 4-5: Multiplication Rule Section 4-7: Counting (next time) 2 What is probability?

### Probability Theory. Elementary rules of probability Sum rule. Product rule. p. 23

Probability Theory Uncertainty is key concept in machine learning. Probability provides consistent framework for the quantification and manipulation of uncertainty. Probability of an event is the fraction

### Probability. A random sample is selected in such a way that every different sample of size n has an equal chance of selection.

1 3.1 Sample Spaces and Tree Diagrams Probability This section introduces terminology and some techniques which will eventually lead us to the basic concept of the probability of an event. The Rare Event

### MATH 140 Lab 4: Probability and the Standard Normal Distribution

MATH 140 Lab 4: Probability and the Standard Normal Distribution Problem 1. Flipping a Coin Problem In this problem, we want to simualte the process of flipping a fair coin 1000 times. Note that the outcomes

### Statistics in Geophysics: Introduction and Probability Theory

Statistics in Geophysics: Introduction and Steffen Unkel Department of Statistics Ludwig-Maximilians-University Munich, Germany Winter Term 2013/14 1/32 What is Statistics? Introduction Statistics is the

### PROBABILITY 14.3. section. The Probability of an Event

4.3 Probability (4-3) 727 4.3 PROBABILITY In this section In the two preceding sections we were concerned with counting the number of different outcomes to an experiment. We now use those counting techniques

### Basics of Statistical Machine Learning

CS761 Spring 2013 Advanced Machine Learning Basics of Statistical Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu Modern machine learning is rooted in statistics. You will find many familiar

### CONTINGENCY (CROSS- TABULATION) TABLES

CONTINGENCY (CROSS- TABULATION) TABLES Presents counts of two or more variables A 1 A 2 Total B 1 a b a+b B 2 c d c+d Total a+c b+d n = a+b+c+d 1 Joint, Marginal, and Conditional Probability We study methods

### Machine Learning. CS 188: Artificial Intelligence Naïve Bayes. Example: Digit Recognition. Other Classification Tasks

CS 188: Artificial Intelligence Naïve Bayes Machine Learning Up until now: how use a model to make optimal decisions Machine learning: how to acquire a model from data / experience Learning parameters

### Extending the Role of Causality in Probabilistic Modeling

Extending the Role of Causality in Probabilistic Modeling Joost Vennekens, Marc Denecker, Maurice Bruynooghe {joost, marcd, maurice}@cs.kuleuven.be K.U. Leuven, Belgium Causality Causality is central concept

### Bayesian Updating with Discrete Priors Class 11, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom

1 Learning Goals Bayesian Updating with Discrete Priors Class 11, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom 1. Be able to apply Bayes theorem to compute probabilities. 2. Be able to identify

### Lecture 1 Introduction Properties of Probability Methods of Enumeration Asrat Temesgen Stockholm University

Lecture 1 Introduction Properties of Probability Methods of Enumeration Asrat Temesgen Stockholm University 1 Chapter 1 Probability 1.1 Basic Concepts In the study of statistics, we consider experiments

### A Probabilistic Causal Model for Diagnosis of Liver Disorders

Intelligent Information Systems VII Proceedings of the Workshop held in Malbork, Poland, June 15-19, 1998 A Probabilistic Causal Model for Diagnosis of Liver Disorders Agnieszka Oniśko 1, Marek J. Druzdzel

### Question: What is the probability that a five-card poker hand contains a flush, that is, five cards of the same suit?

ECS20 Discrete Mathematics Quarter: Spring 2007 Instructor: John Steinberger Assistant: Sophie Engle (prepared by Sophie Engle) Homework 8 Hints Due Wednesday June 6 th 2007 Section 6.1 #16 What is the

### Finite and discrete probability distributions

8 Finite and discrete probability distributions To understand the algorithmic aspects of number theory and algebra, and applications such as cryptography, a firm grasp of the basics of probability theory

### Chapter 4. Probability and Probability Distributions

Chapter 4. robability and robability Distributions Importance of Knowing robability To know whether a sample is not identical to the population from which it was selected, it is necessary to assess the

### Probabilities. Probability of a event. From Random Variables to Events. From Random Variables to Events. Probability Theory I

Victor Adamchi Danny Sleator Great Theoretical Ideas In Computer Science Probability Theory I CS 5-25 Spring 200 Lecture Feb. 6, 200 Carnegie Mellon University We will consider chance experiments with

### 5 Directed acyclic graphs

5 Directed acyclic graphs (5.1) Introduction In many statistical studies we have prior knowledge about a temporal or causal ordering of the variables. In this chapter we will use directed graphs to incorporate

### MAS108 Probability I

1 QUEEN MARY UNIVERSITY OF LONDON 2:30 pm, Thursday 3 May, 2007 Duration: 2 hours MAS108 Probability I Do not start reading the question paper until you are instructed to by the invigilators. The paper

### CHAPTER 3: PROBABILITY TOPICS

CHAPTER 3: PROBABILITY TOPICS Exercise 1. In a particular college class, there are male and female students. Some students have long hair and some students have short hair. Write the symbols for the probabilities