Discrete Structures for Computer Science

Similar documents
6.3 Conditional Probability and Independence

1 Introductory Comments. 2 Bayesian Probability

Math 141. Lecture 2: More Probability! Albyn Jones 1. jones/courses/ Library 304. Albyn Jones Math 141

Chapter What is the probability that a card chosen from an ordinary deck of 52 cards is an ace? Ans: 4/52.

Week 2: Conditional Probability and Bayes formula

Bayesian Tutorial (Sheet Updated 20 March)

The Basics of Graphical Models

Lesson 1. Basics of Probability. Principles of Mathematics 12: Explained! 314

V. RANDOM VARIABLES, PROBABILITY DISTRIBUTIONS, EXPECTED VALUE

Bayesian Updating with Discrete Priors Class 11, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom

If, under a given assumption, the of a particular observed is extremely. , we conclude that the is probably not

Chapter 4 & 5 practice set. The actual exam is not multiple choice nor does it contain like questions.

Basic Probability Theory II

Probability: Terminology and Examples Class 2, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom

Part III: Machine Learning. CS 188: Artificial Intelligence. Machine Learning This Set of Slides. Parameter Estimation. Estimation: Smoothing

Problem sets for BUEC 333 Part 1: Probability and Statistics

Spam Filtering using Naïve Bayesian Classification

Chapter 4 - Practice Problems 1

Economics 1011a: Intermediate Microeconomics

Bayes and Naïve Bayes. cs534-machine Learning

Math 55: Discrete Mathematics

Pattern matching probabilities and paradoxes A new variation on Penney s coin game

Question: What is the probability that a five-card poker hand contains a flush, that is, five cards of the same suit?

Recursive Estimation

Adjust Webmail Spam Settings

1 Introduction to Option Pricing

Name: Date: Use the following to answer questions 2-4:

CSE 473: Artificial Intelligence Autumn 2010

36 Odds, Expected Value, and Conditional Probability

Spam Detection A Machine Learning Approach

Conditional Probability, Independence and Bayes Theorem Class 3, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom

REPEATED TRIALS. The probability of winning those k chosen times and losing the other times is then p k q n k.

MAS113 Introduction to Probability and Statistics

p-values and significance levels (false positive or false alarm rates)

Introduction to Bayesian Classification (A Practical Discussion) Todd Holloway Lecture for B551 Nov. 27, 2007

Question 2 Naïve Bayes (16 points)

Random variables, probability distributions, binomial random variable

HOW TO: Use the UWITC Barracuda Spam Filter System

How To Check For Differences In The One Way Anova

Solutions: Problems for Chapter 3. Solutions: Problems for Chapter 3

FEGYVERNEKI SÁNDOR, PROBABILITY THEORY AND MATHEmATICAL

Chapter 3: DISCRETE RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS. Part 3: Discrete Uniform Distribution Binomial Distribution

Introductory Probability. MATH 107: Finite Mathematics University of Louisville. March 5, 2014

Example. A casino offers the following bets (the fairest bets in the casino!) 1 You get $0 (i.e., you can walk away)

Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur

Math 58. Rumbos Fall Solutions to Review Problems for Exam 2

If n is odd, then 3n + 7 is even.

Introduction to. Hypothesis Testing CHAPTER LEARNING OBJECTIVES. 1 Identify the four steps of hypothesis testing.

Webmail Friends & Exceptions Guide

A natural introduction to probability theory. Ronald Meester

Economics 1011a: Intermediate Microeconomics

calculating probabilities

2DI36 Statistics. 2DI36 Part II (Chapter 7 of MR)

PROBABILITY. The theory of probabilities is simply the Science of logic quantitatively treated. C.S. PEIRCE

Savita Teli 1, Santoshkumar Biradar 2

Breaking The Code. Ryan Lowe. Ryan Lowe is currently a Ball State senior with a double major in Computer Science and Mathematics and

Chapter 5. Discrete Probability Distributions

For two disjoint subsets A and B of Ω, say that A and B are disjoint events. For disjoint events A and B we take an axiom P(A B) = P(A) + P(B)

Lecture 9: Bayesian hypothesis testing

E3: PROBABILITY AND STATISTICS lecture notes

Likelihood: Frequentist vs Bayesian Reasoning

Inner product. Definition of inner product

Setting up Microsoft Outlook to reject unsolicited (UCE or Spam )

The sample space for a pair of die rolls is the set. The sample space for a random number between 0 and 1 is the interval [0, 1].

Expected Value. 24 February Expected Value 24 February /19

Lecture 13. Understanding Probability and Long-Term Expectations

Handling Unsolicited Commercial (UCE) or spam using Microsoft Outlook at Staffordshire University

Version 5.x. Barracuda Spam & Virus Firewall User s Guide. Barracuda Networks Inc S. Winchester Blvd Campbell, CA

7. Normal Distributions

Tweaking Naïve Bayes classifier for intelligent spam detection

Statistics 100A Homework 2 Solutions

eprism Security Appliance 6.0 Intercept Anti-Spam Quick Start Guide

Introduction to Game Theory IIIii. Payoffs: Probability and Expected Utility

Data Modeling & Analysis Techniques. Probability & Statistics. Manfred Huber

1 Maximum likelihood estimation

Machine Learning. CS 188: Artificial Intelligence Naïve Bayes. Example: Digit Recognition. Other Classification Tasks

Conditional Probability, Hypothesis Testing, and the Monty Hall Problem

Lectures notes on orthogonal matrices (with exercises) Linear Algebra II - Spring 2004 by D. Klain

Spam Filtering based on Naive Bayes Classification. Tianhao Sun

How To Manage Your Quarantine On A Blackberry.Com

Groundbreaking Technology Redefines Spam Prevention. Analysis of a New High-Accuracy Method for Catching Spam

LECTURE 16. Readings: Section 5.1. Lecture outline. Random processes Definition of the Bernoulli process Basic properties of the Bernoulli process

Discrete Mathematics and Probability Theory Fall 2009 Satish Rao, David Tse Note 10

1. (First passage/hitting times/gambler s ruin problem:) Suppose that X has a discrete state space and let i be a fixed state. Let

If A is divided by B the result is 2/3. If B is divided by C the result is 4/7. What is the result if A is divided by C?

STA 371G: Statistics and Modeling

Review. March 21, S7.1 2_3 Estimating a Population Proportion. Chapter 7 Estimates and Sample Sizes. Test 2 (Chapters 4, 5, & 6) Results

Session 8 Probability

AP: LAB 8: THE CHI-SQUARE TEST. Probability, Random Chance, and Genetics

Ch. 13.3: More about Probability

Chapter 4 Lecture Notes

A Second Course in Mathematics Concepts for Elementary Teachers: Theory, Problems, and Solutions

Chapter 8 Hypothesis Testing Chapter 8 Hypothesis Testing 8-1 Overview 8-2 Basics of Hypothesis Testing

Contemporary Mathematics Online Math 1030 Sample Exam I Chapters No Time Limit No Scratch Paper Calculator Allowed: Scientific

Probability definitions

WHERE DOES THE 10% CONDITION COME FROM?

α = u v. In other words, Orthogonal Projection

University of Chicago Graduate School of Business. Business 41000: Business Statistics

Transcription:

Discrete Structures for Computer Science Adam J. Lee adamlee@cs.pitt.edu 6111 Sennott Square Lecture #20: Bayes Theorem November 5, 2013

How can we incorporate prior knowledge? Sometimes we want to know the probability of some event given that another event has occurred. Example: A fair coin is flipped three times. The first flip turns up tails. Given this information, what is the probability that an odd number of tails appear in the three flips? Solution: l Let F = the first flip of three comes up tails l Let E = tails comes up an odd number of times in three flips l Since F has happened, S is reduced to {THH, THT, TTH, TTT} l By Laplace s definition of probability, we know: l p(e) = E / S l = {THH, TTT} / {THH, THT, TTH, TTT} l = 2/4 l = 1/2

Conditional Probability Definition: Let E and F be events with p(f) > 0. The conditional probability of E given F, denoted p(e F), is defined as: Intuition: l Think of the event F as reducing the sample space that can be considered l The numerator looks at the likelihood of the outcomes in E that overlap those in F l The denominator accounts for the reduction in sample size indicated by our prior knowledge that F has occurred

Bayes Theorem Bayes Theorem allows us to relate the conditional and marginal probabilities of two random events. In English: Bayes Theorem will help us assess the probability that an event occurred given only partial evidence. Doesn t our formula for conditional probability do this already? We can t always use this formula directly

A Motivating Example Suppose that a certain opium test correctly identifies a person who uses opiates as testing positive 99% of the time, and will correctly identify a non-user as testing negative 99% of the time. If a company suspects that 0.5% of its employees are opium users, what is the probability that an employee that tests positive for this drug is actually a user? Question: Can we use our simple conditional probability formula? X is a user X tested positive

The 1,000 foot view In situations like those on the last slide, Bayes theorem can help! Essentially, Bayes theorem will allow us to calculate P(E F) assuming that we know (or can derive): l P(E) l P(F E) l P(F E) Returning to our earlier example: Probability that X is a user Test success rate Probability that X is an opium user given a positive test Test false positive rate l Let E = Person X is an opium user l Let F = Person X tested positive for opium It looks like Bayes Theorem could help in this case

New Notation Today, we will use the notation E C to denote the complementary event of E That is: E = E C

A Simple Example We have two boxes. The first contains two green balls and seven red balls. The second contains four green balls and three red balls. Bob selects a ball by first choosing a box at random. He then selects one of the balls from that box at random. If Bob has selected a red ball, what is the probability that he took it from the first box? 1 2

Picking the problem apart First, let s define a few events relevant to this problem: l Let E = Bob has chosen a red ball l By definition E C = Bob has chosen a green ball l Let F = Bob chose his ball from this first box l Therefore, F C = Bob chose his ball from the second box We want to find the probability that Bob chose from the first box, given that he picked a red ball. That is, we want p(f E). Goal: Given that p(f E) = p(f E)/p(E), use what we know to derive p(f E) and p(e).

What do we know? We have two boxes. The first contains two green balls and seven red balls. The second contains four green balls and three red balls. Bob selects a ball by first choosing a box at random. He then selects one of the balls from that box at random. If Bob has selected a red ball, what is the probability that he took it from the first box? Statement: Bob selects a ball by first choosing a box at random l Bob is equally likely to choose the first box, or the second box l p(f) = p(f C ) = 1/2 Statement: The first contains two green balls and seven red balls l The first box has nine balls, seven of which are red l p(e F) = 7/9 Statement: The second contains four green balls and three red balls l The second box contains seven balls, three of which are red l p(e F C ) = 3/7

Now, for a little algebra The end goal: Compute p(f E) = p(f E)/p(E) Note that p(e F) = p(e F)/p(F) l If we multiply by p(f), we get p(e F) = p(e F)p(F) l Further, we know that p(e F) = 7/9 and p(f) = 1/2 l So p(e F) = 7/9 1/2 = 7/18 Recall: p(f) = p(f C ) = 1/2 p(e F) = 7/9 p(e F C ) = 3/7 Similarly, p(e F C ) = p(e F C )p(f C ) = 3/7 1/2 = 3/14 Observation: E = (E F) (E F C ) l This means that p(e) = p(e F) + p(e F C ) l = 7/18 + 3/14 l = 49/126 + 27/126 l = 76/126 l = 38/63

Denouement The end goal: Compute p(f E) = p(f E)/p(E) So, p(f E) = (7/18) / (38/63) 0.645 Recall: p(f) = p(f C ) = 1/2 p(e F) = 7/9 p(e F C ) = 3/7 p(e F) = 7/18 p(e) = 38/63 How did we get here? 1. Extract what we could from the problem definition itself 2. Rearrange terms to derive p(f E) and p(e) 3. Use our trusty definition of conditional probability to do the rest!

The reasoning that we used in the last problem essentially derives Bayes Theorem for us! Bayes Theorem: Suppose that E and F are events from some sample space S such that p(e) 0 and p(f) 0. Then: Proof: l The definition of conditional probability says that p(f E) = p(f E)/p(E) p(e F) = p(e F)/p(F) l This means that p(e F) = p(f E)p(E) p(e F) = p(e F)p(F) l So p(f E)p(E) = p(e F)p(F) l Therefore, p(f E) = p(e F)p(F)/p(E)

Proof (continued) Note: To finish, we must prove p(e) = p(e F)p(F) + p(e F C )p(f C ) l Observe that E = E S l = E (F F C ) l = (E F) (E F C ) l Note also that (E F) and (E F C ) are disjoint (i.e., no x can be in both F and F C ) l This means that p(e) = p(e F) + p(e F C ) l We already have shown that p(e F) = p(e F)p(F) l Further, since p(e F C ) = p(e F C )/p(f C ), we have that p(e F C ) = p(e F C )p(f C ) l So p(e) = p(e F) + p(e F C ) = p(e F)p(F) + p(e F C )p(f C ) Putting everything together, we get:

And why is this useful? In a nutshell, Bayes Thereom is useful if you want to find p(f E), but you don t know p(e F) or p(e).

Here s a general solution tactic Step 1: Identify the independent events that are being investigated. For example: l F = Bob chooses the first box, F C = Bob chooses the second box l E = Bob chooses a red ball, E C = Bob chooses a green ball Step 2: Record the probabilities identified in the problem statement. For example: l p(f) = p(f C ) = 1/2 l p(e F) = 7/9 l p(e F C ) = 3/7 Step 3: Plug into Bayes formula and solve

Example: Pants and Skirts Suppose there is a co-ed school having 60% boys and 40% girls as students. The girl students wear trousers or skirts in equal numbers; the boys all wear trousers. An observer sees a (random) student from a distance; all they can see is that this student is wearing trousers. What is the probability this student is a girl? Step 1: Set up events l E = X is wearing pants l E C = X is wearing a skirt l F = X is a girl l F C = X is a boy Step 2: Extract probabilities from problem definition l p(f) = 0.4 l p(f C ) = 0.6 l p(e F) = p(e C F) = 0.5 l p(e F C ) = 1

Pants and Skirts (continued) Step 3: Plug in to Bayes Theorem l p(f E) = (0.5 0.4)/(0.5 0.4 + 1 0.6) l = 0.2/0.8 l = 1/4 Recall: p(f) = 0.4 p(f C ) = 0.6 p(e F) = p(e C F) = 0.5 p(e F C ) = 1 Conclusion: There is a 25% chance that the person seen was a girl, given that they were wearing pants.

Drug screening, revisited Suppose that a certain opium test correctly identifies a person who uses opiates as testing positive 99% of the time, and will correctly identify a non-user as testing negative 99% of the time. If a company suspects that 0.5% of its employees are opium users, what is the probability that an employee that tests positive for this drug is actually a user? Step 1: Set up events l F = X is an opium user l l l F C = X is not an opium user E = X tests positive for opiates E C = X tests negative for opiates Step 2: Extract probabilities from problem definition l p(f) = 0.005 l p(f C ) = 0.995 l p(e F) = 0.99 l p(e F C ) = 0.01

Drug screening (continued) Step 3: Plug in to Bayes Theorem l p(f E) = (0.99 0.005)/(0.99 0.005 + 0.01 0.995) l = 0.3322 Recall: p(f) = 0.005 p(f C ) = 0.995 p(e F) = 0.99 p(e F C ) = 0.01 Conclusion: If an employee tests positive for opiate use, there is only a 33% chance that they are actually an opium user!

Group Work! Suppose that 1 person in 100,000 has a particular rare disease. A diagnostic test is correct 99% of the time when given to someone with the disease, and is correct 99.5% of the time when given to someone without the disease. Problem 1: Calculate the probability that someone who tests positive for the disease actually has it. Problem 2: Calculate the probability that someone who tests negative for the disease does not have the disease.

Application: Spam filtering Definition: Spam is unsolicited bulk email I didn t ask for it, I probably don t want it Sent to lots of people In recent years, spam has become increasingly problematic. For example, in 2006: l Spam accounted for > 40% of all email sent l This is over 12.4 billion spam messages! To combat this problem, people have developed spam filters based on Bayes theorem!

How does a Bayesian spam filter work? Essentially, these filters determine the probability that a message is spam, given that it contains certain keywords. Message is spam Message contains questionable keyword In the above equation: l p(e F) = Probability that our keyword occurs in spam messages l p(e F C ) = Probability that our keyword occurs in legitimate messages l p(f) = Probability that an arbitrary message is spam l p(f C ) = Probability that an arbitrary message is legitimate Question: How do we derive these parameters?

We can learn these parameters by examining historical email traces Imagine that we have a corpus of email messages We can ask a few intelligent questions to learn the parameters of our Bayesian filter: l How many of these messages do we consider spam? p(f) l In the spam messages, how often does our keyword appear? p(e F) l In the good messages, how often does our keyword appear? p(e F C ) Aside: This is what happens every time you click the mark as spam button in your email client! Given this information, we can apply Bayes theorem!

Filtering spam using a single keyword Suppose that the keyword Rolex occurs in 250 of 2000 known spam messages, and in 5 of 1000 known good messages. Estimate the probability that an incoming message containing the word Rolex is spam, assuming that it is equally likely that an incoming message is spam or not spam. If our threshold for classifying a message as spam is 0.9, will we reject this message? Step 1: Define events l F = message is spam l F C = message is good l E = message contains the keyword Rolex l E C = message does not contain the keyword Rolex Step 2: Gather probabilities from the problem statement l p(f) = p(f C )= 0.5 l p(e F) = 250/2000 = 0.125 l p(e F C ) = 5/1000 = 0.005

Spam Rolexes (continued) Step 3: Plug in to Bayes Theorem l p(f E) = (0.125 0.5)/(0.125 0.5 + 0.005 0.5) l = 0.125/(0.125 + 0.005) l 0.962 Recall: p(f) = p(f C ) = 0.5 p(e F) = 0.125 p(e F C ) = 0.005 Conclusion: Since the probability that our message is spam given that it contains the string Rolex is approximately 0.962 > 0.9, we will discard the message.

Problems with this simple filter How would you choose a single keyword/phrase to use? l All natural l Nigeria l Click here l Users get upset if false positives occur, i.e., if legitimate messages are incorrectly classified as spam l When was the last time you checked your spam folder? How can we fix this? l Choose keywords s.t. p(spam keyword) is very high or very low l Filter based on multiple keywords

Specifically, we want to develop a Bayesian filter that tells us p(f E 1 E 2 ) First, some assumptions 1. Events E 1 and E 2 are independent 2. The events E 1 S and E 2 S are independent 3. p(f) = p(f C ) = 0.5 By Bayes theorem Now, let s derive formula for this p(f E 1 E 2 ) Assumption 3 Assumptions 1 and 2

Spam filtering on two keywords Suppose that we train a Bayesian spam filter on a set of 2000 spam messages and 1000 messages that are not spam. The word stock appears in 400 spam messages and 60 good messages, and the word undervalued appears in 200 spam messages and 25 good messages. Estimate the probability that a message containing the words stock and undervalued is spam. Will we reject this message if our spam threshold is set at 0.9? Step 1: Set up events l F = message is spam, F C = message is good l l E 1 = message contains the word stock E 2 = message contains the word undervalued Step 2: Identify probabilities l P(E 1 F) = 400/2000 = 0.2 l p(e 1 F C ) = 60/1000 = 0.06 l p(e 2 F) = 200/2000 = 0.1 l p(e 2 F C ) = 25/1000 = 0.025

Two keywords (continued) p(f E 1 E 2 )= Step 3: Plug in to Bayes Theorem l p(f E 1 E 2 ) = (0.2 0.1)/(0.2 0.1 + 0.06 0.025) l = 0.02/(0.02 + 0.0015) l 0.9302 p(e 1 F )p(e 2 F ) p(e 1 F )p(e 2 F )+p(e 1 F C )p(e 2 F C ) Recall: p(e 1 F) = 0.2 p(e 1 F C ) = 0.06 p(e 2 F) = 0.1 p(e 2 F C ) = 0.025 Conclusion: Since the probability that our message is spam given that it contains the strings stock and undervalued is 0.9302 > 0.9, we will reject this message.

Final Thoughts n Conditional probability is very useful n Bayes theorem l Helps us assess conditional probabilities l Has a range of important applications n Next time: l Expected values and variance (Section 7.4)