Learning from Data: Naive Bayes

Similar documents
Learning from Data: Naive Bayes

Question 2 Naïve Bayes (16 points)

1 Maximum likelihood estimation

Part III: Machine Learning. CS 188: Artificial Intelligence. Machine Learning This Set of Slides. Parameter Estimation. Estimation: Smoothing

CSE 473: Artificial Intelligence Autumn 2010

Machine Learning. CS 188: Artificial Intelligence Naïve Bayes. Example: Digit Recognition. Other Classification Tasks

On Attacking Statistical Spam Filters

Bayes and Naïve Bayes. cs534-machine Learning

Basics of Statistical Machine Learning

Spam Filtering based on Naive Bayes Classification. Tianhao Sun

Bayesian Spam Detection

Spam Filtering using Naïve Bayesian Classification

Model-based Synthesis. Tony O Hagan

T : Classification as Spam or Ham using Naive Bayes Classifier. Santosh Tirunagari :

Machine Learning Final Project Spam Filtering

Prediction of Heart Disease Using Naïve Bayes Algorithm

Introduction to Learning & Decision Trees

Machine Learning in Spam Filtering

CS 348: Introduction to Artificial Intelligence Lab 2: Spam Filtering

Semi-Supervised Support Vector Machines and Application to Spam Filtering

CSCI567 Machine Learning (Fall 2014)

Evaluation of Machine Learning Techniques for Green Energy Prediction

Bayesian Networks. Read R&N Ch Next lecture: Read R&N

MACHINE LEARNING IN HIGH ENERGY PHYSICS

Cell Phone based Activity Detection using Markov Logic Network

Projektgruppe. Categorization of text documents via classification

Attribution. Modified from Stuart Russell s slides (Berkeley) Parts of the slides are inspired by Dan Klein s lecture material for CS 188 (Berkeley)

Spam Filtering with Naive Bayesian Classification

Data Modeling & Analysis Techniques. Probability & Statistics. Manfred Huber

Anti-Spam Filter Based on Naïve Bayes, SVM, and KNN model

Machine Learning.

CS 2750 Machine Learning. Lecture 1. Machine Learning. CS 2750 Machine Learning.

Big Data Visualisations. Professor Ian Nabney NCRG

Classification Problems

Christfried Webers. Canberra February June 2015

Detecting client-side e-banking fraud using a heuristic model

Volume 4, Issue 1, January 2016 International Journal of Advance Research in Computer Science and Management Studies

Business Statistics 41000: Probability 1

Introduction to Bayesian Classification (A Practical Discussion) Todd Holloway Lecture for B551 Nov. 27, 2007

Less naive Bayes spam detection

A Logistic Regression Approach to Ad Click Prediction

Supervised Learning (Big Data Analytics)

Bayesian Spam Filtering

The Basics of Graphical Models

Reasoning Component Architecture

How To Filter s On Gmail On A Computer Or A Computer (For Free)

Sentiment analysis using emoticons

Multi-Class and Structured Classification

Homework 4 Statistics W4240: Data Mining Columbia University Due Tuesday, October 29 in Class

Spam Detection A Machine Learning Approach

Machine Learning for Naive Bayesian Spam Filter Tokenization

Introduction to Machine Learning Lecture 1. Mehryar Mohri Courant Institute and Google Research

Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification

CMPSCI 240: Reasoning about Uncertainty

Creating a NL Texas Hold em Bot

International Journal of Research in Advent Technology Available Online at:

CAS-ICT at TREC 2005 SPAM Track: Using Non-Textual Information to Improve Spam Filtering Performance

Spam filtering. Peter Likarish Based on slides by EJ Jung 11/03/10

1 Introductory Comments. 2 Bayesian Probability

Bayesian Statistics: Indian Buffet Process

CCNY. BME I5100: Biomedical Signal Processing. Linear Discrimination. Lucas C. Parra Biomedical Engineering Department City College of New York

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Fast Statistical Spam Filter by Approximate Classifications

MS1b Statistical Data Mining

Tagging with Hidden Markov Models

Tracking Groups of Pedestrians in Video Sequences

Object Recognition. Selim Aksoy. Bilkent University

Chapter ML:IV. IV. Statistical Learning. Probability Basics Bayes Classification Maximum a-posteriori Hypotheses

L4: Bayesian Decision Theory

Hidden Markov Models in Bioinformatics. By Máthé Zoltán Kőrösi Zoltán 2006

Robust Personalizable Spam Filtering Via Local and Global Discrimination Modeling

Linear Classification. Volker Tresp Summer 2015

Estimating Probability Distributions

Digital System Design Prof. D Roychoudhry Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

Newton s Second Law. ΣF = m a. (1) In this equation, ΣF is the sum of the forces acting on an object, m is the mass of

large-scale machine learning revisited Léon Bottou Microsoft Research (NYC)

Defending Networks with Incomplete Information: A Machine Learning Approach. Alexandre

Robust personalizable spam filtering via local and global discrimination modeling

Life of A Knowledge Base (KB)

Machine Learning Overview

McKinsey Problem Solving Test Top Tips

CHAPTER 2 Estimating Probabilities

Microsoft Azure Machine learning Algorithms

Discrete Structures for Computer Science

Scala type classes and machine learning. David Andrzejewski Bay Area Scala Enthusiasts - 1/14/2013, 1/16/2013

Linear Models for Classification

A Three-Way Decision Approach to Spam Filtering

Compression algorithm for Bayesian network modeling of binary systems

LCs for Binary Classification

A Content based Spam Filtering Using Optical Back Propagation Technique

An evolutionary learning spam filter system

1A Rate of reaction. AS Chemistry introduced the qualitative aspects of rates of reaction. These include:

Discrete Frobenius-Perron Tracking

Sampling via Moment Sharing: A New Framework for Distributed Bayesian Inference for Big Data

Psychology and Economics (Lecture 17)

How To Understand The Data Mining Process Of Andsl

A survey on click modeling in web search

Statistics Graduate Courses

Insurance Analytics - analýza dat a prediktivní modelování v pojišťovnictví. Pavel Kříž. Seminář z aktuárských věd MFF 4.

Homework 1 (Time, Synchronization and Global State) Points

Transcription:

October 13, 2005 http://www.anc.ed.ac.uk/ amos/lfd/

Naive Bayes Typical example: Bayesian Spam Filter. Naive means naive. Bayesian methods can be much more sophisticated. Basic assumption: conditional independence. Given the class (eg Spam, Ham ), whether one data item (eg word) appears is independent of whether another appears. Invariably wrong! But useful anyway.

Why? Easy to program. Simple and transparent. Fast to train. Fast to use. Can deal with uncertainty. Probabilistic.

Data types Naive Bayes assumption can use both continuous and discrete data. However generally understood in terms of discrete data. Binary and discrete very common. Do not use 1 of M! E.g. Bag of words assumption for text classification: Can even mix different types of data

Bag of Words Each document is represented by a large vector. Each element of the vector represents the presence (1) or absence (0) of a particular word in the document. Certain words are more common in one document type than another. Can build another form of class conditional model using the conditional probability of seeing each word, given the document class (e.g. ham/spam).

Conditional Independence P(X, Y ) = P(X)P(Y X). P(X, Y C) = P(X C)P(Y X, C). Think of C as a class label. The above is always true. However we can make an assumption P(Y X, C) = P(Y C). Knowing about the value of X makes no difference to the value Y takes so long as we know the class C. We say that X and Y are conditionally independent given C.

Example Probability of a person hitting Jim (J) and a person hitting Michael (M) is most likely not independent. But they might be independent given that the person in question is (or is not) a known member of the class of bullies (B). P(J, M) P(J)P(M) P(J, M B) = P(J B)P(M B). B explains all of the dependence between J and M.

Generally Naive Bayes x 1, x 2,..., x n are said to be conditionally independent given c iff n P(x c) = P(x i c) for x = (x 1, x 2,,..., x n ). For example. We could have not just Jim and Michael, but Bob, Richard and Tim too. i=1

Naive Bayes The equation on the previous slide is in fact the Naive Bayes Model. n P(x c) = P(x i c) for x = (x 1, x 2,,..., x n ). The x is our attribute vector. And the c is our class label. i=1 We want to learn P(c) and P(x i c) from the data. We then want to find the best choice of c corresponding to a new datum (inference) The form of P(x i c) is usually given. But we do need to learn the parameter.

Working Example See sheet section 3. Have a set of attributes. Inference first: Bayes rule. Learning the model P(E),P(S),P(x S),P(x E) Naive Bayes assumption.

Problems with Naive Bayes 1 of M encoding Failed conditional independence assumptions. Worst case: repeated attribute. Double counted, triple counted etc. Conditionally dependent attributes can have too much influence.

Spam Example Bag of words. Probability of ham containing each word. Probability of spam containing each word. Prior probability of ham/spam. New document. Check the presence/absence of each word. Calculate the spam probability given the vector of word occurrence. How best to fool Naive Bayes? Introduce lots of hammy words into the document. Each hammy word is viewed independently and so they repeatedly count towards the ham probability.

Summary Conditional Independence Bag of Words Naive Bayes Learning Parameters Bayes Rule Working Examples