Learning from Data: Naive Bayes



Similar documents
Question 2 Naïve Bayes (16 points)

1 Maximum likelihood estimation

Part III: Machine Learning. CS 188: Artificial Intelligence. Machine Learning This Set of Slides. Parameter Estimation. Estimation: Smoothing

CSE 473: Artificial Intelligence Autumn 2010

CSCI567 Machine Learning (Fall 2014)

Machine Learning. CS 188: Artificial Intelligence Naïve Bayes. Example: Digit Recognition. Other Classification Tasks

Spam Filtering using Naïve Bayesian Classification

Bayes and Naïve Bayes. cs534-machine Learning

The Basics of Graphical Models

Machine Learning in Spam Filtering

Data Modeling & Analysis Techniques. Probability & Statistics. Manfred Huber

T : Classification as Spam or Ham using Naive Bayes Classifier. Santosh Tirunagari :

Spam Filtering based on Naive Bayes Classification. Tianhao Sun

MACHINE LEARNING IN HIGH ENERGY PHYSICS

Cell Phone based Activity Detection using Markov Logic Network

Attribution. Modified from Stuart Russell s slides (Berkeley) Parts of the slides are inspired by Dan Klein s lecture material for CS 188 (Berkeley)

Supervised Learning (Big Data Analytics)

Machine Learning Final Project Spam Filtering

Spam Filtering with Naive Bayesian Classification

Anti-Spam Filter Based on Naïve Bayes, SVM, and KNN model

1 Introductory Comments. 2 Bayesian Probability

Prediction of Heart Disease Using Naïve Bayes Algorithm

Classification Problems

Christfried Webers. Canberra February June 2015

Introduction to Bayesian Classification (A Practical Discussion) Todd Holloway Lecture for B551 Nov. 27, 2007

CS 348: Introduction to Artificial Intelligence Lab 2: Spam Filtering

Projektgruppe. Categorization of text documents via classification

Basics of Statistical Machine Learning

Less naive Bayes spam detection

Machine Learning.

Bayesian Spam Filtering

Bayesian Networks. Read R&N Ch Next lecture: Read R&N

How To Filter s On Gmail On A Computer Or A Computer (For Free)

Defending Networks with Incomplete Information: A Machine Learning Approach. Alexandre

Multi-Class and Structured Classification

Homework 4 Statistics W4240: Data Mining Columbia University Due Tuesday, October 29 in Class

Bayesian Spam Detection

Semi-Supervised Support Vector Machines and Application to Spam Filtering

Spam Detection A Machine Learning Approach

A survey on click modeling in web search

Introduction to Machine Learning Lecture 1. Mehryar Mohri Courant Institute and Google Research

Machine Learning Overview

A Logistic Regression Approach to Ad Click Prediction

CAS-ICT at TREC 2005 SPAM Track: Using Non-Textual Information to Improve Spam Filtering Performance

On Entropy in Network Traffic Anomaly Detection

Spam filtering. Peter Likarish Based on slides by EJ Jung 11/03/10

Bayesian Statistics: Indian Buffet Process

CCNY. BME I5100: Biomedical Signal Processing. Linear Discrimination. Lucas C. Parra Biomedical Engineering Department City College of New York

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

MS1b Statistical Data Mining

Model-based Synthesis. Tony O Hagan

Tracking Groups of Pedestrians in Video Sequences

Object Recognition. Selim Aksoy. Bilkent University

Detecting client-side e-banking fraud using a heuristic model

Chapter ML:IV. IV. Statistical Learning. Probability Basics Bayes Classification Maximum a-posteriori Hypotheses

Robust Personalizable Spam Filtering Via Local and Global Discrimination Modeling

Linear Classification. Volker Tresp Summer 2015

large-scale machine learning revisited Léon Bottou Microsoft Research (NYC)

Robust personalizable spam filtering via local and global discrimination modeling

Evaluation of Machine Learning Techniques for Green Energy Prediction

Linear Models for Classification

Machine Learning. Mausam (based on slides by Tom Mitchell, Oren Etzioni and Pedro Domingos)

A Three-Way Decision Approach to Spam Filtering

Compression algorithm for Bayesian network modeling of binary systems

Volume 4, Issue 1, January 2016 International Journal of Advance Research in Computer Science and Management Studies

A Content based Spam Filtering Using Optical Back Propagation Technique

An evolutionary learning spam filter system

International Journal of Research in Advent Technology Available Online at:

Statistics Graduate Courses

Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification

Insurance Analytics - analýza dat a prediktivní modelování v pojišťovnictví. Pavel Kříž. Seminář z aktuárských věd MFF 4.

The Optimality of Naive Bayes

A Bayesian Topic Model for Spam Filtering

A SURVEY OF TEXT CLASSIFICATION ALGORITHMS

Customer Classification And Prediction Based On Data Mining Technique

Detecting Corporate Fraud: An Application of Machine Learning

: Introduction to Machine Learning Dr. Rita Osadchy

Naïve Bayes and Hadoop. Shannon Quinn

Finding soon-to-fail disks in a haystack

Search and Data Mining: Techniques. Text Mining Anya Yarygina Boris Novikov

8. Machine Learning Applied Artificial Intelligence

Logistic Regression. Vibhav Gogate The University of Texas at Dallas. Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld.

An efficient matching algorithm for encoded DNA sequences and binary strings

Not So Naïve Online Bayesian Spam Filter

Invited Applications Paper

Chapter 6. The stacking ensemble approach

Achieve more with less

Abstract. Find out if your mortgage rate is too high, NOW. Free Search

Sentiment analysis using emoticons

Introduction to Learning & Decision Trees

Employer Health Insurance Premium Prediction Elliott Lui

Machine learning for algo trading

Transcription:

Semester 1 http://www.anc.ed.ac.uk/ amos/lfd/

Naive Bayes Typical example: Bayesian Spam Filter. Naive means naive. Bayesian methods can be much more sophisticated. Basic assumption: conditional independence. Given the class (eg Spam, Ham ), whether one data item (eg word) appears is independent of whether another appears. Invariably wrong! But useful anyway.

Why? Easy to program. Simple and transparent. Fast to train. Fast to use. Can deal with uncertainty. Probabilistic.

Data types Naive Bayes assumption can use both continuous and discrete data. However generally understood in terms of discrete data. Binary and discrete very common. Do not use 1 of M! E.g. Bag of words assumption for text classification: Can even mix different types of data

Bag of Words Each document is represented by a large vector. Each element of the vector represents the presence (1) or absence (0) of a particular word in the document. Certain words are more common in one document type than another. Can build another form of class conditional model using the conditional probability of seeing each word, given the document class (e.g. ham/spam).

Conditional Independence P(X, Y ) = P(X)P(Y X). P(X, Y C) = P(X C)P(Y X, C). Think of C as a class label. The above is always true. However we can make an assumption P(Y X, C) = P(Y C). Knowing about the value of X makes no difference to the value Y takes so long as we know the class C. We say that X and Y are conditionally independent given C.

Example Probability of you getting heatstroke (S) and you going to the beach (B) are, most likely not independent. But they might be independent given that you know it is hot weather (H). P(S, B) P(S)P(B) P(S, B H) = P(S H)P(B H). H explains all of the dependence between S and B.

Generally Naive Bayes x 1, x 2,..., x n are said to be conditionally independent given c iff n P(x c) = P(x i c) for x = (x 1, x 2,,..., x n ). For example, we could also consider other attributes (eg I - your ice-cream melting) that are also conditionally independent. i=1

Naive Bayes The equation on the previous slide is in fact the Naive Bayes Model. n P(x c) = P(x i c) for x = (x 1, x 2,,..., x n ). The x is our attribute vector. And the c is our class label. i=1 We want to learn P(c) and P(x i c) from the data. We then want to find the best choice of c corresponding to a new datum (inference) The form of P(x i c) is usually given. But we do need to learn the parameter.

Working Example See sheet section 3. Have a set of attributes. Inference first: Bayes rule. Learning the model P(E),P(S),P(x S),P(x E) Naive Bayes assumption.

Problems with Naive Bayes 1 of M encoding Failed conditional independence assumptions. Worst case: repeated attribute. Double counted, triple counted etc. Conditionally dependent attributes can have too much influence.

Spam Example Bag of words. Probability of ham containing each word. Probability of spam containing each word. Prior probability of ham/spam. New document. Check the presence/absence of each word. Calculate the spam probability given the vector of word occurrence. How best to fool Naive Bayes? Introduce lots of hammy words into the document. Each hammy word is viewed independently and so they repeatedly count towards the ham probability.

Summary Conditional Independence Bag of Words Naive Bayes Learning Parameters Bayes Rule Working Examples