BUILDING A SPAM FILTER USING NAÏVE BAYES. CIS 391- Intro to AI 1

Size: px
Start display at page:

Download "BUILDING A SPAM FILTER USING NAÏVE BAYES. CIS 391- Intro to AI 1"

Transcription

1 BUILDING A SPAM FILTER USING NAÏVE BAYES 1

2 Spam or not Spam: that is the question. From: "" Subjet: real estate is the only way... gem oalvgkay Anyone an buy real estate with no money down Stop paying rent TODAY! There is no need to spend hundreds or even thousands for similar ourses I am 22 years old and I have already purhased 6 properties using the methods outlined in this truly INCREDIBLE ebook. Change your life NOW! ================================================= Clik Below to order: ================================================= 2

3 Categorization/Classifiation Problems Given: A desription of an instane X where X is the instane language or instane spae. (Issue: how do we represent tet douments?) A fied set of ategories: C = { 1 2 n } Determine: The ategory of : ()C where () is a ategorization funtion whose domain is X and whose range is C. We want to automatially build ategorization funtions ( lassifiers ). 3

4 EXAMPLES OF TEXT CATEGORIZATION Categories = SPAM? spam / not spam Categories = TOPICS finane / sports / asia Categories = OPINION like / hate / neutral Categories = AUTHOR Shakespeare / Marlowe / Ben Jonson The Federalist papers 4

5 Bayesian Methods for Classifiation Uses Bayes theorem to build a generative model that approimates how data is produed. First step: P( C X ) P( X C) P( C) PX ( ) Where C: Categories X: Instane to be lassified Uses prior probability of eah ategory given no information about an item. Categorization produes a posterior probability distribution over the possible ategories given a desription of eah instane. 6

6 Maimum a posteriori (MAP) Hypothesis Let MAP be the most probable ategory. Then goodbye to that nasty normalization!! MAP argma P( X ) C No need to ompute P(X)!!!! argma C P( D ) P( ) PX ( ) argma P( X ) P( ) C As P(X) is onstant 7

7 Maimum likelihood Hypothesis If all hypotheses are a priori equally likely to find the maimally likely ategory ML we only need to onsider the P(X ) term: ML argma P( X ) C Maimum Likelihood Estimate ( MLE ) 8

8 9 Naïve Bayes Classifiers: Step 1 Assume that instane X desribed by n-dimensional vetor of attributes then 1 2 n X ) ( argma 2 1 n C MAP P ) ( ) ( ) ( argma n n C P P P ) ( ) ( argma 2 1 P P n C

9 Naïve Bayes Classifier: Step 2 To estimate: P( j ): Can be estimated from the frequeny of lasses in the training eamples. P( 1 2 n j ): Problem!! O( X n C ) parameters required to estimate full joint prob. distribution Solution: argma P( ) P( ) MAP 1 2 Naïve Bayes Conditional Independene Assumption: P(... ) P( ) C i 2 n j i j i n 10

10 Naïve Bayes Classifier for Binary variables Flu P X 1 X 2 X 3 X 4 X 5 runnynose sinus ough fever musle-ahe Conditional Independene Assumption: features are independent of eah other given the lass: ( 5 X1 X5 C) P( X1 C) P( X2 C) P( X C) 11

11 Learning the Model C X 1 X 2 X 3 X 4 X 5 X 6 First attempt: maimum likelihood estimates Given training data for N individuals where ount(x=) is the number of those individuals for whih X= e.g Flu=true For eah ategory and eah value for a variable X ount( C ) P ˆ( ) N ount( X C ) Pˆ( ) ount ( C ) 12

12 Problem with Ma Likelihood for Naïve Bayes Flu X 1 X 2 X 3 X 4 X 5 runnynose sinus ough fever musle-ahe P( X X Flu) P( X Flu) P( X Flu) P( X Flu) What if no training ases where patient had musle ahes but no flu? ount( X t flu) P X t flu ˆ( ) 5 5 ount ( flu ) 0 So if X t P( X X flu) Zero probabilities overwhelm any other evidene! 13

13 Add-1 Laplae Smoothing to Avoid Overfitting ount( X C ) 1 Pˆ( ) ount( C ) X # of values of X i here 2 Slightly better version ount( X C ) Pˆ( ) ount( C ) X etent of smoothing 14

14 Using Naive Bayes Classifiers to Classify Tet: Basi method As a generative model: 1. Randomly pik a ategory aording to P() 2. For a doument of length N for eah word i : 1. Generate word i aording to P(w ) N 1 2 n i1 P( D w w... w ) P( ) P( w ) This is a Naïve Bayes lassifier for multinomial variables. Note that word order really doesn t matter here Uses same parameters for eah position Result is bag of words model Views doument not as an ordered list of words but as a multiset i 15

15 Naïve Bayes: Learning (First attempt) From training orpus etrat Voabulary Calulate required estimates of P() and P(w ) terms For eah j in C do ountdos ( C ) P () dos where ount dos () is the number of douments for whih is true. For eah word w i Voabulary and C where ount dotokens () is the number of tokens over all douments for whih is true of that doument and that token P( w ) i ountdotokens ( W wi C ) ount ( C ) dotokens 16

16 Naïve Bayes: Learning (Seond attempt) Laplae smoothing must be done over the voabulary items. We an assume we have at least one instane of eah ategory so we don t need to smooth these. Assume a single new word UNK that ours nowhere within the training doument set. Map all unknown words in douments to be lassified (test douments) to UNK. For 0 1 P( w ) i ountdotokens ( W wi C ) ount ( C ) a( V 1) dotokens 17

17 Naïve Bayes: Classifying Compute NB using either N arg ma P( ) P( w ) NB i1 where ount(w): the number of times word w ours in do (The two are equivalent..) i ( ) arg ma P( ) P( w ) ount w NB wv 18

18 PANTEL AND LIN: SPAMCOP Uses a Naïve Bayes lassifier M is spam if P(Spam M) > P(NonSpam M) Method Tokenize message using Porter Stemmer Estimate P( k C) using m-estimate (a form of smoothing) Remove words that do not satisfy ertain onditions Train: 160 spams 466 non-spams Test: 277 spams 346 non-spams Results: ERROR RATE of 4.33% Worse results using trigrams 19

19 Naive Bayes is (was) Not So Naive Naïve Bayes: First and Seond plae in KDD-CUP 97 ompetition among 16 (then) state of the art algorithms Goal: Finanial servies industry diret mail response predition model: Predit if the reipient of mail will atually respond to the advertisement reords. A good dependable baseline for tet lassifiation But not the best by itself! Optimal if the Independene Assumptions hold: If assumed independene is orret then it is the Bayes Optimal Classifier for problem Very Fast: Learning with one pass over the data; Testing linear in the number of attributes and doument olletion size Low Storage requirements 20

20 Engineering: Underflow Prevention Multiplying lots of probabilities whih are between 0 and 1 by definition an result in floating-point underflow. Sine log(y) = log() + log(y) it is better to perform all omputations by summing logs of probabilities rather than multiplying probabilities. Class with highest final un-normalized log probability sore is still the most probable. NB argma log P( j ) log P( wi j ) jc ipositions 21

21 REFERENCES Mosteller F. & Wallae D. L. (1984). Applied Bayesian and Classial Inferene: the Case of the Federalist Papers (2nd ed.). New York: Springer-Verlag. P. Pantel and D. Lin SPAMCOP: A Spam lassifiation and organization program In Pro. Of the 1998 workshop on learning for tet ategorization AAAI Sebastiani F Mahine Learning in Automated Tet Categorization ACM Computing Surveys 34(1)

Bayes and Naïve Bayes. cs534-machine Learning

Bayes and Naïve Bayes. cs534-machine Learning Bayes and aïve Bayes cs534-machine Learning Bayes Classifier Generative model learns Prediction is made by and where This is often referred to as the Bayes Classifier, because of the use of the Bayes rule

More information

1 Maximum likelihood estimation

1 Maximum likelihood estimation COS 424: Interacting with Data Lecturer: David Blei Lecture #4 Scribes: Wei Ho, Michael Ye February 14, 2008 1 Maximum likelihood estimation 1.1 MLE of a Bernoulli random variable (coin flips) Given N

More information

Part III: Machine Learning. CS 188: Artificial Intelligence. Machine Learning This Set of Slides. Parameter Estimation. Estimation: Smoothing

Part III: Machine Learning. CS 188: Artificial Intelligence. Machine Learning This Set of Slides. Parameter Estimation. Estimation: Smoothing CS 188: Artificial Intelligence Lecture 20: Dynamic Bayes Nets, Naïve Bayes Pieter Abbeel UC Berkeley Slides adapted from Dan Klein. Part III: Machine Learning Up until now: how to reason in a model and

More information

Question 2 Naïve Bayes (16 points)

Question 2 Naïve Bayes (16 points) Question 2 Naïve Bayes (16 points) About 2/3 of your email is spam so you downloaded an open source spam filter based on word occurrences that uses the Naive Bayes classifier. Assume you collected the

More information

INCOME TAX WITHHOLDING GUIDE FOR EMPLOYERS

INCOME TAX WITHHOLDING GUIDE FOR EMPLOYERS Virginia Department of Taxation INCOME TAX WITHHOLDING GUIDE FOR EMPLOYERS www.tax.virginia.gov 2614086 Rev. 01/16 Table of Contents Introdution... 1 Important... 1 Where to Get Assistane... 1 Online File

More information

INCOME TAX WITHHOLDING GUIDE FOR EMPLOYERS

INCOME TAX WITHHOLDING GUIDE FOR EMPLOYERS Virginia Department of Taxation INCOME TAX WITHHOLDING GUIDE FOR EMPLOYERS www.tax.virginia.gov 2614086 Rev. 07/14 * Table of Contents Introdution... 1 Important... 1 Where to Get Assistane... 1 Online

More information

Recommending Questions Using the MDL-based Tree Cut Model

Recommending Questions Using the MDL-based Tree Cut Model WWW 2008 / Refereed Trak: Data Mining - Learning April 2-25, 2008 Beijing, China Reommending Questions Using the MDL-based Tree Cut Model Yunbo Cao,2, Huizhong Duan, Chin-Yew Lin 2, Yong Yu, and Hsiao-Wuen

More information

Machine Learning for Naive Bayesian Spam Filter Tokenization

Machine Learning for Naive Bayesian Spam Filter Tokenization Machine Learning for Naive Bayesian Spam Filter Tokenization Michael Bevilacqua-Linn December 20, 2003 Abstract Background Traditional client level spam filters rely on rule based heuristics. While these

More information

Spam Filtering with Naive Bayesian Classification

Spam Filtering with Naive Bayesian Classification Spam Filtering with Naive Bayesian Classification Khuong An Nguyen Queens College University of Cambridge L101: Machine Learning for Language Processing MPhil in Advanced Computer Science 09-April-2011

More information

Chapter 6 A N ovel Solution Of Linear Congruenes Proeedings NCUR IX. (1995), Vol. II, pp. 708{712 Jerey F. Gold Department of Mathematis, Department of Physis University of Utah Salt Lake City, Utah 84112

More information

Capacity at Unsignalized Two-Stage Priority Intersections

Capacity at Unsignalized Two-Stage Priority Intersections Capaity at Unsignalized Two-Stage Priority Intersetions by Werner Brilon and Ning Wu Abstrat The subjet of this paper is the apaity of minor-street traffi movements aross major divided four-lane roadways

More information

Introduction to Bayesian Classification (A Practical Discussion) Todd Holloway Lecture for B551 Nov. 27, 2007

Introduction to Bayesian Classification (A Practical Discussion) Todd Holloway Lecture for B551 Nov. 27, 2007 Introduction to Bayesian Classification (A Practical Discussion) Todd Holloway Lecture for B551 Nov. 27, 2007 Naïve Bayes Components ML vs. MAP Benefits Feature Preparation Filtering Decay Extended Examples

More information

State of Maryland Participation Agreement for Pre-Tax and Roth Retirement Savings Accounts

State of Maryland Participation Agreement for Pre-Tax and Roth Retirement Savings Accounts State of Maryland Partiipation Agreement for Pre-Tax and Roth Retirement Savings Aounts DC-4531 (08/2015) For help, please all 1-800-966-6355 www.marylandd.om 1 Things to Remember Complete all of the setions

More information

Programming Basics - FORTRAN 77 http://www.physics.nau.edu/~bowman/phy520/f77tutor/tutorial_77.html

Programming Basics - FORTRAN 77 http://www.physics.nau.edu/~bowman/phy520/f77tutor/tutorial_77.html CWCS Workshop May 2005 Programming Basis - FORTRAN 77 http://www.physis.nau.edu/~bowman/phy520/f77tutor/tutorial_77.html Program Organization A FORTRAN program is just a sequene of lines of plain text.

More information

Ranking Community Answers by Modeling Question-Answer Relationships via Analogical Reasoning

Ranking Community Answers by Modeling Question-Answer Relationships via Analogical Reasoning Ranking Community Answers by Modeling Question-Answer Relationships via Analogial Reasoning Xin-Jing Wang Mirosoft Researh Asia 4F Sigma, 49 Zhihun Road Beijing, P.R.China xjwang@mirosoft.om Xudong Tu,Dan

More information

Improved Vehicle Classification in Long Traffic Video by Cooperating Tracker and Classifier Modules

Improved Vehicle Classification in Long Traffic Video by Cooperating Tracker and Classifier Modules Improved Vehile Classifiation in Long Traffi Video by Cooperating Traker and Classifier Modules Brendan Morris and Mohan Trivedi University of California, San Diego San Diego, CA 92093 {b1morris, trivedi}@usd.edu

More information

Hierarchical Clustering and Sampling Techniques for Network Monitoring

Hierarchical Clustering and Sampling Techniques for Network Monitoring S. Sindhuja Hierarhial Clustering and Sampling Tehniques for etwork Monitoring S. Sindhuja ME ABSTRACT: etwork monitoring appliations are used to monitor network traffi flows. Clustering tehniques are

More information

Retirement Option Election Form with Partial Lump Sum Payment

Retirement Option Election Form with Partial Lump Sum Payment Offie of the New York State Comptroller New York State and Loal Retirement System Employees Retirement System Polie and Fire Retirement System 110 State Street, Albany, New York 12244-0001 Retirement Option

More information

Chapter 1 Microeconomics of Consumer Theory

Chapter 1 Microeconomics of Consumer Theory Chapter 1 Miroeonomis of Consumer Theory The two broad ategories of deision-makers in an eonomy are onsumers and firms. Eah individual in eah of these groups makes its deisions in order to ahieve some

More information

Channel Assignment Strategies for Cellular Phone Systems

Channel Assignment Strategies for Cellular Phone Systems Channel Assignment Strategies for Cellular Phone Systems Wei Liu Yiping Han Hang Yu Zhejiang University Hangzhou, P. R. China Contat: wliu5@ie.uhk.edu.hk 000 Mathematial Contest in Modeling (MCM) Meritorious

More information

Machine Learning. CS 188: Artificial Intelligence Naïve Bayes. Example: Digit Recognition. Other Classification Tasks

Machine Learning. CS 188: Artificial Intelligence Naïve Bayes. Example: Digit Recognition. Other Classification Tasks CS 188: Artificial Intelligence Naïve Bayes Machine Learning Up until now: how use a model to make optimal decisions Machine learning: how to acquire a model from data / experience Learning parameters

More information

User s Guide VISFIT: a computer tool for the measurement of intrinsic viscosities

User s Guide VISFIT: a computer tool for the measurement of intrinsic viscosities File:UserVisfit_2.do User s Guide VISFIT: a omputer tool for the measurement of intrinsi visosities Version 2.a, September 2003 From: Multiple Linear Least-Squares Fits with a Common Interept: Determination

More information

Health Savings Account Application

Health Savings Account Application Health Savings Aount Appliation FOR BANK USE ONLY: ACCOUNT # CUSTOMER # Health Savings Aount (HSA) Appliation ALL FIELDS MUST BE COMPLETED. Missing fields may delay the aount opening proess and possibly

More information

5.2 The Master Theorem

5.2 The Master Theorem 170 CHAPTER 5. RECURSION AND RECURRENCES 5.2 The Master Theorem Master Theorem In the last setion, we saw three different kinds of behavior for reurrenes of the form at (n/2) + n These behaviors depended

More information

Simple Language Models for Spam Detection

Simple Language Models for Spam Detection Simple Language Models for Spam Detection Egidio Terra Faculty of Informatics PUC/RS - Brazil Abstract For this year s Spam track we used classifiers based on language models. These models are used to

More information

Research Data Management ANONYMISATION

Research Data Management ANONYMISATION ANONYMISATION Sensitive Data Sensitive Data is information overing: The raial or ethni origin of the Data Subjet Politial opinions Religious or other beliefs of a similar nature Membership of trade unions

More information

PROCEEDS OF CRIME (BUSINESS IN THE REGULATED SECTOR) ORDER 2015

PROCEEDS OF CRIME (BUSINESS IN THE REGULATED SECTOR) ORDER 2015 Proeeds of Crime (Business in the Regulated Setor) Order 2015 Artile 1 Statutory Doument No. 2015/0073 Proeeds of Crime At 2008 PROCEEDS OF CRIME (BUSINESS IN THE REGULATED SECTOR) ORDER 2015 Approved

More information

Naive Bayes Spam Filtering Using Word-Position-Based Attributes

Naive Bayes Spam Filtering Using Word-Position-Based Attributes Naive Bayes Spam Filtering Using Word-Position-Based Attributes Johan Hovold Department of Computer Science Lund University Box 118, 221 00 Lund, Sweden johan.hovold.363@student.lu.se Abstract This paper

More information

From a strategic view to an engineering view in a digital enterprise

From a strategic view to an engineering view in a digital enterprise Digital Enterprise Design & Management 2013 February 11-12, 2013 Paris From a strategi view to an engineering view in a digital enterprise The ase of a multi-ountry Telo Hervé Paault Orange Abstrat In

More information

FIRE DETECTION USING AUTONOMOUS AERIAL VEHICLES WITH INFRARED AND VISUAL CAMERAS. J. Ramiro Martínez-de Dios, Luis Merino and Aníbal Ollero

FIRE DETECTION USING AUTONOMOUS AERIAL VEHICLES WITH INFRARED AND VISUAL CAMERAS. J. Ramiro Martínez-de Dios, Luis Merino and Aníbal Ollero FE DETECTION USING AUTONOMOUS AERIAL VEHICLES WITH INFRARED AND VISUAL CAMERAS. J. Ramiro Martínez-de Dios, Luis Merino and Aníbal Ollero Robotis, Computer Vision and Intelligent Control Group. University

More information

CAS-ICT at TREC 2005 SPAM Track: Using Non-Textual Information to Improve Spam Filtering Performance

CAS-ICT at TREC 2005 SPAM Track: Using Non-Textual Information to Improve Spam Filtering Performance CAS-ICT at TREC 2005 SPAM Track: Using Non-Textual Information to Improve Spam Filtering Performance Shen Wang, Bin Wang and Hao Lang, Xueqi Cheng Institute of Computing Technology, Chinese Academy of

More information

Sebastián Bravo López

Sebastián Bravo López Transfinite Turing mahines Sebastián Bravo López 1 Introdution With the rise of omputers with high omputational power the idea of developing more powerful models of omputation has appeared. Suppose that

More information

How To Fator

How To Fator CHAPTER hapter 4 > Make the Connetion 4 INTRODUCTION Developing seret odes is big business beause of the widespread use of omputers and the Internet. Corporations all over the world sell enryption systems

More information

CMPSCI 240: Reasoning about Uncertainty

CMPSCI 240: Reasoning about Uncertainty CMPSCI 240: Reasoning about Uncertainty Lecture 18: Spam Filtering and Naive Bayes Classification Andrew McGregor University of Massachusetts Last Compiled: April 9, 2015 Review Total Probability If A

More information

i e AT 1 of 2012 DEBT RECOVERY AND ENFORCEMENT ACT 2012

i e AT 1 of 2012 DEBT RECOVERY AND ENFORCEMENT ACT 2012 i e AT 1 of 2012 DEBT RECOVERY AND ENFORCEMENT ACT 2012 Debt Reovery and Enforement At 2012 Index i e DEBT RECOVERY AND ENFORCEMENT ACT 2012 Index Setion Page PART 1 INTRODUCTORY 5 1 Short title... 5

More information

i e AT 35 of 1986 ALCOHOLIC LIQUOR DUTIES ACT 1986

i e AT 35 of 1986 ALCOHOLIC LIQUOR DUTIES ACT 1986 i e AT 35 of 1986 ALCOHOLIC LIQUOR DUTIES ACT 1986 Aloholi Liquor Duties At 1986 Index i e ALCOHOLIC LIQUOR DUTIES ACT 1986 Index Setion Page PART I PRELIMINARY 9 1 The aloholi liquors dutiable under

More information

Customer Reporting for SaaS Applications

Customer Reporting for SaaS Applications Aelerate. Produtivity Marketplae Software as a Servie Invoiing Navigating Produtivity Marketplae Ordering SaaS Appliations Customer Reporting for SaaS Appliations Managing Domains Helpful Resoures Upgrading/Downgrading/Changing

More information

Basic Properties of Probability

Basic Properties of Probability Basi Properties of Probability Definitions: A random experiment is a proedure or an operation whose outome is unertain and annot be predited with ertainty in advane. The olletion of all possible outomes

More information

3F3: Signal and Pattern Processing

3F3: Signal and Pattern Processing 3F3: Signal and Pattern Processing Lecture 3: Classification Zoubin Ghahramani zoubin@eng.cam.ac.uk Department of Engineering University of Cambridge Lent Term Classification We will represent data by

More information

Deadline-based Escalation in Process-Aware Information Systems

Deadline-based Escalation in Process-Aware Information Systems Deadline-based Esalation in Proess-Aware Information Systems Wil M.P. van der Aalst 1,2, Mihael Rosemann 2, Marlon Dumas 2 1 Department of Tehnology Management Eindhoven University of Tehnology, The Netherlands

More information

CSE 473: Artificial Intelligence Autumn 2010

CSE 473: Artificial Intelligence Autumn 2010 CSE 473: Artificial Intelligence Autumn 2010 Machine Learning: Naive Bayes and Perceptron Luke Zettlemoyer Many slides over the course adapted from Dan Klein. 1 Outline Learning: Naive Bayes and Perceptron

More information

Anti-Spam Filter Based on Naïve Bayes, SVM, and KNN model

Anti-Spam Filter Based on Naïve Bayes, SVM, and KNN model AI TERM PROJECT GROUP 14 1 Anti-Spam Filter Based on,, and model Yun-Nung Chen, Che-An Lu, Chao-Yu Huang Abstract spam email filters are a well-known and powerful type of filters. We construct different

More information

A Keyword Filters Method for Spam via Maximum Independent Sets

A Keyword Filters Method for Spam via Maximum Independent Sets Vol. 7, No. 3, May, 213 A Keyword Filters Method for Spam via Maximum Independent Sets HaiLong Wang 1, FanJun Meng 1, HaiPeng Jia 2, JinHong Cheng 3 and Jiong Xie 3 1 Inner Mongolia Normal University 2

More information

Social Network Analysis Based on BSP Clustering Algorithm

Social Network Analysis Based on BSP Clustering Algorithm Soial Network Analysis Based on BSP Clustering Algorithm ong Shool of Business Administration China University of Petroleum ABSRAC Soial network analysis is a new researh field in data mining. he lustering

More information

Scalable Hierarchical Multitask Learning Algorithms for Conversion Optimization in Display Advertising

Scalable Hierarchical Multitask Learning Algorithms for Conversion Optimization in Display Advertising Salable Hierarhial Multitask Learning Algorithms for Conversion Optimization in Display Advertising Amr Ahmed Google amra@google.om Abhimanyu Das Mirosoft Researh abhidas@mirosoft.om Alexander J. Smola

More information

GABOR AND WEBER LOCAL DESCRIPTORS PERFORMANCE IN MULTISPECTRAL EARTH OBSERVATION IMAGE DATA ANALYSIS

GABOR AND WEBER LOCAL DESCRIPTORS PERFORMANCE IN MULTISPECTRAL EARTH OBSERVATION IMAGE DATA ANALYSIS HENRI COANDA AIR FORCE ACADEMY ROMANIA INTERNATIONAL CONFERENCE of SCIENTIFIC PAPER AFASES 015 Brasov, 8-30 May 015 GENERAL M.R. STEFANIK ARMED FORCES ACADEMY SLOVAK REPUBLIC GABOR AND WEBER LOCAL DESCRIPTORS

More information

SLA-based Resource Allocation for Software as a Service Provider (SaaS) in Cloud Computing Environments

SLA-based Resource Allocation for Software as a Service Provider (SaaS) in Cloud Computing Environments 2 th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing SLA-based Resoure Alloation for Software as a Servie Provider (SaaS) in Cloud Computing Environments Linlin Wu, Saurabh Kumar

More information

' R ATIONAL. :::~i:. :'.:::::: RETENTION ':: Compliance with the way you work PRODUCT BRIEF

' R ATIONAL. :::~i:. :'.:::::: RETENTION ':: Compliance with the way you work PRODUCT BRIEF ' R :::i:. ATIONAL :'.:::::: RETENTION ':: Compliane with the way you work, PRODUCT BRIEF In-plae Management of Unstrutured Data The explosion of unstrutured data ombined with new laws and regulations

More information

VCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter

VCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter VCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter Gerard Briones and Kasun Amarasinghe and Bridget T. McInnes, PhD. Department of Computer Science Virginia Commonwealth University Richmond,

More information

Machine Learning in Spam Filtering

Machine Learning in Spam Filtering Machine Learning in Spam Filtering A Crash Course in ML Konstantin Tretyakov kt@ut.ee Institute of Computer Science, University of Tartu Overview Spam is Evil ML for Spam Filtering: General Idea, Problems.

More information

CIS570 Lecture 4 Introduction to Data-flow Analysis 3

CIS570 Lecture 4 Introduction to Data-flow Analysis 3 Introdution to Data-flow Analysis Last Time Control flow analysis BT disussion Today Introdue iterative data-flow analysis Liveness analysis Introdue other useful onepts CIS570 Leture 4 Introdution to

More information

Intuitive Guide to Principles of Communications By Charan Langton www.complextoreal.com

Intuitive Guide to Principles of Communications By Charan Langton www.complextoreal.com Intuitive Guide to Priniples of Communiations By Charan Langton www.omplextoreal.om Understanding Frequeny Modulation (FM), Frequeny Shift Keying (FSK), Sunde s FSK and MSK and some more The proess of

More information

Basics of Statistical Machine Learning

Basics of Statistical Machine Learning CS761 Spring 2013 Advanced Machine Learning Basics of Statistical Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu Modern machine learning is rooted in statistics. You will find many familiar

More information

Insurance Analytics - analýza dat a prediktivní modelování v pojišťovnictví. Pavel Kříž. Seminář z aktuárských věd MFF 4.

Insurance Analytics - analýza dat a prediktivní modelování v pojišťovnictví. Pavel Kříž. Seminář z aktuárských věd MFF 4. Insurance Analytics - analýza dat a prediktivní modelování v pojišťovnictví Pavel Kříž Seminář z aktuárských věd MFF 4. dubna 2014 Summary 1. Application areas of Insurance Analytics 2. Insurance Analytics

More information

Big Data Analysis and Reporting with Decision Tree Induction

Big Data Analysis and Reporting with Decision Tree Induction Big Data Analysis and Reporting with Deision Tree Indution PETRA PERNER Institute of Computer Vision and Applied Computer Sienes, IBaI Postbox 30 11 14, 04251 Leipzig GERMANY pperner@ibai-institut.de,

More information

1.3 Complex Numbers; Quadratic Equations in the Complex Number System*

1.3 Complex Numbers; Quadratic Equations in the Complex Number System* 04 CHAPTER Equations and Inequalities Explaining Conepts: Disussion and Writing 7. Whih of the following pairs of equations are equivalent? Explain. x 2 9; x 3 (b) x 29; x 3 () x - 2x - 22 x - 2 2 ; x

More information

AT 6 OF 2012 GAMBLING DUTY ACT 2012

AT 6 OF 2012 GAMBLING DUTY ACT 2012 i e AT 6 OF 2012 GAMBLING DUTY ACT 2012 Gambling Duty At 2012 Index i e GAMBLING DUTY ACT 2012 Index Setion Page PART 1 INTRODUCTORY 5 1 Short title... 5 2 Commenement... 5 3 General interpretation...

More information

Inference of Probability Distributions for Trust and Security applications

Inference of Probability Distributions for Trust and Security applications Inference of Probability Distributions for Trust and Security applications Vladimiro Sassone Based on joint work with Mogens Nielsen & Catuscia Palamidessi Outline 2 Outline Motivations 2 Outline Motivations

More information

Parametric model of IP-networks in the form of colored Petri net

Parametric model of IP-networks in the form of colored Petri net Parametri model of IP-networks in the form of olored Petri net Shmeleva T.R. Abstrat A parametri model of IP-networks in the form of olored Petri net was developed; it onsists of a fixed number of Petri

More information

DSP-I DSP-I DSP-I DSP-I

DSP-I DSP-I DSP-I DSP-I DSP-I DSP-I DSP-I DSP-I Digital Signal Proessing I (8-79) Fall Semester, 005 IIR FILER DESIG EXAMPLE hese notes summarize the design proedure for IIR filters as disussed in lass on ovember. Introdution:

More information

Discovering Trends in Large Datasets Using Neural Networks

Discovering Trends in Large Datasets Using Neural Networks Disovering Trends in Large Datasets Using Neural Networks Khosrow Kaikhah, Ph.D. and Sandesh Doddameti Department of Computer Siene Texas State University San Maros, Texas 78666 Abstrat. A novel knowledge

More information

Hierarchical Beta Processes and the Indian Buffet Process

Hierarchical Beta Processes and the Indian Buffet Process Hierarhial Beta Proesses and the Indian Buffet Proess Romain Thibaux Dept. of EECS University of California, Berkeley Berkeley, CA 9472 Mihael I. Jordan Dept. of EECS and Dept. of Statistis University

More information

Neural network-based Load Balancing and Reactive Power Control by Static VAR Compensator

Neural network-based Load Balancing and Reactive Power Control by Static VAR Compensator nternational Journal of Computer and Eletrial Engineering, Vol. 1, No. 1, April 2009 Neural network-based Load Balaning and Reative Power Control by Stati VAR Compensator smail K. Said and Marouf Pirouti

More information

An Efficient Network Traffic Classification Based on Unknown and Anomaly Flow Detection Mechanism

An Efficient Network Traffic Classification Based on Unknown and Anomaly Flow Detection Mechanism An Effiient Network Traffi Classifiation Based on Unknown and Anomaly Flow Detetion Mehanism G.Suganya.M.s.,B.Ed 1 1 Mphil.Sholar, Department of Computer Siene, KG College of Arts and Siene,Coimbatore.

More information

Fixed-income Securities Lecture 2: Basic Terminology and Concepts. Present value (fixed interest rate) Present value (fixed interest rate): the arb

Fixed-income Securities Lecture 2: Basic Terminology and Concepts. Present value (fixed interest rate) Present value (fixed interest rate): the arb Fixed-inome Seurities Leture 2: Basi Terminology and Conepts Philip H. Dybvig Washington University in Saint Louis Various interest rates Present value (PV) and arbitrage Forward and spot interest rates

More information

Voluntary Disclosure and the Duty to Disclose

Voluntary Disclosure and the Duty to Disclose Voluntary Dislosure and the Duty to Dislose Ronald A. Dye February 15, 2013 Abstrat This paper ealuates firms dislosure deisions when they hae a duty to dislose the material information in their possession.

More information

Product Warranties and Double Adverse Selection

Product Warranties and Double Adverse Selection rodut Warranties and Double Adverse eletion David A. oberman Assistant rofessor of Marketing INEAD Boulevard de Constane 77305 Fontainebleau Cede, Frane The author thanks rofessors Andy Mithell, Jak Mintz,

More information

Spam Filtering based on Naive Bayes Classification. Tianhao Sun

Spam Filtering based on Naive Bayes Classification. Tianhao Sun Spam Filtering based on Naive Bayes Classification Tianhao Sun May 1, 2009 Abstract This project discusses about the popular statistical spam filtering process: naive Bayes classification. A fairly famous

More information

tr(a + B) = tr(a) + tr(b) tr(ca) = c tr(a)

tr(a + B) = tr(a) + tr(b) tr(ca) = c tr(a) Chapter 3 Determinant 31 The Determinant Funtion We follow an intuitive approah to introue the efinition of eterminant We alreay have a funtion efine on ertain matries: the trae The trae assigns a numer

More information

Increasing the Accuracy of a Spam-Detecting Artificial Immune System

Increasing the Accuracy of a Spam-Detecting Artificial Immune System Increasing the Accuracy of a Spam-Detecting Artificial Immune System Terri Oda Carleton University terri@zone12.com Tony White Carleton University arpwhite@scs.carleton.ca Abstract- Spam, the electronic

More information

MATE: MPLS Adaptive Traffic Engineering

MATE: MPLS Adaptive Traffic Engineering MATE: MPLS Adaptive Traffi Engineering Anwar Elwalid Cheng Jin Steven Low Indra Widjaja Bell Labs EECS Dept EE Dept Fujitsu Network Communiations Luent Tehnologies Univ. of Mihigan Calteh Pearl River,

More information

Taking Advantage of the Web for Text Classification with Imbalanced Classes *

Taking Advantage of the Web for Text Classification with Imbalanced Classes * Taking Advantage of the Web for Text lassification with Imbalanced lasses * Rafael Guzmán-abrera 1,2, Manuel Montes-y-Gómez 3, Paolo Rosso 2, Luis Villaseñor-Pineda 3 1 FIMEE, Universidad de Guanajuato,

More information

Electrician'sMathand BasicElectricalFormulas

Electrician'sMathand BasicElectricalFormulas Eletriian'sMathand BasiEletrialFormulas MikeHoltEnterprises,In. 1.888.NEC.CODE www.mikeholt.om Introdution Introdution This PDF is a free resoure from Mike Holt Enterprises, In. It s Unit 1 from the Eletrial

More information

Abstract. Find out if your mortgage rate is too high, NOW. Free Search

Abstract. Find out if your mortgage rate is too high, NOW. Free Search Statistics and The War on Spam David Madigan Rutgers University Abstract Text categorization algorithms assign texts to predefined categories. The study of such algorithms has a rich history dating back

More information

SCHEME FOR FINANCING SCHOOLS

SCHEME FOR FINANCING SCHOOLS SCHEME FOR FINANCING SCHOOLS UNDER SECTION 48 OF THE SCHOOL STANDARDS AND FRAMEWORK ACT 1998 DfE Approved - Marh 1999 With amendments Marh 2001, Marh 2002, April 2003, July 2004, Marh 2005, February 2007,

More information

Customer Efficiency, Channel Usage and Firm Performance in Retail Banking

Customer Efficiency, Channel Usage and Firm Performance in Retail Banking Customer Effiieny, Channel Usage and Firm Performane in Retail Banking Mei Xue Operations and Strategi Management Department The Wallae E. Carroll Shool of Management Boston College 350 Fulton Hall, 140

More information

Bayes Bluff: Opponent Modelling in Poker

Bayes Bluff: Opponent Modelling in Poker Bayes Bluff: Opponent Modelling in Poker Finnegan Southey, Mihael Bowling, Brye Larson, Carmelo Piione, Neil Burh, Darse Billings, Chris Rayner Department of Computing Siene University of Alberta Edmonton,

More information

On Attacking Statistical Spam Filters

On Attacking Statistical Spam Filters On Attacking Statistical Spam Filters Gregory L. Wittel and S. Felix Wu Department of Computer Science University of California, Davis One Shields Avenue, Davis, CA 95616 USA Paper review by Deepak Chinavle

More information

Classical Electromagnetic Doppler Effect Redefined. Copyright 2014 Joseph A. Rybczyk

Classical Electromagnetic Doppler Effect Redefined. Copyright 2014 Joseph A. Rybczyk Classial Eletromagneti Doppler Effet Redefined Copyright 04 Joseph A. Rybzyk Abstrat The lassial Doppler Effet formula for eletromagneti waves is redefined to agree with the fundamental sientifi priniples

More information

A Context-Aware Preference Database System

A Context-Aware Preference Database System J. PERVASIVE COMPUT. & COMM. (), MARCH 005. TROUBADOR PUBLISHING LTD) A Context-Aware Preferene Database System Kostas Stefanidis Department of Computer Siene, University of Ioannina,, kstef@s.uoi.gr Evaggelia

More information

CHAPTER 2 Estimating Probabilities

CHAPTER 2 Estimating Probabilities CHAPTER 2 Estimating Probabilities Machine Learning Copyright c 2016. Tom M. Mitchell. All rights reserved. *DRAFT OF January 24, 2016* *PLEASE DO NOT DISTRIBUTE WITHOUT AUTHOR S PERMISSION* This is a

More information

Findings and Recommendations

Findings and Recommendations Contrating Methods and Administration Findings and Reommendations Finding 9-1 ESD did not utilize a formal written pre-qualifiations proess for seleting experiened design onsultants. ESD hose onsultants

More information

Open and Extensible Business Process Simulator

Open and Extensible Business Process Simulator UNIVERSITY OF TARTU FACULTY OF MATHEMATICS AND COMPUTER SCIENCE Institute of Computer Siene Karl Blum Open and Extensible Business Proess Simulator Master Thesis (30 EAP) Supervisors: Luiano Garía-Bañuelos,

More information

Linear Classification. Volker Tresp Summer 2015

Linear Classification. Volker Tresp Summer 2015 Linear Classification Volker Tresp Summer 2015 1 Classification Classification is the central task of pattern recognition Sensors supply information about an object: to which class do the object belong

More information

Bayesian Spam Detection

Bayesian Spam Detection Scholarly Horizons: University of Minnesota, Morris Undergraduate Journal Volume 2 Issue 1 Article 2 2015 Bayesian Spam Detection Jeremy J. Eberhardt University or Minnesota, Morris Follow this and additional

More information

Behavior Analysis-Based Learning Framework for Host Level Intrusion Detection

Behavior Analysis-Based Learning Framework for Host Level Intrusion Detection Behavior Analysis-Based Learning Framework for Host Level Intrusion Detetion Haiyan Qiao, Jianfeng Peng, Chuan Feng, Jerzy W. Rozenblit Eletrial and Computer Engineering Department University of Arizona

More information

Relativistic Kinematics -a project in Analytical mechanics Karlstad University

Relativistic Kinematics -a project in Analytical mechanics Karlstad University Relativisti Kinematis -a projet in Analytial mehanis Karlstad University Carl Stigner 1th January 6 Abstrat The following text is a desription of some of the ontent in hapter 7 in the textbook Classial

More information

Learning from Data: Naive Bayes

Learning from Data: Naive Bayes Semester 1 http://www.anc.ed.ac.uk/ amos/lfd/ Naive Bayes Typical example: Bayesian Spam Filter. Naive means naive. Bayesian methods can be much more sophisticated. Basic assumption: conditional independence.

More information

Interpretable Fuzzy Modeling using Multi-Objective Immune- Inspired Optimization Algorithms

Interpretable Fuzzy Modeling using Multi-Objective Immune- Inspired Optimization Algorithms Interpretable Fuzzy Modeling using Multi-Objetive Immune- Inspired Optimization Algorithms Jun Chen, Mahdi Mahfouf Abstrat In this paper, an immune inspired multi-objetive fuzzy modeling (IMOFM) mehanism

More information

Revista Brasileira de Ensino de Fsica, vol. 21, no. 4, Dezembro, 1999 469. Surface Charges and Electric Field in a Two-Wire

Revista Brasileira de Ensino de Fsica, vol. 21, no. 4, Dezembro, 1999 469. Surface Charges and Electric Field in a Two-Wire Revista Brasileira de Ensino de Fsia, vol., no. 4, Dezembro, 999 469 Surfae Charges and Eletri Field in a Two-Wire Resistive Transmission Line A. K. T.Assis and A. J. Mania Instituto de Fsia Gleb Wataghin'

More information

Granular Problem Solving and Software Engineering

Granular Problem Solving and Software Engineering Granular Problem Solving and Software Engineering Haibin Zhu, Senior Member, IEEE Department of Computer Siene and Mathematis, Nipissing University, 100 College Drive, North Bay, Ontario, P1B 8L7, Canada

More information

Information Security 201

Information Security 201 FAS Information Seurity 201 Desktop Referene Guide Introdution Harvard University is ommitted to proteting information resoures that are ritial to its aademi and researh mission. Harvard is equally ommitted

More information

protection p1ann1ng report

protection p1ann1ng report ( f1re protetion p1ann1ng report I BUILDING CONSTRUCTION INFORMATION FROM THE CONCRETE AND MASONRY INDUSTRIES NO. 15 OF A SERIES A Comparison of Insurane and Constrution Costs for Low-Rise Multifamily

More information

Search and Data Mining: Techniques. Text Mining Anya Yarygina Boris Novikov

Search and Data Mining: Techniques. Text Mining Anya Yarygina Boris Novikov Search and Data Mining: Techniques Text Mining Anya Yarygina Boris Novikov Introduction Generally used to denote any system that analyzes large quantities of natural language text and detects lexical or

More information

i e AT 11 of 2006 INSURANCE COMPANIES (AMALGAMATIONS) ACT 2006

i e AT 11 of 2006 INSURANCE COMPANIES (AMALGAMATIONS) ACT 2006 i e AT 11 of 2006 INSURANCE COMPANIES (AMALGAMATIONS) ACT 2006 Insurane Companies (Amalgamations) At 2006 Index i e INSURANCE COMPANIES (AMALGAMATIONS) ACT 2006 Index Setion Page 1 Orders in respet of

More information

Natural Language Processing. Today. Logistic Regression Models. Lecture 13 10/6/2015. Jim Martin. Multinomial Logistic Regression

Natural Language Processing. Today. Logistic Regression Models. Lecture 13 10/6/2015. Jim Martin. Multinomial Logistic Regression Natural Language Processing Lecture 13 10/6/2015 Jim Martin Today Multinomial Logistic Regression Aka log-linear models or maximum entropy (maxent) Components of the model Learning the parameters 10/1/15

More information

Outline. Planning. Search vs. Planning. Search vs. Planning Cont d. Search vs. planning. STRIPS operators Partial-order planning.

Outline. Planning. Search vs. Planning. Search vs. Planning Cont d. Search vs. planning. STRIPS operators Partial-order planning. Outline Searh vs. planning Planning STRIPS operators Partial-order planning Chapter 11 Artifiial Intelligene, lp4 2005/06, Reiner Hähnle, partly based on AIMA Slides Stuart Russell and Peter Norvig, 1998

More information

Disability Discrimination (Services and Premises) Regulations 2016 Index DISABILITY DISCRIMINATION (SERVICES AND PREMISES) REGULATIONS 2016

Disability Discrimination (Services and Premises) Regulations 2016 Index DISABILITY DISCRIMINATION (SERVICES AND PREMISES) REGULATIONS 2016 Disability Disrimination (Servies and Premises) Regulations 2016 Index DISABILITY DISCRIMINATION (SERVICES AND PREMISES) REGULATIONS 2016 Index Regulation Page 1 Title... 3 2 Commenement... 3 3 Interpretation...

More information

Exempt Organization Business Income Tax Return

Exempt Organization Business Income Tax Return Form For alendar year 2013 or other tax year beginning, and ending. 34 Unrelated business taxable. Subtrat line 33 from line 32. If line 33 is greater than line 32, enter the smaller of zero or line 32

More information