Tightening the Net: A Review of Current and Next Generation Spam Filtering Tools



Similar documents
SURVEY PAPER ON INTELLIGENT SYSTEM FOR TEXT AND IMAGE SPAM FILTERING Amol H. Malge 1, Dr. S. M. Chaware 2

eprism Security Appliance 6.0 Intercept Anti-Spam Quick Start Guide

Antispam Security Best Practices

About this documentation

An Overview of Spam Blocking Techniques

Feature Subset Selection in Spam Detection

BoxSentry. Secure your with no false positives. RealMail. Patent Pending

Recurrent Patterns Detection Technology. White Paper

Lan, Mingjun and Zhou, Wanlei 2005, Spam filtering based on preference ranking, in Fifth International Conference on Computer and Information

Government of Canada Managed Security Service (GCMSS) Annex A-5: Statement of Work - Antispam

Introduction. How does filtering work? What is the Quarantine? What is an End User Digest?

Collateral Damage. Consequences of Spam and Virus Filtering for the System. Peter Eisentraut 22C3. credativ GmbH.

AntiSpam QuickStart Guide

Analysis of Spam Filter Methods on SMTP Servers Category: Trends in Anti-Spam Development

IMPROVING SPAM FILTERING EFFICIENCY USING BAYESIAN BACKWARD APPROACH PROJECT

Spam detection with data mining method:

REVIEW AND ANALYSIS OF SPAM BLOCKING APPLICATIONS

GFI Product Comparison. GFI MailEssentials vs. Trend Micro ScanMail Suite for Microsoft Exchange

Spam Filtering using Naïve Bayesian Classification

Commtouch RPD Technology. Network Based Protection Against -Borne Threats

Why Content Filters Can t Eradicate spam

PROTECTING YOUR MAILBOXES. Features SECURITY OF INFORMATION TECHNOLOGIES

BARRACUDA. N e t w o r k s SPAM FIREWALL 600

The Network Box Anti-Spam Solution

Spam Filtering Methods for Filtering

MailMarshal SMTP 2006 Anti-Spam Technology

Adaptive Filtering of SPAM

Barracuda Spam Firewall

How To Stop Spam From Being A Problem

Anti Spam Best Practices

COMBATING SPAM. Best Practices OVERVIEW. White Paper. March 2007

GFI Product Comparison. GFI MailEssentials vs Barracuda Spam Firewall

Savita Teli 1, Santoshkumar Biradar 2

PROOFPOINT - SPAM FILTER

Groundbreaking Technology Redefines Spam Prevention. Analysis of a New High-Accuracy Method for Catching Spam

Solutions IT Ltd Virus and Antispam filtering solutions

A Content based Spam Filtering Using Optical Back Propagation Technique

Intercept Anti-Spam Quick Start Guide

The Radicati Group, Inc. ...

Why Spamhaus is Your Best Approach to Fighting Spam

Opus One PAGE 1 1 COMPARING INDUSTRY-LEADING ANTI-SPAM SERVICES RESULTS FROM TWELVE MONTHS OF TESTING INTRODUCTION TEST METHODOLOGY

How To Block Ndr Spam

Do you need to... Do you need to...

Objective This howto demonstrates and explains the different mechanisms for fending off unwanted spam .

FireEye Threat Prevention Cloud Evaluation

GFI Product Comparison. GFI MailEssentials vs Symantec Mail Security for Microsoft Exchange 7.0

SPAM FILTER Service Data Sheet

A Case-Based Approach to Spam Filtering that Can Track Concept Drift

MDaemon configuration recommendations for dealing with spam related issues

A MACHINE LEARNING APPROACH TO SERVER-SIDE ANTI-SPAM FILTERING 1 2

Spam Testing Methodology Opus One, Inc. March, 2007

Enhanced Spam Defence

Trend Micro Hosted Security Stop Spam. Save Time.

Anti Spamming Techniques

LastSpam is unique in the marketplace, due to its service-based approach to real-time protection.

Using Security to Protect Against Phishing, Spam, and Targeted Attacks: Combining Features for Higher Education

Eiteasy s Enterprise Filter

Purchase College Barracuda Anti-Spam Firewall User s Guide

Spam DNA Filtering System

Security. on your terms SOFTSCAN

Ipswitch IMail Server with Integrated Technology

When Reputation is Not Enough: Barracuda Spam Firewall Predictive Sender Profiling. White Paper

EnterGroup offers multiple spam fighting technologies so that you can pick and choose one or more that are right for you.

Understanding Proactive vs. Reactive Methods for Fighting Spam. June 2003

Combining Global and Personal Anti-Spam Filtering

Bayesian Spam Filtering

ContentCatcher. Voyant Strategies. Best Practice for Gateway Security and Enterprise-class Spam Filtering

The Growing Problem of Outbound Spam

eprism Security Appliance 6.0 Release Notes What's New in 6.0

ASAV Configuration Advanced Spam Filtering

International Journal of Research in Advent Technology Available Online at:

Improving the Performance of Heuristic Spam Detection using a Multi-Objective Genetic Algorithm. James Dudley

Emerging Trends in Fighting Spam

the barricademx end user interface documentation for barricademx users

Transcription:

Tightening the Net: A Review of Current and Next Generation Spam Filtering Tools Spam Track Wednesday 1 March, 2006 APRICOT Perth, Australia James Carpinter & Ray Hunt Dept. of Computer Science and Software Engineering University of Canterbury, New Zealand 1 Where are we at? Current state of the art Outline Spam classification and filtering engines Filtering technologies Machine learning and non-machine learning Corpus performance comparison Case Study: heuristic, Bayesian, combined filters Conclusions 2 1

Where are we at? Overall the industry has been unsuccessful in solving spam problem Current tools are limited or ineffective Network providers cannot or do not want to address this issue Legislation has been a waste of time Problem of spam is getting worse We need new systems and solutions as current products are largely ineffective 3 Where are we at? Spam ranges from minor irritant to major threat to productivity Stanford University study [36]: average Internet user loses ten working days a year dealing with spam Also 15% of emails contains viruses Estimates worldwide cost of spam in 2005, in terms of lost productivity and IT infrastructure investment at > US$10 billion [29, 52] 4 2

Where are we at? Effectiveness of spam filters to improve user productivity is ultimately limited by extent to which users must manually: Review filtered messages for false positives Review incoming email for false negatives 99% accuracy rate with 1% false negatives (and no false positives) is preferable to same level of accuracy with 1% false positives (and no false negatives) 5 Where are we at? Business model of spammers is too attractive Commissions to spammers of 25 50% on products sold are not unusual [30] On a collection of 200 million email addresses a response rate of 0.001% would yield a spammer a return of $25,000, given a $50 product Any solution to this problem must reduce the profitability of the underlying business model by: substantially reducing number of emails reaching valid recipients or increasing expenses faced by the spammer 6 3

Current state of the art Interactive filters (challenge-response systems), intercept incoming emails from suspected spammers Held by recipient s email server, issues challenge to sender to establish that email came from human sender rather than bulk mailer Belief is that spammers will be uninterested in completing challenge If fake email address is used by sender, they will not receive the challenge Selective c/r systems issue a challenge only when (non-interactive) spam filter is unable to determine class of message 7 Current state of the art Current prime focus is automated, non-interactive filters Some found in current commercial systems, others confined to current research Two key current approaches: Machine learning-based filters Non-machine learning-based filters Current range of commercial systems dominated by: Heuristic filtering Bayesian filtering 8 4

Spam Classification & Filtering Engines Non-machine learning: heuristics, signatures, blacklisting, hashbased, traffic analysis, etc Machine learning techniques: Bayesian, spare binary polynomial hashing, support vector machine, Markov models, pattern discovery etc Key developments in this area 9 Spam Classification & Filtering Engines Machine learning filtering techniques can be further categorised into: Complementary solutions Complete solutions Complementary solutions designed to work as a component of larger filtering system, offering support to primary filter (ML or non-ml based) Complete solutions aim to construct comprehensive knowledge base that allows them to classify all incoming messages independently 10 5

Spam Classification & Filtering Engines Complete solutions come in variety of flavours: some aim to build a unified model some compare incoming email to previous examples (previous likeness) others use collaborative approach, combining multiple classifiers to evaluate email (ensemble) 11 Spam Classification and Filtering Engines 12 6

Filtering Technologies Non-machine learning Heuristics (rule-based analysis) Signatures Blacklisting Traffic Analysis Machine learning Unified model filters (Bayesian filtering et al) Previous likeness based filters Ensemble filters Complementary filters 13 Corpus Performance Comparison Many techniques described are in various stages of research and development Difficult to compare as there is no single email benchmark database SpamAssassin (spamassassin.apache.org) maintains a collection of legitimate and spam emails Ling-Spam corpus [1] Enron bankruptcy: 400 MB of realistic workplace email [11] Techniques used by spammers are continually evolving [27] Any static spam corpus would, over time, no longer 14 resemble the makeup of current spam email 7

Case Study Two-stage email filtering: DNS blacklisting system (eliminates 50,000 of 110,000 per day) and then Process Software s Precise-Mail Anti-spam System (PMAS) discards another 42% and quarantines 35% for review PMAS based on comprehensive heuristic rule collection combining both server and user-level block and allow lists Bayesian filtering option, works in conjunction with heuristic filter, and was not currently active before the evaluation 15 Case Study Two database benchmarks used SpamAssassin corpus (public) SpamArchive corpus (internal) Training of PMAS Bayesian filter took place over 2 weeks 16 8

Case Study Overall results consistent with those published by NetworkWorldFusion [51] They recorded 0.75% false positives, and 96% accuracy, while we recorded 0.75% (with the partial SpamAssassin corpus) false positives and 97.67% accuracy Under both corpora, combined filtering option surpasses the alternatives in the two key areas lower level of false positives higher level of spam caught 17 Case Study Results indicate that filtering best placed at user rather than server Consistent with Garcia et al. [19] We conclude two things from these experiments: Use of a Bayesian filtering component improves overall filter performance; however it is not a substitute for traditional heuristic filters Effects of time on validity of the corpora - older spam is more readily identified, suggesting changing techniques 18 9

Spam Filtering Engines performance of heuristic, Bayesian & combined filters 19 Conclusions Spam is a very serious problem for the internet community Threatens both integrity of networks and productivity Anti-spam vendors offer wide array of products These can be implemented in various ways (software, hardware, service) and at various levels (server and user) Introduction of new technologies, such as Bayesian filtering, is improving filter accuracy The implementation of machine learning algorithms is likely to represent the next step in this ongoing fight 20 10