The Open Source Stack: One approach to spam filtering

Similar documents

Do you need to... Do you need to...

Fighting Spam: Tools, Tips, and Techniques

Fighting Spam with open source software

Content Scanning with Exim 4

AntiSpam QuickStart Guide

one million mails a day: open source software to deal with it Charly Kühnast Municipal Datacenter for the Lower Rhine Area Moers, Germany

Exim4U. Server Solution For Unix And Linux Systems

ETH Zürich - Mail Filtering Service

Collateral Damage. Consequences of Spam and Virus Filtering for the System. Peter Eisentraut 22C3. credativ GmbH.

eprism Security Appliance 6.0 Intercept Anti-Spam Quick Start Guide

ORF ENTERPRISE EDITION 1. Getting the Most Out of ORF

Effective Open-Source Spam Filtering

A D M I N I S T R A T O R V 1. 0

Government of Canada Managed Security Service (GCMSS) Annex A-5: Statement of Work - Antispam

Anti-spam filtering techniques

Stop Spam Now! By John Buckman. John Buckman is President of Lyris Technologies, Inc. and programming architect behind Lyris list server.

Spamfilter Relay Mailserver

Objective This howto demonstrates and explains the different mechanisms for fending off unwanted spam .

EnterGroup offers multiple spam fighting technologies so that you can pick and choose one or more that are right for you.

Analysis of Spam Filter Methods on SMTP Servers Category: Trends in Anti-Spam Development

Spam, Spam and More Spam. Spammers: Cost to send

Anti Spam Best Practices

Solutions IT Ltd Virus and Antispam filtering solutions

ContentCatcher. Voyant Strategies. Best Practice for Gateway Security and Enterprise-class Spam Filtering

Antispam Security Best Practices

About this documentation

CONFIGURING FUS ANTI-SPAM

How To Stop Spam From Being A Problem

An Overview of Spam Blocking Techniques

MDaemon configuration recommendations for dealing with spam related issues

COMBATING SPAM. Best Practices OVERVIEW. White Paper. March 2007

Purchase College Barracuda Anti-Spam Firewall User s Guide

FortiMail Filtering Course 221-v2.0. Course Overview. Course Objectives

Who will win the battle - Spammers or Service Providers?

Eiteasy s Enterprise Filter

Fighting Spam in an ISP Environment:

procmail and SpamAssassin

Quarantined Messages 5 What are quarantined messages? 5 What username and password do I use to access my quarantined messages? 5

ECE Mail System Overview. Pablo J. Rebollo ECE Network Operations Center

FortiMail Filtering Course 221-v2.2 Course Overview

Why Spamhaus is Your Best Approach to Fighting Spam

ASAV Configuration Advanced Spam Filtering

Spam Filtering Methods for Filtering

Administrator Manual v3.0

Microsoft and Windows are either registered trademarks or trademarks of Microsoft Corporation in the United States and/or other countries.

Introduction. How does filtering work? What is the Quarantine? What is an End User Digest?

Technical Note. FORTIMAIL Configuration For Enterprise Deployment. Rev 2.1

Security. Help Documentation

Migration Project Plan for Cisco Cloud Security

ANTI-SPAM SOLUTIONS TECHNOLOGY REPORT FEBRUARY SurfControl Filter.

The Network Box Anti-Spam Solution

IceWarp Unified Communications. AntiSpam Reference. Version 10.4

Blocking Spam Sessions with Greylisting and Block Listing based on Client Behavior

Anti Spamming Techniques

How to Use Red Condor Spam Filtering

More Details About Your Spam Digest & Dashboard

1 Accessing accounts on the Axxess Mail Server

Avira Managed Security AMES FAQ.

USAGE GUIDE ADAM INTERNET SPAM FILTER MANAGER

How To Block Ndr Spam

eprism Security Suite

Intercept Anti-Spam Quick Start Guide

the barricademx end user interface documentation for barricademx users

Greylisting has been around since 2003 when Evan Harris wrote the original whitepaper on it as a spam filtering mechanism.

Introduction of the S25R anti-spam system

D3 TECHNOLOGIES SPAM FILTER

Comprehensive Anti-Spam Service

PerfectMail User Guide

ESET Mail Security 4. User Guide. for Microsoft Exchange Server. Microsoft Windows 2000 / 2003 / 2008

services. Anders Wiehe IT department Gjøvik University College

Spam filtering. Peter Likarish Based on slides by EJ Jung 11/03/10

HLI MailGuard For End Users

Panda Cloud Protection

Avira Managed Security (AMES) User Guide

English Translation of SecurityGateway for Exchange/SMTP Servers

Setting up Microsoft Outlook to reject unsolicited (UCE or Spam )

How to minimize SPAM in your CBPref.com Inbox

Transcription:

The Open Source Stack: One approach to spam filtering Chris St. Pierre Unix Systems Administrator Nebraska Wesleyan University

Breaks Administrivia

Administrivia Can turn your cell phone off.

Terminology Spam isn't an abbreviation or acronym. UCE (Unsolicited Commercial Email) and UBE (...Bulk...) Spam is more than spam: phishing, 419 scams, lottery scams, pump and dump, viruses, etc. Things to avoid: False positives (FPs): legit email marked spam False negatives (FNs): Spam marked legit

Goals Make your users happy Users with control are happier than users without control An FP is always worse than an FN

The Stack Approach There's no magic bullet that will kill all spam Zeno's Paradox Every tool we use will get rid of a little more spam Cost-benefit analysis

? Other Approaches Pay someone a lot of money Pure Whitelisting C & R Pray

Disclaimer This is just one approach to spam filtering. There are many other approaches that may be just as effective. Your anti-spam solution must be tailored to fit your environment, not mine. If something I recommend doesn't work for you, ditch it!

1.Honeypots 2.RBLs 3.Greylisting The Stack 4.HELO (and other) restrictions 5.Tarpitting 6.ClamAV 7.SpamAssassin 8.End-user tools 9.Statistics

Order is important If you can discard or reject messages before accepting them, this saves you valuable resources Never accept a message you don't have to

Basics NEVER bounce spam or viruses Don't be a jerk and cause backscatter! Reject with a 5xx error code Discarding is also bad, but sometimes we do it anyway NEVER forward to off-site addresses before filtering You will get blacklisted for spamming

1. Honeypots Create a fake address and publicize it; ban anyone who sends to it Remarkably ineffective Better approach: honeypot MX

Aside: secondary MXes Just Say No.

2. RBLs Realtime Black List (or DNSBL: DNS Black List) Someone else has done all the work for you. Yay! Run a caching nameserver When blocking based on RBL, you must avoid FPs http://www.usenix.org/publications/login/2006-12/pdfs/josephsen.pdf The big question: what RBLs to use?

Live RBL Revue! Only a few are worth considering: zen.spamhaus.org is excellent. Includes SBL, XBL, and PBL. Costs some cash for nonpersonal use; cbl.abuseat.org is free, and is one of their sources SpamCop got a bad reputation early on, but they're doing a great job now (bl.spamcop.net) The Passive Spam Block List (psbl.surriel.com) works much better than you might suspect Nothing else I've found or heard of is worth using

3. Greylisting Overview: Greylisting identifies each message with a unique triplet: sender, recipient, originating server. The first time it sees a given triplet, it gives a 4xx (tempfail) code Legitimate servers will retry, at which point the triplet will be recognized and accepted Spammers don't waste resources on retries Can block a lot of spam

3. Greylisting, continued Greylist on the /24 netblock of the originating server Retry time doesn't matter, because spammers don't retry. (5 minutes is sort of the standard.) Auto-whitelist and auto-blacklist

3. Greylisting, continued Find a greylisting server with a sizable preconfigured whitelist If you have >1 MX, look for a greylisting server that supports a shared database Policyd is wonderful, but is Postfix-only SQLGrey is quite nice and works with both Postfix and Exim RelayDelay is the closest I've found for Sendmail

4. HELO (and other) restrictions Lots of fun stuff! Site-specific whitelists/blacklists Reject non-fqdn HELOs and HELOs with bad syntax Reject mail to unknown recipients! Reject HELOs that resolve to bogons http://www.cymru.com/documents/bogon-bnagg.txt

4. HELO restrictions, continued HELO Randomization Protection (HRP) Reject mail when the HELO name has no MX or A record? Well-configured HELO restrictions can drop about 25% of your spam

5 (or 0). Tarpitting Make a connection very slow (or just pause) Spammers are impatient Claims of 80% block rates Two ways to implement: Pre-MTA wrapper Within the MTA (e.g., milter) Most connections are dropped after about a minute

5 (or 0). Tarpitting, continued Two years ago, this presentation had this line: Tarpitting is fairly new, so software is rare as of this writing Tarpitting never really caught on, so it's still fairly rare. Implementations: GreetPause (sendmail) OpenBSD SpamD Several commercial products

Changeup! Up to here, we've been talking about discarding messages After this, we'll assume you've already accepted the message This is filtering, and it's expensive

Aside: What about filtering integrators? Amavis, MailScanner, etc. Generally, not worth it Not a lot of supplementary functionality of consequence but that's changing They remove you one step from your component configuration, and whether or not they make the integration any easier is up for debate

Aside: What about filtering integrators? Cost: additional complexity of setup and maintenance; one more thing to break Benefit: Some (often minor) features Conclusion: Getting more useful every year

6. ClamAV ClamSMTPD is a great integrator Not just antivirus; anti-phishing par excellence In addition to the standard rules, use http://www.sanesecurity.com/clamav Exclude the SpamDomain rulesets Keep it updated and ClamAV will Just Work Drop viruses on the floor

7. SpamAssassin This could be a class of its own. We'll cover: a)basics b)bayes c)checksumming systems (Razor2, DCC, Pyzor) d)uribl e)sare rulesets f) Plugins g)miscellaneous score adjustments h)alternatives?

a. Basics SpamAssassin does not filter spam SA scores mail with a bunch of tests. Each test can add or subtract a few points to the score. If the mail has over a certain number of points, it gets marked as spam not filtered. The default required_hits value is 5, which tends to work well Keep your rules up to date! SA 3.1+ includes sa-update

b. Bayes You can keep your Bayesian database in either flat files, or in a real DB Use a real database if you have >1 MX Let your users report FPs and FNs, and train Bayes on it Use bayes_auto_learn to ensure a constant feed

b. Bayes, continued Train train train! DO NOT train Bayes on public corpora DO NOT train Bayes on your outgoing mail The SA Bayes engine isn't the greatest One solution (?): crm114 plugin http://mschuette.name/wp/crm114-spamassassin-plugin/

c. Checksumming systems Razor2, DCC, Pyzor They're all free now Razor2 rawks hard DCC gives lots of FPs, because it just measures bulkiness, not spamminess Both Razor2 and Pyzor have very low FP rates

d. URIBL Checks the URLs in an email against a blacklist This is wonderful Crank these scores If none of your top ten rules are URIBL_*, something is wrong

e. Third-Party Rulesets Additional rules that block lots of stock scams, image spam, etc. SpamAssassin Rule Emporium (SARE) Howto: http://daryl.dostech.ca/sa-update/sare/sare-sa-update-howto.txt http://www.rulesemporium.com/rules.htm Most rulesets have 2-4 options, increasing in aggressiveness KAM http://www.peregrinehw.com/downloads/spamassassin/contrib/kam.cf

e. Third-Party Rulesets, cont'd Extra rules from SpamAssassin http://wiki.apache.org/spamassassin/customrulesets See especially the Sought ruleset Sets for other languages

f. Plugins There are lots out there, but four major ones you need to know: Botnet: tries to identify mail from botnets Lots of FPs, not a lot of real positives http://people.ucsc.edu/~jrudd/spamassassin/ PDFInfo: ImageInfo for PDF attachment spam http://www.rulesemporium.com/plugins.htm

f. Plugins, continued ImageInfo: looks for broken or suspicious image attachments Together with the SARE rules, is very good at stopping image spam Doesn't use OCR or other processorintensive tests Consider it a necessity Included in SA 3.2+ http://www.rulesemporium.com/plugins.htm

f. Plugins, continued Custom plugins are beyond the scope of this tutorial Try to write rules instead of plugins Check out http://wiki.apache.org/spamassassin/dumptextplugin for a good sample plugin and a nice place to start

g. Miscellaneous score adjustments Tweak and frob scores to suit your environment Track: Which rules are hitting frequently and what they're hitting on (ham or spam) Which rules give you frequent FPs and FNs

g. Miscellaneous score adjustments Many rules are disabled (score = 0). Enable all tests initially to see if any of the disabled rules hit reliably: egrep 'score.*\s0$' \ /usr/share/spamassassin/50_scores.cf \ awk'{print $1, $2, "0.1"}' > all-rules.cf

h. Alternatives? Dspam, Bogofilter, others Dspam and Bogofilter violate the stack model; they only use Bayes SA uses Bayes, plus other plugins and rulebased tests

8. End-user tools Clients must, at a minimum, be able to report FPs and FNs Learn (with Bayes) and automatically white blacklist per-user based on what they report Let your clients configure their own filtering levels Forget quarantining Policies

8. End-user tools Let your clients configure their own whitelists and blacklists Ideally, whitelisting a sender should get them past RBLs, tarpitting, greylisting, etc., for the recipient(s) who whitelisted them Really really difficult Also ideally, generate whitelists from address books Whitelisting can be dangerous, since it relies on addresses, not Received: headers

9. Statistics You need statistics for four reasons: 1.Everyone likes pretty pictures 2.Track the effectiveness of your filters 3.Plan for and justify growth 4.Spot anomalies

9. What kind of statistics? Both graphs/charts and hard numbers General mail statistics are a prerequisite What is your ratio of ham to spam? How much spam are you delivering to mailboxes? How many viruses are you getting? How much is filtered out by tarpitting/greylisting/rbls/etc.?

9. What kind of statistics? What are your spam scores? (Min/max/avg) Are there arny trends? How long does it take to scan a message? What is your average time-to-delivery? What SA rules are hitting the most? (On ham? On spam?) Which are the best or most reliable rules? What viruses is ClamAV finding?