Fighting Spam with open source software Charly Kühnast Municipal Datacenter for the Lower Rhine Area Internet Infrastructure charly.kuehnast@krzn.de
Introduction: KRZN and spam filtering ~ 11.000 users 2 e-mails per user per day = ~ 22.000 e-mails per day That is, 20k e-mails that we actually want. But we get quite a lot more. Seite: 2
Today: 6.000.000 spam-mails per day 7 6 Spam: millions/day 5 4 3 2 1 0 H1/2007 H2/2007 H1/2008 H2/2008 Seite: 3
Seite: 4 One month of spam
Averages On average,... 99,65% of incoming SMTP traffic is unwanted. we have 5.300 incoming spam-mails per minute However, peaks have reached > 25.000 spams/min. Seite: 5
Seite: 6 An average day:
Seite: 7 So, where does all this spam come from?
Seite: 8 Spam Origins:
Botnets A trojan is written to infiltrate as many PCs (and servers, even) as possible The trojan's author then has full command over that machine. It is now a remotecontrolled bot (or drone) If a sizeable number of PCs have been infected, the lot of them are called a botnet Seite: 9
Botnets are weapons. They can... saturate network connections (ddos) infect other systems to expand the botnet be used for data and identity theft send spam. Seite: 10
Botnets can grow very large Several botnets with more than 1.000.000 drones exist. They are powerful enough to cut whole countries off the internet (which happened to Estland in 2007) Seite: 11
For a fistful of dollars Anyone can rent (a part of) a botnet and make it send spam It's not even expensive ( 1 US$ per Bot per day, chinese botnets are cheaper) Botnets generate a lot of collateral damage, but the ROI is great Seite: 12
Conversion rate Scientists of the UCSD gained control over 80.000 bots (1.5%) of the Storm botnet and tracked its actions for 30 days. For every mail that lead to a purchase of pharmacy products, mails were sent. 12.500.000 Seite: 13
Can botnets be destroyed? It happens, but not very often. In Oct '08, a spammerfriendly hosting provider (McColo) was shut down: Seite: 14
Part II Now you know what the problem is. Let's look at a possible solution. Seite: 15
Seite: 16 DNSBL header checks Address Verification Content Filter Image-spam filter Anti-Virus Spamfilters are step-by-step systems. Each step eliminates more spam. The KRZN filter uses six steps. Open source software is used for each of them. An e-mail that survives all filtering steps is considered clean and may proceed to its final destination.
DNSBL header checks Address Verification Content Filter Image-spam filter Anti-Virus Postfix / PolicyD-weight Postfix / PolicyD-weight Postfix (built-in feature) SpamAssassin + ext. rulesets FuzzyOCR ClamAV + ext. pattern sources Seite: 17
DNSBL Postfix / PolicyD-weight Seite: 18
DNSBL? DNSBL list host Spammer? No mail-out.sender.net my.spamfilter.net Seite: 19
DNSBL DNSBLs are very, very, very effective tools. However, they must be used with care. Is the DNSBL provider trustworthy? What happens when a DNSBL ceases to exist? Why not build your own DNSBL? Seite: 20
Build your own DNSBL Set up a few e-mail accounts without any filtering. Spread these e-mail adresses Poll the accounts once per minute and extract the sending server's IP address Add the IP to your blacklist and have it removed after 48 hours, if no further spam from this IP came in Seite: 21
Ask more than one DNSBL You might want to reject mails only when they are listed in more than one DNSBL. 01 ## DNSBL settings 02 @dnsbl_score = ( 03 #HOST, BAD SCORE, GOOD SCORE, LOG NAME 04 'list.dsbl.org' 3.5, 0, 'DSBL_ORG', 05 'cbl.abuseat.org' 3.5, 0, 'ABUSEAT', 06 'sbl.hsnr.de', 3.5, 0, 'HSNR_DE', 07 ); Seite: 22
header checks Postfix / PolicyD-weight Seite: 23
Header Checks With access to the mail headers, a policy daemon can Seite: 24 - throttle connection if too many mails - come in from the same sender - come in to the same recipient - make use of - greylisting - SPF/DKIM checks - HELO checks
Header Checks Incidentally, I'm not making this names up... HELO randomization (same server, different HELO): Apr 24 12:41:11 connect from rectal.post.ru[83.102.180.3] Apr 24 12:41:32 connect from triplex.post.ru[83.102.180.3] Apr 24 12:42:04 connect from hole.post.ru[83.102.180.3] Occasionally, a spammer will use your own server's name as a HELO string... Seite: 25
Address Verification Postfix (built-in feature) Seite: 26
Address verification Recipient address verification: Mails to non-existent addresses should be rejected as early as possible. Sender address verification: Mails from non-existent addresses are considered bad form. However, this doesn't stop people from sending them (newsletters, order confirmations...) Seite: 27
Address verification Recipient address verification is easy if you have a list of all valid addresses. Needless to say, usually you don't, because there are lots of different mail servers in your organization. The solution is to have your spam filter make dummy connections to the destination mail server. Seite: 28
Address verification To: charly@entropy.de Does charly@entropy.de exist? Spamfilter Mail Server Seite: 29
Content Filter SpamAssassin + ext. rulesets Seite: 30
The content filter is depicted here as a single step. Which is wrong. Content Filter Seite: 31
Content Filter SpamAssassin: hundreds of individual checks are applied to the content and structure of the e-mail. If one check is a hit, points are added to the mail's total spam score. Seite: 32
Content Filter A spam mail, 18-Nov-08: From: "Dickson"<support@leadsandmails.com> Subject: INVESTIGATION ON BEHALF OF OUR BANK Date: Tue, 18 Nov 2008 11:28:20-0000 To: undisclosed-recipients:; Dear Sir/Madam, I am conducting a standard process investigation on behalf of our Bank an international banking conglomerate. This investigation involves a client and also the circumstances surrounding investments made by this client with our Bank. Our client died intestate and nominated no successor in title over the investments made with our bank. The essence of this communication with you is to request you provide us information/comment on this issue so that I can use my position in the bank to establish your eligibility to assume status of successor in title to the deceased. Best regards, Seite: 33
Content Filter...and what the content filter made of it: X-Spam-Score: 16.376 X-Spam-Report: * 2.0 RCVD_IN_BL_SPAMCOP_NET RBL: Received via a relay in bl.spamcop.net * [Blocked - see <http://www.spamcop.net/bl.shtml?217.171.129.66>] * 0.6 SPF_SOFTFAIL SPF: sender does not match SPF record (softfail) * 2.1 SUBJ_ALL_CAPS Subject is all capitals * 1.6 DEAR_SOMETHING BODY: Contains 'Dear (something)' * 0.0 BAYES_50 BODY: Bayesian spam probability is 40 to 60% * [score: 0.5368] * 0.5 RAZOR2_CHECK Listed in Razor2 (http://razor.sf.net/) * 1.5 RAZOR2_CF_RANGE_E4_51_100 Razor2 gives engine 4 confidence level * above 50% * [cf: 100] * 0.5 RAZOR2_CF_RANGE_51_100 Razor2 gives confidence level above 50% * [cf: 100] * 3.7 PYZOR_CHECK Listed in Pyzor (http://pyzor.sf.net/) * 0.0 DIGEST_MULTIPLE Message hits more than one network digest check * 0.8 MSOE_MID_WRONG_CASE MSOE_MID_WRONG_CASE * 3.1 FORGED_MUA_OUTLOOK Forged mail pretending to be from MS Outlook Seite: 34
Content Filter If the total score exceeds a warning threshold, the mail's subject line will be modified: [*Spam?*] original subject line If the score exceeds a kill threshold, it will be quarantined. Seite: 35
Content Filter SpamAssassin comes with a large set of anti-spam rules, but you can still add more to it. Sa-Update will fetch rules from the SpamAssassin Rule Emporium (SARE) and various other sources like - openprotect.com - daryl.dostech.ca Seite: 36
Content Filter sa-update example: sa-update -D --channelfile /etc/spamassassin/channels.text --gpgkeyfile /etc/spamassassin/keys.text channels.text: updates.spamassassin.org saupdates.openprotect.com 70_sare_stocks.cf.sare.sa-update.dostech.net 70_sare_adult.cf.sare.sa-update.dostech.net [...more...] Seite: 37
Image-spam filter FuzzyOCR Seite: 38
Spam containers Spammers usually use text-only or HTML messages. But sometimes containers are used, such as - Images, e.g. animated.gifs - PDFs - Flash -.doc,.rtf,.ppt - MP3 Seite: 39
Seite: 40 Image Spam
Image to text FuzzyOCR extracts text from images and feeds it into SpamAssassin's content filter. FuzzyOCR even works with images that are - distorted, - animated, - only partly readable Seite: 41
Seite: 42 Spam containers
Seite: 43 Spam containers
Anti-Virus ClamAV + ext. pattern sources Seite: 44
Virus Filter With ClamAV, you can use virus patterns that you have made yourself (or someone you trust). These unofficial pattern files can be used to catch anything, not just viruses or malware. Seite: 45 For example, they can be aimed at spam (suprise!), phishing and attachments that aren't exactly spam, but unwanted nonetheless.
Virus Filter: third-party files Seite: 46 SaneSecurity and MSRBL provide pattern files for ClamAV and a shell script ( unofficial-sigs.sh ) to download them. rsync://rsync.sanesecurity.net/sanesecurity/phish.ndb rsync://rsync.sanesecurity.net/sanesecurity/scam.ndb rsync://rsync.sanesecurity.net/sanesecurity/junk.ndb rsync://rsync.sanesecurity.net/sanesecurity/rogue.hdb rsync://rsync.sanesecurity.net/sanesecurity/spear.ndb rsync://rsync.sanesecurity.net/sanesecurity/spamimg.hdb rsync://rsync.sanesecurity.net/sanesecurity/lott.ndb rsync://rsync.sanesecurity.net/sanesecurity/spam.ldb rsync://rsync.mirror.msrbl.com/msrbl/msrbl-images.hdb rsync://rsync.mirror.msrbl.com/msrbl/msrbl-spam.ndb
Seite: 47 Virus Filter: third-party files
Your own AV patterns HTML.Phishing.Bank-66:3:*:6c696d6974656420616363657373 HTML.Phishing.Bank-66: name (shows up in logfile) 3: file type, 3 = HTML *: Offset 6c696d6974656420616363657373: hex-encoded string echo "limited access" sigtool --hex-dump Seite: 48
Your own AV patterns Creating pattern files against Container spam is even easier. sigtool --md5 thisisspam.gif >> /path/to/my-patterns.hdb Seite: 49
Body count DNSBL Kills 97% of incoming spam header checks Address Verification Content Filter Kills 3 %of incoming spam Image-spam filter Anti-Virus Seite: 50
Seite: 51
Fighting Spam with open source software Thank you! Questions?