Why Content Filters Can t Eradicate spam

WHITEPAPER Why Content Filters Can t Eradicate spam About Mimecast Mimecast () delivers cloud-based email management for Microsoft Exchange, including archiving, continuity and security. By unifying disparate and fragmented email environments into one holistic solution that is always available from the cloud, Mimecast minimizes risk and reduces cost and complexity, while providing total end-toend control of email. Founded in the United Kingdom in 2003, Mimecast serves over 5,000 customers worldwide and has offices in Europe, North America, Africa and the Channel Islands. For more information, please visit or email info@mimecast.com.

Contents 03 Defining the Problem 03 Current Spam Management Systems 04 Implications of Content Filtering for Spam Capture 04 Quarantining 04 False Positives 05 Connection Filtering 05 The Mimecast Approach to Spam ARMed SMTP 2

Defining the Problem Spam means different things to different people. Some people define any unwanted or unsolicited email as spam (even if they asked for it but don t want it anymore) while others consider all bulk emails (both solicited and unsolicited newsletters and marketing announcements) to be spam. Many people label any email from people they don t know as spam. This range of perceptions, coupled with spammers ability to vary message content to appear more or less like spam, means it is difficult for a software product to completely eliminate it. As a result, the fight against spam has become an arms race with spammers and Secure Email Gateway vendors trying to outfox one another. Legal authorities have engaged in different activities to address the issue, from shutting down sympathetic hosting providers, to working behind the scenes to kill botnets. While not wholly effective, the process has at least forced unscrupulous email senders to operate from the shadows to avoid the risk of legal prosecution and more interestingly, it has provided mechanisms for simpler identification of friend from foe. Recent statistics suggest that spam outweighs legitimate email by more than four to one. Practically all businesses now have spam management filters in place, but most have not yet been able to find a solution to the problem, and often the implementation of a spam filter has merely moved the spam problem and created a number of other message delivery issues. Current Spam Management Systems The vanguard of current spam management systems has been the content filter. These spam filters use various policy driven techniques applied to the words and body content of email in order to determine whether the email should be delivered to the recipient or not. The most common content filtering techniques employ Heuristic analysis and Bayesian analysis. We can loosely describe the technical process involved in applying these techniques as analyzing of the words or makeup of emails in order to detect a pattern, then comparing that pattern to a database of known bad or malicious email content. Examining words or phrases commonly used in spam is a simple approach to the problem, but is not completely effective because it assumes that those words and phrases will not appear in legitimate emails. More sophisticated techniques apply weighting scores to known bad words or phrases and if the total score for the email exceeds a certain limit, then it considers the email to be spam. Some content filtering systems try to learn about what types of emails the recipient prefers as distinct from those that are rejected most others require a regular update of known bad phrases and settings. Some email security companies have even gone as far as to suggest that their systems are close to artificial intelligence. But, the arms race still continues as the spammers learn how to circumvent the latest techniques designed to defeat them. 3

Implications of Content Filtering for Spam Capture The problem with content filtering is not that it cannot block spam, but rather that it is the most expensive method, in terms of technical processing, as well as the cost of deploying and maintaining that solution. Content filtering deals with an infinite number of variables, which makes it error prone. However, the key failure in the process is the flawed assumption that a software program can determine what emails an individual would like to receive. Systems using content as the only consideration are easily broken when people subscribe to useful and legitimate email newsletters which have much in common with spam emails. Many legitimate emails will also be marked as spam by a content filter so called false positives - if they include common spam words and phrases. This issue is a particular problem in certain industries such as healthcare and financial services where their keywords are frequently the subject matter of real spam. Quarantining To try and combat the problem of false positives that is symptomatic of content filters, vendors added another break in the SMTP chain, called quarantining. Almost all email security systems available on the market today use a quarantine folder to store all emails marked as spam. Email administrators have traditionally been required to manually sift through these quarantines on a daily or hourly basis looking for false positives. Often, the first indication that an email has been mistakenly identified as spam is when the sender calls to ask why there has been no response to an email. In a business context, that can mean lost sales and clients. Some solutions require the end user to review their own quarantine folders in order to locate incorrectly classified emails. In this scenario, they may as well receive the spam directly because the time required to scan the quarantine is the same amount to view an overburdened email inbox. Quarantines create new problems without solving the existing ones. They merely move email delivery problems around, placing an additional burden on the email administration team or the end user. False Positives The more aggressive a spam filter becomes, the more likely it is to reject legitimate emails along with spam emails. This condition is called a false positive. Email vendors have traditionally focused on the idea that NO spam should get through to the end user. However, this approach does not take into account the fact that a single email, incorrectly blocked, is many times more costly to deal with than one junk email received. It is therefore incorrect to assume that 1% or even 0.1% of emails incorrectly classified as spam is acceptable. Email administrators have begun to realize that the cost of false positives greatly outweighs the cost of dealing with the spam itself. False positives represent a breakdown in communication, lost opportunities and productivity, and mistrust of email as a communications medium. 4

Connection Filtering More advanced Secure Email Gateway vendors, particularly those delivering their service from the cloud are able to offer an additional level of protection before the email is processed by the content filtering engines. Connection filtering seeks to examine the source, and in some cases the destination, of the email to determine whether or not the sender has a reputation within that system already. Reputations can be classified into a number of groups. Globally Known Bad would imply the sender already has a bad reputation maintained on the wider Internet, so would be listed by their IP address in an RBL or Realtime Block List. Locally Known Bad would imply the anti-spam solution maintains its own database of known bad senders; some email administrators achieve this manually with organization-wide block lists. Good reputations would be classified as Locally Known Good, where the Anti-spam solution is automatically maintaining a database of known good communication pairs, i.e. the email addresses and IP addresses your users regularly send email to. Anything received from those external contacts can be assumed to be Known Good. Building a reputation with a Secure Email Gateway is also possible, provided that the gateway supports RFC compliance checking techniques like Gray Listing. Passing such a test implies that whatever is sending an email has queued and retried at RFC compliant intervals, so is likely to be a legitimate SMTP Server. The Mimecast Approach to Spam ARMed SMTP Mimecast precisely and correctly blends advanced reputation and protocol connection techniques into a powerful and effective anti-spam system. This sophisticated capability is the result of the Mimecast platform s market-leading Mail Transfer Agent (MTA) architecture and the unique in-protocol and connection level anti-spam tests that we apply. While other vendors have supplemented their legacy content filtering approaches with some standard connection filtering features to shore up their solutions, Mimecast ARMed SMTP centers on this progressive methodology, reducing the reliance on content examination techniques for spam detection. Mimecast can intercept the majority of spam email without examining the body content of an email because our experience and our technology lets us identify patterns of SMTP delivery behavior that are typically exhibited by spammers. We identify these patterns by the way spammers deliver their emails and not according to the content of the emails themselves. This approach offers profound benefits. Mimecast leaves spam undelivered with the spammers so there is no local bandwidth loading; it significantly reduces the overhead of managing quarantine folders and the integrity of the SMTP protocol is maintained so that no legitimate messages go missing. ARMed SMTP is a way of dealing with emails and spam, developed for the way email is used today, rather than how it was originally expected to be used twenty years ago. Our approach provides a more successful, robust, cost effective and intelligent solution to the problem of spam and spam management. 5 2012 Mimecast. ALL RIGHTS RESERVED. WHI-WP-069-001