Software Engineering-Based Design for a Bayesian Spam Filter
|
|
- Evan Barton
- 2 years ago
- Views:
Transcription
1 Al-Khwarizmi Engineering Journal, Vol. 6, No. 2, PP (2010) Al-Khwarizmi Engineering Journal Software Engineering-Based Design for a Bayesian Spam Filter Mumtaz Mohammed Ali AL-Mukhtar College of Information Engineering/AL-Nahrain University (Received 17 March 2008; accepted 9 September 2009) Abstract The rapid spread and the easy availability of a free service have made it the medium of choice for the sending of unsolicited advertising and bulk in general. These messages, known as junk or spam mail, are an increasing problem to both Internet users and Internet service providers (ISPs). The research resolves one aspect of the spam problem by developing an appropriate filter for the client. The proposed filter is a combination of three forms of filters: Whitelist, Blacklist, and a Bayesian filter. Whitelist-based filter only accepts s from known addresses. Blacklist filter blocks s from addresses known to send out spam. Bayesian content-based filter makes estimations of spam probability based on the text and filters messages based on a pre-selected threshold. The Bayesian filter is selected to be the main filter. The Bayesian filter is manually trained on a set of gathered e- mails; some of them are spam and the others are legitimate based on the contents of an . Thereafter the classification phase has been implemented for new entered s. All the required databases are constructed in form of tables stored in the Structured Query Language (SQL) server. The filter at the client side can transparently access the database in order to carry on the intended filtering. The proposed system ( client interface and the filters) can manage messages written in both Arabic and English languages which is crucial for the users in our region. Software engineering principals are implemented throughout the design process to make the system less vulnerable to faults and easily maintained. The design steps have followed the Waterfall-model using the ASCENT software. A user-friendly interface has been developed to access the features of the spam filter at the client side. Visual Basic version 6 has been used to develop the system. As well, the SQL server has been implemented to build and process the database. A number of performance measurements have been carried out with asset of gathered s. The results are used to evaluate the performance of the filter and to prove its efficiency. Keywords: Spam, client , bayesian filter, SQL server, waterfall model. 1. Introduction In the past few years, Internet Technology has affected our daily communication style in a radical way: the electronic mail ( ) concept is used very extensively for communications nowadays. This technology makes it possible to communicate with many people simultaneously in a way so easy and cheap that it is currently considered the first worldwide into business sector [1]. However, the abuse of s has the drawback that the volume of s that show up in mailboxes has been exponentially increasing. Moreover, many s are received by users without there desire: spam mail (or junk mail or bulk mail ) is the general name used to denote these types of . Spam mails, by definition, are the electronic messages posted blindly to thousands of recipients, usually for advertisement, and represent one of the most serious and urgent information overload problems [1, 2]. Spam has caused some serious problems. Firstly, it wastes a mass of network resources that are very important for network users, especially those in enterprises or corporations. People need to spend a lot of time to deal with spam every day. Even worse, many current spam mails bring users
2 unexpected malicious attachments which would seriously crack the user s system. Therefore, spam is a headachy problem [3]. Spam filtering (i.e., distinguishing between spam and legitimate messages) is a commonly accepted technique for dealing with spam [4]. Spam filters vary in functionality from black-lists of frequent spammers to content-based filters. The latter are generally more powerful, as spammers often use fake addresses. Existing content-based filters search for particular keyword patterns in the messages. These patterns need to be crafted by hand, and to achieve better results they need to be tuned to each user and to constantly maintained [5,6]. In this paper the spam filtering has been addressed with the aid of Bayesian classifier, which learn to identify spam after receiving training on messages that have been classified as spam or non-spam (legitimate). 2. System Components is a set of mechanisms designed to allow computer users send messages to one another. At the most basic level the system can be divided into two principles components [7]: A. Servers: which are hosts that deliver, route, and store messages. B. Clients: which interface with users and allow users to read, compose, send, and store e- mail messages. At the lowest levels, it can get quite complicated. At the highest level it is quite simple, consisting of mail messages, a protocol for moving those messages from place to place, and an interfaces for users to perform various related tasks. An message by itself is of a limited usefulness. There is a need to be a way to move it from one location to another. This work is divided into several tasks [8]: A. MTA (Mail Transfer Agent): that routes e- mail. B. MDA (Mail Delivery Agent): that delivers it on behalf of an MTA. C. MUA (Mail User Agent): which provides an interface for the user to send and receive messages. 3. protocols To carrry out the project two main protocols are required which are SMTP, POP3 [9]. A ) SMTP (Simple Mail Transfer Protocol) It is implemented as a communication protocol between two machines, a client and a server. SMTP is also a store and forward protocol, meaning that it permits a message to be sent through a series of servers in order to be delivered on an end destination. Servers store incoming messages in a queue to wait attempts to transmit them to the next destination. The next destination could be a local user or another mail server. B ) POP3 (Post Office Protocol) This protocol allows a client machine to connect to a remote mail server, and download any mail to a local mailbox. At the highest level, POP3, like SMTP, is implemented as a communication protocol between two machines: a client and a server [9]. 4. Proposed Spam filter The proposed filter is designed to be a combination of three layers that work altogether. White-list layer only accepts s from known addresses. Black-list layer blocks s from addresses known to be spam. Content-based filter makes estimation of spam likelihood based on the text of the and filter messages according to a pre-selected threshold. The content based filter uses a Bayesian algorithm to estimate the probability of a message being a spam. A database is constructed in form of tables stored in SQL server. The filter at the client side can transparently access the database in order to carry on the intended filtering. 4.1 Whitelist versus Blacklist The whitelist makes a decision according to the message sender address. It compares this message address to the table in the SQL server that contains all of the message addresses that are to be accepted by the user. However, the blacklist compares the message address to the table in the SQL server that 84
3 contains all message addresses that are rejected by the user. When the user receives a spam message, he can add the address of the sender to his blacklist. 4.2 Bayesian Algorithm 1) Split in tokens. i. Need number of messages for spam and legitimate. ii. Need frequency of each word for each type. 2) Calculate probabilities i. P(legitimate) = word frequency /number of legitimate messages. ii. P(spam) = word frequency/ number of spam messages. 3) Calculate likelihood of being spam (spamicity) using a special form of Bayes Rule where likelihood = a/(a + b), where a is the probability of a legitimate word and b is the probability of spam word. 4) Choose tokens whose combine probability is farthest from 0.5 either way. This is because the farther it is from 0.5 (neutral), with more certainty we can say it belongs to either strategy. i. Do this for n numbers for instance choose to have 15 extremes. ii. Combine their probability to get a figure for message using Bayes Rule. In basic terms, Baye s Rule determine the probability of an event occurring based on the probabilities of two or more independent evidentiary events. For three evidentiary events a, b, and c, the probability is equal to [10]: iii. a b c a b c ( 1 a ) * ( 1 b ) * ( 1 c ) (1) In this fashion, the rule can be expanded to accommodate any number of evidentiary events. If the end result is closer to 1.0, then the message is classified as spam, and if it is closer to 0.0, the message is classified as legitimate. The cutoff range that has been specified for spam is that it should be greater than 0.85, but spam results are above Spam Blocking System Design Data flow diagrams are developed to give the details of constructing the whole spam blocking system. These data flow diagrams are constructed according to the waterfall model using the ASCENT software [11]. The system design is divided into 6 levels, which describe the operations (processes) involved and how data flow across the system. Figure 1 depicts the basic system level (level 0). Fig.1. System level 0. When a message enters the system, a decision has to be made by the spam blocking system where to put the message. If the message is spam, it will be submitted to the bulk box, otherwise (i.e., legitimate message) it is submitted to the inbox. Re-enter direction means that a spam message has been wrongly treated as legitimate, which causes the spam message to enter the inbox as the blocking system could not discern it correctly. However, it is up to the user to set this message as a spam and add it to the training corpus. So, for the next time if this message or a similar one enters the system, it will be recognized and added to the bulk box. A star symbol has been used in figure 1 and next figures to indicate that the designated block comprises more details to be exploded in the next level. Figure 2 illustrates level one in the system data flow diagram. This level has three processes: Mail Server: accepts the message from the client and delivers it to another client according to the address. Client: receives the message from the mail server and decides whether it is a spam to send to the bulk box or it is a legitimate to send to the inbox. 85
4 SQL Server: this process contains the database, which the client relies on to make the decision concerning the received message. So, by this process the database is read, updated, and changed. Fig.2. Level 1. Level 2 which is depicted in figure 3 is an exploding of the client process that contains two processes: Mail User Agent (MUA): sends and receives messages from the mail server, and it is responsible for providing an interface for users to manage their mail. This management typically includes viewing messages, managing mail folders, and composing new s, as well as replying to a message and sending an existing message to other users. Filtering Program: checks the received message from the MUA whether it is a spam or ham. Thereafter the SQL server database is updated. Fig.3. Level 2. Level three, which is exploding of the MUA is illustrated in figure 4. It is comprised of two processes: Fig.4. Level 3. 86
5 POP3: provides a protocol for MUA to download the user's mailbox from a remote server. SMTP: implements the communication protocol between two machines that is a client and a server. In its simplest form, a message is sent from a user on one machine to a mailbox in another machine. Figure 5 depicts level four, which is the exploding of white list and blacklist checking that contains two processes: Whitelist Checking: in this process the message ID is compared with the one stored in the SQL server, which contains the addresses of the IDs of allowed users. If a match occurs, the SQL server database will be updated and the message is added to the inbox. Otherwise, in case of no match, the ID will go through another checking which is the black list. Blacklist Checking: this process makes a comparison between the messages ID that is not found in the white list table in the SQL server database with the one that contains all the addresses that are not accepted by the user. If a match occurs, the SQL server database will be updated and the message is added to the bulk box. Otherwise, the message enters to the reduction message process. Fig.5. Level 4. Figure 6, which constitutes level five, is the exploding of filtering program that contains four processes, which are: Black List & white List Checking: this process is mentioned earlier in level four. Bayesian Filter: this process is the main part in the blocking system. It does several activities: accepts the message from the reduction process, reads the database from the SQL server to make its decision for spam or legitimate, updates the database for the existence words in the message, and changes the database if a false positive Move to Spam: sometimes the message enters the inbox as legitimate, but it is really misclassified by the filtering, i.e., a false positive case has encountered. 87
6 TP R TP FN (3) When filtering spam, it is worth noting that misclassifying a legitimate mail as spam is much more severe than letting a spam message pass the filter. Letting a spam go through the filter generally does no harm while blocking an important personal mail as spam can be a real disaster. The usual precision/recall measures tell little about a filter s performance when false positive and false negative are weighted differently. To introduce some cost sensitive evaluation measures that assign false positive a higher cost than false negative, a weighted accuracy (WAcc) and weighted error rate (Werr =1-Wacc ) measures specially tailored for this scenario can be used. Wacc and Werr was introduced by Androutsopoulos et al. [13] as: n L L n S WAcc NL NS n L S n S WErr NL NS S L (4) (5) Fig.6. Level 5. In this case the user could decide that such a message will be considered as a spam in future classifications by activating the re-enter the message to the filter again. So, the database in the SQL server will be updated accordingly. 6. Performance Measures The performance measures used in this paper are introduced. Let S and L stand for spam and legitimate message, respectively. n L L, n S S denote the numbers of legitimate and spam messages correctly classified by the system i.e., true positive (TP) and true negative (TN) respectively. n L S represents the number of legitimate messages misclassified as spam i.e., false positive (FP), and n S L is the number of spam messages wrongly treated as legitimate i.e., false negative (FN). Then spam precision (P), and spam recall (R) are defined as follows [12]: where NL is the total number of legitimate messages, and NS denotes the total number of spams. WAcc treats each legitimate message as if it were λ messages: when false positive occurs, it is counted as λ errors; and when it is classified correctly, this counts as λ successes. The higher λ is, the more cost is penalized on false positives. Androutsopoulos et al. [11] also introduced three different values of λ: λ = 1, 9, and 999. When λ is set to 1, spam and legitimate mails are weighted equally; when λ is set to 9, a false positive is penalized nine times more than a false negative; for the setting of λ = 999, more penalties are put on false positive: misclassification a legitimate mail is as bad as letting 999 spam messages pass the filter. Such a high value of λ is suitable for scenarios where messages marked as spam are deleted directly. In practice, when λ is assigned a high value (such as λ = 999), WAcc can be so high that it tends to be easily misinterpreted. To avoid this problem, it is better to compare the weighted accuracy and error rate to a simplistic baseline i.e., legitimate messages are never blocked and spams can always pass the filter. P TP...(2) TP FP 88
7 7. System Evaluation To examine the performance of the proposed spam blocking system, a testing corpus of 800 e- mails; 400 for spam and 400 for ham s has been used. In order to evaluate the effect of Whitelist, Blacklist, and the Bayesian filter that are used for detecting spam, each layer is tested separately as shown in figure (7). the other two methods. This is because the Bayesian filter tests the message body rather than the addresses. Figure (8) reveals that the spam detection percentage increases with the increasing number of incoming s. This is because the statistical filter would learn more with more various features that arrive with the tested s. Fig.7. Spam Filter Detection Versus Layer. The results obtained are summarized as follows: Spam Detected by Bayesian Filter: the system is in a state of testing and this means that the system has already been trained. For that reason the detection percentage starts from a good percentage, which is 82%. Thereafter, as more learning is carried out, spam detection could approach 100% percentages. Spam detected by Blacklist: this method depends on the addresses of spammers that are stored in the Blacklist table. Initially it starts from zero and as much as the new s are entered as much as the detection increases. That is because when the new is detected as spam its address is automatically saved in a Blacklist table. Legitimate detected by Whitelist: this method depends on the s headers that are addresses of non-spammers. The detection line starts from 10% and it vibrates around this percentage. This is because of that the new e- mails entered are from persons that were initially entered in the whitelist. It may be concluded that the Bayesian filter had detected more Spam percentages rather than Fig.8. Spam Detected by the Blocking System. Figure (9) examines the effect of four measures TN, TP, FN, and FP on the tested corpus. It reveals that TN and TP increase as more messages are tested whereas FN and FP decrease as the filter has learned more. Fig.9. Percentages of Spam &Legitimate Classification. The recent results of the filter have been used to construct table (1) and figure (10) to summarize the classification results. 89
8 Table 1, Classified s by the Proposed Filter Classification TN L L TP S S FN S L FP L S number Fig.10. Percentages of Filter Classification. Table (2) shows the results of classification with three values of. As can be seen from the table WErr is acceptable low, and it is noticed that with increasing value the WErr decreases. That means the filter has a good low cost resulted from L S error type. Also as can be observed that the smallest value of has TCR 9.3, which indicates the high performance of the filter True Negative (TN) True Positive (TP) False Negative (FN) 0.04 False Positive (FP) Table 2, Performance Evaluation Results of the Proposed Filter with Three Values of Description No. WAcc WErr SP SR TCR s evaluation performance Conclusions Based on the research undertaken, the analysis and design processes of the filter and the implementation of it, several key results have been identified: The proposed filter is an aggregation of whitelist, blacklist in addition to the Bayesian filter. This has been contributed in increasing filter efficiency, decreasing false positive, and reduces classification time. Bayesian filtering includes the elements of changeability. That is it is constantly selfupdating which makes it very difficult for spammers to circumvent. The Bayesian method is multilingual. So, it is efficiently adapted in this work to manage English, Arabic, or mixed linguistic messages. The proposed solution is a client side-filtering approach that adds convenience to the user. This allows the users to custom the filter based on their interests and needs. A number of performance measurements have been carried out. The results ensure performance of the filter and its efficiency. 9. References [1] Lorenzo Lazzari, Marco Mari and Agostino Poggi, CAFÉ Collaborative Agents for Filtering s, Proceedings of the 14 th IEEE International Workshops on Enabling Technologies (WETICE 05), [2] Xiao-Lin Wang and Ian Cloete, Learning to Classify A Survey, Proceedings of the Fourth International Conference on Machine Learning and Cybernetics, pp , August [3] Yang Li, Binxing Fang, Li Guo and Shen Wang, Research of a Novel Anti-Spam Technique Based on Usere s Feedbach and Improved Naïve Bayesian Approach,
9 [4] Calto Pu, Steve Webb, Oleg Kolesnikov, Wenke Lee, and Richard Lipton, Towards the Integration of Diverse Spam filtering Techniques, Proceedings of the 22 nd International Conference on Data Engineering Workshops (ICDW"06), [5] Takamichi Saito, "Anti-Spam System: Another Way of Preventing Spam", Proceeding of the 16 th International Workshop on Database Systems Applications (DEXA"05), [6] Paul Heymann, Georgia Koutrika, and Hector Garcia-Molina, "Fighting Spam on Social Web Sites: A Survey of Approaches and Future Challenges", IEEE Internet Computing, November [7] Behrouz A. Forouzan, Data Communication and Networking, McGraw-Hill, [8] Fred Halsall, Computer Networking and the Internet, Addison-Wesley, [9] Kevin Johnson, Internet Protocols, Developers Guide, Addison-Wesley Longman Inc., [10] A. McCallum and K. Nigam, A Comparison of Event Models for Naïve Bayes Text Classification, Journal on Machine Learning Research 3, pp , [11] Yeates D., and Cadle J., Project Management for Information Systems, 2 nd edition London: Finintial Times Management. [12] Wen-Feng Hsiao, Te-Ming Chang, and Guo-Hsin Hu A Cluster-based Approach to Filtering Spam under Skewed Class Distributions, Proceedings of the 40 th Hawaii International Conference on System Science, pp.1-7, [13] I. Androutspoulos, G.Sakkis, C.D. Spyropoulos, and P. Stamatopoulos, "Learning to Filter Spam s: A comparison of a Naïve Bayesian and a Memory-Based Approach", Proceedings of the Workshop: Machine Learning and Textual Information Access, pp.1-13,
10 ممتاز محمد علي المختار مجلة الخوارزمي الھندسیة المجلد 6 العدد 2 صفحة (2010) تصمیم مرشح Bayesian لرساي ل الدعایة یعتمد ھندسة البرامجیات ممتاز محمد علي المختار كلیة ھندسة المعلومات/ جامعة النھرین الخلاصة الانتشار االسریع و توفر السھل لخدمة البرید الاكتروني المجاني جعلا منھ وسطا مختارا لارسال برید الاعلانات الغیر مرغوبة و برید الدعایة بشكل عام. ھذه الرساي ل والمعروفة بالبرید التافھ او (spam) مشكلة متزایدة لكل من المستعملین و مزودي خدمة الانترنت.(ISP) یقدم البحث حلا لاحدى جوانب مشكلة رساي ل الدعایة البیضاء (Whitelist) القاي مة السوداء (spam) من خلال تطویر مرشح ملاي م لبرید المستفید client).( المرشح المقترح یتكون من ثلاثة اجزاء تعمل معا: القاي مة.Bayesian و مرشح (Blacklist) یسمح مرشح القاي مة البیضاء باستقبال الرساي ل البریدیة من عناوین معروفة للمستفید. بینما یمنع مرشح القاي مة السوداء استقبال الرساي ل البریدیة من عناوین عرفت بارسالھا لرساي ل الدعایة. یعتمد مرشح Bayesian في تقدیراتھ على محتوى الرساي ل ویرشح ھذه الرساي ل نسبة ال معیار (threshold) محدد سلفا. تم بناء قواعد البیانات المطلوبة بشكل جداول تخزن في خادم ال.SQL المرشح المقترح للمستفید یمكن ان یصل الى قواعد البیانات ھذه بشكل شفاف لكي یتمكن من تنفیذ الترشیح المطلوب. النظام المقترح یتعامل مع رساي ل الدعایة التي تكتب في كلتا اللغتین العربیة و الانكلیزیة و الذي یعتبر امرا ھاما للمستفیدین في منطقتنا. تم اعتماد مبادئ ھندسة البرامجیات خلال تصمیم النظام مما یجعل النظام اقل عرضة للاخطاء وادامتھ اسھل. خطوات التصمیم نفذت باستحدام نموذج Visual تم تطویر واجھة للمستفید سھلة الاستخدام للحصول على مزایا المرشح المقترح. تم استخدام بیي ة Basic 6.ASCENT وبرامجیات Waterfall لبناء النظام كما استخدم SQL Server لبناء وتنفیذ قواعد البیانات المطلوبة. تم استخدام عدد من مقاییس الاداء و استحصال النتاي ج التجریبیة مع مجموعة من البرید المجموع لتقییم الاداء للمرشح المقترح واثبات كفاي تھ. 92
AN E-MAIL SERVER-BASED SPAM FILTERING APPROACH
AN E-MAIL SERVER-BASED SPAM FILTERING APPROACH MUMTAZ MOHAMMED ALI AL-MUKHTAR College of Information Engineering, AL-Nahrain University IRAQ ABSTRACT The spam has now become a significant security issue
Lan, Mingjun and Zhou, Wanlei 2005, Spam filtering based on preference ranking, in Fifth International Conference on Computer and Information
Lan, Mingjun and Zhou, Wanlei 2005, Spam filtering based on preference ranking, in Fifth International Conference on Computer and Information Technology : CIT 2005 : proceedings : 21-23 September, 2005,
eprism Email Security Appliance 6.0 Intercept Anti-Spam Quick Start Guide
eprism Email Security Appliance 6.0 Intercept Anti-Spam Quick Start Guide This guide is designed to help the administrator configure the eprism Intercept Anti-Spam engine to provide a strong spam protection
Bayesian Spam Filtering
Bayesian Spam Filtering Ahmed Obied Department of Computer Science University of Calgary amaobied@ucalgary.ca http://www.cpsc.ucalgary.ca/~amaobied Abstract. With the enormous amount of spam messages propagating
Savita Teli 1, Santoshkumar Biradar 2
Effective Spam Detection Method for Email Savita Teli 1, Santoshkumar Biradar 2 1 (Student, Dept of Computer Engg, Dr. D. Y. Patil College of Engg, Ambi, University of Pune, M.S, India) 2 (Asst. Proff,
An Efficient Spam Filtering Techniques for Email Account
American Journal of Engineering Research (AJER) e-issn : 2320-0847 p-issn : 2320-0936 Volume-02, Issue-10, pp-63-73 www.ajer.org Research Paper Open Access An Efficient Spam Filtering Techniques for Email
AN EFFECTIVE SPAM FILTERING FOR DYNAMIC MAIL MANAGEMENT SYSTEM
ISSN: 2229-6956(ONLINE) ICTACT JOURNAL ON SOFT COMPUTING, APRIL 212, VOLUME: 2, ISSUE: 3 AN EFFECTIVE SPAM FILTERING FOR DYNAMIC MAIL MANAGEMENT SYSTEM S. Arun Mozhi Selvi 1 and R.S. Rajesh 2 1 Department
Anti Spamming Techniques
Anti Spamming Techniques Written by Sumit Siddharth In this article will we first look at some of the existing methods to identify an email as a spam? We look at the pros and cons of the existing methods
Achieve more with less
Energy reduction Bayesian Filtering: the essentials - A Must-take approach in any organization s Anti-Spam Strategy - Whitepaper Achieve more with less What is Bayesian Filtering How Bayesian Filtering
About this documentation
Wilkes University, Staff, and Students have a new email spam filter to protect against unwanted email messages. Barracuda SPAM Firewall will filter email for all campus email accounts before it gets to
Sender and Receiver Addresses as Cues for Anti-Spam Filtering Chih-Chien Wang
Sender and Receiver Addresses as Cues for Anti-Spam Filtering Chih-Chien Wang Graduate Institute of Information Management National Taipei University 69, Sec. 2, JianGuo N. Rd., Taipei City 104-33, Taiwan
Adaptive Filtering of SPAM
Adaptive Filtering of SPAM L. Pelletier, J. Almhana, V. Choulakian GRETI, University of Moncton Moncton, N.B.,Canada E1A 3E9 {elp6880, almhanaj, choulav}@umoncton.ca Abstract In this paper, we present
escan Anti-Spam White Paper
escan Anti-Spam White Paper Document Version (esnas 14.0.0.1) Creation Date: 19 th Feb, 2013 Preface The purpose of this document is to discuss issues and problems associated with spam email, describe
IMF Tune Opens Exchange to Any Anti-Spam Filter
Page 1 of 8 IMF Tune Opens Exchange to Any Anti-Spam Filter September 23, 2005 10 th July 2007 Update Include updates for configuration steps in IMF Tune v3.0. IMF Tune enables any anti-spam filter to
Anti-Spam Methodologies: A Comparative Study
Anti-Spam Methodologies: A Comparative Study Saima Hasib, Mahak Motwani, Amit Saxena Truba Institute of Engineering and Information Technology Bhopal (M.P),India Abstract: E-mail is an essential communication
A Two-Pass Statistical Approach for Automatic Personalized Spam Filtering
A Two-Pass Statistical Approach for Automatic Personalized Spam Filtering Khurum Nazir Junejo, Mirza Muhammad Yousaf, and Asim Karim Dept. of Computer Science, Lahore University of Management Sciences
Why Bayesian filtering is the most effective anti-spam technology
Why Bayesian filtering is the most effective anti-spam technology Achieving a 98%+ spam detection rate using a mathematical approach This white paper describes how Bayesian filtering works and explains
REVIEW AND ANALYSIS OF SPAM BLOCKING APPLICATIONS
REVIEW AND ANALYSIS OF SPAM BLOCKING APPLICATIONS Rami Khasawneh, Acting Dean, College of Business, Lewis University, khasawra@lewisu.edu Shamsuddin Ahmed, College of Business and Economics, United Arab
CAS-ICT at TREC 2005 SPAM Track: Using Non-Textual Information to Improve Spam Filtering Performance
CAS-ICT at TREC 2005 SPAM Track: Using Non-Textual Information to Improve Spam Filtering Performance Shen Wang, Bin Wang and Hao Lang, Xueqi Cheng Institute of Computing Technology, Chinese Academy of
Detecting E-mail Spam Using Spam Word Associations
Detecting E-mail Spam Using Spam Word Associations N.S. Kumar 1, D.P. Rana 2, R.G.Mehta 3 Sardar Vallabhbhai National Institute of Technology, Surat, India 1 p10co977@coed.svnit.ac.in 2 dpr@coed.svnit.ac.in
Configuring MDaemon for Centralized Spam Blocking and Filtering
Configuring MDaemon for Centralized Spam Blocking and Filtering Alt-N Technologies, Ltd 2201 East Lamar Blvd, Suite 270 Arlington, TX 76006 (817) 525-2005 http://www.altn.com July 26, 2004 Contents A Centralized
6367(Print), ISSN 0976 6375(Online) & TECHNOLOGY Volume 4, Issue 1, (IJCET) January- February (2013), IAEME
INTERNATIONAL International Journal of Computer JOURNAL Engineering OF COMPUTER and Technology ENGINEERING (IJCET), ISSN 0976-6367(Print), ISSN 0976 6375(Online) & TECHNOLOGY Volume 4, Issue 1, (IJCET)
Why Bayesian filtering is the most effective anti-spam technology
GFI White Paper Why Bayesian filtering is the most effective anti-spam technology Achieving a 98%+ spam detection rate using a mathematical approach This white paper describes how Bayesian filtering works
Antispam Security Best Practices
Antispam Security Best Practices First, the bad news. In the war between spammers and legitimate mail users, spammers are winning, and will continue to do so for the foreseeable future. The cost for spammers
How to block NDR spam
How to block NDR spam Spam generates an enormous amount of traffic that is both time-consuming to handle and resource intensive. Apart from that, a large number of organizations have been victims of NDR
International Journal of Research in Advent Technology Available Online at: http://www.ijrat.org
IMPROVING PEFORMANCE OF BAYESIAN SPAM FILTER Firozbhai Ahamadbhai Sherasiya 1, Prof. Upen Nathwani 2 1 2 Computer Engineering Department 1 2 Noble Group of Institutions 1 firozsherasiya@gmail.com ABSTARCT:
AN EVALUATION OF FILTERING TECHNIQUES A NAÏVE BAYESIAN ANTI-SPAM FILTER. Vikas P. Deshpande
AN EVALUATION OF FILTERING TECHNIQUES IN A NAÏVE BAYESIAN ANTI-SPAM FILTER by Vikas P. Deshpande A report submitted in partial fulfillment of the requirements for the degree of MASTER OF SCIENCE in Computer
A Personalized Spam Filtering Approach Utilizing Two Separately Trained Filters
2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology A Personalized Spam Filtering Approach Utilizing Two Separately Trained Filters Wei-Lun Teng, Wei-Chung Teng
Analysis of Spam Filter Methods on SMTP Servers Category: Trends in Anti-Spam Development
Analysis of Spam Filter Methods on SMTP Servers Category: Trends in Anti-Spam Development Author André Tschentscher Address Fachhochschule Erfurt - University of Applied Sciences Applied Computer Science
Objective This howto demonstrates and explains the different mechanisms for fending off unwanted spam e-mail.
Collax Spam Filter Howto This howto describes the configuration of the spam filter on a Collax server. Requirements Collax Business Server Collax Groupware Suite Collax Security Gateway Collax Platform
Software Engineering 4C03 SPAM
Software Engineering 4C03 SPAM Introduction As the commercialization of the Internet continues, unsolicited bulk email has reached epidemic proportions as more and more marketers turn to bulk email as
Copyright Information. Confidentiality Notice. Anti-Spam Evaluation Guide Confidential November 2009 Page 2 of 16
Copyright Information Kaspersky is a registered trademark of Kaspersky Lab. Other trademarks found in this publication have been used for identification purposes only and may be the trademarks of their
A Content based Spam Filtering Using Optical Back Propagation Technique
A Content based Spam Filtering Using Optical Back Propagation Technique Sarab M. Hameed 1, Noor Alhuda J. Mohammed 2 Department of Computer Science, College of Science, University of Baghdad - Iraq ABSTRACT
Spam Filtering Based On The Analysis Of Text Information Embedded Into Images
Journal of Machine Learning Research 7 (2006) 2699-2720 Submitted 3/06; Revised 9/06; Published 12/06 Spam Filtering Based On The Analysis Of Text Information Embedded Into Images Giorgio Fumera Ignazio
How to keep spam off your network
What features to look for in anti-spam technology A buyers guide to anti-spam software, this white paper highlights the key features to look for in anti-spam software and why. GFI Software www.gfi.com
Solutions IT Ltd Virus and Antispam filtering solutions 01324 877183 Info@solutions-it.co.uk
Contents Reduce Spam & Viruses... 2 Start a free 14 day free trial to separate the wheat from the chaff... 2 Emails with Viruses... 2 Spam Bourne Emails... 3 Legitimate Emails... 3 Filtering Options...
COAT : Collaborative Outgoing Anti-Spam Technique
COAT : Collaborative Outgoing Anti-Spam Technique Adnan Ahmad and Brian Whitworth Institute of Information and Mathematical Sciences Massey University, Auckland, New Zealand [Aahmad, B.Whitworth]@massey.ac.nz
Three-Way Decisions Solution to Filter Spam Email: An Empirical Study
Three-Way Decisions Solution to Filter Spam Email: An Empirical Study Xiuyi Jia 1,4, Kan Zheng 2,WeiweiLi 3, Tingting Liu 2, and Lin Shang 4 1 School of Computer Science and Technology, Nanjing University
AntiSpam QuickStart Guide
IceWarp Server AntiSpam QuickStart Guide Version 10 Printed on 28 September, 2009 i Contents IceWarp Server AntiSpam Quick Start 3 Introduction... 3 How it works... 3 AntiSpam Templates... 4 General...
Intercept Anti-Spam Quick Start Guide
Intercept Anti-Spam Quick Start Guide Software Version: 6.5.2 Date: 5/24/07 PREFACE...3 PRODUCT DOCUMENTATION...3 CONVENTIONS...3 CONTACTING TECHNICAL SUPPORT...4 COPYRIGHT INFORMATION...4 OVERVIEW...5
CAFE - Collaborative Agents for Filtering E-mails
CAFE - Collaborative Agents for Filtering E-mails Lorenzo Lazzari, Marco Mari and Agostino Poggi Dipartimento di Ingegneria dell Informazione Università degli Studi di Parma Parco Area delle Scienze 181/A,
Image Content-Based Email Spam Image Filtering
Image Content-Based Email Spam Image Filtering Jianyi Wang and Kazuki Katagishi Abstract With the population of Internet around the world, email has become one of the main methods of communication among
An Overview of Spam Blocking Techniques
An Overview of Spam Blocking Techniques Recent analyst estimates indicate that over 60 percent of the world s email is unsolicited email, or spam. Spam is no longer just a simple annoyance. Spam has now
Some fitting of naive Bayesian spam filtering for Japanese environment
Some fitting of naive Bayesian spam filtering for Japanese environment Manabu Iwanaga 1, Toshihiro Tabata 2, and Kouichi Sakurai 2 1 Graduate School of Information Science and Electrical Engineering, Kyushu
A Collaborative Approach to Anti-Spam
A Collaborative Approach to Anti-Spam Chia-Mei Chen National Sun Yat-Sen University TWCERT/CC, Taiwan Agenda Introduction Proposed Approach System Demonstration Experiments Conclusion 1 Problems of Spam
EnterGroup offers multiple spam fighting technologies so that you can pick and choose one or more that are right for you.
CONFIGURING THE ANTI-SPAM In this tutorial you will learn how to configure your anti-spam settings using the different options we provide like Challenge/Response, Whitelist and Blacklist. EnterGroup Anti-Spam
Spam Filtering A WORD TO THE WISE WHITE PAPER BY LAURA ATKINS, CO- FOUNDER
Spam Filtering A WORD TO THE WISE WHITE PAPER BY LAURA ATKINS, CO- FOUNDER 2 Introduction Spam filtering is a catch- all term that describes the steps that happen to an email between a sender and a receiver
Spam Filtering using Naïve Bayesian Classification
Spam Filtering using Naïve Bayesian Classification Presented by: Samer Younes Outline What is spam anyway? Some statistics Why is Spam a Problem Major Techniques for Classifying Spam Transport Level Filtering
On Attacking Statistical Spam Filters
On Attacking Statistical Spam Filters Gregory L. Wittel and S. Felix Wu Department of Computer Science University of California, Davis One Shields Avenue, Davis, CA 95616 USA Paper review by Deepak Chinavle
A MACHINE LEARNING APPROACH TO SERVER-SIDE ANTI-SPAM E-MAIL FILTERING 1 2
UDC 004.75 A MACHINE LEARNING APPROACH TO SERVER-SIDE ANTI-SPAM E-MAIL FILTERING 1 2 I. Mashechkin, M. Petrovskiy, A. Rozinkin, S. Gerasimov Computer Science Department, Lomonosov Moscow State University,
Typical spam characteristics
Typical spam characteristics How to effectively block spam and junk mail By Mike Spykerman CEO Red Earth Software This article discusses how spam messages can be distinguished from legitimate messages
Journal of Information Technology Impact
Journal of Information Technology Impact Vol. 8, No., pp. -0, 2008 Probability Modeling for Improving Spam Filtering Parameters S. C. Chiemeke University of Benin Nigeria O. B. Longe 2 University of Ibadan
OIS. Update on the anti spam system at CERN. Pawel Grzywaczewski, CERN IT/OIS HEPIX fall 2010
OIS Update on the anti spam system at CERN Pawel Grzywaczewski, CERN IT/OIS HEPIX fall 2010 OIS Current mail infrastructure Mail service in numbers: ~18 000 mailboxes ~ 18 000 mailing lists (e-groups)
Purchase College Barracuda Anti-Spam Firewall User s Guide
Purchase College Barracuda Anti-Spam Firewall User s Guide What is a Barracuda Anti-Spam Firewall? Computing and Telecommunications Services (CTS) has implemented a new Barracuda Anti-Spam Firewall to
Feature Subset Selection in E-mail Spam Detection
Feature Subset Selection in E-mail Spam Detection Amir Rajabi Behjat, Universiti Technology MARA, Malaysia IT Security for the Next Generation Asia Pacific & MEA Cup, Hong Kong 14-16 March, 2012 Feature
PANDA CLOUD EMAIL PROTECTION 4.0.1 1 User Manual 1
PANDA CLOUD EMAIL PROTECTION 4.0.1 1 User Manual 1 Contents 1. INTRODUCTION TO PANDA CLOUD EMAIL PROTECTION... 4 1.1. WHAT IS PANDA CLOUD EMAIL PROTECTION?... 4 1.1.1. Why is Panda Cloud Email Protection
Why Content Filters Can t Eradicate spam
WHITEPAPER Why Content Filters Can t Eradicate spam About Mimecast Mimecast () delivers cloud-based email management for Microsoft Exchange, including archiving, continuity and security. By unifying disparate
Spam Testing Methodology Opus One, Inc. March, 2007
Spam Testing Methodology Opus One, Inc. March, 2007 This document describes Opus One s testing methodology for anti-spam products. This methodology has been used, largely unchanged, for four tests published
Serial and Parallel Bayesian Spam Filtering using Aho- Corasick and PFAC
Serial and Parallel Bayesian Spam Filtering using Aho- Corasick and PFAC Saima Haseeb TIEIT Bhopal Mahak Motwani TIEIT Bhopal Amit Saxena TIEIT Bhopal ABSTRACT With the rapid growth of Internet, E-mail,
Spam detection with data mining method:
Spam detection with data mining method: Ensemble learning with multiple SVM based classifiers to optimize generalization ability of email spam classification Keywords: ensemble learning, SVM classifier,
BARRACUDA. N e t w o r k s SPAM FIREWALL 600
BARRACUDA N e t w o r k s SPAM FIREWALL 600 Contents: I. What is Barracuda?...1 II. III. IV. How does Barracuda Work?...1 Quarantine Summary Notification...2 Quarantine Inbox...4 V. Sort the Quarantine
SURVEY PAPER ON INTELLIGENT SYSTEM FOR TEXT AND IMAGE SPAM FILTERING Amol H. Malge 1, Dr. S. M. Chaware 2
International Journal of Computer Engineering and Applications, Volume IX, Issue I, January 15 SURVEY PAPER ON INTELLIGENT SYSTEM FOR TEXT AND IMAGE SPAM FILTERING Amol H. Malge 1, Dr. S. M. Chaware 2
Filtering Junk Mail with A Maximum Entropy Model
Filtering Junk Mail with A Maximum Entropy Model ZHANG Le and YAO Tian-shun Institute of Computer Software & Theory. School of Information Science & Engineering, Northeastern University Shenyang, 110004
How to keep spam off your network
GFI White Paper How to keep spam off your network What features to look for in anti-spam technology A buyer s guide to anti-spam software, this white paper highlights the key features to look for in anti-spam
Towards Eradication of SPAM: A Study on Intelligent Adaptive SPAM Filters
Towards Eradication of SPAM: A Study on Intelligent Adaptive SPAM Filters Tarek Hassan (B.Sc. Computer Science, Egypt) This thesis is presented for the degree of Master of Computer Science Murdoch University,
Evaluation of Anti-spam Method Combining Bayesian Filtering and Strong Challenge and Response
Evaluation of Anti-spam Method Combining Bayesian Filtering and Strong Challenge and Response Abstract Manabu IWANAGA, Toshihiro TABATA, and Kouichi SAKURAI Kyushu University Graduate School of Information
IMPROVING SPAM EMAIL FILTERING EFFICIENCY USING BAYESIAN BACKWARD APPROACH PROJECT
IMPROVING SPAM EMAIL FILTERING EFFICIENCY USING BAYESIAN BACKWARD APPROACH PROJECT M.SHESHIKALA Assistant Professor, SREC Engineering College,Warangal Email: marthakala08@gmail.com, Abstract- Unethical
MessageLabs. Spam Manager User Guide
MessageLabs Spam Manager User Guide Copyright and permitted use 2004, by MessageLabs Ltd. All rights reserved. This publication is the exclusive property of MessageLabs and may not be reproduced, distributed,
T-61.3050 : Email Classification as Spam or Ham using Naive Bayes Classifier. Santosh Tirunagari : 245577
T-61.3050 : Email Classification as Spam or Ham using Naive Bayes Classifier Santosh Tirunagari : 245577 January 20, 2011 Abstract This term project gives a solution how to classify an email as spam or
Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing and Developing E-mail Classifier
International Journal of Recent Technology and Engineering (IJRTE) ISSN: 2277-3878, Volume-1, Issue-6, January 2013 Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing
Email Deliverability:
Email Deliverability: A guide to reaching your audience Email deliverability is a key factor in email campaign performance. Understanding how email deliverability is calculated and how it affects your
An Approach to Detect Spam Emails by Using Majority Voting
An Approach to Detect Spam Emails by Using Majority Voting Roohi Hussain Department of Computer Engineering, National University of Science and Technology, H-12 Islamabad, Pakistan Usman Qamar Faculty,
Introduction. Friday, June 21, 2002
This article is intended to give you a general understanding how ArGoSoft Mail Server Pro, and en Email, in general, works. It does not give you step-by-step instructions; it does not walk you through
Email Spam Detection A Machine Learning Approach
Email Spam Detection A Machine Learning Approach Ge Song, Lauren Steimle ABSTRACT Machine learning is a branch of artificial intelligence concerned with the creation and study of systems that can learn
CONFIGURING FUSEMAIL ANTI-SPAM
CONFIGURING FUSEMAIL ANTI-SPAM In this tutorial you will learn how to configure your anti-spam settings using the different options we provide like FuseFilter, Challenge/Response, Whitelist and Blacklist.
INSTITUTE of TECHNOLOGY CARLOW Intelligent Anti-Spam Technology. User Manual
INSTITUTE of TECHNOLOGY CARLOW Intelligent Anti-Spam Technology User Manual Author: CHEN LIU (C00140374) Supervisor: Paul Barry Date: 16 th April 2010 Content 1. System Requirement... 3 2. Interface and
SpamPanel Email Level Manual Version 1 Last update: March 21, 2014 SpamPanel
SpamPanel Email Level Manual Version 1 Last update: March 21, 2014 SpamPanel Table of Contents Incoming... 1 Incoming Spam Quarantine... 2 Incoming Log Search... 4 Delivery Queue... 7 Report Non-Spam...
A Case-Based Approach to Spam Filtering that Can Track Concept Drift
A Case-Based Approach to Spam Filtering that Can Track Concept Drift Pádraig Cunningham 1, Niamh Nowlan 1, Sarah Jane Delany 2, Mads Haahr 1 1 Department of Computer Science, Trinity College Dublin 2 School
MINIMIZING THE TIME OF SPAM MAIL DETECTION BY RELOCATING FILTERING SYSTEM TO THE SENDER MAIL SERVER
MINIMIZING THE TIME OF SPAM MAIL DETECTION BY RELOCATING FILTERING SYSTEM TO THE SENDER MAIL SERVER Alireza Nemaney Pour 1, Raheleh Kholghi 2 and Soheil Behnam Roudsari 2 1 Dept. of Software Technology
Configuring Your Gateman Email Server
Configuring Your Gateman Email Server Your Gateman Lifestyle Server includes an Email Server that provides users access to email via an email client and via your web browser using your laptop and mobile
Bayesian Learning Email Cleansing. In its original meaning, spam was associated with a canned meat from
Bayesian Learning Email Cleansing. In its original meaning, spam was associated with a canned meat from Hormel. In recent years its meaning has changed. Now, an obscure word has become synonymous with
Dealing with spam mail
Vodafone Hosted Services Dealing with spam mail User guide Welcome. This guide will help you to set up anti-spam measures on your email accounts and domains. The main principle behind dealing with spam
Eiteasy s Enterprise Email Filter
Eiteasy s Enterprise Email Filter Eiteasy s Enterprise Email Filter acts as a shield for companies, small and large, who are being inundated with Spam, viruses and other malevolent outside threats. Spammer
White Paper X-Spam for Exchange 2000-2003 Server
White Paper X-Spam for Exchange 2000-2003 Server X-Spam for Exchange 2000-2003 (X-Spam) is a highly adaptive Anti-Spam Software that protects the Microsoft Exchange 2000-2003 servers from Spam. X-Spam
Email Marketing Do s and Don ts A Sprint Mail Whitepaper
Email Marketing Do s and Don ts A Sprint Mail Whitepaper Table of Contents: Part One Email Marketing Dos and Don ts The Right Way of Email Marketing The Wrong Way of Email Marketing Outlook s limitations
Groundbreaking Technology Redefines Spam Prevention. Analysis of a New High-Accuracy Method for Catching Spam
Groundbreaking Technology Redefines Spam Prevention Analysis of a New High-Accuracy Method for Catching Spam October 2007 Introduction Today, numerous companies offer anti-spam solutions. Most techniques
Content Filtering With MDaemon 6.0
Content Filtering With MDaemon 6.0 Alt-N Technologies, Ltd 1179 Corporate Drive West, #103 Arlington, TX 76006 Tel: (817) 652-0204 2002 Alt-N Technologies. All rights reserved. Product and company names
It is designed to resist the spam in the Internet. It can provide the convenience to the email user and save the bandwidth of the network.
1. Abstract: Our filter program is a JavaTM 2 SDK, Standard Edition Version 1.5.0 (J2SE) based application, which can be running on the machine that has installed JDK 1.5.0. It can integrate with a JavaServer
Introduction. How does email filtering work? What is the Quarantine? What is an End User Digest?
Introduction The purpose of this memo is to explain how the email that originates from outside this organization is processed, and to describe the tools that you can use to manage your personal spam quarantine.
ECUE: A Spam Filter that Uses Machine Learning to Track Concept Drift
ECUE: A Spam Filter that Uses Machine Learning to Track Concept Drift Sarah Jane Delany 1 and Pádraig Cunningham 2 and Barry Smyth 3 Abstract. While text classification has been identified for some time
Email. Daniel Zappala. CS 460 Computer Networking Brigham Young University
Email Daniel Zappala CS 460 Computer Networking Brigham Young University How Email Works 3/25 Major Components user agents POP, IMAP, or HTTP to exchange mail mail transfer agents (MTAs) mailbox to hold
A Proposed Algorithm for Spam Filtering Emails by Hash Table Approach
International Research Journal of Applied and Basic Sciences 2013 Available online at www.irjabs.com ISSN 2251-838X / Vol, 4 (9): 2436-2441 Science Explorer Publications A Proposed Algorithm for Spam Filtering
Whitepaper Spam and Ham. Spam and Ham. A Simple Guide. Fauzi Yunos
Whitepaper Spam and Ham Spam and Ham A Simple Guide Fauzi Yunos 12 Page2 Executive Summary People tend to be much less bothered by spam slipping through filters into their mail box (false negatives), than
Adaption of Statistical Email Filtering Techniques
Adaption of Statistical Email Filtering Techniques David Kohlbrenner IT.com Thomas Jefferson High School for Science and Technology January 25, 2007 Abstract With the rise of the levels of spam, new techniques
Filtering Spam Using Search Engines
Filtering Spam Using Search Engines Oleg Kolesnikov, Wenke Lee, and Richard Lipton ok,wenke,rjl @cc.gatech.edu College of Computing Georgia Institute of Technology Atlanta, GA 30332 Abstract Spam filtering
Prevention of Spam over IP Telephony (SPIT)
General Papers Prevention of Spam over IP Telephony (SPIT) Juergen QUITTEK, Saverio NICCOLINI, Sandra TARTARELLI, Roman SCHLEGEL Abstract Spam over IP Telephony (SPIT) is expected to become a serious problem
Detecting Email Spam. MGS 8040, Data Mining. Audrey Gies Matt Labbe Tatiana Restrepo
Detecting Email Spam MGS 8040, Data Mining Audrey Gies Matt Labbe Tatiana Restrepo 5 December 2011 INTRODUCTION This report describes a model that may be used to improve likelihood of recognizing undesirable
ModusMail Software Instructions.
ModusMail Software Instructions. Table of Contents Basic Quarantine Report Information. 2 Starting A WebMail Session. 3 WebMail Interface. 4 WebMail Setting overview (See Settings Interface).. 5 Account
The Growing Problem of Outbound Spam
y The Growing Problem of Outbound Spam An Osterman Research Survey Report Published June 2010 SPONSORED BY! #$!#%&'()*(!!!!"#$!#%&'()*( Osterman Research, Inc. P.O. Box 1058 Black Diamond, Washington 98010-1058