Spam Filtering using Spam Mail Communities

Size: px
Start display at page:

Download "Spam Filtering using Spam Mail Communities"

Transcription

1 Spam Filtering using Spam Mail Communities Deepak P 1, Jyothi John 1, Sandeep Parameswaran 2 1 Model Engg: College, Kochi, Kerala, India 2 IBM Global Services India Pvt. Ltd., Bangalore, India deepak-p@eth.net, jyothijohn@mec.ac.in sandeep_potty@yahoo.com Abstract We might have heard quite a few people say on seeing some new mails in their inboxes, Oh! That spam again. People who observe the kind of spam messages that they receive would perhaps be able to classify similar spam mails into communities. Such properties of spam messages can be used to filter spam. This paper describes an approach towards spam filtering that seeks to exploit the nature of spam messages that allow them to be classified into different communities. The working of a possible implementation of the approach is described in detail. The new approach does not base itself on any prejudices about spam and can be used to block nonspam nuisance mails also. It can also support users who would want selective blocking of spam mails based on their interests. The approach inherently is user-centric, flexible and user-friendly. The results of some tests done to check for the feasibility of such an approach have been evaluated as well. 1. Introduction Spam mail can be described as unsolicited or unsolicited commercial bulk . Spam is becoming a great problem today and survey reports show that in most cases, more than 25% of received is spam [1]. Spam is considered a serious problem since it causes huge losses to the organization due to bandwidth consumption, mail server processing load, user s productivity time spent responding, deleting or forwarding etc. [1]. It is also estimated by the same study that the cost incurred for each spam message received amounts to nearly 1$. Thus spam mail is becoming an increasing concern and the need to prevent it from continuing to clog the mailboxes is assuming greater significance. Spam mails are sent to addresses which spammers find either by means of spiders finding addresses directly put up in web pages, by means of references by other people, or by guesses. People use different techniques to prevent spam, examples include putting up the mail addresses in not easily recognizable forms in web pages such as user(a)domain(.)com for the mail address user@domain.com. The focus of this study is to filter spam mail, to shield the spam mail away from the users so that the waste of time due to time spent on detection and dealing with spam mails can be eliminated (or reduced atleast). The losses due to bandwidth consumption and mail server processing load are not considered here. Section 2 enumerates the different quality of service parameters for spam filters. Section 3 describes some of the current approaches towards spam filtering. Section 4 evaluates the current approaches and how much consideration they give to spam communities. Section 5 describes and evaluates a new approach towards spam filtering which is based on spam communities. Section 6 narrates some experiments conducted to evaluate core concepts of the new approach. Section 7 lists some conclusions and possible future work with Section 8 listing the references. 2. Considerations for spam filters Spam filters have certain considerations and certain quality parameters. Spam precision is the percentage of messages classified as spam that truly are. Spam recall is the proportion of actual spam messages that are classified as spam. Non-spam messages are usually called solicited messages or legitimate messages. Legitimate precision, analogously, is the percentage of messages classified as legitimate that truly are.

2 Legitimate recall is the proportion of actual legitimate messages that are classified as legitimate [2]. A bit of thought would reveal that spam precision is the parameter to be maximized. We do not want any legitimate messages to be classified as spam even if some errors occur the other way round. More plainly, the number of false positives should be reduced to a minimum. Paul Graham opines [3] that a filter that yields false positives is like an acne cure that carries the risk of death to the patient. 3. Approaches to filter spam The current techniques to filter spam mail do it by means of classifying a message as either spam or nonspam (legitimate). Most of them do statistical filtering using methods such as identifying keywords, phrases etc. Some of the different approaches have been reviewed in the subsections as under. 3.1 Naï ve Bayesian Filtering (2) Here, a message is classified into two categories, as either spam or legitimate based mostly on the message content. The message is represented using the set of words occurring in it as a vector. The probability of a message being a spam given its vector is calculated by the classical bayes theorem. Further the phrases in each message are also examined. Various domain specific details are also examined. The approach leads to the classification of a message as spam or legitimate. Further studies arrive at the conclusion that additional safety nets are needed for the naï ve Bayesian anti spam filter to be viable in practice [4]. Bayesian approaches have also been described by Paul Graham ([3],[5]). 3.2 Memory-based approach Different memory based approaches have been experimented and one system, the TiMBL (Tilburg Machine Based Learner) [6] based on memory based learning. Studies on memory based spam filtering ([7],[8]) are usually directed towards representing a mail as a vector, not based on words but based on various features of spam mails such as presence of words such as adult, sex or phrases such as be over 21, online pharmacy etc and by characters such as $,! etc. Training is done and a set of vectors are built and stored for classes of spam and legitimate mails. Given a new mail, the k-nearest neighbor algorithm is used to obtain a set of nearby vectors (based on hamming distance etc). Mails of that set are either made to vote for the new mail based on their similarity with the new mail or the new mail is put into that class to which the majority of the set belongs to. The major advantage is that the features of each and every mail of the training set can be used without having the entire mails stored. Moreover, the system can be made to learn during mail filtering also thus leading to automatic framing of implicit rules that are user-specific. The major challenge is to decide upon the number of elements in the vector as well as to decide on what each vector element represents. Example rules may include add 1 to the 4 th element for every occurrence of $ etc. 3.3 Neural-network based approach A neural network based approach which focuses on building vocabularies for spam s, has also been experimented [9], but the results of the study seem to be much less generalizable. Such vocabulary based approaches tend to be much more vulnerable to false positives in cases such as automatic newsletters etc. 3.4 Blacklist and whitelist approaches Certain spam filters contain just a pattern matching where a mail containing more than k from among a set of words or coming from an address resembling one from a list are considered to be junk. The whitelist approach uses lists for the opposite purposes such as storing words which are common in legitimate mails (such as salutation by name), or storing addresses which are known. The blacklist approach can assure of no false positives and the whitelist can assure to deliver zero false negatives. But the precision would be very low. Nowadays spam mails increasingly include salutations such as dear xyz for a mail directed to xyz@domain.com, which reduce the precision of such approaches to unacceptably low levels. 3.5 Miscallaneous approaches There are various other approaches for filtering spam at various levels. The ISP of whom the author is a client, uses a crude filtering technique whereby mails directed to more than 10 addresses are deleted as spam. Currently, a lot of spam which

3 contain exactly 10 addresses in the to field get through. It is evident that many spammers do use some learning algorithms that detect the behavior of such filters pretty quickly. A lot of approaches have been used where the user is asked to have more than one address, all of which forward mails to the same address, at which site, duplicates are deemed to be spam and deleted or flagged. Another approach, using extended addresses has also been described [10]. Many such techniques require the user to do certain activities periodically, resulting in the loss of transparency of the filtering technique. 3.5 The best, optimal, always-win, impossible approach The best approach to filtering would be collaborative filtering. People receiving similar e- mails are asked to judge whether an is spam or not, and the results are used for spam-filtering. It is obviously next to impossible. And another approach to collaborative filtering would be to present s to different people (in porn-related sites, we can periodically show a mail to the user and require him to judge whether the mail is spam before going to the next page or photograph), and use their judgments. This should succeed in all cases, but is impossible as secrecy is lost completely, the concern of which, makes the entire discussion in this paragraph futile. But we have to realize that no machine-based approach would obtain accuracies anywhere near to usage of human intelligence. 4. Spam mail communities and current approaches It would be a common observation that spam mails can be classified into various communities, some of them being, online pharmacies, mortgage, vacation offers etc. Such communities are obvious and identifiable on visual inspection, but there might be a lot of not-so-explicit communities that are machine-identifiable such as porn-mails bearing links to xyz.com etc. None of the current approaches classify mails to such extents. Some classify mails only as spam and legitimate whereas some classify spam mails as porn-spam and other-spam. Memory based approaches are naturally feasible to such classifications where each element of the vector can be used to indicate a class of spam, the first element may indicate the probability of it being a porn-spam, the second may indicate the probability of a message being a get-rich spam and so on. But clearly, the number of classifications that can be imposed by such techniques is limited to the number of elements in the vector. The other methods, which are mostly based on statistical clustering, cannot be imparted with such community identification techniques easily. The communities need not be hardwired into the system, and a spam filter may be imparted with the capability of automatic identification of such spam communities. If the system is to be built into the client end, the communities can even be very much userspecific, a system working to filter mails for a person receiving only online prescription related spam may build communities such as weight-loss, anti-aging, sexual enhancement, hair loss etc. A person who wants to receive anti-aging advertisements may mark that community as non-spam and thus, identification of such communities can be used to impart more flexibility or to make the filter more usercentric. 5. A community-based approach 5.1 Underlying concepts The main assumption or the foundation of this approach is that spam mails can be classified into a lot of communities. A rudiment of this approach has been used in some studies where a mail is classified as either legitimate, porn-spam or other-spam, and labeling the mails mapping to the latter two communities as spam. Communities of mails may be as precise as mails sent from mail addresses starting with abc and containing the word aging atleast two times in bold capitals (such descriptions would be implicit as the communities are identified by the algorithm) or as general as just porn-spam. The former kind of definitions may be appropriate in cases where the user receives spam from just two or three mailing lists. Another factor being addressed by such an approach is that of making the spam filter as usercentric as possible. This approach is most appropriate to be implemented on the mail client, and in whatever manner it is implemented, separate lists and tables have to be kept for each user. Yet another advantage of this approach is its flexibility. Nuisance mails (constant requests for help from a distant friend) can also be identified as a system implementing this approach does not come

4 hard coded with a set of rules such as a mail having the word sex would be spam 99% of the time. Thus a person who would like to receive porn-spam but not others also can be accommodated. The system need not have any prejudices, it can learn from the user over time. This property allows it to evolve and understand the changing nature of spam. 5.2 The approach and how it works The general working model of an application using this approach (and thus the approach) is presented as under. The different phases and how the algorithm works are presented under the different subheadings, with possible implementations listed as well. The algorithms used in our test implementation have been described in detail in apporporiate areas The phase of ignorance. Upon installation of the application, the system is ignorant of what spam is. The user has to mark the spam mails among the incoming ones and thus point to the system, hey, this is spam. The system records the entire message. This continues until about 50 messages are accumulated by the system. Even in this time, it can automatically filter and accumulate mails using trivial heuristics such as this is spam as he had marked a mail from this address as spam earlier The message similarity computation. One among the main algorithms to be used here is the computation of similarity between two messages. It may use heuristics such as add one to the similarity score if both have atleast two common names in their To address. Another efficient heuristic would be to represent a message as a vector of words occurring in it and taking the dot product of the vectors of the messages. Here we can include heuristics such as the similarity between the images in the messages which were not possible in cases such as statistical filtering. Spam mail is becoming increasingly image-centric; a lot of spam that the author receives have only a salutation and a remove link other than the image(s). Framing such mail similarity heuristics would require a lot of research into the general nature of spam mail. In our test implementation, we used a naï ve similaty computation algorithm which can be described as below. Algorithm Similarity-Score(Messages M1 and M2) Remove the repeated words in both messages to get messages N1 and N2; The number of intersections of words in the messages N1 and N2 is calculated and output as the similarity score; The identification of communities. After accumulating close to 50 spam messages on the advice of the user, the system can proceed to identify communities of similar messages. It can build a graph with the messages as nodes and each undirected edge connecting two messages being labeled by the similarity weight between them. The system should now find strongly connected communities of mails based on some threshold. This computation of densely connected communities is an NP-complete problem. Suitable approximation algorithms can be used for the said computation. The following algorithm was used in our test implementation. Algorithm Community-Identification() Build a graph with the 50-odd messages as nodes and undirected edges between them labeled by the similarity scores of the messages in question; Prune all edges which have a label value below a threshold T, resulting possibly in a disconnected graph; The connected components of the graph are enumerated as a set of communities N; For each pair of communities in N If each similarity-score between a message in a community and a message in the other community bears a label not less than a threshold T1, merge the communities; The merger in the previous step results in a set of communities N1; Output N1 as the set of communities of messages; The initial threshold T may be set to a higher value than T1. This is because, we do not want any unrelated messages to be falsely included as a community in N. Thus we expect N to consist of

5 highly coherent communities. But our urge to avoid false communities, may well have caused splits of logically coherent communities (which are coherent enough to levels of detail that we expect). The second spet of refinement of N to build the set N1 is a step towards merging such communities. We merge communities that are coherent enough such each message in a community bears atleast some relationship or similarity (enforced by T1) to each message in the other community. This step may be avoided if T is set to a low value, but the risk involved in such an approach is very obvious Community Cohesion Scores and Signatures. We have to compute a score for each community which indicates the cohesion within the community. (Such a score could also be used in the identification of communities in Section 5.2.3). It can be computed on the basis of some heuristics such as the sum of the weights of all edges within the community divided by the number of nodes in the community. Evidently, the aim should be to give high scores to communities of high cohesion. We also can assign signatures to communities which may consist of a set of words which occur very frequently in the community. The signature could also be a set of messages from the community. That set of messages should be as varied as possible. Suppose a community consists of 3 sets of 10 identical messages each, the signature should consist of atleast one representative from each set. The emphasis is that the signature set should not be computed as the densest connected subset (connected with strong edges) of the community, but perhaps one among the sparsely connected subsets (connected with weak edges) in the community. Although computing community cohesion scores would definitely improve the precision, we chose not to implement it in our test implementation, given that our aim was to demonstrate the feasibility of the approach rather than building a workable prototype. But our implementation refines the communities obtained in the previous step, by eliminating copies of fairly identical messages. The algorithm used is outlined as below Algorithm Refine(Set N1) while(1) For each message pair, P and Q Eliminate duplicate words in each message to form P1 and Q1, the sets of words in each message. If ((the cardinality of P1 intersection Q1)>(cardinality of the symmetric difference between P1 and Q1)) Choose P1 or Q1 arbitrarily and eliminate it from the community; If no message could be eliminated in a complete pass, break out of the loop; Return the newly formed set of messages N2, whose cardinality is less than or equal to N1; Copies of fairly identical messages are eliminated as they wouldn t be of much use in the actual spam filtering process. Many users consistently receive messages that are very identical, with the sole difference being in the random string that occurs in the beginning and/or end of most spam messages. We could readily identify such messages which arrive during the actual filtering process as they would have a very high similarity score with a message (or messages) in a community. This elimination of nearly identical messages saves space in the spam filter database and reduces the amount of computation to be done Spam Identification. Each incoming message is tested against the signatures of each spam community and if is found worthy enough of being included in the community, it is tested whether its inclusion would enhance the cohesion within the community. It can be added to the community and marked as spam if it either increases the cohesion of the community or has a very high similarity score with one or more of the community members, for obvious reasons. If not, it is marked as legitimate and passed to the user. Our test implementation used the following algorithm for the actual spam filtering process.

6 Algorithm Test(Message K) For each community C in N2 worthy-of-inclusion score = the mean similarityscore between K and a message in C; If (the maximum worthy-of-inclusion score obtained exceeds a threshold T2) include K in the community with which the maximum worthy-of-inclusion score was obtained and flag K as spam; else Flag K as legitimate; If (K was included in a community) perform the refine algorithm on N2 (or more specifically, on the community in which K was included) and assign the new set of communities to N2; perform the merge algorithm on N2 and assign the new set of communities to N2; The merge algorithm used is the same as the merging procedure in the community identification algorithm. However we reproduce the algorithm here once again. Algorithm Merge(N2) For each pair of communities in N2 If each similarity-score between a message in a community and a message in the other community bears a label not less than a threshold T1, merge the communities; The merger in the previous step results in a set of communities N3; output N3; Maintenance. If the user opines that a message delivered to him as legitimate was actually spam (a false negative), it can be added to the community to which it best fits or as a single member community. Periodically, if there is a proliferation of small communities, those can be gathered and processed just as the initial set of 50 odd spam messages to identify larger communities. If the user opines that a message marked as spam was legitimate (the dreaded false positive), the system can inspect the communities to find messages of very high similarity with the one in question and they can be deleted from the database of spam messages. Further it can show the user the community in which the false positive was put in and ask whether he feels that the community was actually something of interest to him. As more and more messages are identified as spam, they are added to the database. Periodically we have to clean the database. This can be done by considering communities, finding very dense subsets within them and deleting some of the messages which are connected to the communities by dense edges. This is extremely useful in purging identical messages from the set (which obviously is not dangerous). Periodically, the system can do a warm reboot, by dissolving all communities and identifying them from the entire set of messages using techniques used to process the initial set of 50 odd messages. A cold reboot would obviously, be to empty the database. Our test implementation worked in an environment with no interaction from the user. It was supplied with a set of 50 known spam messages and then with a set of messages to be identified as either spam or legitimate. The proliferation of messages in spam communities was avoided by the periodic application of the merge and refine algorithm as presented in the previous section. But when implemented as a workable prototype, more specialized algorithms for handling user input may have to be implemented Adaptation. Adaptability to changing nature of spam is to be taken care of. It can be done by the system by identifying and deleting communities that have had no admissions for a long time. Perhaps the user might have been taken off the list or the nature of spam sent by the spammer would have changed. In either case, holding the community in the database would be of no use. Further the user could be provided options to manually clean up or delete communities. Although handling adaptation would not be too difficult, we did not handle it in our implementation as the tests were performed on spam messages that came in within a short duration during which significant changes in the nature of spam would not have occurred. 5.3 Advantages

7 The system comes in with an empty memory and learns what spam is, from the user. The user is free to point to some nuisance mail (such as an old lover who is no more interesting) and mark it as spam. If the heuristics used for similarity computation give high weightage to the sender s address (or perhaps even content), the user stands a good chance of not being troubled by the nuisance mail in the future. The initial empty memory of the system provides some more advantages. A person entertaining some special spam category, e.g., porn-spam, can continue to keep himself entertained by not marking them as spam during the ignorance phase. The system provides little help in the phase of ignorance, but more importantly it does not come in the way. Further, even after the ignorance phase, he can view the communities and mark one that he is interested in as non-spam. In cases where spam comes to a user from only a few spammers, each community might get precisely mapped to a single spammer. In such cases, small changes made by the spammer in his mails would not lead to them being recognized as false negatives, thus providing increased precision over conventional statistical filters. Further, as the system is implemented per user, the implicit rules may be more user-specific, thus providing more flexibility to the user. 5.4 Disadvantages The user is provided with little or no support during the ignorance phase. The mails themselves are stored in the database, thus increasing storage requirements. Bandwidth wastage is not prevented. Initially, user has to mark the spam, thus giving no indication of the presence of a filter atleast in the early stages. The system might take a lot of time to start filtering mails very effectively. 6. Experiments and results The main aim of the experiment was to test the feasibility of the application of the concept of community clustering of spam mails to implement spam filtering. The implementation done was tested on a non-interactive environment with no user input possible amidst the process. The testing was done on 2 test sets, each of 100 mails, which would be referred to as Set 1 and Set 2 hereafter. 50 of those mails were marked as spam to be used as an initial set, and the rest of the messages were a collection of both spam and legitimate messages and is henceforth referred to as the test set. The value of T & T1 were set to 12 and 6 respectively (Section 5.2.3). The value of T2 was set to 13 (Section 5.2.5). The isolated nodes were considered as singleton communities in N. Singleton communities which could not be merged with any other ones, were discarded in N1. The rest of the algorithms are not parameterized and were included as such. Each message apart from the initial set of 50 messages were subjected to the algorithm Test and the results were logged. The results table given below are the values obtained from the log file. The number of communities does not change in the course of the algorithm no user input is sought in real-time. Thus this test just demonstarates the feasibility of the approach. Tests on Set 1 Number of communities in N1 10 Total messages in N1 initially 42 Total messages in N1 after Refine 37 Proportion of initial set clustered 74% Number of spam messages in test set 35 Number of legitiamate messages in test set 15 Spam Precision 84.0% Legitimate Precision 44.0% Spam Recall 60.0% Legitimate Recall 73.3% False Positives 04 False Negatives 10 Tests on Set 2 Number of communities in N1 09 Total messages in N1 initially 39 Total messages in N1 after Refine 35 Proportion of initial set clustered 70% Number of spam messages in test set 40 Number of legitimate messages in test set 10 Spam Precision 89.3% Legitimate Precision 31.8% Spam Recall 62.5% Legitimate Recall 70.0% False Positives 03 False Negatives 15 We consider the spam precision results as very good considering the fact that no hard-coded ruiles were used. Very low legitimate precision is infact of not too much concern as the number of false negatives wouldn t have disastrous consequences. The legitimate

8 recall is a bit lower than expected, and the number of false positives is a cause for concern and calls for finetuning of the algorithm to reduce false positives. The spam precision testifies that the approach is feasible in the real world. Further, in the real-world, the database could well be tuned based on the user-inputs to provide better results. Further, tehse experiments considered only the texts of the messages, image similarity measures and subject line similarity computations may well enhance the performance. The next experiment was conducted to test whether the inclusion of a non-related message into a community would decrease its cohesion. The test was conducted on a community of 5 messages taken from community1 in the above table. A matrix was formed in which element (i,j) holds a measure of similarity between the i th and j th message. Obviously, the matrix would be symmetric and the values of the principal diagonal elements would be useless. The measure for similarity used was the number of common words in the messages, which although crude, would aid in providing a rough idea of the situation. The matrix formed by the community of 5 messages is given as below. Table2. Similarity matrix of community *** *** *** *** *** The row sums (which are equal to the column sums) expressed as a tuple would be a justifiable estimate of the cohesion within the community. The tuple for this matrix is <177, 156, 112, 155, 146>. Then the first message was replaced by a non-related message and the similarity matrix changed to: Table 3: Similarity matrix after replacement of a message by a non-community message *** *** *** *** *** The cohesion indicator tuple, evidently has changed to <44, 122, 88, 111, 115>. This has much weaker values, with the first element of the tuple having a very low value, indicative of the fact that the first message does not deserve to be a member of the community. Such experiments were performed on a number of communities and each of them demonstrated such sharp deviations due to inclusions of unrelated messages. 7. Conclusions and future work As indicated by the experiments, it can be concluded that community-based detection of spam can prove to be a useful technique. It can be implemented as a mail client add-on, whereby the complex matching algorithms can be done at the client machine (implementing such computationally intensive algorithms on the server might not be inviting). The experiments above indicate that the above approach explained at Section 5.2 would perhaps be feasible. Future work may be directed towards developing better algorithms for spam message similarity computation, for selecting victims to be purged off to limit database size, to enable the system to self-adapt to the changing nature of spam mails, and approximation algorithms for identification of communities from a corpus. This approach treats spam and legitimate mails asymmetrically, in that it clusters spam mails into communities, but doesn t deal with legitimate in any sophisticated manner. Studies have to be performed as to whether legitimate mails can be dealt with in the same manner (by building communities). Feasibility of such an approach depends on the clusterability of legitimate mails which, even if it does exist, is not obvious. 8. References [1]. Surf-Control s Anti-Spam Prevalence Study 2002, URL: Spam_Study_v2.pdf [2]. A bayesian approach to filtering junk , Sahami, Dumais, Heckerman & Horvitz, Learning for Text Categorization: Papers from the 1998 Workshop, Madison, Wisconsin [3]. A plan for Spam, Paul Graham, August 2002 URL: URL: [4]. An evaluation of naï ve Bayesian anti-spam filtering, Androutsopoulos et. al., Proc. of the workshop on Machine Learning in the New Information Age, 2000 [5]. Better Bayesian Filtering, Paul Graham, January 2003 URL: [6]. TiMBL: Tilburg Machine Based Learner version 4.0 Reference Guide, Daelemans et. al. (2001)

9 [7]. Learning to filter spam A comparison of naï ve Bayesian and a memory based approach, Androutsopoulos et. al., In Workshop on Machine Learning & Textual Information Access, 4th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD-2000). [8]. A learning content-based spam filter, Tim Hemel [9]. Junk Detection using neural networks, Michael Vinther, URL: n.pdf [10]. Curbing junk mail via secure classification, Bleichenbacher et. al. Financial Cryptography, 1998, pp

Bayesian Spam Filtering

Bayesian Spam Filtering Bayesian Spam Filtering Ahmed Obied Department of Computer Science University of Calgary amaobied@ucalgary.ca http://www.cpsc.ucalgary.ca/~amaobied Abstract. With the enormous amount of spam messages propagating

More information

Differential Voting in Case Based Spam Filtering

Differential Voting in Case Based Spam Filtering Differential Voting in Case Based Spam Filtering Deepak P, Delip Rao, Deepak Khemani Department of Computer Science and Engineering Indian Institute of Technology Madras, India deepakswallet@gmail.com,

More information

Anti Spamming Techniques

Anti Spamming Techniques Anti Spamming Techniques Written by Sumit Siddharth In this article will we first look at some of the existing methods to identify an email as a spam? We look at the pros and cons of the existing methods

More information

A MACHINE LEARNING APPROACH TO SERVER-SIDE ANTI-SPAM E-MAIL FILTERING 1 2

A MACHINE LEARNING APPROACH TO SERVER-SIDE ANTI-SPAM E-MAIL FILTERING 1 2 UDC 004.75 A MACHINE LEARNING APPROACH TO SERVER-SIDE ANTI-SPAM E-MAIL FILTERING 1 2 I. Mashechkin, M. Petrovskiy, A. Rozinkin, S. Gerasimov Computer Science Department, Lomonosov Moscow State University,

More information

Evolutionary Detection of Rules for Text Categorization. Application to Spam Filtering

Evolutionary Detection of Rules for Text Categorization. Application to Spam Filtering Advances in Intelligent Systems and Technologies Proceedings ECIT2004 - Third European Conference on Intelligent Systems and Technologies Iasi, Romania, July 21-23, 2004 Evolutionary Detection of Rules

More information

On Attacking Statistical Spam Filters

On Attacking Statistical Spam Filters On Attacking Statistical Spam Filters Gregory L. Wittel and S. Felix Wu Department of Computer Science University of California, Davis One Shields Avenue, Davis, CA 95616 USA Paper review by Deepak Chinavle

More information

Adaptive Filtering of SPAM

Adaptive Filtering of SPAM Adaptive Filtering of SPAM L. Pelletier, J. Almhana, V. Choulakian GRETI, University of Moncton Moncton, N.B.,Canada E1A 3E9 {elp6880, almhanaj, choulav}@umoncton.ca Abstract In this paper, we present

More information

A Two-Pass Statistical Approach for Automatic Personalized Spam Filtering

A Two-Pass Statistical Approach for Automatic Personalized Spam Filtering A Two-Pass Statistical Approach for Automatic Personalized Spam Filtering Khurum Nazir Junejo, Mirza Muhammad Yousaf, and Asim Karim Dept. of Computer Science, Lahore University of Management Sciences

More information

Savita Teli 1, Santoshkumar Biradar 2

Savita Teli 1, Santoshkumar Biradar 2 Effective Spam Detection Method for Email Savita Teli 1, Santoshkumar Biradar 2 1 (Student, Dept of Computer Engg, Dr. D. Y. Patil College of Engg, Ambi, University of Pune, M.S, India) 2 (Asst. Proff,

More information

Simple Language Models for Spam Detection

Simple Language Models for Spam Detection Simple Language Models for Spam Detection Egidio Terra Faculty of Informatics PUC/RS - Brazil Abstract For this year s Spam track we used classifiers based on language models. These models are used to

More information

Machine Learning for Naive Bayesian Spam Filter Tokenization

Machine Learning for Naive Bayesian Spam Filter Tokenization Machine Learning for Naive Bayesian Spam Filter Tokenization Michael Bevilacqua-Linn December 20, 2003 Abstract Background Traditional client level spam filters rely on rule based heuristics. While these

More information

SURVEY PAPER ON INTELLIGENT SYSTEM FOR TEXT AND IMAGE SPAM FILTERING Amol H. Malge 1, Dr. S. M. Chaware 2

SURVEY PAPER ON INTELLIGENT SYSTEM FOR TEXT AND IMAGE SPAM FILTERING Amol H. Malge 1, Dr. S. M. Chaware 2 International Journal of Computer Engineering and Applications, Volume IX, Issue I, January 15 SURVEY PAPER ON INTELLIGENT SYSTEM FOR TEXT AND IMAGE SPAM FILTERING Amol H. Malge 1, Dr. S. M. Chaware 2

More information

Manual Spamfilter Version: 1.1 Date: 20-02-2014

Manual Spamfilter Version: 1.1 Date: 20-02-2014 Manual Spamfilter Version: 1.1 Date: 20-02-2014 Table of contents Introduction... 2 Quick guide... 3 Quarantine reports...3 What to do if a message is blocked inadvertently...4 What to do if a spam has

More information

Journal of Information Technology Impact

Journal of Information Technology Impact Journal of Information Technology Impact Vol. 8, No., pp. -0, 2008 Probability Modeling for Improving Spam Filtering Parameters S. C. Chiemeke University of Benin Nigeria O. B. Longe 2 University of Ibadan

More information

Detecting E-mail Spam Using Spam Word Associations

Detecting E-mail Spam Using Spam Word Associations Detecting E-mail Spam Using Spam Word Associations N.S. Kumar 1, D.P. Rana 2, R.G.Mehta 3 Sardar Vallabhbhai National Institute of Technology, Surat, India 1 p10co977@coed.svnit.ac.in 2 dpr@coed.svnit.ac.in

More information

escan Anti-Spam White Paper

escan Anti-Spam White Paper escan Anti-Spam White Paper Document Version (esnas 14.0.0.1) Creation Date: 19 th Feb, 2013 Preface The purpose of this document is to discuss issues and problems associated with spam email, describe

More information

Machine Learning Final Project Spam Email Filtering

Machine Learning Final Project Spam Email Filtering Machine Learning Final Project Spam Email Filtering March 2013 Shahar Yifrah Guy Lev Table of Content 1. OVERVIEW... 3 2. DATASET... 3 2.1 SOURCE... 3 2.2 CREATION OF TRAINING AND TEST SETS... 4 2.3 FEATURE

More information

A Case-Based Approach to Spam Filtering that Can Track Concept Drift

A Case-Based Approach to Spam Filtering that Can Track Concept Drift A Case-Based Approach to Spam Filtering that Can Track Concept Drift Pádraig Cunningham 1, Niamh Nowlan 1, Sarah Jane Delany 2, Mads Haahr 1 1 Department of Computer Science, Trinity College Dublin 2 School

More information

Immunity from spam: an analysis of an artificial immune system for junk email detection

Immunity from spam: an analysis of an artificial immune system for junk email detection Immunity from spam: an analysis of an artificial immune system for junk email detection Terri Oda and Tony White Carleton University, Ottawa ON, Canada terri@zone12.com, arpwhite@scs.carleton.ca Abstract.

More information

Groundbreaking Technology Redefines Spam Prevention. Analysis of a New High-Accuracy Method for Catching Spam

Groundbreaking Technology Redefines Spam Prevention. Analysis of a New High-Accuracy Method for Catching Spam Groundbreaking Technology Redefines Spam Prevention Analysis of a New High-Accuracy Method for Catching Spam October 2007 Introduction Today, numerous companies offer anti-spam solutions. Most techniques

More information

Why Bayesian filtering is the most effective anti-spam technology

Why Bayesian filtering is the most effective anti-spam technology Why Bayesian filtering is the most effective anti-spam technology Achieving a 98%+ spam detection rate using a mathematical approach This white paper describes how Bayesian filtering works and explains

More information

The Network Box Anti-Spam Solution

The Network Box Anti-Spam Solution NETWORK BOX TECHNICAL WHITE PAPER The Network Box Anti-Spam Solution Background More than 2,000 years ago, Sun Tzu wrote if you know yourself but not the enemy, for every victory gained you will also suffer

More information

Purchase College Barracuda Anti-Spam Firewall User s Guide

Purchase College Barracuda Anti-Spam Firewall User s Guide Purchase College Barracuda Anti-Spam Firewall User s Guide What is a Barracuda Anti-Spam Firewall? Computing and Telecommunications Services (CTS) has implemented a new Barracuda Anti-Spam Firewall to

More information

An Efficient Spam Filtering Techniques for Email Account

An Efficient Spam Filtering Techniques for Email Account American Journal of Engineering Research (AJER) e-issn : 2320-0847 p-issn : 2320-0936 Volume-02, Issue-10, pp-63-73 www.ajer.org Research Paper Open Access An Efficient Spam Filtering Techniques for Email

More information

A Content based Spam Filtering Using Optical Back Propagation Technique

A Content based Spam Filtering Using Optical Back Propagation Technique A Content based Spam Filtering Using Optical Back Propagation Technique Sarab M. Hameed 1, Noor Alhuda J. Mohammed 2 Department of Computer Science, College of Science, University of Baghdad - Iraq ABSTRACT

More information

So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02)

So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02) Internet Technology Prof. Indranil Sengupta Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture No #39 Search Engines and Web Crawler :: Part 2 So today we

More information

Bayesian Learning Email Cleansing. In its original meaning, spam was associated with a canned meat from

Bayesian Learning Email Cleansing. In its original meaning, spam was associated with a canned meat from Bayesian Learning Email Cleansing. In its original meaning, spam was associated with a canned meat from Hormel. In recent years its meaning has changed. Now, an obscure word has become synonymous with

More information

Spam Filtering Methods for Email Filtering

Spam Filtering Methods for Email Filtering Spam Filtering Methods for Email Filtering Akshay P. Gulhane Final year B.E. (CSE) E-mail: akshaygulhane91@gmail.com Sakshi Gudadhe Third year B.E. (CSE) E-mail: gudadhe.sakshi25@gmail.com Shraddha A.

More information

A Personalized Spam Filtering Approach Utilizing Two Separately Trained Filters

A Personalized Spam Filtering Approach Utilizing Two Separately Trained Filters 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology A Personalized Spam Filtering Approach Utilizing Two Separately Trained Filters Wei-Lun Teng, Wei-Chung Teng

More information

Is Spam Bad For Your Mailbox?

Is Spam Bad For Your Mailbox? Whitepaper Spam and Ham Spam and Ham A Simple Guide Fauzi Yunos 12 Page2 Executive Summary People tend to be much less bothered by spam slipping through filters into their mail box (false negatives), than

More information

eprism Email Security Appliance 6.0 Intercept Anti-Spam Quick Start Guide

eprism Email Security Appliance 6.0 Intercept Anti-Spam Quick Start Guide eprism Email Security Appliance 6.0 Intercept Anti-Spam Quick Start Guide This guide is designed to help the administrator configure the eprism Intercept Anti-Spam engine to provide a strong spam protection

More information

Anti-Spam Methodologies: A Comparative Study

Anti-Spam Methodologies: A Comparative Study Anti-Spam Methodologies: A Comparative Study Saima Hasib, Mahak Motwani, Amit Saxena Truba Institute of Engineering and Information Technology Bhopal (M.P),India Abstract: E-mail is an essential communication

More information

COS 116 The Computational Universe Laboratory 11: Machine Learning

COS 116 The Computational Universe Laboratory 11: Machine Learning COS 116 The Computational Universe Laboratory 11: Machine Learning In last Tuesday s lecture, we surveyed many machine learning algorithms and their applications. In this lab, you will explore algorithms

More information

Email Spam Detection A Machine Learning Approach

Email Spam Detection A Machine Learning Approach Email Spam Detection A Machine Learning Approach Ge Song, Lauren Steimle ABSTRACT Machine learning is a branch of artificial intelligence concerned with the creation and study of systems that can learn

More information

Software Engineering 4C03 SPAM

Software Engineering 4C03 SPAM Software Engineering 4C03 SPAM Introduction As the commercialization of the Internet continues, unsolicited bulk email has reached epidemic proportions as more and more marketers turn to bulk email as

More information

BARRACUDA. N e t w o r k s SPAM FIREWALL 600

BARRACUDA. N e t w o r k s SPAM FIREWALL 600 BARRACUDA N e t w o r k s SPAM FIREWALL 600 Contents: I. What is Barracuda?...1 II. III. IV. How does Barracuda Work?...1 Quarantine Summary Notification...2 Quarantine Inbox...4 V. Sort the Quarantine

More information

Social Media Mining. Data Mining Essentials

Social Media Mining. Data Mining Essentials Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

More information

T-61.3050 : Email Classification as Spam or Ham using Naive Bayes Classifier. Santosh Tirunagari : 245577

T-61.3050 : Email Classification as Spam or Ham using Naive Bayes Classifier. Santosh Tirunagari : 245577 T-61.3050 : Email Classification as Spam or Ham using Naive Bayes Classifier Santosh Tirunagari : 245577 January 20, 2011 Abstract This term project gives a solution how to classify an email as spam or

More information

IMPROVING SPAM EMAIL FILTERING EFFICIENCY USING BAYESIAN BACKWARD APPROACH PROJECT

IMPROVING SPAM EMAIL FILTERING EFFICIENCY USING BAYESIAN BACKWARD APPROACH PROJECT IMPROVING SPAM EMAIL FILTERING EFFICIENCY USING BAYESIAN BACKWARD APPROACH PROJECT M.SHESHIKALA Assistant Professor, SREC Engineering College,Warangal Email: marthakala08@gmail.com, Abstract- Unethical

More information

International Journal of Research in Advent Technology Available Online at: http://www.ijrat.org

International Journal of Research in Advent Technology Available Online at: http://www.ijrat.org IMPROVING PEFORMANCE OF BAYESIAN SPAM FILTER Firozbhai Ahamadbhai Sherasiya 1, Prof. Upen Nathwani 2 1 2 Computer Engineering Department 1 2 Noble Group of Institutions 1 firozsherasiya@gmail.com ABSTARCT:

More information

How to keep spam off your network

How to keep spam off your network What features to look for in anti-spam technology A buyers guide to anti-spam software, this white paper highlights the key features to look for in anti-spam software and why. GFI Software www.gfi.com

More information

Spam filtering. Peter Likarish Based on slides by EJ Jung 11/03/10

Spam filtering. Peter Likarish Based on slides by EJ Jung 11/03/10 Spam filtering Peter Likarish Based on slides by EJ Jung 11/03/10 What is spam? An unsolicited email equivalent to Direct Mail in postal service UCE (unsolicited commercial email) UBE (unsolicited bulk

More information

CAS-ICT at TREC 2005 SPAM Track: Using Non-Textual Information to Improve Spam Filtering Performance

CAS-ICT at TREC 2005 SPAM Track: Using Non-Textual Information to Improve Spam Filtering Performance CAS-ICT at TREC 2005 SPAM Track: Using Non-Textual Information to Improve Spam Filtering Performance Shen Wang, Bin Wang and Hao Lang, Xueqi Cheng Institute of Computing Technology, Chinese Academy of

More information

Lan, Mingjun and Zhou, Wanlei 2005, Spam filtering based on preference ranking, in Fifth International Conference on Computer and Information

Lan, Mingjun and Zhou, Wanlei 2005, Spam filtering based on preference ranking, in Fifth International Conference on Computer and Information Lan, Mingjun and Zhou, Wanlei 2005, Spam filtering based on preference ranking, in Fifth International Conference on Computer and Information Technology : CIT 2005 : proceedings : 21-23 September, 2005,

More information

Intercept Anti-Spam Quick Start Guide

Intercept Anti-Spam Quick Start Guide Intercept Anti-Spam Quick Start Guide Software Version: 6.5.2 Date: 5/24/07 PREFACE...3 PRODUCT DOCUMENTATION...3 CONVENTIONS...3 CONTACTING TECHNICAL SUPPORT...4 COPYRIGHT INFORMATION...4 OVERVIEW...5

More information

Spam Testing Methodology Opus One, Inc. March, 2007

Spam Testing Methodology Opus One, Inc. March, 2007 Spam Testing Methodology Opus One, Inc. March, 2007 This document describes Opus One s testing methodology for anti-spam products. This methodology has been used, largely unchanged, for four tests published

More information

INSIDE. Neural Network-based Antispam Heuristics. Symantec Enterprise Security. by Chris Miller. Group Product Manager Enterprise Email Security

INSIDE. Neural Network-based Antispam Heuristics. Symantec Enterprise Security. by Chris Miller. Group Product Manager Enterprise Email Security Symantec Enterprise Security WHITE PAPER Neural Network-based Antispam Heuristics by Chris Miller Group Product Manager Enterprise Email Security INSIDE What are neural networks? Why neural networks for

More information

How To Send Email From A Netbook To A Spam Box On A Pc Or Mac Or Mac (For A Mac) On A Mac Or Ipo (For An Ipo) On An Ipot Or Ipot (For Mac) (For

How To Send Email From A Netbook To A Spam Box On A Pc Or Mac Or Mac (For A Mac) On A Mac Or Ipo (For An Ipo) On An Ipot Or Ipot (For Mac) (For INSTITUTE of TECHNOLOGY CARLOW Intelligent Anti-Spam Technology User Manual Author: CHEN LIU (C00140374) Supervisor: Paul Barry Date: 16 th April 2010 Content 1. System Requirement... 3 2. Interface and

More information

Increasing the Accuracy of a Spam-Detecting Artificial Immune System

Increasing the Accuracy of a Spam-Detecting Artificial Immune System Increasing the Accuracy of a Spam-Detecting Artificial Immune System Terri Oda Carleton University terri@zone12.com Tony White Carleton University arpwhite@scs.carleton.ca Abstract- Spam, the electronic

More information

Detecting Spam Bots in Online Social Networking Sites: A Machine Learning Approach

Detecting Spam Bots in Online Social Networking Sites: A Machine Learning Approach Detecting Spam Bots in Online Social Networking Sites: A Machine Learning Approach Alex Hai Wang College of Information Sciences and Technology, The Pennsylvania State University, Dunmore, PA 18512, USA

More information

EXPLANATION OF COMMON SPAM FILTERING TECHNIQUES WHITEPAPER

EXPLANATION OF COMMON SPAM FILTERING TECHNIQUES WHITEPAPER EXPLANATION OF COMMON SPAM FILTERING TECHNIQUES WHITEPAPER Every year, the amount of unsolicited email received by the average email user increases dramatically. According to IDC, spam has accounted for

More information

Why Bayesian filtering is the most effective anti-spam technology

Why Bayesian filtering is the most effective anti-spam technology GFI White Paper Why Bayesian filtering is the most effective anti-spam technology Achieving a 98%+ spam detection rate using a mathematical approach This white paper describes how Bayesian filtering works

More information

Do you need to... Do you need to...

Do you need to... Do you need to... TM Guards your Email. Kills Spam and Viruses. Do you need to... Do you need to... Scan your e-mail traffic for Viruses? Scan your e-mail traffic for Viruses? Reduce time wasted dealing with Spam? Reduce

More information

A Proposed Algorithm for Spam Filtering Emails by Hash Table Approach

A Proposed Algorithm for Spam Filtering Emails by Hash Table Approach International Research Journal of Applied and Basic Sciences 2013 Available online at www.irjabs.com ISSN 2251-838X / Vol, 4 (9): 2436-2441 Science Explorer Publications A Proposed Algorithm for Spam Filtering

More information

Three types of messages: A, B, C. Assume A is the oldest type, and C is the most recent type.

Three types of messages: A, B, C. Assume A is the oldest type, and C is the most recent type. Chronological Sampling for Email Filtering Ching-Lung Fu 2, Daniel Silver 1, and James Blustein 2 1 Acadia University, Wolfville, Nova Scotia, Canada 2 Dalhousie University, Halifax, Nova Scotia, Canada

More information

Filtering Spam Using Search Engines

Filtering Spam Using Search Engines Filtering Spam Using Search Engines Oleg Kolesnikov, Wenke Lee, and Richard Lipton ok,wenke,rjl @cc.gatech.edu College of Computing Georgia Institute of Technology Atlanta, GA 30332 Abstract Spam filtering

More information

Adaption of Statistical Email Filtering Techniques

Adaption of Statistical Email Filtering Techniques Adaption of Statistical Email Filtering Techniques David Kohlbrenner IT.com Thomas Jefferson High School for Science and Technology January 25, 2007 Abstract With the rise of the levels of spam, new techniques

More information

Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification

Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification Tina R. Patil, Mrs. S. S. Sherekar Sant Gadgebaba Amravati University, Amravati tnpatil2@gmail.com, ss_sherekar@rediffmail.com

More information

Barracuda Spam Firewall Users Guide. How to Download, Review and Manage Spam

Barracuda Spam Firewall Users Guide. How to Download, Review and Manage Spam Barracuda Spam Firewall Users Guide How to Download, Review and Manage Spam By: Terence Peak July, 2007 1 Contents Reviewing Barracuda Messages... 3 Managing the Barracuda Quarantine Interface... 4 Preferences...4

More information

Typical spam characteristics

Typical spam characteristics Typical spam characteristics How to effectively block spam and junk mail By Mike Spykerman CEO Red Earth Software This article discusses how spam messages can be distinguished from legitimate messages

More information

Barracuda Spam Firewall

Barracuda Spam Firewall Barracuda Spam Firewall Overview The Barracuda Spam Firewall is a network appliance that scans every piece of email our organization receives. Its main purposes are to reduce the amount of spam we receive

More information

Data Pre-Processing in Spam Detection

Data Pre-Processing in Spam Detection IJSTE - International Journal of Science Technology & Engineering Volume 1 Issue 11 May 2015 ISSN (online): 2349-784X Data Pre-Processing in Spam Detection Anjali Sharma Dr. Manisha Manisha Dr. Rekha Jain

More information

Combining Global and Personal Anti-Spam Filtering

Combining Global and Personal Anti-Spam Filtering Combining Global and Personal Anti-Spam Filtering Richard Segal IBM Research Hawthorne, NY 10532 Abstract Many of the first successful applications of statistical learning to anti-spam filtering were personalized

More information

Spam Filtering using Naïve Bayesian Classification

Spam Filtering using Naïve Bayesian Classification Spam Filtering using Naïve Bayesian Classification Presented by: Samer Younes Outline What is spam anyway? Some statistics Why is Spam a Problem Major Techniques for Classifying Spam Transport Level Filtering

More information

Email Filters that use Spammy Words Only

Email Filters that use Spammy Words Only Email Filters that use Spammy Words Only Vasanth Elavarasan Department of Computer Science University of Texas at Austin Advisors: Mohamed Gouda Department of Computer Science University of Texas at Austin

More information

Evaluation of Anti-spam Method Combining Bayesian Filtering and Strong Challenge and Response

Evaluation of Anti-spam Method Combining Bayesian Filtering and Strong Challenge and Response Evaluation of Anti-spam Method Combining Bayesian Filtering and Strong Challenge and Response Abstract Manabu IWANAGA, Toshihiro TABATA, and Kouichi SAKURAI Kyushu University Graduate School of Information

More information

REVIEW AND ANALYSIS OF SPAM BLOCKING APPLICATIONS

REVIEW AND ANALYSIS OF SPAM BLOCKING APPLICATIONS REVIEW AND ANALYSIS OF SPAM BLOCKING APPLICATIONS Rami Khasawneh, Acting Dean, College of Business, Lewis University, khasawra@lewisu.edu Shamsuddin Ahmed, College of Business and Economics, United Arab

More information

Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing and Developing E-mail Classifier

Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing and Developing E-mail Classifier International Journal of Recent Technology and Engineering (IJRTE) ISSN: 2277-3878, Volume-1, Issue-6, January 2013 Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing

More information

Configuring MDaemon for Centralized Spam Blocking and Filtering

Configuring MDaemon for Centralized Spam Blocking and Filtering Configuring MDaemon for Centralized Spam Blocking and Filtering Alt-N Technologies, Ltd 2201 East Lamar Blvd, Suite 270 Arlington, TX 76006 (817) 525-2005 http://www.altn.com July 26, 2004 Contents A Centralized

More information

INBOX. How to make sure more emails reach your subscribers

INBOX. How to make sure more emails reach your subscribers INBOX How to make sure more emails reach your subscribers White Paper 2011 Contents 1. Email and delivery challenge 2 2. Delivery or deliverability? 3 3. Getting email delivered 3 4. Getting into inboxes

More information

Representation of Electronic Mail Filtering Profiles: A User Study

Representation of Electronic Mail Filtering Profiles: A User Study Representation of Electronic Mail Filtering Profiles: A User Study Michael J. Pazzani Department of Information and Computer Science University of California, Irvine Irvine, CA 92697 +1 949 824 5888 pazzani@ics.uci.edu

More information

Copyright Information. Confidentiality Notice. Anti-Spam Evaluation Guide Confidential November 2009 Page 2 of 16

Copyright Information. Confidentiality Notice. Anti-Spam Evaluation Guide Confidential November 2009 Page 2 of 16 Copyright Information Kaspersky is a registered trademark of Kaspersky Lab. Other trademarks found in this publication have been used for identification purposes only and may be the trademarks of their

More information

Fuzzy Logic for E-Mail Spam Deduction

Fuzzy Logic for E-Mail Spam Deduction Fuzzy Logic for E-Mail Spam Deduction P.SUDHAKAR 1, G.POONKUZHALI 2, K.THIAGARAJAN 3,R.KRIPA KESHAV 4, K.SARUKESI 5 1 Vernalis systems Pvt Ltd, Chennai- 600116 2,4 Department of Computer Science and Engineering,

More information

MDaemon configuration recommendations for dealing with spam related issues

MDaemon configuration recommendations for dealing with spam related issues Web: Introduction MDaemon configuration recommendations for dealing with spam related issues Without a doubt, our most common support queries these days fall into one of the following groups:- 1. Why did

More information

An Approach to Detect Spam Emails by Using Majority Voting

An Approach to Detect Spam Emails by Using Majority Voting An Approach to Detect Spam Emails by Using Majority Voting Roohi Hussain Department of Computer Engineering, National University of Science and Technology, H-12 Islamabad, Pakistan Usman Qamar Faculty,

More information

1 Choosing the right data mining techniques for the job (8 minutes,

1 Choosing the right data mining techniques for the job (8 minutes, CS490D Spring 2004 Final Solutions, May 3, 2004 Prof. Chris Clifton Time will be tight. If you spend more than the recommended time on any question, go on to the next one. If you can t answer it in the

More information

PineApp Anti IP Blacklisting

PineApp Anti IP Blacklisting PineApp Anti IP Blacklisting Whitepaper 2011 Overview ISPs outbound SMTP Services Individual SMTP relay, not server based (no specific protection solutions are stated between the sender and the ISP backbone)

More information

CommuniGator. Avoiding spam filters

CommuniGator. Avoiding spam filters CommuniGator Avoiding spam filters How to dodge the junk box; deliverability and avoiding spam filters Email marketers often have more to battle with than just creating an email and sending it to their

More information

About this documentation

About this documentation Wilkes University, Staff, and Students have a new email spam filter to protect against unwanted email messages. Barracuda SPAM Firewall will filter email for all campus email accounts before it gets to

More information

the barricademx end user interface documentation for barricademx users

the barricademx end user interface documentation for barricademx users the barricademx end user interface documentation for barricademx users BarricadeMX Plus The End User Interface This short document will show you how to use the end user web interface for the BarricadeMX

More information

Dealing with spam mail

Dealing with spam mail Vodafone Hosted Services Dealing with spam mail User guide Welcome. This guide will help you to set up anti-spam measures on your email accounts and domains. The main principle behind dealing with spam

More information

Some fitting of naive Bayesian spam filtering for Japanese environment

Some fitting of naive Bayesian spam filtering for Japanese environment Some fitting of naive Bayesian spam filtering for Japanese environment Manabu Iwanaga 1, Toshihiro Tabata 2, and Kouichi Sakurai 2 1 Graduate School of Information Science and Electrical Engineering, Kyushu

More information

Spam Filter Message Center. User Guide

Spam Filter Message Center. User Guide Spam Filter Message Center User Guide Powered by MX Resources, LLC 10573 W. Pico Blvd., #343 Los Angeles, CA 90064 (888) 556-7788 support@mxresources.com Introduction Introduction to the Postini Message

More information

Tightening the Net: A Review of Current and Next Generation Spam Filtering Tools

Tightening the Net: A Review of Current and Next Generation Spam Filtering Tools Tightening the Net: A Review of Current and Next Generation Spam Filtering Tools Spam Track Wednesday 1 March, 2006 APRICOT Perth, Australia James Carpinter & Ray Hunt Dept. of Computer Science and Software

More information

AN EVALUATION OF FILTERING TECHNIQUES A NAÏVE BAYESIAN ANTI-SPAM FILTER. Vikas P. Deshpande

AN EVALUATION OF FILTERING TECHNIQUES A NAÏVE BAYESIAN ANTI-SPAM FILTER. Vikas P. Deshpande AN EVALUATION OF FILTERING TECHNIQUES IN A NAÏVE BAYESIAN ANTI-SPAM FILTER by Vikas P. Deshpande A report submitted in partial fulfillment of the requirements for the degree of MASTER OF SCIENCE in Computer

More information

PANDA CLOUD EMAIL PROTECTION 4.0.1 1 User Manual 1

PANDA CLOUD EMAIL PROTECTION 4.0.1 1 User Manual 1 PANDA CLOUD EMAIL PROTECTION 4.0.1 1 User Manual 1 Contents 1. INTRODUCTION TO PANDA CLOUD EMAIL PROTECTION... 4 1.1. WHAT IS PANDA CLOUD EMAIL PROTECTION?... 4 1.1.1. Why is Panda Cloud Email Protection

More information

ModusMail Software Instructions.

ModusMail Software Instructions. ModusMail Software Instructions. Table of Contents Basic Quarantine Report Information. 2 Starting A WebMail Session. 3 WebMail Interface. 4 WebMail Setting overview (See Settings Interface).. 5 Account

More information

FireEye Email Threat Prevention Cloud Evaluation

FireEye Email Threat Prevention Cloud Evaluation Evaluation Prepared for FireEye June 9, 2015 Tested by ICSA Labs 1000 Bent Creek Blvd., Suite 200 Mechanicsburg, PA 17050 www.icsalabs.com Table of Contents Executive Summary... 1 Introduction... 1 About

More information

Abstract. Find out if your mortgage rate is too high, NOW. Free Search

Abstract. Find out if your mortgage rate is too high, NOW. Free Search Statistics and The War on Spam David Madigan Rutgers University Abstract Text categorization algorithms assign texts to predefined categories. The study of such algorithms has a rich history dating back

More information

Reputation Network Analysis for Email Filtering

Reputation Network Analysis for Email Filtering Reputation Network Analysis for Email Filtering Jennifer Golbeck, James Hendler University of Maryland, College Park MINDSWAP 8400 Baltimore Avenue College Park, MD 20742 {golbeck, hendler}@cs.umd.edu

More information

An evolutionary learning spam filter system

An evolutionary learning spam filter system An evolutionary learning spam filter system Catalin Stoean 1, Ruxandra Gorunescu 2, Mike Preuss 3, D. Dumitrescu 4 1 University of Craiova, Romania, catalin.stoean@inf.ucv.ro 2 University of Craiova, Romania,

More information

Quick Start Policy Patrol Spam Filter 5

Quick Start Policy Patrol Spam Filter 5 Quick Start Policy Patrol Spam Filter 5 This guide will help you start using Policy Patrol Spam Filter as quickly as possible. For more detailed instructions, consult the Policy Patrol manual. Step 1.

More information

Spam detection with data mining method:

Spam detection with data mining method: Spam detection with data mining method: Ensemble learning with multiple SVM based classifiers to optimize generalization ability of email spam classification Keywords: ensemble learning, SVM classifier,

More information

EMAIL SECURITY S INSIDER SECRETS

EMAIL SECURITY S INSIDER SECRETS EMAIL SECURITY S INSIDER SECRETS There s more to email security than spam block rates. Antivirus software has kicked the can. Don t believe it? Even Bryan Dye, Symantec s senior vice president for information

More information

Predicting the Stock Market with News Articles

Predicting the Stock Market with News Articles Predicting the Stock Market with News Articles Kari Lee and Ryan Timmons CS224N Final Project Introduction Stock market prediction is an area of extreme importance to an entire industry. Stock price is

More information

How To Filter Spam Image From A Picture By Color Or Color

How To Filter Spam Image From A Picture By Color Or Color Image Content-Based Email Spam Image Filtering Jianyi Wang and Kazuki Katagishi Abstract With the population of Internet around the world, email has become one of the main methods of communication among

More information

STATISTICA. Financial Institutions. Case Study: Credit Scoring. and

STATISTICA. Financial Institutions. Case Study: Credit Scoring. and Financial Institutions and STATISTICA Case Study: Credit Scoring STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Web-based Analytics Table of Contents INTRODUCTION: WHAT

More information

Emails and anti-spam Page 1

Emails and anti-spam Page 1 Emails and anti-spam Page 1 As the spammers become increasing aggressive more and more legit emails get banned as spam. When you send emails from your webcrm system, we use the webcrm servers to send emails

More information

Spam Filtering Based On The Analysis Of Text Information Embedded Into Images

Spam Filtering Based On The Analysis Of Text Information Embedded Into Images Journal of Machine Learning Research 7 (2006) 2699-2720 Submitted 3/06; Revised 9/06; Published 12/06 Spam Filtering Based On The Analysis Of Text Information Embedded Into Images Giorgio Fumera Ignazio

More information

How To Stop Spam From Being A Problem

How To Stop Spam From Being A Problem Solutions to Spam simple analysis of solutions to spam Thesis Submitted to Prof. Dr. Eduard Heindl on E-business technology in partial fulfilment for the degree of Master of Science in Business Consulting

More information