How To Filter Spam With A Poa
|
|
|
- Suzan Simmons
- 5 years ago
- Views:
Transcription
1 A Multiobjective Evolutionary Algorithm for Spam Filtering A.G. López-Herrera 1, E. Herrera-Viedma 2, F. Herrera 2 1.Dept. of Computer Sciences, University of Jaén, E-23071, Jaén (Spain), [email protected] 2.Dept. of Computer Sciences and A.I., University of Granada, E-18071, Granada (Spain), {viedma, herrera}@decsai.ugr.es Abstract Unsolicited Commercial , also known as spam, has been a major problem on the Internet. In this paper a well known Multiobjective Evolutionary Algorithm, NSGA-II, is first time used for spam filtering. NSGA-II is adapted to use Genetic Programming components to achieve a set of filtering rules with different profiles. 1. Introduction The Unsolicited Commercial , also known as spam, is commonplace everywhere in communication. Spam is a major and growing problem. It is estimated that in the month of May 2006, for example, 86% of all s sent were spam [1]. Spam is a costly problem and many experts agree it is only getting worse [2, 3, 4, 5, 6]. Because of the economics of spam and the difficulties inherent in stopping it, it is unlikely to go away soon. Consequently, a large amount of effort has been expended on devising effective filters to identify spam s. In recent years, personalized anti-spam filters of client applications based on content filters have now become the standard for spam filters [7, 8]. Spam filters may be implemented using rule-based filters [9], nearest neighbor classifiers [10], decision trees [11] and Bayesian classifiers [12], etc. s are filtered inconsistently across different users irrespective of the user s interest. Since users have different interests or business needs, a good antispam filtering system should take into account of the different users needs and interests into consideration and influence the overall decisions and behavior [13]. Spam filter design problem is naturally multiobjective. Given a spam filter, stopping as many spam s as possible is in direct conflict with preventing the filtering of legitimate s. In fact, the conflicting nature of decreasing the number of false /08/$ IEEE positives (fraction labelled as spam from the non-spam class) and the increasing the number of true positives (fraction labelled correctly as spam) is a very general problem in pattern recognition. In other words, it is difficult to design a filter which simultaneously optimises the precision and recall. The Spam Filtering (SF) problem can be considered as a specific Information Retrieval (IR) one. In IR there are usually two kinds of documents, relevant and non-relevant. From this IR point of view, spam s can be treated as non-relevant documents, and nonspam s as relevant ones. Defining a query for information retrieval is analogous to build a rule for spam filtering. In IR, a query is used to get relevant documents, whereas, in SF a rule is set for blocking undesirable s. Evolutionary Algorithms (EAs) have been used for IR purposes [14], being the query definition problem one of the most studied one. Concretely, the use of Multiobjective EAs (MOEAs) in the query definition problem has been proved as advantageous, due to MOEAs can learn a set of queries with good precisionrecall trade-off in a unique run [26]. This feature of MOEAs, applied to SF, would be used to get a set of filtering rules, each one defining a different profile, from very strong rule (high recall) to weak rules (high precision). In this paper, we treat the SF problem using concepts from the query definition problem. A new MOEA for SF is presented in this paper. It is called SPAM-NSGA-II-GP, and it is built on the basis of a well known MOEA, the Non-dominated Sorting Genetic Algorithm (NSGA-II) [15]. The performance of SPAM-NSGA-II-GP in the automatic building rule problem is analyzed using a public spam dataset. To do so, this paper is structured as follows. In Section 2, we first briefly describe the query definition problem. In Section 3, SPAM-NSGA-II-GP is introduced. Section 4 shows the experimental results. Finally, Section 5 summarizes several concluding remarks.
2 2. The Query Definition Problem This is the most extended group of applications of EAs to IR. Every proposal in this group use EAs either like a relevance feedback technique or like an Inductive Query by Example (IQBE) algorithm. The basis of relevance feedback lie in the fact that either users normally formulate queries composed of terms that do not match the terms used to index the relevant documents to their needs, or they do not provide the appropriate weights for the query terms. The operation mode involving modifying the previous query adding and removing terms or changing the weights of the existing query term staking into account the relevance judgements of the documents retrieved by it, constitutes a good way to solve the latter two problems and to improve the precision, and especially the recall, of the previous query [16]. IQBE was proposed in [17] as a process in which searchers provide sample documents (examples) and the algorithms induce (or learn) the key concepts in order to find other relevant documents. This way, IQBE is a process for assisting the users in the query formulation process performed by machine learning methods. It works taking a set of relevant documents (and optionally non-relevant documents) provided by the user and applying an automatic learning process to generate a query that describes the user information needs (represented by the previous set of documents). The query that is obtained can be executed to obtain new relevant documents. Several IQBE EAs for different IR models have been proposed and revised in [9]. The most well known, in the context of Boolean IR [16], IQBE approach is that of Smith & Smith [18], which is based on Genetic Programming (GP) [19], with queries being represented by expression syntax trees and where the algorithms are articulated on the basis of the classic operators: cross, mutation and selection. It is called Boolean IQBE-GP, and it is able to derive Boolean queries. As it is usual in the field [14], this approach is guided by a weighted fitness function combining the classical retrieval accuracy criteria, precision and recall [16]. In [20], Kraft et al. propose an IQBE technique to learn the whole composition of extended Boolean queries for Fuzzy IR systems. The algorithm is based on GP and the fuzzy queries are encoded in expression trees, whose terminal nodes are query terms with their respective weights and whose inner nodes are the Boolean operators. In this paper, the IQBE paradigm will be used, with spam s playing the role of non-relevant documents. 3. A new MOEA for Spam Filtering Following sections introduce: the classical performance measures used in SF, the objectives and essential notions of pareto set used in this paper, and the new MOEA for SF proposed Performance Measures There are several ways to measure the quality of a SF system, such as the system efficiency and effectiveness, and several subjective aspects related to user satisfaction. Traditionally, the filtering effectiveness is based on the judgement with respect to the user s needs or interests. There are different criteria to measure this aspect, but Precision ( ) and Recall ( ) [16] are the most used. Precision is the ratio between the spam s filtered by the SF system in response to a filtering rule and the total number of s filtered, whilst recall is the ratio between the number of spam s filtered and the total number of spam s in the user s inbox. The mathematical expression of each of them is: Esf Esf P ; R E E tf where E is the number of spam s filtered, sf is the total number of s filtered and is the total number of spam s in the user s inbox. and are defined in [0,1], being 1 the optimal value. We notice that the only way to know all the spam e- mails existing in an inbox (value used in the R measure) is to evaluate all s. Due to this fact and tacking into account that spam labelling is subjective, there are some public spam datasets: LingSpam [21], Spambase [22], SpamAssassin [23] and PU1 [24, 25], each one with a set of s being labelled as spam or nonspam, so that they can be used to verify the new proposals in the field. In this paper, we use the PU1 dataset Objectives and Pareto Set ts E ts Etf In multiobjective optimization problems, the definition of the quality concept is substantially more complex than in singleobjective ones, since the optimization processes imply several different objectives.
3 The key concepts to evaluate MOEAs are the dominance relation and the Pareto sets. The MOEA presented in this paper assume the two classical criteria to evaluate SF systems, i.e., Precision and Recall whose expressions are introduced in Subsection 3.1. The proposed MOEA assumes that all objectives have to be maximized. The solutions are represented by objective vectors, which are compared according to the dominance relation defined below and displayed in Figure 1. Definition 1. (Dominance relation) Let. Then is said to dominate, denoted as, iff 1., 2.. Based on the concept of dominance, the Pareto set can be defined as follows. Figure 1: Concept of dominance. Definition 2. (Pareto set) Let be a set of vectors. Then the Pareto set (see Figure 2) of is defined as follows: contains all vectors which are not dominated by any vector, i.e.: 3.3. SPAM-NSGA-II-GP NSGA-II [15] is a MOEA that incorporates a preservation strategy of an elite population and uses an explicit mechanism (crowded comparison operator) to preserve diversity. Figure 2: Concept of pareto set. NSGA-II works with an offspring s population Q t, which is created using the predecessor population P t. Both populations (Q t and P t ) are combined to form an unique population R t, with a size 2 M, that is examined in order to extract the front of the Pareto. Then, an arrangement on the non-dominated individuals is done to classify the R t population. Although this implies a greater effort compared with the arrangement of the set Q t, it allows a global verification of the non-dominated solutions, that belong as well as to the population of offsprings or to the one of the predecessors. Once the arrangement of the non-dominated individuals finishes, the new generation (population) P t+1 is formed with solutions of the different nondominated fronts ( ), taking them alternatively from each of the fronts. It begins with the best front of non-dominated individuals and continues with the solutions of the second one, and so on. Since the R t size is 2 M, it is possible that some of the front solutions have to be eliminated to form the new population. In the last states of the execution, it is usual that the majority of the solutions are in the best front of nondominated solutions. It is also probable that the size of the best front of the combined population R t is bigger than M. It is then, when the previous algorithm assures the selection of a diverse set of solutions of this front by means of the crowded comparison operator (the NSGA-II procedure is shown in Figure 3). When the whole population converges to the Pareto-optimal frontier, the algorithm continues, so that the best distribution between the solutions is assured.
4 Initial population: All individuals of the first generation are generated in a random way. The population is created including all the terms in the spam s given by the user. Those that appear in more spam s will have greater probability of being selected. 4. Experimental Study This subsection is divided into two parts according to the following contents: experimental framework and experimental results Experimental Framework Figure 3: NSGA-II procedure. In this paper NSGA-II will be adapted to use GP components. It will be denoted as SPAM-NSGA-II-GP. The SPAM-NSGA-II-GP proposed in this paper has the following components: Codification scheme: Boolean rules are encoded in expression syntax trees, whose terminal nodes are terms and whose inner nodes are the Boolean operators AND, OR and NOT. Hence, the natural representation is to encode the rule within a tree and to work with a GP algorithm [19] to evolve it, as done by previous approaches devoted to the derivation of Boolean queries in the field of IR [18, 26]. An example of rule can be seen in Figure 4. OR AND OR Viagra Cialis Pharma $ Figure 4. Example of spam filtering rule. Crossover operator: Subtrees are randomly selected and crossed-over in two randomly selected rules. Mutation operator: A randomly selected term or operator is changed in a randomly selected tree. The experimental study has been developed using the PU1 dataset [24]. PU1 is a mixture of 481 spam messages and 618 legitimate messages received by a particular user, after replacing each token (i.e., word, number, punctuation mark, etc.) by a unique number throughout the corpus. Only the earliest five legitimate messages of each sender are retained. Attachments, HTML tags, and duplicate spam messages received on the same day are not included. For more information see [25]. The role of the user who provides spam s will be played by different sets of spam s. In PU1 each (spam or legitimate) is stored in a separated file. Filenames with the form *spmsg*.txt are spam messages. Files whose names have the form *legit*.txt are legitimate messages. The first number in each filename is random; it was originally 1 used to shuffle the messages. The second number was the initial identifier of the message; it does not reflect the order in which the messages were received. To bypass privacy issues, the messages are encoded, as explained in the paper [25]. The second part in the filename of spam files has an initial letter (A, B or C). Three spam groups have been created. Filenames with letter A are associated to group A, those with letter B are colleted by the group B, and finally, filenames with letter C are grouped by group C. The size of these groups is 166, 146 and 169 respectively. In this way, for example, if there are 166 spam s in group A, this situation will mimic a situation in which the user provides 166 spam s related with his/her undesirable commercial s. Besides, the remaining 933 s ( ) will be considered as non-spam s for the IQBE process. 1 This is not used in this paper.
5 SPAM-NSGA-II-GP generates a set of rules from the spam and the non-spam s sets. The SPAM-NSGA-II-GP in this contribution has been run 30 times for each spam group (a total of 90 runs) with different initializations. The number of chromosome evaluations is per run. A 2.0GHz Pentium Core 2 Duo computer with 1Gb of RAM was used. The common parameter values considered are shown in Table 1. Table 1. Common parameters. Parameter Value Cross probability 0.8 Mutation probability 0.2 Maximum nodes per rule 19 Population size (M) 800 In this contribution, a MOEA for spam filtering purposes has been developed. This MOEA has been called SPAM-NSGA-II-GP. It has been tested using a public spam dataset, and the benefits of using MOEAs in the SF process under the IQBE parading have been also proved. SPAM-NSGA-II-GP provides a flexible way to set a filtering rule profile in a SF system. User can decide the level of anti-spam security it is desirable in his/her client. SPAM-NSGA-II-GP would be continuously learning from the user s s, getting new filtering rules SPAM-NSGA-II-GP Experimental Results In Table 2 we present the average number of different rules (in the decision space) for the 30 runs for each spam group for the PU1 dataset. We can observe SPAM-NSGA-II-GP achieves an extraordinarily number of different rules in a unique run. Table 2. Average number of different rules (in the decision space) for each spam group on PU1 dataset. Group Rules A 270,2 B 328,3 C 232,2 In Figure 5, we can see the Pareto in a representative run of SPAM-NSGA-II-GP. We can appreciate that SPAM-NSGA-II-GP achieves 27 filtering rules profiles, from very strong filtering rules (high recall and low precision) to weak rules (high precision and low recall). A set of intermediate profile is also get. With these different filtering rules profiles, user can decide how strong the SF systems will be set. For example, using strong filtering rules, the SF system will block as many spam s as possible, although a high number of legitimate s will be also labelled as spam (false positives). The inverse situation can be also set, i.e., a minimum portion of spam s will be labelled. 5. Concluding Remarks Recall Precision Figure 5. Example of the Pareto front along one representative run of SPAM-NSGA-II-GP on PU1 (group A). Acknowledgments This paper has been supported by the projects: FUZZY-LING, Cod. TIN and Proyecto de Excelencia de la Junta de Andalucía SAINFOWEB, Cod References [1] G. J. Koprowski. Spam accounts for most traffic, Tech News World (2006). Available: [2] L. F. Cranor and B. A. LaMacchia. Spam! CACM, 41:8 (1998), pp [3] S. Machlis. Uh-oh: Spam's getting more sophisticated. Computerworld, Jan [4] J. Gleick. Tangled up in spam. New York Times, Feb Available: magazine/09spam.html. [5] K. Schneider. Fighting spam in real time. In Proceedings of the 2003 Spam Conference, Jan Available:
6 Spam_Conference/. [6] L. Weinstein. Inside risks: Spam wars. CACM, 46:8 (2003), pp [7] S. Hinde. Spam: the evolution of a nuisance. Computer Security 22(6) (2003), pp [8] Z. Gyongyi, H. Garcia-Molina. Spam: it s not just for inboxes anymore. IEEE Computer 38:10 (2005), pp [9] Mason J. The SpamAssassin homepage. (2004). Available: [10] G. Sakkis, I. Androutsopoulos, G. Paliouras, V. Karkaletsis, C. Spyropoulos, P. Stamatopoulos. A memory based approach to anti-spam filtering for mailing lists. Information Retrieval 6 (2003), pp [11] X. Carreras, L. Marquez. Boosting trees for anti-spam filtering. In: Proceedings of RANLP-01, 4th international conference on recent advances in natural language processing (2001). [12] Androutsopoulos I, Koutsias J, Chandrinos KV, Paliouras G, Spyropoulos CD (2000) An evaluation of naive Bayesian antispam filtering. In: Proceedings of the workshop on machine learning in the new information age. 11th European Conference on Machine Learning, Barcelona, Spain ( 2000), pp [13] X. Yue, A. Abraham, Z.X. Chi, Y.Y. Hao, H. Mo. Artificial immune system inspired behavior-based anti-spam filter. Soft Computing 11:8 (2007), pp [14] O. Cordón, E. Herrera-Viedma, C. López-Pujalte, M. Luque, and C. Zarco, A review on the application of evolutionary computation to information retrieval, International Journal of Approximate Reasoning 34 (2003), pp [15] K. Deb, A. Pratap, S. Agrawal, and T. Meyarivan, A fast and elitist multiobjective genetic algorithm: NSGA-II, IEEE Transactions on Evolutionary Computation 6 (2002), pp [16] C.J. van Rijsbergen, Information retrieval, Butterworth, [17] H. Chen, G. Shankaranarayanan, L. She, and A. Iyer, A machine learning approach to inductive query by example: An experiment using relevance feedback, ID3, genetic algorithms, and simulated annealing, Journal of the American Society for Information Science 49:8 (1998), pp [18] M. P Smith and M. Smith, The use of genetic programming to build Boolean queries for text retrieval through relevance feedback, Journal of Information Science 23:6 (1997), pp [19] J. Koza, Genetic programming. on the programming of computers by means of natural selection, The MIT Press, [20] D.H. Kraft, F.E. Petry, B.P. Buckles, and T. Sadasivan. Genetic algorithms and fuzzy logic systems, ch. Genetic algorithms for query optimization in information retrieval: relevance feedback, pp , Sanchez, E. and Shibata, T. and Zadeh, L.A., Eds., (World Scientific), [21] LingSpam dataset. Donwloadable from [22] Spambase dataset. Donwloadable from [23] SpamAssasin dataset. Donwloadable from [24] PU1 dataset. Donwloadable from pu1_encoded.tar.gz [25] I. Androutsopoulos, J. Koutsias, K.V. Chandrinos, C.D. Spyropoulos. An experimental comparison of naive Bayesian and keyword-based anti-spam filtering with personal messages. Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval (2000), pp [26] O. Cordón, E. Herrera-Viedma, and M. Luque. Improving the learning of boolean queries by means a multiobjective IQBE evolutionary algorithm, Information Processing & Management 42:3 (2006), pp
Bayesian Spam Filtering
Bayesian Spam Filtering Ahmed Obied Department of Computer Science University of Calgary [email protected] http://www.cpsc.ucalgary.ca/~amaobied Abstract. With the enormous amount of spam messages propagating
CAS-ICT at TREC 2005 SPAM Track: Using Non-Textual Information to Improve Spam Filtering Performance
CAS-ICT at TREC 2005 SPAM Track: Using Non-Textual Information to Improve Spam Filtering Performance Shen Wang, Bin Wang and Hao Lang, Xueqi Cheng Institute of Computing Technology, Chinese Academy of
A Content based Spam Filtering Using Optical Back Propagation Technique
A Content based Spam Filtering Using Optical Back Propagation Technique Sarab M. Hameed 1, Noor Alhuda J. Mohammed 2 Department of Computer Science, College of Science, University of Baghdad - Iraq ABSTRACT
A Two-Pass Statistical Approach for Automatic Personalized Spam Filtering
A Two-Pass Statistical Approach for Automatic Personalized Spam Filtering Khurum Nazir Junejo, Mirza Muhammad Yousaf, and Asim Karim Dept. of Computer Science, Lahore University of Management Sciences
A MACHINE LEARNING APPROACH TO SERVER-SIDE ANTI-SPAM E-MAIL FILTERING 1 2
UDC 004.75 A MACHINE LEARNING APPROACH TO SERVER-SIDE ANTI-SPAM E-MAIL FILTERING 1 2 I. Mashechkin, M. Petrovskiy, A. Rozinkin, S. Gerasimov Computer Science Department, Lomonosov Moscow State University,
Three types of messages: A, B, C. Assume A is the oldest type, and C is the most recent type.
Chronological Sampling for Email Filtering Ching-Lung Fu 2, Daniel Silver 1, and James Blustein 2 1 Acadia University, Wolfville, Nova Scotia, Canada 2 Dalhousie University, Halifax, Nova Scotia, Canada
Detecting E-mail Spam Using Spam Word Associations
Detecting E-mail Spam Using Spam Word Associations N.S. Kumar 1, D.P. Rana 2, R.G.Mehta 3 Sardar Vallabhbhai National Institute of Technology, Surat, India 1 [email protected] 2 [email protected]
Lan, Mingjun and Zhou, Wanlei 2005, Spam filtering based on preference ranking, in Fifth International Conference on Computer and Information
Lan, Mingjun and Zhou, Wanlei 2005, Spam filtering based on preference ranking, in Fifth International Conference on Computer and Information Technology : CIT 2005 : proceedings : 21-23 September, 2005,
IMPROVING SPAM EMAIL FILTERING EFFICIENCY USING BAYESIAN BACKWARD APPROACH PROJECT
IMPROVING SPAM EMAIL FILTERING EFFICIENCY USING BAYESIAN BACKWARD APPROACH PROJECT M.SHESHIKALA Assistant Professor, SREC Engineering College,Warangal Email: [email protected], Abstract- Unethical
1 Introductory Comments. 2 Bayesian Probability
Introductory Comments First, I would like to point out that I got this material from two sources: The first was a page from Paul Graham s website at www.paulgraham.com/ffb.html, and the second was a paper
[email protected]. Keywords - Algorithm, Artificial immune system, E-mail Classification, Non-Spam, Spam
An Improved AIS Based E-mail Classification Technique for Spam Detection Ismaila Idris Dept of Cyber Security Science, Fed. Uni. Of Tech. Minna, Niger State [email protected] Abdulhamid Shafi i
Three-Way Decisions Solution to Filter Spam Email: An Empirical Study
Three-Way Decisions Solution to Filter Spam Email: An Empirical Study Xiuyi Jia 1,4, Kan Zheng 2,WeiweiLi 3, Tingting Liu 2, and Lin Shang 4 1 School of Computer Science and Technology, Nanjing University
Introduction To Genetic Algorithms
1 Introduction To Genetic Algorithms Dr. Rajib Kumar Bhattacharjya Department of Civil Engineering IIT Guwahati Email: [email protected] References 2 D. E. Goldberg, Genetic Algorithm In Search, Optimization
Multi-Objective Optimization using Evolutionary Algorithms
Multi-Objective Optimization using Evolutionary Algorithms Kalyanmoy Deb Department of Mechanical Engineering, Indian Institute of Technology, Kanpur, India JOHN WILEY & SONS, LTD Chichester New York Weinheim
Electric Distribution Network Multi objective Design Using Problem Specific Genetic Algorithm
Electric Distribution Network Multi objective Design Using Problem Specific Genetic Algorithm 1 Parita Vinodbhai Desai, 2 Jignesh Patel, 3 Sangeeta Jagdish Gurjar 1 Department of Electrical Engineering,
An Evolutionary Algorithm in Grid Scheduling by multiobjective Optimization using variants of NSGA
International Journal of Scientific and Research Publications, Volume 2, Issue 9, September 2012 1 An Evolutionary Algorithm in Grid Scheduling by multiobjective Optimization using variants of NSGA Shahista
A Multi-Objective Performance Evaluation in Grid Task Scheduling using Evolutionary Algorithms
A Multi-Objective Performance Evaluation in Grid Task Scheduling using Evolutionary Algorithms MIGUEL CAMELO, YEZID DONOSO, HAROLD CASTRO Systems and Computer Engineering Department Universidad de los
An evolutionary learning spam filter system
An evolutionary learning spam filter system Catalin Stoean 1, Ruxandra Gorunescu 2, Mike Preuss 3, D. Dumitrescu 4 1 University of Craiova, Romania, [email protected] 2 University of Craiova, Romania,
Package NHEMOtree. February 19, 2015
Type Package Package NHEMOtree February 19, 2015 Title Non-hierarchical evolutionary multi-objective tree learner to perform cost-sensitive classification Depends partykit, emoa, sets, rpart Version 1.0
Abstract. Find out if your mortgage rate is too high, NOW. Free Search
Statistics and The War on Spam David Madigan Rutgers University Abstract Text categorization algorithms assign texts to predefined categories. The study of such algorithms has a rich history dating back
Multi-objective Approaches to Optimal Testing Resource Allocation in Modular Software Systems
Multi-objective Approaches to Optimal Testing Resource Allocation in Modular Software Systems Zai Wang 1, Ke Tang 1 and Xin Yao 1,2 1 Nature Inspired Computation and Applications Laboratory (NICAL), School
Multiobjective Multicast Routing Algorithm
Multiobjective Multicast Routing Algorithm Jorge Crichigno, Benjamín Barán P. O. Box 9 - National University of Asunción Asunción Paraguay. Tel/Fax: (+9-) 89 {jcrichigno, bbaran}@cnc.una.py http://www.una.py
Naive Bayes Spam Filtering Using Word-Position-Based Attributes
Naive Bayes Spam Filtering Using Word-Position-Based Attributes Johan Hovold Department of Computer Science Lund University Box 118, 221 00 Lund, Sweden [email protected] Abstract This paper
An Efficient Three-phase Email Spam Filtering Technique
An Efficient Three-phase Email Filtering Technique Tarek M. Mahmoud 1 *, Alaa Ismail El-Nashar 2 *, Tarek Abd-El-Hafeez 3 *, Marwa Khairy 4 * 1, 2, 3 Faculty of science, Computer Sci. Dept., Minia University,
Email Spam Detection A Machine Learning Approach
Email Spam Detection A Machine Learning Approach Ge Song, Lauren Steimle ABSTRACT Machine learning is a branch of artificial intelligence concerned with the creation and study of systems that can learn
A Fast and Elitist Multiobjective Genetic Algorithm: NSGA-II
182 IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, VOL. 6, NO. 2, APRIL 2002 A Fast and Elitist Multiobjective Genetic Algorithm: NSGA-II Kalyanmoy Deb, Associate Member, IEEE, Amrit Pratap, Sameer Agarwal,
D A T A M I N I N G C L A S S I F I C A T I O N
D A T A M I N I N G C L A S S I F I C A T I O N FABRICIO VOZNIKA LEO NARDO VIA NA INTRODUCTION Nowadays there is huge amount of data being collected and stored in databases everywhere across the globe.
An Approach to Detect Spam Emails by Using Majority Voting
An Approach to Detect Spam Emails by Using Majority Voting Roohi Hussain Department of Computer Engineering, National University of Science and Technology, H-12 Islamabad, Pakistan Usman Qamar Faculty,
Evolutionary Detection of Rules for Text Categorization. Application to Spam Filtering
Advances in Intelligent Systems and Technologies Proceedings ECIT2004 - Third European Conference on Intelligent Systems and Technologies Iasi, Romania, July 21-23, 2004 Evolutionary Detection of Rules
Journal of Information Technology Impact
Journal of Information Technology Impact Vol. 8, No., pp. -0, 2008 Probability Modeling for Improving Spam Filtering Parameters S. C. Chiemeke University of Benin Nigeria O. B. Longe 2 University of Ibadan
DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES
DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES Vijayalakshmi Mahanra Rao 1, Yashwant Prasad Singh 2 Multimedia University, Cyberjaya, MALAYSIA 1 [email protected]
MULTI-OBJECTIVE OPTIMIZATION USING PARALLEL COMPUTATIONS
MULTI-OBJECTIVE OPTIMIZATION USING PARALLEL COMPUTATIONS Ausra Mackute-Varoneckiene, Antanas Zilinskas Institute of Mathematics and Informatics, Akademijos str. 4, LT-08663 Vilnius, Lithuania, [email protected],
How To Reduce Spam To A Legitimate Email From A Spam Message To A Spam Email
Enhancing Scalability in Anomaly-based Email Spam Filtering Carlos Laorden DeustoTech Computing - S 3 Lab, University of Deusto Avenida de las Universidades 24, 48007 Bilbao, Spain [email protected] Borja
Feature Subset Selection in E-mail Spam Detection
Feature Subset Selection in E-mail Spam Detection Amir Rajabi Behjat, Universiti Technology MARA, Malaysia IT Security for the Next Generation Asia Pacific & MEA Cup, Hong Kong 14-16 March, 2012 Feature
Simple Population Replacement Strategies for a Steady-State Multi-Objective Evolutionary Algorithm
Simple Population Replacement Strategies for a Steady-State Multi-Objective Evolutionary Christine L. Mumford School of Computer Science, Cardiff University PO Box 916, Cardiff CF24 3XF, United Kingdom
A Case-Based Approach to Spam Filtering that Can Track Concept Drift
A Case-Based Approach to Spam Filtering that Can Track Concept Drift Pádraig Cunningham 1, Niamh Nowlan 1, Sarah Jane Delany 2, Mads Haahr 1 1 Department of Computer Science, Trinity College Dublin 2 School
Index Terms- Batch Scheduling, Evolutionary Algorithms, Multiobjective Optimization, NSGA-II.
Batch Scheduling By Evolutionary Algorithms for Multiobjective Optimization Charmi B. Desai, Narendra M. Patel L.D. College of Engineering, Ahmedabad Abstract - Multi-objective optimization problems are
Towards better accuracy for Spam predictions
Towards better accuracy for Spam predictions Chengyan Zhao Department of Computer Science University of Toronto Toronto, Ontario, Canada M5S 2E4 [email protected] Abstract Spam identification is crucial
Differential Voting in Case Based Spam Filtering
Differential Voting in Case Based Spam Filtering Deepak P, Delip Rao, Deepak Khemani Department of Computer Science and Engineering Indian Institute of Technology Madras, India [email protected],
A New Multi-objective Evolutionary Optimisation Algorithm: The Two-Archive Algorithm
A New Multi-objective Evolutionary Optimisation Algorithm: The Two-Archive Algorithm Kata Praditwong 1 and Xin Yao 2 The Centre of Excellence for Research in Computational Intelligence and Applications(CERCIA),
Sender and Receiver Addresses as Cues for Anti-Spam Filtering Chih-Chien Wang
Sender and Receiver Addresses as Cues for Anti-Spam Filtering Chih-Chien Wang Graduate Institute of Information Management National Taipei University 69, Sec. 2, JianGuo N. Rd., Taipei City 104-33, Taiwan
Chapter 6. The stacking ensemble approach
82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described
Improving the Performance of Heuristic Spam Detection using a Multi-Objective Genetic Algorithm. James Dudley
Improving the Performance of Heuristic Spam Detection using a Multi-Objective Genetic Algorithm James Dudley This report is submitted as partial fulfilment of the requirements for the Honours Programme
Anti-Spam Filter Based on Naïve Bayes, SVM, and KNN model
AI TERM PROJECT GROUP 14 1 Anti-Spam Filter Based on,, and model Yun-Nung Chen, Che-An Lu, Chao-Yu Huang Abstract spam email filters are a well-known and powerful type of filters. We construct different
Multi-Objective Optimization to Workflow Grid Scheduling using Reference Point based Evolutionary Algorithm
Multi-Objective Optimization to Workflow Grid Scheduling using Reference Point based Evolutionary Algorithm Ritu Garg Assistant Professor Computer Engineering Department National Institute of Technology,
Adaptive Filtering of SPAM
Adaptive Filtering of SPAM L. Pelletier, J. Almhana, V. Choulakian GRETI, University of Moncton Moncton, N.B.,Canada E1A 3E9 {elp6880, almhanaj, choulav}@umoncton.ca Abstract In this paper, we present
Web Document Clustering
Web Document Clustering Lab Project based on the MDL clustering suite http://www.cs.ccsu.edu/~markov/mdlclustering/ Zdravko Markov Computer Science Department Central Connecticut State University New Britain,
Impact of Feature Selection Technique on Email Classification
Impact of Feature Selection Technique on Email Classification Aakanksha Sharaff, Naresh Kumar Nagwani, and Kunal Swami Abstract Being one of the most powerful and fastest way of communication, the popularity
Filtering Junk Mail with A Maximum Entropy Model
Filtering Junk Mail with A Maximum Entropy Model ZHANG Le and YAO Tian-shun Institute of Computer Software & Theory. School of Information Science & Engineering, Northeastern University Shenyang, 110004
Machine Learning. Chapter 18, 21. Some material adopted from notes by Chuck Dyer
Machine Learning Chapter 18, 21 Some material adopted from notes by Chuck Dyer What is learning? Learning denotes changes in a system that... enable a system to do the same task more efficiently the next
Anti Spamming Techniques
Anti Spamming Techniques Written by Sumit Siddharth In this article will we first look at some of the existing methods to identify an email as a spam? We look at the pros and cons of the existing methods
An Efficient Spam Filtering Techniques for Email Account
American Journal of Engineering Research (AJER) e-issn : 2320-0847 p-issn : 2320-0936 Volume-02, Issue-10, pp-63-73 www.ajer.org Research Paper Open Access An Efficient Spam Filtering Techniques for Email
Multi-Objective Genetic Test Generation for Systems-on-Chip Hardware Verification
Multi-Objective Genetic Test Generation for Systems-on-Chip Hardware Verification Adriel Cheng Cheng-Chew Lim The University of Adelaide, Australia 5005 Abstract We propose a test generation method employing
Model-based Parameter Optimization of an Engine Control Unit using Genetic Algorithms
Symposium on Automotive/Avionics Avionics Systems Engineering (SAASE) 2009, UC San Diego Model-based Parameter Optimization of an Engine Control Unit using Genetic Algorithms Dipl.-Inform. Malte Lochau
GPSQL Miner: SQL-Grammar Genetic Programming in Data Mining
GPSQL Miner: SQL-Grammar Genetic Programming in Data Mining Celso Y. Ishida, Aurora T. R. Pozo Computer Science Department - Federal University of Paraná PO Box: 19081, Centro Politécnico - Jardim das
Email Filters that use Spammy Words Only
Email Filters that use Spammy Words Only Vasanth Elavarasan Department of Computer Science University of Texas at Austin Advisors: Mohamed Gouda Department of Computer Science University of Texas at Austin
GA as a Data Optimization Tool for Predictive Analytics
GA as a Data Optimization Tool for Predictive Analytics Chandra.J 1, Dr.Nachamai.M 2,Dr.Anitha.S.Pillai 3 1Assistant Professor, Department of computer Science, Christ University, Bangalore,India, [email protected]
Combining SVM classifiers for email anti-spam filtering
Combining SVM classifiers for email anti-spam filtering Ángela Blanco Manuel Martín-Merino Abstract Spam, also known as Unsolicited Commercial Email (UCE) is becoming a nightmare for Internet users and
Spam Filter: VSM based Intelligent Fuzzy Decision Maker
IJCST Vo l. 1, Is s u e 1, Se p te m b e r 2010 ISSN : 0976-8491(Online Spam Filter: VSM based Intelligent Fuzzy Decision Maker Dr. Sonia YMCA University of Science and Technology, Faridabad, India E-mail
An Information Retrieval using weighted Index Terms in Natural Language document collections
Internet and Information Technology in Modern Organizations: Challenges & Answers 635 An Information Retrieval using weighted Index Terms in Natural Language document collections Ahmed A. A. Radwan, Minia
Email Spam Detection Using Customized SimHash Function
International Journal of Research Studies in Computer Science and Engineering (IJRSCSE) Volume 1, Issue 8, December 2014, PP 35-40 ISSN 2349-4840 (Print) & ISSN 2349-4859 (Online) www.arcjournals.org Email
Hiroyuki Sato. Minami Miyakawa. Keiki Takadama ABSTRACT. Categories and Subject Descriptors. General Terms
Controlling election Area of Useful Infeasible olutions and Their Archive for Directed Mating in Evolutionary Constrained Multiobjective Optimization Minami Miyakawa The University of Electro-Communications
Non-Parametric Spam Filtering based on knn and LSA
Non-Parametric Spam Filtering based on knn and LSA Preslav Ivanov Nakov Panayot Markov Dobrikov Abstract. The paper proposes a non-parametric approach to filtering of unsolicited commercial e-mail messages,
SpamNet Spam Detection Using PCA and Neural Networks
SpamNet Spam Detection Using PCA and Neural Networks Abhimanyu Lad B.Tech. (I.T.) 4 th year student Indian Institute of Information Technology, Allahabad Deoghat, Jhalwa, Allahabad, India [email protected]
Spam Filtering Based On The Analysis Of Text Information Embedded Into Images
Journal of Machine Learning Research 7 (2006) 2699-2720 Submitted 3/06; Revised 9/06; Published 12/06 Spam Filtering Based On The Analysis Of Text Information Embedded Into Images Giorgio Fumera Ignazio
Programming Risk Assessment Models for Online Security Evaluation Systems
Programming Risk Assessment Models for Online Security Evaluation Systems Ajith Abraham 1, Crina Grosan 12, Vaclav Snasel 13 1 Machine Intelligence Research Labs, MIR Labs, http://www.mirlabs.org 2 Babes-Bolyai
Dr. D. Y. Patil College of Engineering, Ambi,. University of Pune, M.S, India University of Pune, M.S, India
Volume 4, Issue 6, June 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Effective Email
A Hybrid ACO Based Feature Selection Method for Email Spam Classification
A Hybrid ACO Based Feature Selection Method for Email Spam Classification KARTHIKA RENUKA D 1, VISALAKSHI P 2 1 Department of Information Technology 2 Department of Electronics and Communication Engineering
A Three-Way Decision Approach to Email Spam Filtering
A Three-Way Decision Approach to Email Spam Filtering Bing Zhou, Yiyu Yao, and Jigang Luo Department of Computer Science, University of Regina Regina, Saskatchewan, Canada S4S 0A2 {zhou200b,yyao,luo226}@cs.uregina.ca
Less naive Bayes spam detection
Less naive Bayes spam detection Hongming Yang Eindhoven University of Technology Dept. EE, Rm PT 3.27, P.O.Box 53, 5600MB Eindhoven The Netherlands. E-mail:[email protected] also CoSiNe Connectivity Systems
A Binary Model on the Basis of Imperialist Competitive Algorithm in Order to Solve the Problem of Knapsack 1-0
212 International Conference on System Engineering and Modeling (ICSEM 212) IPCSIT vol. 34 (212) (212) IACSIT Press, Singapore A Binary Model on the Basis of Imperialist Competitive Algorithm in Order
Multilingual Rules for Spam Detection
Multilingual Rules for Spam Detection Minh Tuan Vu 1, Quang Anh Tran 1, Frank Jiang 2 and Van Quan Tran 1 1 Faculty of Information Technology, Hanoi University, Hanoi, Vietnam 2 School of Engineering and
Genetic Algorithms for Bridge Maintenance Scheduling. Master Thesis
Genetic Algorithms for Bridge Maintenance Scheduling Yan ZHANG Master Thesis 1st Examiner: Prof. Dr. Hans-Joachim Bungartz 2nd Examiner: Prof. Dr. rer.nat. Ernst Rank Assistant Advisor: DIPL.-ING. Katharina
Spam Filtering with Naive Bayesian Classification
Spam Filtering with Naive Bayesian Classification Khuong An Nguyen Queens College University of Cambridge L101: Machine Learning for Language Processing MPhil in Advanced Computer Science 09-April-2011
Image Spam Filtering by Content Obscuring Detection
Image Spam Filtering by Content Obscuring Detection Battista Biggio, Giorgio Fumera, Ignazio Pillai, Fabio Roli Dept. of Electrical and Electronic Eng., University of Cagliari Piazza d Armi, 09123 Cagliari,
On Attacking Statistical Spam Filters
On Attacking Statistical Spam Filters Gregory L. Wittel and S. Felix Wu Department of Computer Science University of California, Davis One Shields Avenue, Davis, CA 95616 USA Abstract. The efforts of anti-spammers
Genetic Algorithm. Based on Darwinian Paradigm. Intrinsically a robust search and optimization mechanism. Conceptual Algorithm
24 Genetic Algorithm Based on Darwinian Paradigm Reproduction Competition Survive Selection Intrinsically a robust search and optimization mechanism Slide -47 - Conceptual Algorithm Slide -48 - 25 Genetic
Simple Language Models for Spam Detection
Simple Language Models for Spam Detection Egidio Terra Faculty of Informatics PUC/RS - Brazil Abstract For this year s Spam track we used classifiers based on language models. These models are used to
Multiobjective Optimization and Evolutionary Algorithms for the Application Mapping Problem in Multiprocessor System-on-Chip Design
358 IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, VOL. 10, NO. 3, JUNE 2006 Multiobjective Optimization and Evolutionary Algorithms for the Application Mapping Problem in Multiprocessor System-on-Chip
2. NAIVE BAYES CLASSIFIERS
Spam Filtering with Naive Bayes Which Naive Bayes? Vangelis Metsis Institute of Informatics and Telecommunications, N.C.S.R. Demokritos, Athens, Greece Ion Androutsopoulos Department of Informatics, Athens
Detecting Spam Bots in Online Social Networking Sites: A Machine Learning Approach
Detecting Spam Bots in Online Social Networking Sites: A Machine Learning Approach Alex Hai Wang College of Information Sciences and Technology, The Pennsylvania State University, Dunmore, PA 18512, USA
Numerical Research on Distributed Genetic Algorithm with Redundant
Numerical Research on Distributed Genetic Algorithm with Redundant Binary Number 1 Sayori Seto, 2 Akinori Kanasugi 1,2 Graduate School of Engineering, Tokyo Denki University, Japan [email protected],
14.10.2014. Overview. Swarms in nature. Fish, birds, ants, termites, Introduction to swarm intelligence principles Particle Swarm Optimization (PSO)
Overview Kyrre Glette kyrrehg@ifi INF3490 Swarm Intelligence Particle Swarm Optimization Introduction to swarm intelligence principles Particle Swarm Optimization (PSO) 3 Swarms in nature Fish, birds,
Social Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
Machine Learning Final Project Spam Email Filtering
Machine Learning Final Project Spam Email Filtering March 2013 Shahar Yifrah Guy Lev Table of Content 1. OVERVIEW... 3 2. DATASET... 3 2.1 SOURCE... 3 2.2 CREATION OF TRAINING AND TEST SETS... 4 2.3 FEATURE
