Spam Filter: VSM based Intelligent Fuzzy Decision Maker

Save this PDF as:
 WORD  PNG  TXT  JPG

Size: px
Start display at page:

Download "Spam Filter: VSM based Intelligent Fuzzy Decision Maker"

Transcription

1 IJCST Vo l. 1, Is s u e 1, Se p te m b e r 2010 ISSN : (Online Spam Filter: VSM based Intelligent Fuzzy Decision Maker Dr. Sonia YMCA University of Science and Technology, Faridabad, India Abstract : Over the internet, becomes a most popular means for communication. The received unsolicited and undesired s are called spam and junk mails, which arises day by day. To filter the spam from the legitimate s, classification approach using context based techniques are proposed. In this talk, an efficient and effective classification approach to detect the spam from the legitimate will be discussed. In the first phase, A Vector Space Model (VSM based classification method is developed accordingly. In which the input mail is converted into matrix and on the basis of term frequency. Then similarity coefficient has been computed. In the second phase, the intelligent fuzzy decision maker has been developed that categorized the for user decision. The real legitimate and real spam can be filtered by using the fuzzy decision maker. Keywords : Spam, Spam Filter, Vector Space Model, Fuzzy Decision Maker. I. Introduction The junk s are received daily in inbox and they are really a headache to most of the people. The precious time of the employee is wasted to browsing through the spam s. Spam filtering is done to filter those junk s from your inbox so that you save a lot of time [15,18,19]. With spam filtering your receive only the genuine s that are intended for your reading spam is used here to mean receiving unsolicited electronic mail, usually advertising some product, service, business, scheme, website, etc. spamming via other means is a different problem which we do not tackle here [4,6,8,15]. The spam problem is complex system and should be dealt with developing strategies to holistically interact with it. Such Strategies must embrace both technical and legal realities simultaneously in order to be successful [3,9]. A spam filter is a program that is used to detect unsolicited and unwanted and prevent those messages from getting to a user s inbox. Like other types of filtering programs, a spam filter looks for certain criteria on which it bases judgments [1]. For example, the simplest and earliest versions (such as the one available with Microsoft s Hotmail can be set to watch for particular words in the subject line of messages and to exclude these from the user s inbox [7,10,11,13]. This method is not especially effective, too often omitting perfectly legitimate messages. Spam Filtering somewhat similar to Information Retrieval (IR where the spam are retrieved from the mail box of user [4,17]. One of the information retrieval methods have been used to filter the spam is Vector Space Model (VSM.The vector-space models for information retrieval are just one subclass of retrieval techniques that have been studied in recent years. To efficiently satisfy the user query requirements in information retrieval, a query optimization algorithm for spam retrieval is proposed. In this thesis we propose Vector Space Model (VSM Spam Filtering technique using text based vector space model. This method constructs the spam detection model by contents of various kind of mail and finds spam more efficiently. The algorithm has been proposed that can be applied on bulk of the messages to detect the spam mails. The architecture of VSM Spam Filter is also proposed. Moreover an efficient classification engine is defined to explain the filtering process. The vector space model for information retrieval is just one subclass of retrieval techniques. The taxonomy provided in labels the class of techniques that resemble vector-space models formal, feature-based, individual, partial match retrieval techniques since they typically rely on an underlying, formal mathematical model for retrieval model, the mail documents as sets of terms that can be individually weighted and manipulated, perform queries by comparing the representation of the query to the representation of each mail document in the space, and can retrieve spam mail documents that don t necessarily contain one of the search terms [35,36]. In the first phase of this paper Vector Space Model (VSM method on the basis of similarity has been defined. In which the input mail is converted into matrix and on the basis of term frequency the similarity coefficient has been computed. In the second phase, the intelligent fuzzy decision maker has been developed that categorized the for user decision. The real legitimate and real spam can be filtered by using the fuzzy decision maker. II. Literature Survey Spam mails vary significantly in content and they roughly belong to the following categories: money making scams, fat loss, improve business, sexually explicit, make friends, service provider advertisement, etc., [5,14]. Among the proposed methods, much interest has focused on the machine learning techniques in spam filtering. They include rule learning [23,34,38], Naive Bayes [24,37], decision trees [39], support vector machines [2,21,28] or combinations of different learners [12,22]. The common concept of these approaches is that they do not require specifying any rules explicitly to filter out spam mails. Instead, a set of training samples (preclassified s is needed. Sahami et al. [30,37] employed Bayesian classification technique to filter junk s. By making use of the extensible framework of Bayesian modeling, they can not only employ traditional document classification techniques based on the text of , but they can also easily incorporate domain knowledge to aim at filtering spam s. Androutsopoulos et al. [23 25] presented a series of papers that extended the Naı ve Bayes (NB filter proposed by Sahami et al. [37], by investigating the effect of different number of features and training-set sizes on the filter s performance. Drucker et al. [21] used support vector machine (SVM for classifying s according to their contents and compared its performance with Ripper, Rocchio, and boosting decision trees. Sakkis et.al. [20] proposed a memory based approach to spam filtering for mailing list. Zhang and Yao [32] presented a maximum entropy based approach to junk mail filtering. They showed that comparing two Naïve Bayes, the maximum entropy method reduced comparable or better results and domainspecific features provided by Spam Assassin [16].The fuzzy process can be include in order to detect the spam in which fuzzification, membership function and inference rule includes [27,33]. 48 International Journal of Computer Science and Technology

2 ISSN : (Online III. Vector Space Model Filter Technique The vector space model procedure can be divided in to three stages. The first stage is the document indexing where content bearing terms are extracted from the document text [26]. The second stage is the weighting of the indexed terms to enhance retrieval of document relevant to the user. The last stage ranks the document with respect to the query according to a similarity measure. Collection IJCST Vo l. 1, Is s u e 1, Se p te m b e r 2010 the spam mails. A. VSM based Spam Filter Algorithm wi 1 wi4 w1 j w i j Split into matrix. A collection of n message can be represented in the vector space model by a term-message matrix. An entry in the matrix corresponds to the weight of a term in the message; zero means the term has no significance in the message or it simply doesn t exist in the message. Term Weights: Term Frequency More frequent terms in a message are more important, i.e. more indicative of the topic. f ij = frequency of term i in document j May want to normalize term frequency (tf by dividing by the frequency of the most common term in the message: tf ij = f ij / maxi{f ij } Term Weights: Inverse Message Frequency Terms that appear in many different messages are less indicative of overall topic. m fi = message frequency of term i tf = number of messages containing term i imfi = inverse message frequency of term i, im fi = log2 (N/ m fi (N: total number of mail documents An indication of a term s discrimination power. Log used to dampen the effect relative to tf. TF-IDF Weighting A typical combined term importance indicator is tf-imf weighting: w ij = t fi im fi = tf ij log2 (N / m fi A term occurring frequently in the mail document but rarely in the rest of the collection is given high weight Many other ways of determining term weights have been proposed. Experimentally, tf-imf has been found to work well. Similarity Measure The Similarity Coefficient calculation can be measure by using the formula SC(Q,M = imf tf ij IV. Proposed Work To efficiently satisfy the user query requirements in information retrieval, a query optimization algorithm for spam retrieval is proposed. Vector Space Model (VSM Spam Filtering technique is using text based vector space model. This method constructs the spam detection model by contents of various kind of mail and finds spam more efficiently. The algorithm has been proposed that can be applied on bulk of the messages to detect See which one exist in the inverted index and calculate the similarity coefficient. o Need number of messages for spam detection o Need term frequency of each word in a message TF(m,t = 0 if n(m,t=0 = 1+log(1+(n(m,t otherwise o Calculate inverse message frequency M 1 + M t IMF = log where M is message collection and M t is the set of message containing t term. Sort it alphabetically o Combine TF and IMF into complete vector space model. The coordinate of message m in axis t is given by mt = TF(m,t. IMF(t o Calculation of Similarity Coefficient SC(Q,M = j= 1 t w m q i j where M( mi m it collection of messages with t term and Q(wq1,wq wqm terms found on the query o Parse each weight into fuzzy decision maker Using these weights make the rule sets for intelligent fuzzy decision o Result on the basis of decision maker Choice of user for proper action Do it for n number Certain optimizations have been implemented in addition to this algorithm. They are as follows: o Using Rocchio s approach used vector space model to find the more relevance message spam document and used as International Journal of Computer Science and Technology 49

3 IJCST Vo l. 1, Is s u e 1, Se p te m b e r 2010 relevant feedback. o Using a stop word list of most common word hence speeding up the process of classification and learning as used in Goggle search engine[3] B. VSM based Spam Filter Example Consider a case of query and message collection consisting of three mail Q: Free ticket click M1: Guaranty of free shirt in a mall M2: Delivery of ticket movie in a ticket click M3: Guaranty of free movie in a click In the collection there are three mail documents n=3. If the term appears in only one of the three mail document, its imf is log(n/mfj=log(3/1= If the term appears in two of the three mail document, its imf is log(2/1=0.176 and it appears in all the three documents it has an imf= log(3/3=0. The imf for the terms in the three mail documents is given below 1 imf a = 0 2 imf click 3 imf delievry = imf free 5 imf guaranty 6 imf in = 0 7 imf mall = imf movie 9 imf of = 0 10 imf shirt = imf ticket Mail document vectors now be constructed. Since eleven terms appear in the mail document collection, an eleven-dimensional mail document vector is constructed. The alphabetical ordering given above is used to construct the mail document vector. The weight for the term i in the vector j is computed as the imf tf ij. Mail document vectors now be constructed. Since eleven terms appear in the mail document collection, an eleven-dimensional mail document vector is constructed. The alphabetical ordering given above is used to construct the mail document vector. The weight for the term i in the vector j is computed as the imf tf ij. The mail document vectors are SC(Q,M1 = (0(0 + (0( (0(0 + (0.176( (0 ( (0(0 + (0(0 + (0(0.477 = ( = Similarly, SC(Q,M2 = (0.352( ( = SC(Q,M3 = ( ( = Hence the mail document M2 having weight more than M1 and M3. On the basis of similarity coefficient the fuzzy decision maker differentiate the real legitimate. Table 1. Terms Appear in the Collection C. The Architecture VSM based Spam Filter The architecture of the VSM Spam Filter is defined in Fig 1. Fig. 1. The Deployment Diagram of VSM Spam Filter Bulk of s are received Pass from the pop server Go into the VSM Spam Filter The result of the spam filter is sent to the intelligent decision maker User can choose the option Fig. 2. VSM Spam Filter Architecture 1. VSM Spam Filter is shown in Fig. 2. a. Letter is received. b. It is used as plain text including body and header. c. The letter is represented in to matrix d. Term frequency calculated e. The inverse message frequency is calculated. f. The similarity coefficient (SC calculation is done using the term frequency and inverse message frequency.. Fig. 3. Intelligent Decision Maker Architecture ISSN : (Online 2. The Intelligent Fuzzy Decision Maker is described in Fig. 3. a. The result of SC calculation used as input. b. The fuzzification is done. c. The is categorized. 50 International Journal of Computer Science and Technology

4 ISSN : (Online d. The rule set is constructed to take final decision about the . e. Final out put for user choice. D. The VSM based Filter Classification Engine The classification engine used to filter the spam by using the vector space model is defined in Fig. 4.The various part of classification engine is explained in detail. s Data Source: The data source represents the input. The source should be a source of text such as an article, or an for that matter Matrix Conversion: The matrix conversion tokenizes the text, into words, passes those words onto the inverse document calculation engine to further process. Inverse Message Frequency Calculation: The inverse term frequency calculation to be done by the following equation: imf = log(m/m fj Where m is the total number of mail documents and m fj number of documents which contain term or word( tj Sorting of Inverse Message Frequency: Message vectors can now be constructed and all the terms appear in the mail document collection. The alphabetical order sorting is done to construct the mail document vector. Similarity Coefficient (weight Calculation: Calculation of the weighting factor (m for a term in the document is defined as a combination of term frequency (tf, and inverse mail document frequency ( imf. To compute the value of the j th entry in the vector corresponding to mail document i, the following equation is used: m ij = tf imf A simple similarity coefficient (SC between the query Q and a mail Mi is define by the product of the two vectors. Since a query vector is similar in length to the mail vector, this same measure is often used to compute the similarity between the two documents. IJCST Vo l. 1, Is s u e 1, Se p te m b e r 2010 assigned a weight, it is time to classify. The deployment of fuzzy decision maker is shown in Fig. 5. Fuzzy decision maker works on the soft computing and by taking decision using fuzzy rule set is very real. The fuzzy logic provides a convenient way of converting existing results into fuzzy logic rules and is categorized in the form of legitimate, unlike, like and spam. The user can take decision corresponds to these s in the form of no check, essential check, rigorous check and discard respectively. The input values related to each message are translated into linguistic concept. As in the previous work, s are defined into two-form spam and legitimate. In this work the s are categorized on the basis of weight. The fuzzy decision maker categorized into four different forms like legitimate, unlike, like and spam according to their weight. The legitimate is defined in the range of 0 to 0.4, unlike spam having the range from 0.2 to 0.7, while like spam is in the 0.5 to 0.7 and the definitely spam is defined 0.8 onwards. On the basis of these observation, the user can select no check, essential check, rigorous check and discard respectively. Fig. 5. Deployment of Fuzzy Decision Maker Fig. 4. VSM Classification Engine wqj 1 SC(Q,Mi= t j= 1 m i Where a vector M(mi 1,mi 2..,m it of size t for each mail. The vectors are filled with the term weights. Similarly, a vector Q(wq 1, wq 2,.,wq t is constructed for terms found in the query. Intelligent Fuzzy Decision Maker: Once all token have been V. Concluding Remarks As the becomes a popular means for communication over the internet, the problem receiving unsolicited and undesired s, called spam or junk. To filter spam from s, automatic classification approaches using text mining technique has been proposed. Vector space model is popular retrieval model and used in the detection of spam. The main advantages of vector space spam filter are, its term weighting scheme improves filtration performance, its partial matching strategies allows retrieval of messages that approximate the query conditions, its weight scheme sorts the messages according to their degree of similarity of the query. Vector space model is fairly cheap to compute and yields decent effectiveness. Moreover it is very popular and smart enough so it is most commonly used. We proposed Vector Space Model (VSM Spam Filtering technique using text based vector space model. This method constructs the spam detection model by contents of various kind of mail and finds spam more efficiently. This chapter deals with the calculation of term weight calculation with an example. The algorithm has been proposed that can be applied on bulk of the messages to detect the spam mails. The architecture of VSM Spam Filter is also proposed. Moreover an efficient classification engine is defined to explain the filtering process. International Journal of Computer Science and Technology 51

5 IJCST Vo l. 1, Is s u e 1, Se p te m b e r 2010 The main purpose of considering the vector space model in the detection of spam is that VSM spam filter, it is assumed that terms are independent. The measure is important as it is used by a retrieval system to identify which mail documents are displayed to the user as spam and not spam. More over vector space spam filtering model can be applied to all featured i.e. header, subject and body. This will give the better performance than any other technique. The is categories into four parts like legitimate, unlikely, likely and spam. The legitimate is useful mail, which the user will not check it for spam. The unlikely mail are those which needs essential check, not for spam but in case of likely, the rigorous check has been done to take decision that this mail may be spam or not. Finally when the weight is very high the discard the mail indicates that the mail is spam. For this a intelligent fuzzy decision maker is created and achieve higher accuracy by using both the model together. First crisp value is converted into fuzzy by fuzzification. The fuzzification gives the possibility to define e mail in the form of legitimate, unlikely, likely and spam. This fuzzy value is used as input to get the final result for the user choice by using the decision maker. With this method not only the s are categorized but the characteristics of the output are used for user choice. This method is used to differentiate the real legitimate and real spam by combining the logic model with vector space model through use of decision maker, which is easily usable and give higher accuracy. VI. Acknowledgement This work was supported by Dr. A. K. Sharma and Dr. C.K. Nagpal. The author is very much thankful to Mr. Niranjan Kumar for his support. The author would like to thank anonymous referees for their valuable comments and suggestions. References [1]. Graham P., A Plan for Spam, [Online] Available: [2]. Kołcz A., Alspector J., SVM-based filtering of spam with content-specific misclassification costs, Proc. of TextDM 01 Workshop on Text Mining, [3]. Berger A., Pietra S.D., and Pietra VD. A maximum entropy approach to Natural Language. Computational Linguistics,Vol. 22(2, pp.39-71(1996. [4]. Dudani A. The distance-weighted k-nearest neighbor rule. IEEE Transactions on Systems, Man and Cybernetics, Vol. 6(4, pp (1976. [5]. Schwatz A.and S. Garfinbel, Stopping Spam Stamping out Unwanted s and news posting, publisher, O Reilly, (1998. [6]. Paul N.C. and Monitor C.S., New strategies aimed at blocking spam , [online] Available: technology/ story/655215p c.html. [7]. Mueller S.H., Ant-Spam Abuse site, [Online] Available: [8]. McGlynnaria T. [online] Available: arizona.edu/courses/ tutorials/class/html/class.html. [9]. Clement T., CAUCE, How do you define Spam?, [Online] Available: [10]. Williams C., D. Ferris, The cost of spam false positives, private communication, August [11]. Wolpert D.. Stacked Generalization. Neural Networks, Vol. 5 (2, pp (1992. ISSN : (Online [12]. Espiner T., Demand for Anti-Spam Products to increase. UK. Zdnet, June [13]..net basic describer of the way Spam is handled by this organization- [14]. Smadja F., Tumblin H., Automatic spam detection as a text classification task, in: Proc. of Workshop on Operational Text Classification Systems, [15]. Falk J.D., Fight Spam on Internet What is Spam. [Online] Available: whatsspam.shtml. [16]. Fuzzy Logic Toolbox User s Guide, The Math Works Inc. (2004. [17]. lir G.J. and Yuan B. Fuzzy Sets and Fuzzy Logic, Prentice Hall P T R, Upper Saddle River, New Jersey, USA (1995 [18]. Sakkis G., Androutsopoulos I., Paliouras G., Karkaletsis V., Spyropoulous C., and Stamatopoulos P. Stacking Classifiers for Anti-Spam Filtering of . Proceedings of the 6th Conference on Empirical Methods in Natural Language Processing, (2001, pp [19]. Iooannidis J.. Fighting Spam by Encapsulating Policy in Addressed, 10th Network and Distributed System Security Symposium (2003 [20]. Sakkis G., Androutsopoulos I., Paliouras G., Karkaletsis V., Spyropoulous C., and Stamatopoulos C.. A memorybased approach to Anti-Spam Filtering for mailing lists. Information Retrieval, 6th, pp.49-73(2003. [21]. Drucker H, Wu D., Vapnik V.N., Support vector machines for spam categorization, IEEE Trans. Neural Netw. Vol. 10 (No. 5, pp (1999. [22]. Tony Clement, shtml. [23]. Androutsopoulos I., Paliouras G., Karkaletsis V., Sakkis G., Spyropoulos C.D., Stamatopoulos P., Learning to filter spam a comparison of a Naı ve Bayesian and a memory-based approach, in: Proc. of the workshop: Machine Learning and Textual Information Access,, 1 13(2000. [24]. Androutsopoulos I., Koutsias J., Chandrinos K., Paliouras G., Spyropoulos C.D., An evaluation of naive bayesian anti-spam filtering, in: Proc. of the Workshop on Machine Learning in the New Information Age: 11th European Conference on Machine Learning,, pp. 9 17(2000. [25]. Androutsopoulos I., Koutsias J., Chandrinos K., Paliouras G., Spyropoulos C.D., An experimental comparison of naive bayesian and keyword-based anti-spam filtering with encrypted personal messages, in: Proc. of the 23rd Annual International ACM SIGIR Conf. on Research and Development in Information Retrieval, pp (2000. [26]. Information Retrieval by Mitra S. [27]. Baldwin J. F., Fuzzy logic and fuzzy reasoning, in Fuzzy Reasoning and Its Applications, Mamdani E.H. and Gaines B.R. (eds., London: Academic Press (1981. [28]. Provost J., Naïve-Bayes vs. rule-learning in classification of , Technical-report, University of Texas at Austin, [29]. Carpinter J., Hunt R., Tighting the Net: A Review of Current and Next Generation Spam Filtering Tools, Computer & Security Vol. 25 ( [30]. Praed J., Latest trends in the Legal Fighter Against Spammers, Spam Conference [31]. Kojima H., Chung C., Westen C., Strategy on the landslide 52 International Journal of Computer Science and Technology

6 ISSN : (Online IJCST Vo l. 1, Is s u e 1, Se p te m b e r 2010 type analysis based on the expert knowledge and the quantitative prediction model, ISPRS, Amsterdam [32]. Zhang L. and T-Yao. Filtering Junk Mail with a maximum Entropy Model. In Proceeding of the 20th International Conferences on Computer Processing of Oriental Languages, (2003. [33]. Zadeh L.A., Fuzzy algorithms, Info. & Ctl., Vol. 12,, pp ,(1968. [34]. Zadeh L.A., Making computers think like people, IEEE. Spectrum, 8, pp.26-32, (1984. [35]. Zadeh L.A., Fuzzy Sets, Information and Control, [36]. Zadeh L.A., Outline of A New Approach to the Analysis of Complex Systems and Decision Processes, [37]. Sahami M., Dumais S., Heckerman D., Horvitz E., A Bayesian approach to filtering junk , in: Learning for Text Categorization Papers from the AAAI Workshop, pp (1998. [38]. Cohen W.W., Learning rules that classify , in: Proc. of AAAI Spring Symposium on Machine Learning in Information Access, pp (1996. [39]. Carreras X., L. Ma rquez, Boosting trees for anti-spam filtering, in: Proc. of fourth Int l Conf. on Recent Advances in Natural Language Processing, pp (2001. Dr. Sonia is presently working as an Assistant Professor in YMCA University of Science and Technology, Faridabad, Haryana, India. She has done her M.Sc. and PhD from Jamia Millia Islamia, New Delhi, India. She has also completed her M.C.A. and M.Tech. (Computer Engineering degrees from MDU, Rohtak, Haryana, India. Her research interest includes Web Mining, Data Mining and Spam Filtering. International Journal of Computer Science and Technology 53

IMPROVING SPAM EMAIL FILTERING EFFICIENCY USING BAYESIAN BACKWARD APPROACH PROJECT

IMPROVING SPAM EMAIL FILTERING EFFICIENCY USING BAYESIAN BACKWARD APPROACH PROJECT IMPROVING SPAM EMAIL FILTERING EFFICIENCY USING BAYESIAN BACKWARD APPROACH PROJECT M.SHESHIKALA Assistant Professor, SREC Engineering College,Warangal Email: marthakala08@gmail.com, Abstract- Unethical

More information

A Two-Pass Statistical Approach for Automatic Personalized Spam Filtering

A Two-Pass Statistical Approach for Automatic Personalized Spam Filtering A Two-Pass Statistical Approach for Automatic Personalized Spam Filtering Khurum Nazir Junejo, Mirza Muhammad Yousaf, and Asim Karim Dept. of Computer Science, Lahore University of Management Sciences

More information

A Personalized Spam Filtering Approach Utilizing Two Separately Trained Filters

A Personalized Spam Filtering Approach Utilizing Two Separately Trained Filters 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology A Personalized Spam Filtering Approach Utilizing Two Separately Trained Filters Wei-Lun Teng, Wei-Chung Teng

More information

Sender and Receiver Addresses as Cues for Anti-Spam Filtering Chih-Chien Wang

Sender and Receiver Addresses as Cues for Anti-Spam Filtering Chih-Chien Wang Sender and Receiver Addresses as Cues for Anti-Spam Filtering Chih-Chien Wang Graduate Institute of Information Management National Taipei University 69, Sec. 2, JianGuo N. Rd., Taipei City 104-33, Taiwan

More information

A Proposed Algorithm for Spam Filtering Emails by Hash Table Approach

A Proposed Algorithm for Spam Filtering Emails by Hash Table Approach International Research Journal of Applied and Basic Sciences 2013 Available online at www.irjabs.com ISSN 2251-838X / Vol, 4 (9): 2436-2441 Science Explorer Publications A Proposed Algorithm for Spam Filtering

More information

Bayesian Spam Filtering

Bayesian Spam Filtering Bayesian Spam Filtering Ahmed Obied Department of Computer Science University of Calgary amaobied@ucalgary.ca http://www.cpsc.ucalgary.ca/~amaobied Abstract. With the enormous amount of spam messages propagating

More information

CAS-ICT at TREC 2005 SPAM Track: Using Non-Textual Information to Improve Spam Filtering Performance

CAS-ICT at TREC 2005 SPAM Track: Using Non-Textual Information to Improve Spam Filtering Performance CAS-ICT at TREC 2005 SPAM Track: Using Non-Textual Information to Improve Spam Filtering Performance Shen Wang, Bin Wang and Hao Lang, Xueqi Cheng Institute of Computing Technology, Chinese Academy of

More information

Detecting E-mail Spam Using Spam Word Associations

Detecting E-mail Spam Using Spam Word Associations Detecting E-mail Spam Using Spam Word Associations N.S. Kumar 1, D.P. Rana 2, R.G.Mehta 3 Sardar Vallabhbhai National Institute of Technology, Surat, India 1 p10co977@coed.svnit.ac.in 2 dpr@coed.svnit.ac.in

More information

Filtering Junk Mail with A Maximum Entropy Model

Filtering Junk Mail with A Maximum Entropy Model Filtering Junk Mail with A Maximum Entropy Model ZHANG Le and YAO Tian-shun Institute of Computer Software & Theory. School of Information Science & Engineering, Northeastern University Shenyang, 110004

More information

Developing Methods and Heuristics with Low Time Complexities for Filtering Spam Messages

Developing Methods and Heuristics with Low Time Complexities for Filtering Spam Messages Developing Methods and Heuristics with Low Time Complexities for Filtering Spam Messages Tunga Güngör and Ali Çıltık Boğaziçi University, Computer Engineering Department, Bebek, 34342 İstanbul, Turkey

More information

1 Introductory Comments. 2 Bayesian Probability

1 Introductory Comments. 2 Bayesian Probability Introductory Comments First, I would like to point out that I got this material from two sources: The first was a page from Paul Graham s website at www.paulgraham.com/ffb.html, and the second was a paper

More information

PSSF: A Novel Statistical Approach for Personalized Service-side Spam Filtering

PSSF: A Novel Statistical Approach for Personalized Service-side Spam Filtering 2007 IEEE/WIC/ACM International Conference on Web Intelligence PSSF: A Novel Statistical Approach for Personalized Service-side Spam Filtering Khurum Nazir Juneo Dept. of Computer Science Lahore University

More information

Impact of Feature Selection Technique on Email Classification

Impact of Feature Selection Technique on Email Classification Impact of Feature Selection Technique on Email Classification Aakanksha Sharaff, Naresh Kumar Nagwani, and Kunal Swami Abstract Being one of the most powerful and fastest way of communication, the popularity

More information

Representation of Electronic Mail Filtering Profiles: A User Study

Representation of Electronic Mail Filtering Profiles: A User Study Representation of Electronic Mail Filtering Profiles: A User Study Michael J. Pazzani Department of Information and Computer Science University of California, Irvine Irvine, CA 92697 +1 949 824 5888 pazzani@ics.uci.edu

More information

Single-Class Learning for Spam Filtering: An Ensemble Approach

Single-Class Learning for Spam Filtering: An Ensemble Approach Single-Class Learning for Spam Filtering: An Ensemble Approach Tsang-Hsiang Cheng Department of Business Administration Southern Taiwan University of Technology Tainan, Taiwan, R.O.C. Chih-Ping Wei Institute

More information

Search Taxonomy. Web Search. Search Engine Optimization. Information Retrieval

Search Taxonomy. Web Search. Search Engine Optimization. Information Retrieval Information Retrieval INFO 4300 / CS 4300! Retrieval models Older models» Boolean retrieval» Vector Space model Probabilistic Models» BM25» Language models Web search» Learning to Rank Search Taxonomy!

More information

Three-Way Decisions Solution to Filter Spam Email: An Empirical Study

Three-Way Decisions Solution to Filter Spam Email: An Empirical Study Three-Way Decisions Solution to Filter Spam Email: An Empirical Study Xiuyi Jia 1,4, Kan Zheng 2,WeiweiLi 3, Tingting Liu 2, and Lin Shang 4 1 School of Computer Science and Technology, Nanjing University

More information

Email Classification Using Data Reduction Method

Email Classification Using Data Reduction Method Email Classification Using Data Reduction Method Rafiqul Islam and Yang Xiang, member IEEE School of Information Technology Deakin University, Burwood 3125, Victoria, Australia Abstract Classifying user

More information

INTERNATIONAL JOURNAL OF ADVANCES IN COMPUTING AND INFORMATION TECHNOLOGY An International online open access peer reviewed journal

INTERNATIONAL JOURNAL OF ADVANCES IN COMPUTING AND INFORMATION TECHNOLOGY An International online open access peer reviewed journal INTERNATIONAL JOURNAL OF ADVANCES IN COMPUTING AND INFORMATION TECHNOLOGY An International online open access peer reviewed journal Research Article ISSN 2277 9140 ABSTRACT Web page categorization based

More information

WE DEFINE spam as an e-mail message that is unwanted basically

WE DEFINE spam as an e-mail message that is unwanted basically 1048 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 10, NO. 5, SEPTEMBER 1999 Support Vector Machines for Spam Categorization Harris Drucker, Senior Member, IEEE, Donghui Wu, Student Member, IEEE, and Vladimir

More information

Simple Language Models for Spam Detection

Simple Language Models for Spam Detection Simple Language Models for Spam Detection Egidio Terra Faculty of Informatics PUC/RS - Brazil Abstract For this year s Spam track we used classifiers based on language models. These models are used to

More information

An Approach to Detect Spam Emails by Using Majority Voting

An Approach to Detect Spam Emails by Using Majority Voting An Approach to Detect Spam Emails by Using Majority Voting Roohi Hussain Department of Computer Engineering, National University of Science and Technology, H-12 Islamabad, Pakistan Usman Qamar Faculty,

More information

A MACHINE LEARNING APPROACH TO SERVER-SIDE ANTI-SPAM E-MAIL FILTERING 1 2

A MACHINE LEARNING APPROACH TO SERVER-SIDE ANTI-SPAM E-MAIL FILTERING 1 2 UDC 004.75 A MACHINE LEARNING APPROACH TO SERVER-SIDE ANTI-SPAM E-MAIL FILTERING 1 2 I. Mashechkin, M. Petrovskiy, A. Rozinkin, S. Gerasimov Computer Science Department, Lomonosov Moscow State University,

More information

Non-Parametric Spam Filtering based on knn and LSA

Non-Parametric Spam Filtering based on knn and LSA Non-Parametric Spam Filtering based on knn and LSA Preslav Ivanov Nakov Panayot Markov Dobrikov Abstract. The paper proposes a non-parametric approach to filtering of unsolicited commercial e-mail messages,

More information

Adaptive Filtering of SPAM

Adaptive Filtering of SPAM Adaptive Filtering of SPAM L. Pelletier, J. Almhana, V. Choulakian GRETI, University of Moncton Moncton, N.B.,Canada E1A 3E9 {elp6880, almhanaj, choulav}@umoncton.ca Abstract In this paper, we present

More information

T-61.3050 : Email Classification as Spam or Ham using Naive Bayes Classifier. Santosh Tirunagari : 245577

T-61.3050 : Email Classification as Spam or Ham using Naive Bayes Classifier. Santosh Tirunagari : 245577 T-61.3050 : Email Classification as Spam or Ham using Naive Bayes Classifier Santosh Tirunagari : 245577 January 20, 2011 Abstract This term project gives a solution how to classify an email as spam or

More information

Fuzzy Logic for E-Mail Spam Deduction

Fuzzy Logic for E-Mail Spam Deduction Fuzzy Logic for E-Mail Spam Deduction P.SUDHAKAR 1, G.POONKUZHALI 2, K.THIAGARAJAN 3,R.KRIPA KESHAV 4, K.SARUKESI 5 1 Vernalis systems Pvt Ltd, Chennai- 600116 2,4 Department of Computer Science and Engineering,

More information

ECUE: A Spam Filter that Uses Machine Learning to Track Concept Drift

ECUE: A Spam Filter that Uses Machine Learning to Track Concept Drift ECUE: A Spam Filter that Uses Machine Learning to Track Concept Drift Sarah Jane Delany 1 and Pádraig Cunningham 2 and Barry Smyth 3 Abstract. While text classification has been identified for some time

More information

Three types of messages: A, B, C. Assume A is the oldest type, and C is the most recent type.

Three types of messages: A, B, C. Assume A is the oldest type, and C is the most recent type. Chronological Sampling for Email Filtering Ching-Lung Fu 2, Daniel Silver 1, and James Blustein 2 1 Acadia University, Wolfville, Nova Scotia, Canada 2 Dalhousie University, Halifax, Nova Scotia, Canada

More information

Automatic Web Page Classification

Automatic Web Page Classification Automatic Web Page Classification Yasser Ganjisaffar 84802416 yganjisa@uci.edu 1 Introduction To facilitate user browsing of Web, some websites such as Yahoo! (http://dir.yahoo.com) and Open Directory

More information

ECUE: A Spam Filter that Uses Machine Learning to Track Concept Drift

ECUE: A Spam Filter that Uses Machine Learning to Track Concept Drift ECUE: A Spam Filter that Uses Machine Learning to Track Concept Drift Sarah Jane Delany 1 and Pádraig Cunningham 2 Abstract. While text classification has been identified for some time as a promising application

More information

Abstract. Find out if your mortgage rate is too high, NOW. Free Search

Abstract. Find out if your mortgage rate is too high, NOW. Free Search Statistics and The War on Spam David Madigan Rutgers University Abstract Text categorization algorithms assign texts to predefined categories. The study of such algorithms has a rich history dating back

More information

FRACTAL RECOGNITION AND PATTERN CLASSIFIER BASED SPAM FILTERING IN EMAIL SERVICE

FRACTAL RECOGNITION AND PATTERN CLASSIFIER BASED SPAM FILTERING IN EMAIL SERVICE FRACTAL RECOGNITION AND PATTERN CLASSIFIER BASED SPAM FILTERING IN EMAIL SERVICE Ms. S.Revathi 1, Mr. T. Prabahar Godwin James 2 1 Post Graduate Student, Department of Computer Applications, Sri Sairam

More information

A Case-Based Approach to Spam Filtering that Can Track Concept Drift

A Case-Based Approach to Spam Filtering that Can Track Concept Drift A Case-Based Approach to Spam Filtering that Can Track Concept Drift Pádraig Cunningham 1, Niamh Nowlan 1, Sarah Jane Delany 2, Mads Haahr 1 1 Department of Computer Science, Trinity College Dublin 2 School

More information

Spam detection with data mining method:

Spam detection with data mining method: Spam detection with data mining method: Ensemble learning with multiple SVM based classifiers to optimize generalization ability of email spam classification Keywords: ensemble learning, SVM classifier,

More information

Shafzon@yahool.com. Keywords - Algorithm, Artificial immune system, E-mail Classification, Non-Spam, Spam

Shafzon@yahool.com. Keywords - Algorithm, Artificial immune system, E-mail Classification, Non-Spam, Spam An Improved AIS Based E-mail Classification Technique for Spam Detection Ismaila Idris Dept of Cyber Security Science, Fed. Uni. Of Tech. Minna, Niger State Idris.ismaila95@gmail.com Abdulhamid Shafi i

More information

Journal of Information Technology Impact

Journal of Information Technology Impact Journal of Information Technology Impact Vol. 8, No., pp. -0, 2008 Probability Modeling for Improving Spam Filtering Parameters S. C. Chiemeke University of Benin Nigeria O. B. Longe 2 University of Ibadan

More information

Spam Detection System Combining Cellular Automata and Naive Bayes Classifier

Spam Detection System Combining Cellular Automata and Naive Bayes Classifier Spam Detection System Combining Cellular Automata and Naive Bayes Classifier F. Barigou*, N. Barigou**, B. Atmani*** Computer Science Department, Faculty of Sciences, University of Oran BP 1524, El M Naouer,

More information

Combining SVM classifiers for email anti-spam filtering

Combining SVM classifiers for email anti-spam filtering Combining SVM classifiers for email anti-spam filtering Ángela Blanco Manuel Martín-Merino Abstract Spam, also known as Unsolicited Commercial Email (UCE) is becoming a nightmare for Internet users and

More information

6367(Print), ISSN 0976 6375(Online) & TECHNOLOGY Volume 4, Issue 1, (IJCET) January- February (2013), IAEME

6367(Print), ISSN 0976 6375(Online) & TECHNOLOGY Volume 4, Issue 1, (IJCET) January- February (2013), IAEME INTERNATIONAL International Journal of Computer JOURNAL Engineering OF COMPUTER and Technology ENGINEERING (IJCET), ISSN 0976-6367(Print), ISSN 0976 6375(Online) & TECHNOLOGY Volume 4, Issue 1, (IJCET)

More information

SURVEY PAPER ON INTELLIGENT SYSTEM FOR TEXT AND IMAGE SPAM FILTERING Amol H. Malge 1, Dr. S. M. Chaware 2

SURVEY PAPER ON INTELLIGENT SYSTEM FOR TEXT AND IMAGE SPAM FILTERING Amol H. Malge 1, Dr. S. M. Chaware 2 International Journal of Computer Engineering and Applications, Volume IX, Issue I, January 15 SURVEY PAPER ON INTELLIGENT SYSTEM FOR TEXT AND IMAGE SPAM FILTERING Amol H. Malge 1, Dr. S. M. Chaware 2

More information

Image Spam Filtering Using Visual Information

Image Spam Filtering Using Visual Information Image Spam Filtering Using Visual Information Battista Biggio, Giorgio Fumera, Ignazio Pillai, Fabio Roli, Dept. of Electrical and Electronic Eng., Univ. of Cagliari Piazza d Armi, 09123 Cagliari, Italy

More information

Effectiveness and Limitations of Statistical Spam Filters

Effectiveness and Limitations of Statistical Spam Filters Effectiveness and Limitations of Statistical Spam Filters M. Tariq Banday, Lifetime Member, CSI P.G. Department of Electronics and Instrumentation Technology University of Kashmir, Srinagar, India Abstract

More information

An Imbalanced Spam Mail Filtering Method

An Imbalanced Spam Mail Filtering Method , pp. 119-126 http://dx.doi.org/10.14257/ijmue.2015.10.3.12 An Imbalanced Spam Mail Filtering Method Zhiqiang Ma, Rui Yan, Donghong Yuan and Limin Liu (College of Information Engineering, Inner Mongolia

More information

AN EFFECTIVE SPAM FILTERING FOR DYNAMIC MAIL MANAGEMENT SYSTEM

AN EFFECTIVE SPAM FILTERING FOR DYNAMIC MAIL MANAGEMENT SYSTEM ISSN: 2229-6956(ONLINE) ICTACT JOURNAL ON SOFT COMPUTING, APRIL 212, VOLUME: 2, ISSUE: 3 AN EFFECTIVE SPAM FILTERING FOR DYNAMIC MAIL MANAGEMENT SYSTEM S. Arun Mozhi Selvi 1 and R.S. Rajesh 2 1 Department

More information

Image Content-Based Email Spam Image Filtering

Image Content-Based Email Spam Image Filtering Image Content-Based Email Spam Image Filtering Jianyi Wang and Kazuki Katagishi Abstract With the population of Internet around the world, email has become one of the main methods of communication among

More information

An Efficient Spam Filtering Techniques for Email Account

An Efficient Spam Filtering Techniques for Email Account American Journal of Engineering Research (AJER) e-issn : 2320-0847 p-issn : 2320-0936 Volume-02, Issue-10, pp-63-73 www.ajer.org Research Paper Open Access An Efficient Spam Filtering Techniques for Email

More information

Naïve Bayesian Anti-spam Filtering Technique for Malay Language

Naïve Bayesian Anti-spam Filtering Technique for Malay Language Naïve Bayesian Anti-spam Filtering Technique for Malay Language Thamarai Subramaniam 1, Hamid A. Jalab 2, Alaa Y. Taqa 3 1,2 Computer System and Technology Department, Faulty of Computer Science and Information

More information

Image Spam Filtering by Content Obscuring Detection

Image Spam Filtering by Content Obscuring Detection Image Spam Filtering by Content Obscuring Detection Battista Biggio, Giorgio Fumera, Ignazio Pillai, Fabio Roli Dept. of Electrical and Electronic Eng., University of Cagliari Piazza d Armi, 09123 Cagliari,

More information

SpamNet Spam Detection Using PCA and Neural Networks

SpamNet Spam Detection Using PCA and Neural Networks SpamNet Spam Detection Using PCA and Neural Networks Abhimanyu Lad B.Tech. (I.T.) 4 th year student Indian Institute of Information Technology, Allahabad Deoghat, Jhalwa, Allahabad, India abhimanyulad@iiita.ac.in

More information

A Three-Way Decision Approach to Email Spam Filtering

A Three-Way Decision Approach to Email Spam Filtering A Three-Way Decision Approach to Email Spam Filtering Bing Zhou, Yiyu Yao, and Jigang Luo Department of Computer Science, University of Regina Regina, Saskatchewan, Canada S4S 0A2 {zhou200b,yyao,luo226}@cs.uregina.ca

More information

Savita Teli 1, Santoshkumar Biradar 2

Savita Teli 1, Santoshkumar Biradar 2 Effective Spam Detection Method for Email Savita Teli 1, Santoshkumar Biradar 2 1 (Student, Dept of Computer Engg, Dr. D. Y. Patil College of Engg, Ambi, University of Pune, M.S, India) 2 (Asst. Proff,

More information

Differential Voting in Case Based Spam Filtering

Differential Voting in Case Based Spam Filtering Differential Voting in Case Based Spam Filtering Deepak P, Delip Rao, Deepak Khemani Department of Computer Science and Engineering Indian Institute of Technology Madras, India deepakswallet@gmail.com,

More information

Filtering Noisy Contents in Online Social Network by using Rule Based Filtering System

Filtering Noisy Contents in Online Social Network by using Rule Based Filtering System Filtering Noisy Contents in Online Social Network by using Rule Based Filtering System Bala Kumari P 1, Bercelin Rose Mary W 2 and Devi Mareeswari M 3 1, 2, 3 M.TECH / IT, Dr.Sivanthi Aditanar College

More information

Efficient Spam Email Filtering using Adaptive Ontology

Efficient Spam Email Filtering using Adaptive Ontology Efficient Spam Email Filtering using Adaptive Ontology Seongwook Youn and Dennis McLeod Computer Science Department, University of Southern California Los Angeles, CA 90089, USA {syoun, mcleod}@usc.edu

More information

Detecting Spam Bots in Online Social Networking Sites: A Machine Learning Approach

Detecting Spam Bots in Online Social Networking Sites: A Machine Learning Approach Detecting Spam Bots in Online Social Networking Sites: A Machine Learning Approach Alex Hai Wang College of Information Sciences and Technology, The Pennsylvania State University, Dunmore, PA 18512, USA

More information

Naive Bayes Spam Filtering Using Word-Position-Based Attributes

Naive Bayes Spam Filtering Using Word-Position-Based Attributes Naive Bayes Spam Filtering Using Word-Position-Based Attributes Johan Hovold Department of Computer Science Lund University Box 118, 221 00 Lund, Sweden johan.hovold.363@student.lu.se Abstract This paper

More information

Hoodwinking Spam Email Filters

Hoodwinking Spam Email Filters Proceedings of the 2007 WSEAS International Conference on Computer Engineering and Applications, Gold Coast, Australia, January 17-19, 2007 533 Hoodwinking Spam Email Filters WANLI MA, DAT TRAN, DHARMENDRA

More information

Accelerating Techniques for Rapid Mitigation of Phishing and Spam Emails

Accelerating Techniques for Rapid Mitigation of Phishing and Spam Emails Accelerating Techniques for Rapid Mitigation of Phishing and Spam Emails Pranil Gupta, Ajay Nagrale and Shambhu Upadhyaya Computer Science and Engineering University at Buffalo Buffalo, NY 14260 {pagupta,

More information

An Efficient Two-phase Spam Filtering Method Based on E-mails Categorization

An Efficient Two-phase Spam Filtering Method Based on E-mails Categorization International Journal of Network Security, Vol.9, No., PP.34 43, July 29 34 An Efficient Two-phase Spam Filtering Method Based on E-mails Categorization Jyh-Jian Sheu Department of Information Management,

More information

A Content based Spam Filtering Using Optical Back Propagation Technique

A Content based Spam Filtering Using Optical Back Propagation Technique A Content based Spam Filtering Using Optical Back Propagation Technique Sarab M. Hameed 1, Noor Alhuda J. Mohammed 2 Department of Computer Science, College of Science, University of Baghdad - Iraq ABSTRACT

More information

Email Spam Detection Using Customized SimHash Function

Email Spam Detection Using Customized SimHash Function International Journal of Research Studies in Computer Science and Engineering (IJRSCSE) Volume 1, Issue 8, December 2014, PP 35-40 ISSN 2349-4840 (Print) & ISSN 2349-4859 (Online) www.arcjournals.org Email

More information

Lan, Mingjun and Zhou, Wanlei 2005, Spam filtering based on preference ranking, in Fifth International Conference on Computer and Information

Lan, Mingjun and Zhou, Wanlei 2005, Spam filtering based on preference ranking, in Fifth International Conference on Computer and Information Lan, Mingjun and Zhou, Wanlei 2005, Spam filtering based on preference ranking, in Fifth International Conference on Computer and Information Technology : CIT 2005 : proceedings : 21-23 September, 2005,

More information

A Multiobjective Evolutionary Algorithm for Spam E-mail Filtering

A Multiobjective Evolutionary Algorithm for Spam E-mail Filtering A Multiobjective Evolutionary Algorithm for Spam E-mail Filtering A.G. López-Herrera 1, E. Herrera-Viedma 2, F. Herrera 2 1.Dept. of Computer Sciences, University of Jaén, E-23071, Jaén (Spain), aglopez@ujaen.es

More information

AN E-MAIL SERVER-BASED SPAM FILTERING APPROACH

AN E-MAIL SERVER-BASED SPAM FILTERING APPROACH AN E-MAIL SERVER-BASED SPAM FILTERING APPROACH MUMTAZ MOHAMMED ALI AL-MUKHTAR College of Information Engineering, AL-Nahrain University IRAQ ABSTRACT The spam has now become a significant security issue

More information

SEARCH ENGINE WITH PARALLEL PROCESSING AND INCREMENTAL K-MEANS FOR FAST SEARCH AND RETRIEVAL

SEARCH ENGINE WITH PARALLEL PROCESSING AND INCREMENTAL K-MEANS FOR FAST SEARCH AND RETRIEVAL SEARCH ENGINE WITH PARALLEL PROCESSING AND INCREMENTAL K-MEANS FOR FAST SEARCH AND RETRIEVAL Krishna Kiran Kattamuri 1 and Rupa Chiramdasu 2 Department of Computer Science Engineering, VVIT, Guntur, India

More information

Data Mining in Web Search Engine Optimization and User Assisted Rank Results

Data Mining in Web Search Engine Optimization and User Assisted Rank Results Data Mining in Web Search Engine Optimization and User Assisted Rank Results Minky Jindal Institute of Technology and Management Gurgaon 122017, Haryana, India Nisha kharb Institute of Technology and Management

More information

An Efficient Three-phase Email Spam Filtering Technique

An Efficient Three-phase Email Spam Filtering Technique An Efficient Three-phase Email Filtering Technique Tarek M. Mahmoud 1 *, Alaa Ismail El-Nashar 2 *, Tarek Abd-El-Hafeez 3 *, Marwa Khairy 4 * 1, 2, 3 Faculty of science, Computer Sci. Dept., Minia University,

More information

Rough Set Theory Approach for Filtering Spams from boundary messages in a Chat System

Rough Set Theory Approach for Filtering Spams from boundary messages in a Chat System Rough Set Theory Approach for Filtering Spams from boundary messages in a Chat System Sanjiban Sekhar Roy 1, Saptarshi Charaborty 1, Swapnil Sourav 1 and Ajith Abraham 2,3 1 School of Computing Science

More information

Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing and Developing E-mail Classifier

Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing and Developing E-mail Classifier International Journal of Recent Technology and Engineering (IJRTE) ISSN: 2277-3878, Volume-1, Issue-6, January 2013 Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing

More information

Towards better accuracy for Spam predictions

Towards better accuracy for Spam predictions Towards better accuracy for Spam predictions Chengyan Zhao Department of Computer Science University of Toronto Toronto, Ontario, Canada M5S 2E4 czhao@cs.toronto.edu Abstract Spam identification is crucial

More information

An incremental cluster-based approach to spam filtering

An incremental cluster-based approach to spam filtering Available online at www.sciencedirect.com Expert Systems with Applications Expert Systems with Applications 34 (2008) 1599 1608 www.elsevier.com/locate/eswa An incremental cluster-based approach to spam

More information

Behavior Analysis of SVM Based Spam Filtering Using Various Kernel Functions and Data Representations

Behavior Analysis of SVM Based Spam Filtering Using Various Kernel Functions and Data Representations ISSN: 2278-181 Vol. 2 Issue 9, September - 213 Behavior Analysis of SVM Based Spam Filtering Using Various Kernel Functions and Data Representations Author :Sushama Chouhan Author Affiliation: MTech Scholar

More information

Filtering Spam E-Mail from Mixed Arabic and English Messages: A Comparison of Machine Learning Techniques

Filtering Spam E-Mail from Mixed Arabic and English Messages: A Comparison of Machine Learning Techniques 52 The International Arab Journal of Information Technology, Vol. 6, No. 1, January 2009 Filtering Spam E-Mail from Mixed Arabic and English Messages: A Comparison of Machine Learning Techniques Alaa El-Halees

More information

International Journal of Electronics and Computer Science Engineering 1449

International Journal of Electronics and Computer Science Engineering 1449 International Journal of Electronics and Computer Science Engineering 1449 Available Online at www.ijecse.org ISSN- 2277-1956 Neural Networks in Data Mining Priyanka Gaur Department of Information and

More information

Tweaking Naïve Bayes classifier for intelligent spam detection

Tweaking Naïve Bayes classifier for intelligent spam detection 682 Tweaking Naïve Bayes classifier for intelligent spam detection Ankita Raturi 1 and Sunil Pranit Lal 2 1 University of California, Irvine, CA 92697, USA. araturi@uci.edu 2 School of Computing, Information

More information

SURVEY OF TEXT CLASSIFICATION ALGORITHMS FOR SPAM FILTERING

SURVEY OF TEXT CLASSIFICATION ALGORITHMS FOR SPAM FILTERING I J I T E ISSN: 2229-7367 3(1-2), 2012, pp. 233-237 SURVEY OF TEXT CLASSIFICATION ALGORITHMS FOR SPAM FILTERING K. SARULADHA 1 AND L. SASIREKA 2 1 Assistant Professor, Department of Computer Science and

More information

International Journal of Research in Advent Technology Available Online at: http://www.ijrat.org

International Journal of Research in Advent Technology Available Online at: http://www.ijrat.org IMPROVING PEFORMANCE OF BAYESIAN SPAM FILTER Firozbhai Ahamadbhai Sherasiya 1, Prof. Upen Nathwani 2 1 2 Computer Engineering Department 1 2 Noble Group of Institutions 1 firozsherasiya@gmail.com ABSTARCT:

More information

Data Pre-Processing in Spam Detection

Data Pre-Processing in Spam Detection IJSTE - International Journal of Science Technology & Engineering Volume 1 Issue 11 May 2015 ISSN (online): 2349-784X Data Pre-Processing in Spam Detection Anjali Sharma Dr. Manisha Manisha Dr. Rekha Jain

More information

Spam Filtering with Naive Bayesian Classification

Spam Filtering with Naive Bayesian Classification Spam Filtering with Naive Bayesian Classification Khuong An Nguyen Queens College University of Cambridge L101: Machine Learning for Language Processing MPhil in Advanced Computer Science 09-April-2011

More information

Detecting Spam in VoIP Networks

Detecting Spam in VoIP Networks Detecting Spam in VoIP Networks Ram Dantu, Prakash Kolan Dept. of Computer Science and Engineering University of North Texas, Denton {rdantu, prk2}@cs.unt.edu Abstract Voice over IP (VoIP) is a key enabling

More information

DATA MINING TECHNIQUES AND APPLICATIONS

DATA MINING TECHNIQUES AND APPLICATIONS DATA MINING TECHNIQUES AND APPLICATIONS Mrs. Bharati M. Ramageri, Lecturer Modern Institute of Information Technology and Research, Department of Computer Application, Yamunanagar, Nigdi Pune, Maharashtra,

More information

Data Mining - Evaluation of Classifiers

Data Mining - Evaluation of Classifiers Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010

More information

Spam Filtering using Naïve Bayesian Classification

Spam Filtering using Naïve Bayesian Classification Spam Filtering using Naïve Bayesian Classification Presented by: Samer Younes Outline What is spam anyway? Some statistics Why is Spam a Problem Major Techniques for Classifying Spam Transport Level Filtering

More information

Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification

Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification Tina R. Patil, Mrs. S. S. Sherekar Sant Gadgebaba Amravati University, Amravati tnpatil2@gmail.com, ss_sherekar@rediffmail.com

More information

Data duplication: an imbalance problem?

Data duplication: an imbalance problem? Data duplication: an imbalance problem? Aleksander Kołcz Abdur Chowdhury Joshua Alspector AOL, Inc., 44900 Prentice Drive, Dulles, VA 20166 USA a.kolcz@ieee.org cabdur@aol.com jalspector1@aol.com Abstract

More information

American Journal of Engineering Research (AJER) 2013 American Journal of Engineering Research (AJER) e-issn: 2320-0847 p-issn : 2320-0936 Volume-2, Issue-4, pp-39-43 www.ajer.us Research Paper Open Access

More information

Search and Information Retrieval

Search and Information Retrieval Search and Information Retrieval Search on the Web 1 is a daily activity for many people throughout the world Search and communication are most popular uses of the computer Applications involving search

More information

Email Filter for Spam Mail: A Review

Email Filter for Spam Mail: A Review Email Filter for Spam Mail: A Review Amar V. Sable 1 and Prof. Vijay S. Gulhane 2 1,2 Computer Science & Engineering Department, Sant Gadge Baba University, Amravati SIPNA College of Engineering & Technology,

More information

agoweder@yahoo.com ** The High Institute of Zahra for Comperhensive Professions, Zahra-Libya

agoweder@yahoo.com ** The High Institute of Zahra for Comperhensive Professions, Zahra-Libya AN ANTI-SPAM SYSTEM USING ARTIFICIAL NEURAL NETWORKS AND GENETIC ALGORITHMS ABDUELBASET M. GOWEDER *, TARIK RASHED **, ALI S. ELBEKAIE ***, and HUSIEN A. ALHAMMI **** * The High Institute of Surman for

More information

Natural Language to Relational Query by Using Parsing Compiler

Natural Language to Relational Query by Using Parsing Compiler Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 3, March 2015,

More information

Spam or Not Spam That is the question

Spam or Not Spam That is the question Spam or Not Spam That is the question Ravi Kiran S S and Indriyati Atmosukarto {kiran,indria}@cs.washington.edu Abstract Unsolicited commercial email, commonly known as spam has been known to pollute the

More information

Decision Trees for Mining Data Streams Based on the Gaussian Approximation

Decision Trees for Mining Data Streams Based on the Gaussian Approximation International Journal of Computer Sciences and Engineering Open Access Review Paper Volume-4, Issue-3 E-ISSN: 2347-2693 Decision Trees for Mining Data Streams Based on the Gaussian Approximation S.Babu

More information

On Attacking Statistical Spam Filters

On Attacking Statistical Spam Filters On Attacking Statistical Spam Filters Gregory L. Wittel and S. Felix Wu Department of Computer Science University of California, Davis One Shields Avenue, Davis, CA 95616 USA Abstract. The efforts of anti-spammers

More information

Data Mining Yelp Data - Predicting rating stars from review text

Data Mining Yelp Data - Predicting rating stars from review text Data Mining Yelp Data - Predicting rating stars from review text Rakesh Chada Stony Brook University rchada@cs.stonybrook.edu Chetan Naik Stony Brook University cnaik@cs.stonybrook.edu ABSTRACT The majority

More information

Mining a Corpus of Job Ads

Mining a Corpus of Job Ads Mining a Corpus of Job Ads Workshop Strings and Structures Computational Biology & Linguistics Jürgen Jürgen Hermes Hermes Sprachliche Linguistic Data Informationsverarbeitung Processing Institut Department

More information

Dr. D. Y. Patil College of Engineering, Ambi,. University of Pune, M.S, India University of Pune, M.S, India

Dr. D. Y. Patil College of Engineering, Ambi,. University of Pune, M.S, India University of Pune, M.S, India Volume 4, Issue 6, June 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Effective Email

More information

Paper Classification for Recommendation on Research Support System Papits

Paper Classification for Recommendation on Research Support System Papits IJCSNS International Journal of Computer Science and Network Security, VOL.6 No.5A, May 006 17 Paper Classification for Recommendation on Research Support System Papits Tadachika Ozono, and Toramatsu Shintani,

More information

A MACHINE LEARNING APPROACH TO FILTER UNWANTED MESSAGES FROM ONLINE SOCIAL NETWORKS

A MACHINE LEARNING APPROACH TO FILTER UNWANTED MESSAGES FROM ONLINE SOCIAL NETWORKS A MACHINE LEARNING APPROACH TO FILTER UNWANTED MESSAGES FROM ONLINE SOCIAL NETWORKS Charanma.P 1, P. Ganesh Kumar 2, 1 PG Scholar, 2 Assistant Professor,Department of Information Technology, Anna University

More information