** The High Institute of Zahra for Comperhensive Professions, Zahra-Libya

Transcription

1 AN ANTI-SPAM SYSTEM USING ARTIFICIAL NEURAL NETWORKS AND GENETIC ALGORITHMS ABDUELBASET M. GOWEDER *, TARIK RASHED **, ALI S. ELBEKAIE ***, and HUSIEN A. ALHAMMI **** * The High Institute of Surman for Comperhensive Professions, Surman-Libya agoweder@yahoo.com ** The High Institute of Zahra for Comperhensive Professions, Zahra-Libya tarmanlib@yahoo.com *** The High Institute of Computer Technology, Tripoli-Libya ali.elbekai@yahoo.co.uk **** The High Institute of Zawia for Comperhensive Professions, Zawia-Libya h1974hami@yahoo.com Abstract Nowadays, is widely becoming one of the fastest and most economical forms of communication.thus, the is prone to be misused. One such misuse is the posting of unsolicited, unwanted s known as spam or junk s. This paper presents and discusses an implementation of an Anti-spam filtering system, which uses a Multi-Layer Perceptron (MLP) as a classifier and a Genetic Algorithm (GA) as a training algorithm. Standard genetic operators and advanced techniques of GA algorithm are used to train the MLP. The implemented filtering system has achieved an accuracy of about 94% to detect spam s, and 89% to detect legitimate s. Keywords: Artificial Neural Networks, Genetic Algorithms, Spam s, Legitimate s, Arabic Spam, Text Classification. relevant recent work. Section 3 provides a description of Genetic Algorithms. Section 4 describes the Multi- Layer Feed Forward Artificial Neural Networks. Section 5 discusses the experimental work, the results of the experiments conducted and includes an analysis of these results. Section 6 presents the conclusion drawn by the researchers. 1 INTRODUCTION Spam is becoming an increasingly large problem. Many Internet Service Providers (ISPs) receive over a billion spam messages per day. Much of these s are filtered before they reach end users. Content-Based filtering is a key technological method to filtering. The spam contents usually contain common words called features. Frequency of occurrence of these features inside an gives an indication that the is a spam or legitimate [1, 11, 26, 28].The spam filtering is high sensitive application of text classification (TC) task. Because spam s contain high noise, and redundant data to bypass filtering systems, a pre processing of s is required in order to split contents of s from HTML tags ( structure) and decide which information to use. The information is organized in as a set of fields, for example: From, To, Cc, Subject, and Body fields. In addition, we should handle the cases when some words appear in different forms (e.g.: CLICK, C*L*I*C*K, N-O-W, now!). In other languages such as Arabic, some words are also occur in different forms (e.g.:,ألتح ق) Altehk "Join"), Altehk!(!,ألتحق "Join!"), and,إضغط*)* Edkat "Click*")). For Arabic spam s, some of the challenges which we encountered in features reduction and selection phases are: some Arabic letters have many orthographical forms such as ألان) Alan, "NOW"),,(" NOW ",إلان) Elan and الان) Alan,"NOW"). In addition, some Arabic s usually include English words which need to be considered when designing and implementing an Arabic spam filtering system. This paper is organized as follows: Section 2 gives a theoretical background for the research and a review of 2 BACKGROUND AND LITERATURE REVIEW The success of statistical-probabilities algorithms and machine learning algorithms in text categorization (TC) has led researchers to explore these algorithms to be applied in anti-spam filtering [9,, 18]. Various techniques to extract features from have been proposed and implemented. Payne and Edwards [2] have used features consisting of words in the From and Subject fields. Segal et.al. [23] developed the MailCat system. They have used the information in the To, Cc, Subject, From, and Body Fields. Jason and Rennie [12] developed the ifile system and used the words found in the From, Subject, and Body Fields. Graham [11] extracted features from all fields in the Header and Body of s. In this paper, we have used the features that found in the From, Subject, and Body fields. There are three common and intuitive representations found in text categorization and they are called: Term Frequency (TF), Term Frequency Inverse Document Frequency (TF-IDF) weight representation and semantic approaches. Jason and Rennie [12]; and Boone [3] have used TF representation in the ifile filtering system. Segal, et.al. [23] have used TF-IDF weighting scheme to develop the MailCat text classifier. Boone [3] has showed that the TF-IDF weighting scheme captures the idea that the

2 Subject words will occur frequently in the document on a given topic. Liao, et.al. [] have compared between TF and TF-IDF feature representations. They have concluded that the TF-IDF features representation is better than the TF representation. Scott and Matwin [24] have discussed semantic approach representation in text classification. Their approach was focused on words meanings by clustering words which have the same meaning together. The TF-IDF representations have a greater advantage over semantic approaches and TF. This is because TF- IDF shows the degree of information represented by feature occurrences in s. Features reduction often applied to reduce the size of features extracted from e- mails. Almost all techniques for features reductions consider stop-words removal. Normalizing some Arabic alphabet letters are very useful and necessary reduction (ا) Alef step which converts some Arabic letters such as: with (ء) hamza above or below or Madda(~) above into the Arabic letter ا() Alef, and (و) Waw with (ء) hamza or.(و) Waw Madda(~) above into the Arabic letter Features selection approaches are usually employed to reduce the size of the feature set, and to select a subset of the original features. Chi-square test is used as a selection method [, 25, 3]. Boone [3] and Salton [25] have used the TF-IDF as a feature selection and weighting scheme. They have found that the TF-IDF scheme is useful for the features size reduction. Joachims [13] has used information gain to select a subset of features. Liao, et. al. [] have showed that the TF-IDF has similar performance to information gain and Chi-square test methods. The TF-IDF feature selection method is proposed to select the most discriminative features while eliminating irrelevant ones among arbitrarily constructed feature sets. Some algorithms are developed to classify and filter s. The RIPPER algorithm [4] is an algorithm that employs rule-based to filtering s. Drucker, et. al. [8] proposed an SVM algorithm for spam categorization. Jason [12] and Rennie [14] have demonstrated that the SVM is costly to train and requires significant time to classify. Sahami, et. al. [22] proposed Bayesian junk E- mail filter using bag-of-words representation and Naïve Bayes algorithm. Graham [11] described a simple implementation of the Naïve Bayes algorithm. Chuan, et. al. [7] proposed a Learning Vector Quantizers (LVQ) based on neural network Anti-spam approach. Özgür, et. al. [17] proposed an Anti-spam filtering method based on ANN and Bayesian networks for English languages in general and for Turkish in particular. Clark, et. al. [5] used the bag-of-words representation and ANN for automated spam filtering system. Previous researches have shown that ANN can achieve very accurate results, that are sometimes more accurate than those of the TC classifiers [27]. Some researchers used GA's as Alternative approach for training ANN [16]. Branke, J. [2] discussed how the genetic algorithm can be used to assist in designing and training. Riley. J. [21] described a method of utilizing genetic algorithms to train fixed architecture feedforward and recurrent neural networks. Yao. X. and Liu. Y. [29] reviewed the different combinations between ANN and GA, and used GA to evolve ANN connection weights, architectures, learning rules, and input features. Prados. D. [19] reported in his paper that the GA-based training algorithm is more useful for training ANN epically when simple ANN topology used. 3 A GENETIC ALGORITHM A GA is used in the system proposed by this paper for training the MLP. Training the MLP based on the GA will benefit from the GA properties which are parallel interactions process between a numbers of different chromosomes information (genes) in population pool of candidate solutions. This leads to create new several chromosomes information. In this paper, GA chromosome of the MLP is encoded as weights (w 1, w 2,,w n ) where n is the number of MLP connections and each gene is a real value number in the interval [-, ]. There are two genetic operators. The first one is referred to as the uniform crossover operator. Its occurrence is based on crossover probability (Cp). The crossover occurs, if the generated random value number which is between [, 1] is greater than or equals the Cp. The second genetic operator is called mutation which simply involves changing the genes values by adding the gene value to a uniformly random-generated number. Mutation occurs with a probability equals one for the chromosome that has not crossed, and with a probability equals (1-Cp) for the chromosome that has crossed. The mutation function can be computed according to the following equation: value = random_value[,1] * (Min_bound - Max_bound) + Max_b) (3.1) Where: Min_bound= Min_b*Random_value[,1]*Generation_Rate (3.2) Max_bound= Max_b*Random_value(,1)*Generation_Rate (3.3) Min_b: lower interval value = -3, Max_b: upper interval value = +3. Generation_Rate= log (Max_Gen)-(Cur_Gen) / log (Max_Gen) (3.4) Max_Gen: Maximum number of Generation, Cur_Gen: Current generation. 3.1 A FITNESS FUNCTION The fitness function was absolute sum of the output differences between actual and desired output of a chromosome over all training data. The fitness function is computed by the following equation: The fitness function = C desired_ou tput(i) actual_out put(i) (3.5) c = 1 Where: 2

3 C: is the number of chromosomes in the population pool. desired_ou put(i) : indicates the class which is either a spam (represented by the value.1) or legitimate (represented by the value.9), for an i. actual_out put(i) : is the expected output value of chromosome c over all s in the training data. 3.2 ELITISM STRATEGY AND A RANK- BASED SELECTION A rank-based selection is needed to make few copies of a set of best chromosomes. Equation 3.6 was used to calculate the number of copies for each chromosome depending on an ordered set. Copies = (q - ((Chr_order - 1) * p)) * Chrs (3.6) Where: Chr_order: is the order number of chromosome in population pool list. Chrs: is number of chromosomes in the population pool list. q = 2 /Chrs. p = q / (Chrs - 1). 4 AN ARTIFICIAL NEURAL NETWORK (ANN) The ANN used in our system is the key component that does the filtering operation. The MLP architecture is a full connection feed-forward with inputs depending on the number of selected features. Each input is corresponding to a single feature which is converted to the TF-IDF weight and organized as TF-IDF vector features with a class label spam (.1) or legitimate (.9). The MLP output is a single output. Training is done by constructing one target output for legitimate or spam e- mails, and training with the appropriate output value for the input data. By observation, a threshold value is chosen to be.6. On the basis of the output, a value less than.6 is thresholded to be.1, otherwise the value is thresholded to be.9. Training the MLP is performed using one and two hidden layers. A number of hidden and one output neuron with sigmoid activation function are used. English and Arabic data sets are tested on different combinations (5,,, 2, and 3) of hidden. Training the MLP is achieved through the use of the GA which is described in Section 3.2. The training procedure starts with 2 chromosomes. Other experiments are conducted on different number of chromosomes (e.g.: 4 and 6). Initial chromosome genes values were real numbers in the interval [-, ]. A training procedure was repeated many times with many different training data, over several generations until one of the following conditions are met: 1. The maximum number (set to be 5,) of generations is reached. 2. The fitness value (the MLP Error) is less than or equals to.5. 5 THE EXPERIMENTAL WORK In this section, we first present the data sets that we used to conduct our evaluation experiments. Next, a pre-processing of our data and an implementation of our system are given. Then, evaluation measures to assess our system are described. Finally, a set of experiments are presented followed by the results and their discussion. 5.1 THE DATA SETS Three different data sets are used to conduct our experiments. These data sets are collected from different sources [6, 31]. Table 5.1 shows these three data sets. Table 5. 1:The Data Sets (corpora). Corpus Name No. of Spam s No. of Legitimate s Total SpamAssassin TREC The Arabic Corpus TRAINING AND TEST DATA Each data set was equally split into two sets (5% for training and 5% for test data). Table (5.2) shows the training and test data for each corpus (data set). Table 5. 2: Training and Test Data. SpamAssassin The Arabic TREC Corpus Corpus Corpus Trainin Test Training Test Training Test set g set set set set set Spam Legitimate Total DATA PRE-PROCESSING Data pre-processing is an analysis of the textual data and an extraction of information from s. The general procedure for data pre-processing can be described according to the following steps: (i) Deletion: Remove irrelevant elements of s, and select segments suitable for processing (e.g., Subject and Body Fields). (ii) Normalization: For Arabic s, convert some Arabic letters which have the same shape such as: (ا) Alef with (ء) hamza above or below or Madda(~) above into the Arabic letter Alef( ا(, and (و) Waw with (ء) hamza or Madda(~) above into the Arabic letter.(و) Waw 3

4 (iii)tokenization: Divide the message into semantically coherent segments (e.g.: words, other character strings). (iv) Representation: Convert the message into a vector of values, where each value in this vector represents an feature. (v) Selection: Delete the least predictive features using the TF-IDF weighting scheme. The highest values of TF-IDF features are selected to represent the set of training features. 5.4 IMPLEMENTATION We have implemented an Anti spam system that runs under Windows XP platform. The code is written using Visual Basic.net. The system was built from scratch without using any ANN or GA libraries. The system has three main modules, these are: (1) A features extraction and reduction module. (2) A features weighting and selection module. (3) A classifier module, which consists of an MLP classifier and GA THE FEATURES EXTRACTION AND REDUCTION MODULE This module is concerned with the features extraction and reduction. It first tokenizes each included in the training data set. Then, a bag-of-words is created for each data set. No stemming was applied. Next, words that appear only three times and less in each corpus were discarded. Finally, words that are 2 characters in length or longer were removed from the e- mail. As a result, the initial number of unique features is reduced from about 48 to 981 for Arabic and English corpus. For SpamAssissn corpus, the initial number of features is reduced from 22 to 32. While for the TREC corpus, the features are reduced from 29 to THE FEATURES WEIGHTING AND SELECTION MODULE The implementation of feature selection using the TF- IDF scheme was carried out after the construction of the bag-of-words. The selection of the best features is done by sorting the TF-IDF features in a descending order. We then decide how many features we might include. The experiments are conducted using different number of selected features THE CLASSIFIER MODULE The MLP architecture is a full connection feed-forward with inputs depending on the number of selected features. Each input is converted to the TF-IDF weight and organized as TF-IDF vector features with a class label spam (.1) or legitimate (.9). Two matrices are used to calculate the outputs of every layer. The first matrix is concerned with the MLP inputs organized as vectors. Each vector consists of a set of TF-IDF values. The second matrix contains a set of chromosomes which represent the weight associated with every MLP input. 5.5 EVALUATION MEASURES The performance of spam filtering techniques is determined by two well known measures used in text classification. These measures are precision and recall [5, ] which can be computed as follows: N SS Spam Precision (SP) = (5.1) N + N ) Legitimate Precision (LP) = Spam Recall (SR) = ( SS LS N LL ( N + N ) SL LL ( N + N ) SL SS N SS (5.2) (5.3) N LL Legitimate Recall (LR) = (5.4) ( N LL + N LS ) Where: N SS = the number of spam messages correctly classified as spam. N SL = the number of spam messages incorrectly classified as legitimate. N LL = the number of legitimate messages correctly classified as legitimate. N LS = the number of legitimate messages incorrectly classified as spam. 5.6 EXPERIMENTS The purpose of these experiments is to evaluate the performance of the MLP in spam filtering and the efficiency of GA in training the MLP. A series of tests are performed on a small problem (the XOR) to discover the best GA parameters (e.g., mutation probabilities, crossover probabilities, and population size) that give the best performance of the MLP. The best obtained GA parameters are used to train our MLP classifier THE XOR PROBLEM The XOR problem was the first problem to be solved using the MLP trained by the GA. This problem has become a standard example used by many researchers to explain the training process. Table 5.3 shows the different values of mutation, crossover probabilities, and population size for each experiment. Table 5.3 clearly shows that experiment 2 recorded the minimum time to train the MLP using the GA for the XOR problem. 4

5 Table 5. 3: The GA Parameters for the XOR Problem. Experiment name Mp Cp Ps Training Time in Seconds (s) Experiment 1 Mp=.3 Cp=.7 Ps= s Experiment 2 Mp=.3 Cp=.7 Ps=2 5s Experiment 3 Mp=.3 Cp=.7 Ps=4 s Experiment 4 Mp=.3 Cp=.7 Ps=6 >2s Experiment 5 Mp=.5 Cp=.7 Ps= s Experiment 6 Mp=.5 Cp=.7 Ps=2 s Experiment 7 Mp=.5 Cp=.7 Ps=4 >2s Experiment 8 Mp=.5 Cp=.7 Ps=6 4s THE MLP AND GA CLASSIFIER A series of experiments were conducted to train our MLP using the GA parameters obtained from experiment 2 described in the previous section. These experiments are intended to train our MLP using the GA on three different data sets. Despite the fact that the training process is accomplished, there are some cases where combinations of the MLP parameters have led to a failure due to the low rates of SR, SP, LR, and LP evaluation measures. Other combinations of the MLP parameters were ignored and the processes of training were terminated because the training time exceeded 6 hours and the MLP errors were slightly improved. One of training processes that are terminated is the process where the experiment used 25 features as input, the first layer contained 3, and the second layer contained. The following sections present the results of the experiments conducted on three different data sets THE SPAMASSASSIN DATA SET RESULTS In this experiment, we have trained the MLP using the GA on the SpamAssassin data set. This section presents the obtained results using and 2 different features. Tables 5.4 and 5.5 show the SR, SP, LR, and LP values using and 2 input features respectively. It can be observed from Table 5.4 that the best results as highlighted are obtained using the MLP which consists of one hidden layer with 3. These results were error rate which was 123 in the first generation and it took about generations to reach the error value of.49. Table 5.5 also shows that the best results as highlighted are obtained using the MLP which consists of one hidden layer with 3. These results were error rate which was in the first generation and it took about generations to reach the error value of.432. Table 5. 4: The Results of SpamAssassin Data Set: ( input features) Table 5. 5: The Results of SpamAssassin Data Set: (2 input features) THE TREC DATA SET RESULTS In this experiment, the MLP has been trained using the GA on the TREC data set. The obtained results using and 2 different features are given in tables 5.6 and 5.7. These tables show the SR, SP, LR, and LP values using and 2 input features respectively. In general, the results show low rates, because the TREC corpus contains large number of spam s that are highly similar to legitimates s (hard spam). It can be observed from Table 5.6 that the best results as highlighted are obtained using the MLP which consists of two hidden layers with 3 in the first layer and in the second one. These results were error rate which was 124 in the first generation and it took about generations to reach the error value of.498. Table 5.7 also shows that the best results as highlighted are obtained using the MLP which consists of two hidden layers with 3 in the first layer and in the second one. These results were achieved through a gradual improvement of the initial error rate

6 which was 112 in the first generation and it took about generations to reach the error value of.46. Table 5. 6: The Results of TREC Data Set: ( input features). 2 3 Table 5. 8: The Results of Arabic Data Set: (5 input features) Table 5. 7: The Results of TREC Data Set: (2 input features) THE ARABIC DATA SET RESULTS In this experiment, we have trained the MLP using the GA on the Arabic data set. This section presents the obtained results using 5 and 9 different features. Tables 5.8 and 5.9 show the SR, SP, LR, and LP values using 5 and 9 input features respectively. It can be observed from Table 5.8 that the best results as highlighted are obtained using the MLP which consists of one hidden layer with. These results were error rate which was 8 in the first generation and it took about 3546 generations to reach the error value of.48. Table 5.9 also shows that the best results as highlighted are obtained using the MLP which consists of one hidden layer with. These results were error rate which was 7 in the first generation and it took about 4546 generations to reach the error value of Table 5. 9: The Results of Arabic Data Set: (9 input features) THE OVERALL PERFORMANCE The results of our experiments indicate that our implemented MLP classifier using the GA performed significantly well. The overall accuracy rates are about 94% to detect spam s. On the other hand, the accuracy rates are about 89% to detect legitimate e- mails. 5.8 DISCUSSION OF THE RESULTS An analysis of the results and a deep understanding of the experiments produced a set of remarks as follows: (1) The best input features for English s were that generate the best results comparable to the 2 input features. For Arabic s, 9 input features are considered to be the best input features. This implies that the sucess rates are apparently influnced by the number of input feaures. (2) Words in legimate s are as important as words in spam s for the filtering process. By obsrvation, most misclassifications were s containing only 6

7 one or two words, or Arabic s which have Arabic mixed with English words. (3) A wise setting of the number of hidden layers and the number of can significantly dcrease the MLP error rates. (4) The initial parameters that were used during the development of the GA were Mp =.3, Cp =.7, Ps=2, and maximum generations were set to be 5,. These settings were suitable for filtering domain. Increasing the population size gives less chances of good chromosomes to appear in the next generation using the rank-based selection. The GA works better using many inputs (spam filtering) than using a few inputs (the XOR problem). 6 CONCLUSION An anti-spam filtering system was proposed which uses the multi-layer artificial neural network trained by the genetic algorithm. The results clearly show that the Subject and Body fields can contain enough information to classify s into spam or legitimate. The results have also shown that the MLP with -3 in the first hidden layer are sufficient to filter both easy spam and easy legitimate s. The MLP architecture used to develop our system is good for filtering s, if we do not take into account the long time needed to train the MLP. We have also investigated the effects of several GA parameters. The parameters that have been found to be the most significant to the performance of the classifier are: size of the population pool, crossover, mutation probabilities and mutation method. It is important to remember that filtering is high sensitive application of textual classification problem. The classifier must be able to handle many input features, with low false positive and low false negative. ACKNOWLEDGEMENT We would like to express our gratitude to the Libyan General Secretariat for Human Resources and Training for supporting this work. REFERENCES [1] Bruening, P., Technological Responses to the Problem of Spam: Preserving Free Speech and Open Internet Values, First Conference on and Anti- Spam, 24. [2] Branke, J., "Evolutionary algorithms for neural network design and training", In Proceedings 1st Nordic Workshop on Genetic Algorithms and its Applications, Finland, [3] Boone, G., "Concept Features in Re:Agent, an Intelligent Agent", The Second international Conference on Autonomous Agents, 1998 [4] Cohen, W., Learning Rules that Classify , In AAAI Spring Symposium on Machine Learning in Information Access, California, [5] Clark, et. al., "A Neural Network Based Approach to Automated Classification, IEEE/WIC International Conference on Web Intelligence, 23. [6] Cormack, G. Lynam, T., Spam Corpus Creation for TREC, Second conference of and Anti-spam, 25. [7] Chuan, Z., et. al., A LVQ-based neural network anti-spam approach, Proceedings of the 5th International Conference, Singapore, 24. [8] Drucker, H., et. al., Support Vector Machines for Spam Categorization, In IEEE Transactions on Neural Networks, [9] Flavio, D., et. al., Spam Filter Analysis, University of Nijmegen, the Netherlands, 23. [] Goodman, J., Spam: Technologies and Policies, Microsoft Research, 23 [11] Graham, P., A Plan for Spam, MIT Conference on Spam, 23. [12] Jason, D., Rennie, M., "ifile: An Application of Machine Learning to E Mail Filtering", Text Mining Workshop, Boston, U.S.A, 2. [13] Joachims, T., "Text Categorization with Support Vector Machines: Learning with Many Relevant Features", Proceedings of ECML-98, th European Conference on Machine Learning, [14] Kolcz, A., Alspector, J., "SVM-based filtering of e- mail spam with content-specific misclassification costs", In Proceedings of the Workshop on Text Mining, IEEE International Conference on Data Mining. San Jose, California, 21. [] Liao, C., Alpha, S., "Dixon.P, "Feature Preparation in Text Categorization", Oracle Corporation, 24. [16] Montana. D., Davis, L., "Training feed-forward neural networks using genetic algorithms", In Proceedings of the 11th International on Artificial Intelligence, 1989 [17] Ozgur, L., et. al., Adaptive Turkish Anti-spam Filtering, International Twelfth Turkish Symposium on Artificial Intelligence and Neural Networks (TAINN), 23. [18] Oda, T., White, T., Increasing the Accuracy of a Spam-detecting Artificial Immune System, In the Congress on Evolutionary Computation Proceedings, Canberra, Australia, 23. [19] Prados. D., Training multilayered neural networks by replacing the least fit hidden, In Proceedings IEEE SOUTHEASTCON 22, 22. [2] Payne, T., Edwards, P., "Interface Agents that Learn: An Investigation of Learning Issues in a Mail Agent Interface". Applied Artificial Intelligence, [21] Riley. J., "An evolutionary approach to training Feed-Forward and Recurrent Neural Networks", Master thesis of Applied Science in Information Technology, Department of Computer Science, Royal Melbourne Institute of Technology, Australia, 22. [22] Sahami, M., et. al., A Bayesian Approach to Filtering Junk , In Learning for Text Categorization, AAAI Technical Report, U.S.A,

8 [23] Segal, R., et. al., " MailCat: An intelligent assistant for organizing ", Proceedings of the Third International Conference on Autonomous Agents, [24] Scott, S., Matwin, S., "Feature engineering for text classification", Proceedings of ICML-99, 16th International Conference on Machine Learning, [25] Salton, G., Buckley, C., "Term Weighting Approaches in Automatic Text Retrieval", Information Processing and Management, Vol. 24, No.5, P513, [26] Urnkranz, J., "A Study Using n-gram Features for Text Categorization", Austrian Research Institute, [27] Vinther, M., "Intelligent junk mail detection using Neural networks", URL: kdetection.pdf, 22. [28] William, S., et. al., A Unified Model of Spam Filtration, MIT Spam Conference, Cambridge, 25. [29] Yao. X., Liu. Y., "A new evolutionary system for evolving artificial neural networks", IEEE Transactions on Neural Networks, [3] Yang, Y., Pedersen. J., "A comparative study on feature selection in text categorization", In Proceedings of ICML-97, 14th International Conference on Machine Learning, U.S.A, [31]URL: 8